Recent

Author Topic: Accented characters in SynEdit  (Read 12554 times)

typo

  • Hero Member
  • *****
  • Posts: 3051
Accented characters in SynEdit
« on: April 26, 2010, 06:30:45 pm »
When I load a file through LoadFromFile method in SynEdit, the accented characters appear with a question mark. This does not occur when I paste text from clipboard. Why? What should I do to get the accented characters appear correctly?

Thanks.
« Last Edit: April 26, 2010, 06:42:40 pm by typo »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12386
  • Debugger - SynEdit - and more
    • wiki
Re: Accented characters in SynEdit
« Reply #1 on: April 26, 2010, 07:15:07 pm »
Sounds like your file is not utf-8 encoded?

Your trying this in the IDE, or SynEdit in your own app?

In the IDE use the context menu of the source-editor => file settings, encoding

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Accented characters in SynEdit
« Reply #2 on: April 26, 2010, 07:19:33 pm »
I am trying this in my app.

Why can I paste text from clipboard correctly? Why SynEdit does not do the same thing to load the file as it does to paste it from clipboard?

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12386
  • Debugger - SynEdit - and more
    • wiki
Re: Accented characters in SynEdit
« Reply #3 on: April 26, 2010, 07:26:19 pm »
SynEdit does nothing itself to the clipboard.

The clipboard is accessed via the LCL, which (probably) translates the content to utf8. (assuming your clipboard isn't utf8 anyway).

The LCL doesn't convert the content of files, if you open a file. You have to make this call explicitly.

Neither the LCL nor SynEdit can know what encoding the content of a file is. This is because often the same binary file, can be correctly interpreted as many different encodings. (Or may not even be text, in which case the data would be destroyed, if an encoding conversion was done).


Zoran

  • Hero Member
  • *****
  • Posts: 1988
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Accented characters in SynEdit
« Reply #4 on: April 26, 2010, 07:33:13 pm »
The contents of the file is probably not encoded in utf8. You should convert it. You can first load it in StringList using LoadFromFile, then you can use correct xxxxToUTF8 function from LConvEncoding unit.
Code: [Select]
uses
   ... LConvEncoding;

...
var
  SL: TStringList;

...
  SL := TStringList.Create;
  try
    SL.LoadFromFile(yourfile.txt);
    SynEdit1.Lines.Text := CP1250ToUTF8(SL.Text);
  finally
    SL.Free;
  end;

Of course, instead of CP1250ToUTF8, you should use correct function for your file's encoding (if it's greek, then CP1253ToUTF8, if it's cyrilic then CP1251ToUTF8...)
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Accented characters in SynEdit
« Reply #5 on: April 26, 2010, 07:41:08 pm »
Quote
SL.LoadFromFile(yourfile.txt);

I am trying this and I guess that StringList does not open long file names. I have an error message "Unable to open file XXXX".

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12386
  • Debugger - SynEdit - and more
    • wiki
Re: Accented characters in SynEdit
« Reply #6 on: April 26, 2010, 07:55:58 pm »
for now, just reimplement the following in your own code (from SynEditLines)

Code: [Select]
procedure TSynEditLines.LoadFromFile(const FileName: string);
var
  Reader: TSynEditFileReader;
begin
  Reader := TSynEditFileReader.Create(FileName);
  try
    BeginUpdate;
    try
      Clear;
      while not Reader.EOF do
/////////// convert the result of Reader.ReadLine
        Add(Reader.ReadLine);
      fDosFileFormat := Reader.DosFile;
    finally
      EndUpdate;
    end;
  finally
    Reader.Free;
  end;
end;

you need to use SynEditLines, and it may not be 100% future compatible, in case this gets changed in future.
Usually the interface of SynEdit itself is kept, but internal files can change without warning....


BTW: "fDosFileFormat " => I dount it does anything... possible just a leftover from before the utf8 time

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Accented characters in SynEdit
« Reply #7 on: April 26, 2010, 08:03:47 pm »
Thanks.

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Accented characters in SynEdit
« Reply #8 on: April 26, 2010, 08:49:22 pm »
One more thing: how can I know whether a file is in UTF8 format or not, in order to convert it?

Thanks.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12386
  • Debugger - SynEdit - and more
    • wiki
Re: Accented characters in SynEdit
« Reply #9 on: April 26, 2010, 09:15:55 pm »
As I said before: It's not possible to know. Well at least not always.

A utf8 file is always a valid ASCII file too(at least for the latin code page).
 It may just be, that some of the utf8 chars form a sequence of 2,3 or more chars (including accented, or graphical symbols)

An ASCII file is not always a valid UTF8 => so that can be tested. (There probably are some tests available, but I don't know where, sorry)

Furthermore, ascii is not equal to ascii, there are many codepages. The same exact same file can be interpreted in any codepage, and will result in a different text.

There is also sometimes a utf8 header in a file, but it is not present in all utf8 files. If it is, you need to remove it.

Sorry, I know it isn't much help...


Example:
let's say your file contains 2 bytes:    c3 84

It could be utf8: "Ä" (A-umlaut)
or ISO 1252 Latin-1: "Ä"  (A with ~ and lower/opening double quote)
or even some other combination of 2 chars, in another ascii table

All valid encodings => so who knows.
« Last Edit: April 26, 2010, 09:21:48 pm by Martin_fr »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Accented characters in SynEdit
« Reply #10 on: April 26, 2010, 09:30:58 pm »
Well, my app is very language specific, 1252 Latin. So I need to expect only 1252 Latin and UTF8.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12386
  • Debugger - SynEdit - and more
    • wiki
Re: Accented characters in SynEdit
« Reply #11 on: April 26, 2010, 09:40:14 pm »
look at LCLProc, there are some utf8 helpers.

FindInvalidUTF8Character may help...


however, it will detect my example as valid utf8. (Which could be correct, if the text indeed was an a-umlaut)

If the text was "Ä" then that will never be known, and you still display a a-umlaut.

But if the text isn't valid utf8 (that is were you get the "?") then it should detect it

Zoran

  • Hero Member
  • *****
  • Posts: 1988
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: Accented characters in SynEdit
« Reply #12 on: April 27, 2010, 06:44:17 am »
In LConvEncoding unit there is a function "GuessEncoding". However, as its name says, guessing is not 100% reliable. Take a look and test it.
Swan, ZX Spectrum emulator https://github.com/zoran-vucenovic/swan

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1946
Re: Accented characters in SynEdit
« Reply #13 on: April 27, 2010, 08:44:28 am »
Or use charencstreams.pas: http://wiki.lazarus.freepascal.org/Theodp
It will take care of BOM's and UTF-16 files etc. It's using GuessEncoding internally too.

 

TinyPortal © 2005-2018