Recent

Author Topic: XMLWriter/reader and chinese characters  (Read 14600 times)

scribly

  • Jr. Member
  • **
  • Posts: 82
XMLWriter/reader and chinese characters
« on: January 10, 2011, 10:47:24 pm »
First off: I'm using ansi strings throughout my app. (It's not an option to convert it completly to unicode)

Now the thing, when a chinese user uses chinese characters in a string it's displayed and handled properly by the rest of the gui. TCanvas.textout shows the correct chinese characters and can be edited again in a editbox without any problems.

The problem occurs when the data is being saved to an XML file and then read back using (xmlread/xmlwrite)
The chinese user sees question mark signs when he loads the file back

Weird thing is that when I (English/us) copy/paste the chinese characters from a website and then paste them into my app, and save, it will get loaded back properly.

Anyone knows what is going on here or how to fix this? Setting an explicit encoding ? Temporarily convert the string to unicode and save that?

I use TDOMNode.Textcontent to access the text in the fields

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: XMLWriter/reader and chinese characters
« Reply #1 on: January 10, 2011, 10:57:33 pm »
Have you tried a TIniFile?

scribly

  • Jr. Member
  • **
  • Posts: 82
Re: XMLWriter/reader and chinese characters
« Reply #2 on: January 10, 2011, 11:02:32 pm »
No, I need the tree-support xml provides
entry can contain 1 or more entries of the same kind, which again can have multiple entries as well, etc... (And part of it is filling a treeview)

(Cheat Engine 6 cheat table)

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: XMLWriter/reader and chinese characters
« Reply #3 on: January 10, 2011, 11:19:30 pm »
Example directory has a xmlreader example with a TreeView. Have you tested it with chinese characters?

scribly

  • Jr. Member
  • **
  • Posts: 82
Re: XMLWriter/reader and chinese characters
« Reply #4 on: January 10, 2011, 11:23:52 pm »
No, but the thing is that I can not test it myself.
When I copy/paste Chinese characters (I don't have a Chinese keyboard) it works fine
But when a real Chinese person types in the characters and then saves that and loads it back it just comes out as question marks

So it's pretty impossible for me to find out what's going wrong.

I'll give that xmlreader version to my tester and see what happens
« Last Edit: January 10, 2011, 11:25:34 pm by scribly »

scribly

  • Jr. Member
  • **
  • Posts: 82
Re: XMLWriter/reader and chinese characters
« Reply #5 on: January 11, 2011, 04:38:06 am »
Not much useful information from the testers...
I did get one tip though, to use AnsiToUtf8 when setting the strings.

I've passed down a test version with this incorporated, and now waiting if it works

scribly

  • Jr. Member
  • **
  • Posts: 82
Re: XMLWriter/reader and chinese characters
« Reply #6 on: January 11, 2011, 03:39:05 pm »
Didn't seem to work.
They still say it's "Lousy code"

garlar27

  • Hero Member
  • *****
  • Posts: 652
Re: XMLWriter/reader and chinese characters
« Reply #7 on: January 11, 2011, 04:58:49 pm »
The XML encoding is the right one?


Code: [Select]
<?xml version="1.0" encoding="ISO-XXXX-X"?>

There are a couple of methods to check if the Attribute name is OK (IsXMLName), and another one to check if the attribute value is OK (IsXMLNmTocken). Also there's an IsValidXmlEncoding!!
Located in the unit xmlutils.

Sorry, I don't know how to use them... I didn't find any help Here ...

scribly

  • Jr. Member
  • **
  • Posts: 82
Re: XMLWriter/reader and chinese characters
« Reply #8 on: February 19, 2011, 12:33:28 pm »
A little update on the problem.
I found a chinese guy who's able to speak a few more words english and it seems the problem was caused by adding qoutes arround the textfield  (Quotes where needed else the leading spaces in front of the text would get stripped)

It seems that the following code causes the problem:
Code: [Select]
  if (description<>'') and ((description[1]='"') and (description[length(description)]='"')) then
    description:=copy(description,2,length(description)-2);   
description is of type ansistring


Also, one question, I see there's DOM, but also laz_dom and laz2_dom. I currently make use of DOM, would I get better results if I used laz2_dom ?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12656
  • FPC developer.
Re: XMLWriter/reader and chinese characters
« Reply #9 on: February 19, 2011, 01:32:15 pm »
DOM (the XML unit of the FCL) is probably better, but is all widestring/unicodestring (UTF16)

Somewhere probably implicit conversions from ansistring to utf16 and back happen.  These conversions will happen by the default system encoding, and not by Lazarus' utf8 encoding.

The solutioon would be to convert all strings (assume they contain utf8) before passing it to xml using utf8decode, and packing the results back into utf8 (utf8encode).

Of course for this to work, all the characters in the utf8 string must also exist in the system encoding.

scribly

  • Jr. Member
  • **
  • Posts: 82
Re: XMLWriter/reader and chinese characters
« Reply #10 on: March 07, 2011, 07:06:26 pm »
Little progress on this:
This time I found some chinese guy who even understands pascal code.

http://forum.cheatengine.org/viewtopic.php?p=5208238#5208238

as suggested he used utf8toansi and ansitoutf8 and he's saying it's working fine for him now.
Problem now is that I can't use special characters anymore. E.g copying this text into the app and working with it works :"北方话" but calling ansitoutf8 on this string changes it to 3 ansi question marks  (same for the alt+1 (white face), alt+2 (black face) and alt+3 (heart) that I tested with)

but I guess it's workable for them now and I don't really use those special characters that often...
« Last Edit: March 07, 2011, 07:09:35 pm by scribly »

jixian.yang

  • Full Member
  • ***
  • Posts: 173
Re: XMLWriter/reader and chinese characters
« Reply #11 on: March 08, 2011, 03:35:01 am »
Without <?xml version="1.0" encoding="GB2312"?> and Encoding = 'GBK', Utf8toansi and ansitoutf8 works fine in Lazarus xmlreader example.

If Edit.Text = "北方话"  then it must be converted into ansi string by Utf8toansi (Edit.Text).

The XML text file saved as UTF8 encode and memo.lines.savetofile(filename) works fine.


marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12656
  • FPC developer.
Re: XMLWriter/reader and chinese characters
« Reply #12 on: March 08, 2011, 10:12:37 am »
You could also try to change the system encoding of your application to UTF8.  (setcodepage or SetACP or something like that under Windows)

 

TinyPortal © 2005-2018