Recent

Author Topic: [SOLVED] UTF8Encode(String(Chr(i)))  (Read 15669 times)

WickedDum

  • Full Member
  • ***
  • Posts: 211
[SOLVED] UTF8Encode(String(Chr(i)))
« on: August 27, 2016, 09:15:05 pm »
Huh???   :D

This code doesn't work.  The upper 128 are continuously displayed in boxes.  I even added LazUTF8 to the Uses section.  There was no difference exhibited.

Is it FPC?  Lazarus?  Or, probably, me?

Code: Pascal  [Select]
  1.    for i := 0 to 255 do
  2.    { To display the 'extended' characters in Lazarus you will need explicitly to
  3.    convert the char to UTF8. }
  4.      ListBox1.Items.Add( 'Ascii ' + IntToStr(i) + ' = ' + UTF8Encode(String(Chr(i)))  );
  5.  

I have read more than several articles on ASCII, EBCDIC, UNICODE, UTF8, UTF16, etc.  I have not found a resolution nor an avenue to pursue.

All suggestions/comments accepted!  Thanks!!

« Last Edit: September 06, 2016, 05:20:47 am by WickedDum »
Practice Safe Computing!!

Intel i5-4460K @ 3.2GHz | Win8.1 64-bit | FPC: v3.0 | Lazarus:  v1.6.0

lainz

  • Hero Member
  • *****
  • Posts: 3279
    • Lainz
Re: UTF8Encode(String(Chr(i)))
« Reply #1 on: August 27, 2016, 09:22:46 pm »
Code: Pascal  [Select]
  1. implementation
  2. uses
  3.   LazUTF8;
  4.  
  5. {$R *.lfm}
  6.  
  7. { TForm1 }
  8.  
  9. procedure TForm1.FormCreate(Sender: TObject);
  10. var
  11.   i: integer;
  12. begin
  13.   for i:=0 to 255 do
  14.     ListBox1.Items.Add(UnicodeToUTF8(i));
  15. end;

Phil

  • Hero Member
  • *****
  • Posts: 2750
Re: UTF8Encode(String(Chr(i)))
« Reply #2 on: August 27, 2016, 09:24:27 pm »
What codepage do you assume the upper chars are from?

If Latin-1, try using the LConvEncoding unit's CP1252ToUTF8 function instead of UTF8Encode.


WickedDum

  • Full Member
  • ***
  • Posts: 211
Re: UTF8Encode(String(Chr(i)))
« Reply #3 on: August 28, 2016, 08:11:26 am »
Thanks!

lainz:  It still does not display 127 to 160.  Any thoughts as to why?  Or, how to get the full display?

Phil:  I have no idea what the codepage is.  How do figure it out? 

Looking forward to hearing from you both again!!  And, anyone else?

Thanks!
Practice Safe Computing!!

Intel i5-4460K @ 3.2GHz | Win8.1 64-bit | FPC: v3.0 | Lazarus:  v1.6.0

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3646
  • I like bugs.
Re: UTF8Encode(String(Chr(i)))
« Reply #4 on: August 28, 2016, 12:37:00 pm »
What are your 'extended' characters? What exactly are you trying to do?

wp

  • Hero Member
  • *****
  • Posts: 6334
Re: UTF8Encode(String(Chr(i)))
« Reply #5 on: August 28, 2016, 01:14:15 pm »
I have no idea what the codepage is.  How do figure it out?
Read this: https://en.wikipedia.org/wiki/Windows_code_page.

I assume that your are on Windows. Then the function WinCPToUTF8() (in unit LazUTF8) converts a characters from the codepage used by your current Windows settings to UTF8 used by Lazarus:
Code: Pascal  [Select]
  1. uses
  2.   LazUTF8;
  3.  
  4. { TForm1 }
  5.  
  6. procedure TForm1.FormCreate(Sender: TObject);
  7. var
  8.   i: Integer;
  9. begin
  10.   for i := 0 to 255 do
  11.     ListBox1.Items.Add('Ascii ' + IntToStr(i) + ' = ' + WinCPToUTF8(String(Chr(i)))  );
  12. end;
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

josh

  • Hero Member
  • *****
  • Posts: 754
Re: UTF8Encode(String(Chr(i)))
« Reply #6 on: August 28, 2016, 01:16:41 pm »
Hi
If your displaying this on a form etc; does the font you are using have these characters defined?
Development Installation Lazarus 1.3, FPC 2.7.1,Windows 7/8 32/64, OSX, *nix

Test Environment Lazarus & FPC Trunk on Windows and OSX (Cocoa Mainly on OSX). Testing also Crosscompile windows to OSX.. 
Any posts made from 2015 will be based on Lazarus Trunk.

lainz

  • Hero Member
  • *****
  • Posts: 3279
    • Lainz
Re: UTF8Encode(String(Chr(i)))
« Reply #7 on: August 28, 2016, 05:06:15 pm »
lainz:  It still does not display 127 to 160.  Any thoughts as to why?  Or, how to get the full display?

In my Windows it displays all of them.
« Last Edit: August 28, 2016, 09:40:47 pm by lainz »

WickedDum

  • Full Member
  • ***
  • Posts: 211
Re: UTF8Encode(String(Chr(i)))
« Reply #8 on: August 29, 2016, 08:29:44 pm »
Thank you for all of your input!!  I haven't done so much research and studying since I left college!

What are your 'extended' characters? What exactly are you trying to do?
I am trying to display a complete list of the ASCII 255 characters.

I have no idea what the codepage is.  How do figure it out?
Read this: https://en.wikipedia.org/wiki/Windows_code_page.
Now that I know what a codepage is, how do I
1- determine the codepage that my version of Windows is using?
2- assign a specific codepage for a particular project?
And, do all (American) English Windows versions use 1252?

Hi.  If your displaying this on a form etc; does the font you are using have these characters defined?
I am using the standard Lazarus configuration.  I reviewed all of the configuration options and did not see where the font is set.  Where do I find the font setting?


A synopsis (for future reference):
UTF8Encode does not display all of characters.  128-on are boxes.  The 'normal' 0,1, 9, 10, 13, and 28-31 are not displayed.
UnicodeToUTF8  A large chunk (128-160) is missing, as well as the 'normal' 0,1, 9, 10, 13, and 28-31.
WinCPToUTF8 and CP1252ToUTF8, in addition to the 'normal' non-displayed characters, do not display 129, 141, 143-144, and 160.

Thank you all for your assistance!!!
Practice Safe Computing!!

Intel i5-4460K @ 3.2GHz | Win8.1 64-bit | FPC: v3.0 | Lazarus:  v1.6.0

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3646
  • I like bugs.
Re: UTF8Encode(String(Chr(i)))
« Reply #9 on: August 29, 2016, 09:27:16 pm »
I still didn't understand what your ultimate goal is with those extended ANSI characters.
You may be solving a problem that does not exist. If you don't even know what a codepage is then you probably don't need them. Just use Unicode everywhere. If some of your input/output really uses the codepages then convert data for reading/writing, otherwise use Unicode everywhere.
Everything becomes easier. No explicit conversion functions are needed.
You didn't mention you Lazarus version. Use the latest 1.6 + FPC 3.0.

Quote
UnicodeToUTF8  A large chunk (128-160) is missing, as well as the 'normal' 0,1, 9, 10, 13, and 28-31.

Did you check what those U+... codes are? Please learn Unicode, it is important now and in the future. We are finally getting rid of the horrors of old codepages.


WickedDum

  • Full Member
  • ***
  • Posts: 211
Re: UTF8Encode(String(Chr(i)))
« Reply #10 on: August 29, 2016, 10:57:40 pm »
JuhaManninen:  My ultimate goal is to display the character set.  That's all.  I am taking a refresher course and the lesson was a FOR loop - I know FOR loops...  The lesson FOR ME is continued familiarity of GUI coding.  (I haven't programmed since TP5.  :'( )  The ListBox1.Items.Add(...) is something I have never done before.  And then I noticed the whole table wasn't being displayed.  To me, that is totally unsatisfactory.

Using Unicode everywhere would be great...but it didn't show me the most complete table.  Therefore, it is not the best solution.

My specifics are in my signature.  I am using the latest versions.

What do you mean by 'U+' codes?

And, could you assist with these questions, too:
Now that I know what a codepage is, how do I
1- determine the codepage that my version of Windows is using?
2- assign a specific codepage for a particular project?
And, do all (American) English Windows versions use 1252?

Thanks!

Practice Safe Computing!!

Intel i5-4460K @ 3.2GHz | Win8.1 64-bit | FPC: v3.0 | Lazarus:  v1.6.0

lainz

  • Hero Member
  • *****
  • Posts: 3279
    • Lainz
Re: UTF8Encode(String(Chr(i)))
« Reply #11 on: August 29, 2016, 11:28:11 pm »
What it display? Log it into a memo and Copy to the clipboard. Then show us.

This is how need to look:
http://www.utf8-chartable.de/unicode-utf8-table.pl?unicodeinhtml=dec

This is how look in my PC, with the code I provided to you:

Code: Pascal  [Select]
  1. 
  2. 
  3. 
  4. 
  5. 
  6. 
  7. 
  8. 
  9.        
  10.  
  11.  
  12.  
  13.  
  14.  
  15. 
  16. 
  17. 
  18. 
  19. 
  20. 
  21. 
  22. 
  23. 
  24. 
  25. 
  26. 
  27. 
  28. 
  29. 
  30. 
  31. 
  32. 
  33.  
  34. !
  35. "
  36. #
  37. $
  38. %
  39. &
  40. '
  41. (
  42. )
  43. *
  44. +
  45. ,
  46. -
  47. .
  48. /
  49. 0
  50. 1
  51. 2
  52. 3
  53. 4
  54. 5
  55. 6
  56. 7
  57. 8
  58. 9
  59. :
  60. ;
  61. <
  62. =
  63. >
  64. ?
  65. @
  66. A
  67. B
  68. C
  69. D
  70. E
  71. F
  72. G
  73. H
  74. I
  75. J
  76. K
  77. L
  78. M
  79. N
  80. O
  81. P
  82. Q
  83. R
  84. S
  85. T
  86. U
  87. V
  88. W
  89. X
  90. Y
  91. Z
  92. [
  93. \
  94. ]
  95. ^
  96. _
  97. `
  98. a
  99. b
  100. c
  101. d
  102. e
  103. f
  104. g
  105. h
  106. i
  107. j
  108. k
  109. l
  110. m
  111. n
  112. o
  113. p
  114. q
  115. r
  116. s
  117. t
  118. u
  119. v
  120. w
  121. x
  122. y
  123. z
  124. {
  125. |
  126. }
  127. ~
  128. 
  129. €
  130. 
  131. ‚
  132. ƒ
  133. „
  134. …
  135. †
  136. ‡
  137. ˆ
  138. ‰
  139. Š
  140. ‹
  141. Œ
  142. 
  143. Ž
  144. 
  145. 
  146. ‘
  147. ’
  148. “
  149. ”
  150. •
  151. –
  152. —
  153. ˜
  154. ™
  155. š
  156. ›
  157. œ
  158. 
  159. ž
  160. Ÿ
  161.  
  162. ¡
  163. ¢
  164. £
  165. ¤
  166. ¥
  167. ¦
  168. §
  169. ¨
  170. ©
  171. ª
  172. «
  173. ¬
  174. ­
  175. ®
  176. ¯
  177. °
  178. ±
  179. ²
  180. ³
  181. ´
  182. µ
  183. ·
  184. ¸
  185. ¹
  186. º
  187. »
  188. ¼
  189. ½
  190. ¾
  191. ¿
  192. À
  193. Á
  194. Â
  195. Ã
  196. Ä
  197. Å
  198. Æ
  199. Ç
  200. È
  201. É
  202. Ê
  203. Ë
  204. Ì
  205. Í
  206. Î
  207. Ï
  208. Ð
  209. Ñ
  210. Ò
  211. Ó
  212. Ô
  213. Õ
  214. Ö
  215. ×
  216. Ø
  217. Ù
  218. Ú
  219. Û
  220. Ü
  221. Ý
  222. Þ
  223. ß
  224. à
  225. á
  226. â
  227. ã
  228. ä
  229. å
  230. æ
  231. ç
  232. è
  233. é
  234. ê
  235. ë
  236. ì
  237. í
  238. î
  239. ï
  240. ð
  241. ñ
  242. ò
  243. ó
  244. ô
  245. õ
  246. ö
  247. ÷
  248. ø
  249. ù
  250. ú
  251. û
  252. ü
  253. ý
  254. þ
  255. ÿ
  256.  

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 3646
  • I like bugs.
Re: UTF8Encode(String(Chr(i)))
« Reply #12 on: August 29, 2016, 11:50:00 pm »
JuhaManninen:  My ultimate goal is to display the character set.

Mighty goal! There are well over 100000 characters in Unicode.
However a FOR loop practice can use any subset of them. For example:
Code: Pascal  [Select]
  1. for i := $2100 to $2A00 do
  2.   ListBox1.Items.Add(UnicodeToUTF8(i));

Quote
Using Unicode everywhere would be great...but it didn't show me the most complete table.  Therefore, it is not the best solution.

Well, you have a lot to learn...  As you can see here U+0080 .. U+00A0 (128-160) are not printable. They don't show.
 https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF

Quote
What do you mean by 'U+' codes?

They are the Unicode definitions of codepoints, as opposed to encoded values of codepoints.
Learn more:
 https://en.wikipedia.org/wiki/Unicode

For the rest of your questions: Forget Windows codepages, use Unicode.
« Last Edit: August 30, 2016, 01:03:53 am by JuhaManninen »

WickedDum

  • Full Member
  • ***
  • Posts: 211
Re: UTF8Encode(String(Chr(i)))
« Reply #13 on: August 30, 2016, 02:05:18 am »
lainz:  If I knew how to log it into a memo, I would... :(   That's why I'm going through this course...

JuhaManninen:  As for FOR exercises, I don't have a problem with the constructs, just the displaying of the results.

Here is a sample of an ASCII chart.  This is all I am/was trying to achieve.



Practice Safe Computing!!

Intel i5-4460K @ 3.2GHz | Win8.1 64-bit | FPC: v3.0 | Lazarus:  v1.6.0

lainz

  • Hero Member
  • *****
  • Posts: 3279
    • Lainz
Re: UTF8Encode(String(Chr(i)))
« Reply #14 on: August 30, 2016, 03:30:39 am »
lainz:  If I knew how to log it into a memo, I would... :(   That's why I'm going through this course...

JuhaManninen:  As for FOR exercises, I don't have a problem with the constructs, just the displaying of the results.

Here is a sample of an ASCII chart.  This is all I am/was trying to achieve.

Loging on a memo is the same as this

Code: Pascal  [Select]
  1. uses
  2.   LazUTF8;
  3.  
  4. { TForm1 }
  5.  
  6. procedure TForm1.FormCreate(Sender: TObject);
  7. var
  8.   i: Integer;
  9. begin
  10.   for i := 0 to 255 do
  11.     ListBox1.Items.Add('Ascii ' + IntToStr(i) + ' = ' + WinCPToUTF8(String(Chr(i)))  );
  12. end;

But instead of putting a ListBox in the form you can put a Memo control and that's all, change ListBox1 by Memo1

Code: Pascal  [Select]
  1. uses
  2.   LazUTF8;
  3.  
  4. { TForm1 }
  5.  
  6. procedure TForm1.FormCreate(Sender: TObject);
  7. var
  8.   i: Integer;
  9. begin
  10.   for i := 0 to 255 do
  11.     Memo1.Items.Add('Ascii ' + IntToStr(i) + ' = ' + WinCPToUTF8(String(Chr(i)))  );
  12. end;

But as JuhaManninen said, you will get the same as I get, the characters you say are missing are control characters that are not visible, these don't have a glyph to render them on the screen.