Recent

Author Topic: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)  (Read 17303 times)

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #15 on: December 01, 2017, 07:24:58 am »
I think you are very smart. We all have our moments when we feel things are hard. Usually I take a break from what am doing.

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #16 on: December 01, 2017, 07:53:45 am »
I think you are very smart. We all have our moments when we feel things are hard. Usually I take a break from what am doing.

Thank you for your encouragement, I really need a rest, keep reading the code makes me feel tired.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #17 on: December 01, 2017, 03:02:59 pm »
Yes, this is a bug. One of the few bugs it has. We were working on it here. It evolved into several procedures using pure Pascal and assembly. Can you try this:
  ...
Thanks engkin! I improved function UTF8LengthFast in r56572 with your new code + some formatting changes.
Do you have a version with assembly? It would be an interesting optimization experiment for the most common CPUs, although I think the compiler is able to optimize this kind of code very well.
How about using SSE2 to count 16 bytes parallel in a cycle?

Quote
It needs to be tested for correct results and speed. Don't forget to use optimization level 3 or 4 if you test for speed.
I leave profiling of this code for you guys.
« Last Edit: December 01, 2017, 04:10:47 pm by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #18 on: December 01, 2017, 05:24:27 pm »
Can you also post the addresses of the strings you used (the first parameter you pass to UTF8LengthFast)

I'm sorry, I have just reviewed your reply. Google gave me the wrong translation result: "你还可以发布你使用的字符串的地址(你传递给UTF8LengthFast的第一个参数)", which I didn't care much about. :-[

The code and the string addresses are follow:

Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   lazUTF8;
  5.  
  6. var
  7.   S: string;
  8. begin
  9.   S := '一二三';
  10.   writeln(lazUTF8.UTF8LengthFast(@S[Length('一') + 1], Length('二三')));
  11.   writeln(PtrInt(@S[Length('一') + 1]));
  12.   writeln(lazUTF8.UTF8LengthFast(@S[Length('一二') + 1], Length('三')));
  13.   writeln(PtrInt(@S[Length('一二') + 1]));
  14. end.

Code: [Select]
4
4776331
1
4776334

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #19 on: December 01, 2017, 05:56:17 pm »
The new function UTF8Len result and addresses:

Code: Pascal  [Select][+][-]
  1. var
  2.   S: string;
  3.  
  4. begin
  5.   S := '一二三四五';
  6.   writeln(UTF8Len(@S[Length('一') + 1], Length('二三四五')));
  7.   writeln(PtrInt(@S[Length('一') + 1]));
  8.   writeln(UTF8Len(@S[Length('一二') + 1], Length('三四五')));
  9.   writeln(PtrInt(@S[Length('一二') + 1]));
  10.   writeln(UTF8Len(@S[Length('一二三') + 1], Length('四五')));
  11.   writeln(PtrInt(@S[Length('一二三') + 1]));
  12.   writeln(UTF8Len(@S[Length('一二三四') + 1], Length('五')));
  13.   writeln(PtrInt(@S[Length('一二三四') + 1]));
  14. end.

Code: [Select]
4
4777019
3
4777022
2
4777025
1
4777028

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #20 on: December 02, 2017, 04:10:36 pm »
Yes, this is a bug. One of the few bugs it has. We were working on it here. It evolved into several procedures using pure Pascal and assembly. Can you try this:
  ...
Thanks engkin! I improved function UTF8LengthFast in r56572 with your new code + some formatting changes.
Do you have a version with assembly? It would be an interesting optimization experiment for the most common CPUs, although I think the compiler is able to optimize this kind of code very well.
How about using SSE2 to count 16 bytes parallel in a cycle?
Thank you! Yes, there is a version that handles 16 bytes in parallel. It is very fast for long strings. 20x faster than the typical version when using strings of more than 900 bytes (my random test had around 400 code points)

Quote
It needs to be tested for correct results and speed. Don't forget to use optimization level 3 or 4 if you test for speed.
I leave profiling of this code for you guys.
I tested both functions and compared the results with the original UTF8Length. I used a variable length random string that contains code points of all lengths with changing both ends.

Here is a sample of the results using -O4
UTF8Length-1 is the original version that comes with LazUTF8
UTF8Length-2 is the Pascal version
UTF8Length-3 is the SIMD version
Quote
Length: 986 bytes
UTF8Length-1: 414 in 19050 ticks
UTF8Length-2: 414 in  2658 ticks, ratio  7.17
UTF8Length-3: 414 in   935 ticks, ratio 20.37

Length: 980 bytes
UTF8Length-1: 412 in 18256 ticks
UTF8Length-2: 412 in  2685 ticks, ratio  6.80
UTF8Length-3: 412 in   952 ticks, ratio 19.18

Length: 973 bytes
UTF8Length-1: 410 in 18154 ticks
UTF8Length-2: 410 in  2632 ticks, ratio  6.90
UTF8Length-3: 410 in   868 ticks, ratio 20.91

...

Length: 401 bytes
UTF8Length-1: 166 in  7153 ticks
UTF8Length-2: 166 in  1131 ticks, ratio  6.32
UTF8Length-3: 166 in   525 ticks, ratio 13.62

Length: 395 bytes
UTF8Length-1: 164 in  7070 ticks
UTF8Length-2: 164 in  1101 ticks, ratio  6.42
UTF8Length-3: 164 in   470 ticks, ratio 15.04

Length: 390 bytes
UTF8Length-1: 162 in  6944 ticks
UTF8Length-2: 162 in  1076 ticks, ratio  6.45
UTF8Length-3: 162 in   433 ticks, ratio 16.04

....

Length: 21 bytes
UTF8Length-1: 8 in   335 ticks
UTF8Length-2: 8 in   135 ticks, ratio  2.48
UTF8Length-3: 8 in   153 ticks, ratio  2.19

Length: 16 bytes
UTF8Length-1: 6 in   278 ticks
UTF8Length-2: 6 in   162 ticks, ratio  1.72
UTF8Length-3: 6 in   247 ticks, ratio  1.13

Length: 11 bytes
UTF8Length-1: 4 in   192 ticks
UTF8Length-2: 4 in   129 ticks, ratio  1.49
UTF8Length-3: 4 in   262 ticks, ratio  0.73

Length: 3 bytes
UTF8Length-1: 2 in   110 ticks
UTF8Length-2: 2 in    86 ticks, ratio  1.28
UTF8Length-3: 2 in    70 ticks, ratio  1.57

Length: 0 bytes
UTF8Length-1: 0 in    42 ticks
UTF8Length-2: 0 in    62 ticks, ratio  0.68
UTF8Length-3: 0 in    53 ticks, ratio  0.79

These results come from a 32bit old machine using FPC 3.0.2

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #21 on: December 02, 2017, 04:16:04 pm »
Can you also post the addresses of the strings you used (the first parameter you pass to UTF8LengthFast)

I'm sorry, I have just reviewed your reply. Google gave me the wrong translation result: "你还可以发布你使用的字符串的地址(你传递给UTF8LengthFast的第一个参数)", which I didn't care much about. :-[
LOL. Google can give real funny translations.

The code and the string addresses are follow:

Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   lazUTF8;
  5.  
  6. var
  7.   S: string;
  8. begin
  9.   S := '一二三';
  10.   writeln(lazUTF8.UTF8LengthFast(@S[Length('一') + 1], Length('二三')));
  11.   writeln(PtrInt(@S[Length('一') + 1]));
  12.   writeln(lazUTF8.UTF8LengthFast(@S[Length('一二') + 1], Length('三')));
  13.   writeln(PtrInt(@S[Length('一二') + 1]));
  14. end.

Code: [Select]
4
4776331
1
4776334

The bug is fixed as visible in your later post. Thank you for testing.  :)

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #22 on: December 02, 2017, 05:08:30 pm »
The bug is fixed as visible in your later post. Thank you for testing.  :)

This sentence, Google translated it as "该错误已修复为在您以后的帖子中可见。谢谢你的测试。", I am daze for a long time, finally I understand that Google was wrong again.  %)


engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #23 on: December 02, 2017, 05:12:30 pm »
The bug is fixed as visible in your later post. Thank you for testing.  :)

This sentence, Google translated it as "该错误已修复为在您以后的帖子中可见。谢谢你的测试。", I am daze for a long time, finally I understand that Google was wrong again.  %)
LOOOOL, computers are not good enough to translate, not yet.

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #24 on: December 02, 2017, 05:24:03 pm »
LOOOOL, computers are not good enough to translate, not yet.

My post is also translated with Google, and I think there should be a lot of jokes in my post. I can not accurately express my thoughts in English, I hope people do not mind it.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #25 on: December 02, 2017, 05:27:19 pm »
Your posts are amazingly good.

tomitomy

  • Sr. Member
  • ****
  • Posts: 251
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #26 on: December 02, 2017, 05:38:04 pm »
Your posts are amazingly good.

I'll be relieved you say that. It's very late in China now. I should go to sleep. Good night, everybody! (if you are in China)

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #27 on: December 02, 2017, 11:30:28 pm »
Thank you! Yes, there is a version that handles 16 bytes in parallel. It is very fast for long strings. 20x faster than the typical version when using strings of more than 900 bytes (my random test had around 400 code points)
Lines like:
Code: [Select]
MOVDQA xmm0, ZEROMASKgive me errors like:
Code: [Select]
lazutf8.pas(554,16) Error: Generating PIC, but reference is not PIC-safeI don't know why FPC would generate Position Independent Code (PIC). I did not set any flags for it.

BTW, as good as the speedup looks, it has only little practical importance because no good algorithm depends on counting codepoints.
Still, a nice exercise. I also like optimized code. :)
« Last Edit: December 03, 2017, 01:00:35 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #28 on: December 03, 2017, 06:31:44 am »
Thank you! Yes, there is a version that handles 16 bytes in parallel. It is very fast for long strings. 20x faster than the typical version when using strings of more than 900 bytes (my random test had around 400 code points)
Lines like:
Code: [Select]
MOVDQA xmm0, ZEROMASKgive me errors like:
Code: [Select]
lazutf8.pas(554,16) Error: Generating PIC, but reference is not PIC-safeI don't know why FPC would generate Position Independent Code (PIC). I did not set any flags for it.
Maybe a Linux requirement?

It is possible to remove the constants altogether and replace them with a few instructions, or embed the constants inside the code as follows:
Code: Pascal  [Select][+][-]
  1. function UTF8Len_aligned16(p16: pchar; BlockCount: PtrInt):PtrInt; assembler; { PIC }
  2.  
  3. Label
  4.  loop, ZEROMASK, ONEMASK, ONEMASKx80, ONEMASKxFF, EndLbl;
  5. asm
  6.   push ecx
  7.   push edi
  8.   push ebx
  9.  
  10.   { tmp counter }
  11.   MOV ecx, 0
  12.  
  13.   { masks }
  14.   MOVDQA xmm0, ZEROMASK
  15.   MOVDQA xmm1, ONEMASK
  16.   MOVDQA xmm2, ONEMASKx80
  17.   MOVDQA xmm3, ONEMASKxFF
  18.  
  19. Loop:
  20.   { get 16 bytes }
  21.   MOVDQA xmm4, [p16]
  22.  
  23.   { Invert 16 bytes }
  24.   MOVDQA xmm5, xmm4
  25.   ANDNPD xmm5, xmm3 { xmm5 = not xmm4}
  26.  
  27.   { Shift the inverted bytes 6 bits to the right }
  28.   PSRLQ  xmm5, 6
  29.  
  30.   { Keep msb of each non-inverted byte }
  31.   PAND xmm4, ONEMASKx80
  32.  
  33.   { Shift them to right 7 bits }
  34.   PSRLQ  xmm4, 7 { Shift Right Logical QWord }
  35.  
  36.   { A one in the 1st bit means: NOT the first byte of a codepoint }
  37.   PAND xmm5, xmm4
  38.  
  39.   { Count them ;-) }
  40.   PSADBW xmm5, xmm0
  41.  
  42.   MOVD edi, xmm5
  43.   PEXTRW ebx, xmm5, 4
  44.  
  45.   ADD ecx, ebx
  46.   ADD ecx, edi
  47.  
  48.   { Next 16 bytes }
  49.   ADD p16, 16
  50.   DEC edx
  51.   JNZ Loop
  52.  
  53.   { Result }
  54.   Mov eax, ecx
  55.  
  56.   pop ebx
  57.   pop edi
  58.   pop ecx
  59.  
  60.   jmp EndLbl
  61.  
  62. align 16
  63.   ZEROMASK:
  64.   db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
  65.  
  66.   ONEMASK:
  67.   db $01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01
  68.  
  69.   ONEMASKx80:
  70.   db $80,$80,$80,$80,$80,$80,$80,$80,$80,$80,$80,$80,$80,$80,$80,$80
  71.  
  72.   ONEMASKxFF:
  73.   db $FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF
  74. EndLbl:
  75. end;

I could not replicate the problem on my side. The compiler neglected my simple request to produce PIC code. :-/

BTW, as good as the speedup looks, it has only little practical importance because no good algorithm depends on counting codepoints.
Yes, I kinda agree with you, but did you do a search for UTF8Length in LCL folder?

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: Sharing a function (Also: Is there a BUG in lazUTF8.UTF8LengthFast)
« Reply #29 on: December 03, 2017, 04:04:37 pm »
I could not replicate the problem on my side. The compiler neglected my simple request to produce PIC code. :-/
I get the same error with the new code, too. I am using FPC trunk now, maybe it matters. I will switch to 3.0.4 in near future.

Quote
BTW, as good as the speedup looks, it has only little practical importance because no good algorithm depends on counting codepoints.
Yes, I kinda agree with you, but did you do a search for UTF8Length in LCL folder?
There are some, true. They don't look very critical for performance though.
The fast function version could be used in places where a valid UTF-8 data can be safely assumed.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

 

TinyPortal © 2005-2018