Recent

Author Topic: Which way is faster? Comparing $XX or Integer?  (Read 6854 times)

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Which way is faster? Comparing $XX or Integer?
« on: September 16, 2014, 06:27:50 pm »
Hi

While parsing binary data, my program compares byte values to see if they match certain byte structures. It moves along the data in single byte sequences constantly comparing the byte it has landed on and then a number of bytes forwards from it, until it finds a match. It works well, but I want to check I can't make it faster. 

I do it with a "then continue" loop, e.g. to find the hex pattern $202123 in the buffer segment then do something with the byte range:

Code: [Select]
var
  Buffer : array [0..32767] of byte // 32Kb reads
begin
...
for i := 0 to SizeOf(Buffer) do
  if Buffer[i] <> $20 then continue
    if Buffer[i+1] <> $21 then continue
      if Buffer[i+2] <> $23 then continue
        // now do something

However, I'm curious to know if it would be notibaly quicker to compare the integer value of the byte rather than the hex? I realise of course IT WILL be quicker, but will it be signifcantly quicker? Bearing mind there are billions of bytes to compare in a file of say 4Gb, if I can make my program take only 5 minutes instead of 7 or 8, then it's worth it. But if it will only take the speed down from 10 minutes to say 9 mins and 45 seconds, I'll keep the current system which is easier to read by me, and others.

Code: [Select]
for i := 0 to SizeOf(Buffer) do
  if Buffer[i] <> 33 then continue
    if Buffer[i+1] <> 34 then continue
      if Buffer[i+2] <> 35 then continue

Or, are there any faster methods folks can suggest generally to achieve the same effect?
   

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Which way is faster? Comparing $XX or Integer?
« Reply #1 on: September 16, 2014, 06:54:31 pm »
The boyer moore string search algorithm should speed things up a bit. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Blaazen

  • Hero Member
  • *****
  • Posts: 3237
  • POKE 54296,15
    • Eye-Candy Controls
Re: Which way is faster? Comparing $XX or Integer?
« Reply #2 on: September 16, 2014, 07:00:53 pm »
There is no difference between
Code: [Select]
if Buffer[i] <> $20 then continueand
Code: [Select]
if Buffer[i] <> 32 then continueit is evaluated during compilation.
And $20 is 32 decimal, not 33.
Lazarus 2.3.0 (rev main-2_3-2863...) FPC 3.3.1 x86_64-linux-qt Chakra, Qt 4.8.7/5.13.2, Plasma 5.17.3
Lazarus 1.8.2 r57369 FPC 3.0.4 i386-win32-win32/win64 Wine 3.21

Try Eye-Candy Controls: https://sourceforge.net/projects/eccontrols/files/

hy

  • Full Member
  • ***
  • Posts: 221
Re: Which way is faster? Comparing $XX or Integer?
« Reply #3 on: September 16, 2014, 11:08:53 pm »
continue is as bad as goto
_____
***hy
OS: debian sid(64bit)  [fpc 3.20] Lazarus 2.0.12

circular

  • Hero Member
  • *****
  • Posts: 4181
    • Personal webpage
Re: Which way is faster? Comparing $XX or Integer?
« Reply #4 on: September 16, 2014, 11:21:56 pm »
The code here does not work
Code: [Select]
for i := 0 to SizeOf(Buffer) do
  if Buffer[i] <> $20 then continue
    if Buffer[i+1] <> $21 then continue
      if Buffer[i+2] <> $23 then continue
        // now do something
You need to add a begin to stay in the for-loop:
Code: [Select]
for i := 0 to SizeOf(Buffer)-3 do //avoid going after the end of the buffer
begin
  if Buffer[i] <> $20 then continue;
  if Buffer[i+1] <> $21 then continue;
  if Buffer[i+2] <> $23 then continue;
  // now do something
end;
Conscience is the debugger of the mind

Basile B.

  • Guest
Re: Which way is faster? Comparing $XX or Integer?
« Reply #5 on: September 17, 2014, 12:11:00 am »
Hi

While parsing binary data, my program compares byte values to see if they match certain byte structures. It moves along the data in single byte sequences constantly comparing the byte it has landed on and then a number of bytes forwards from it, until it finds a match. It works well, but I want to check I can't make it faster. 

I do it with a "then continue" loop, e.g. to find the hex pattern $202123 in the buffer segment then do something with the byte range:

Code: [Select]
var
  Buffer : array [0..32767] of byte // 32Kb reads
begin
...
for i := 0 to SizeOf(Buffer) do
  if Buffer[i] <> $20 then continue
    if Buffer[i+1] <> $21 then continue
      if Buffer[i+2] <> $23 then continue
        // now do something

However, I'm curious to know if it would be notibaly quicker to compare the integer value of the byte rather than the hex? I realise of course IT WILL be quicker, but will it be signifcantly quicker? Bearing mind there are billions of bytes to compare in a file of say 4Gb, if I can make my program take only 5 minutes instead of 7 or 8, then it's worth it. But if it will only take the speed down from 10 minutes to say 9 mins and 45 seconds, I'll keep the current system which is easier to read by me, and others.

Code: [Select]
for i := 0 to SizeOf(Buffer) do
  if Buffer[i] <> 33 then continue
    if Buffer[i+1] <> 34 then continue
      if Buffer[i+2] <> 35 then continue

Or, are there any faster methods folks can suggest generally to achieve the same effect?
   

Even better, you can read a DWord or a Cardinal from the first byte and avoid two comparisons, combined with a bitmasking operation:
If the pattern is always 3 bytes:
Code: [Select]
for i := 0 to SizeOf(Buffer)-3 do
begin
  if (PCardinal(@Buffer[i])^ and $FFFFFF00) <> $23212000 then continue; // little-endian version
  //if (PCardinal(@Buffer[i])^ and $00FFFFFF) <> $00202123 then continue; // big-endian version
  // now do something
end;

comparing "$20" or "33" is just the same. It seems that you've missed something in the understanding of how data are from the hardware point of view.
« Last Edit: September 17, 2014, 12:16:55 am by Basile B. »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Which way is faster? Comparing $XX or Integer?
« Reply #6 on: September 17, 2014, 02:14:49 am »
I agree with Taazz, according to this link:
Quote
The benchmark is conducted on a MacBook Pro with OS X Snow Leopard and Intel Core 2 Duo 2.4 Ghz.
...
This benchmark searches for the needle “I have control\n” in a 200 MB file containing random binary data, 10 iterations.
Boyer-Moore    1191 ms
If my calculations are correct, searching for this string: “I have control\n” in a 4 GB should take around 25 seconds on that machine. Excluding the time spent on reading. How long does it take your system to read 4 GB?

benohb

  • Full Member
  • ***
  • Posts: 213
Re: Which way is faster? Comparing $XX or Integer?
« Reply #7 on: September 17, 2014, 09:35:15 pm »

I've been playing with  codeI've tried it in  loop 40000 times and I got  speed 142% on 32bit linux; we can get to 182% faster on 64 bit but the code will change (more local variabel ..i,aa,bb,cc,cca,dd,+,+,+);



3282 ms <-Your source code
2250 ms <-My  code




Code: [Select]
function TForm1.Getindexid2(By1,By2,By3:byte;arrg:array of byte):integer;
 var i:Longword;
begin
   for i := 0 to SizeOf(arrg) do
   if arrg[i] = $20 then
     if arrg[i+1] = $21 then
       if arrg[i+2] = $23 then
         begin
          Getindexid2:=i;
          Break;
         end;
end; 


VS

Code: [Select]
function TForm1.Getindexid(By1,By2,By3:byte;size:integer;Buffer:PLongword):integer;
var i,aa,bb,cc,cca,dd:Longword;
begin
 aa:=(By3 shl 24) + (By2 shl 16)+(By1 shl 8);
 bb:=aa shr 8;
 cc:= (By2 shl 8)+ By1;
 cca :=By3 shl 24  ;
 dd:= (By2 shl 16)+(By3 shl 24);
 for i := 0 to size-1 do
 begin
  if Buffer[i] shl 8 = aa then begin getindexid:=(i*4);Break;end;
  if Buffer[i] shr 8 = bb then begin getindexid:=(i*4)+1;Break;end;
  if (Buffer[i] shr 16=cc)  and (Buffer[i+1] shl 24 = cca) then begin getindexid:=(i*4)+2;Break;end;
  if (Buffer[i] shr 24=By1) and (Buffer[i+1] shl 16 = dd ) then begin getindexid:=(i*4)+3;Break;end;
 end;
end;

This is for 3 byte  By1,By2,By3  = 20$21$23
size               = ength(array) div 4;

Buffer:PLongword = @array
« Last Edit: September 18, 2014, 05:03:26 pm by benohb »

circular

  • Hero Member
  • *****
  • Posts: 4181
    • Personal webpage
Re: Which way is faster? Comparing $XX or Integer?
« Reply #8 on: September 18, 2014, 05:45:40 pm »
@benohb
What about this:

Code: [Select]
function TForm1.Getindexid(By1,By2,By3:byte;size:integer;Buffer:PLongword):integer;
var i,aa,bb,cc,cca,dd:Longword;
  p, pEnd: PLongword;
begin
 result := -1; //not found
 if size = 0 then exit;
 aa:=(By3 shl 24) + (By2 shl 16)+(By1 shl 8);
 bb:=aa shr 8;
 cc:= (By2 shl 8)+ By1;
 cca :=By3 shl 24  ;
 dd:= (By2 shl 16)+(By3 shl 24);
 p := Buffer;
 pEnd := p+size;
 while true do
 begin
  if p^ shl 8 = aa then begin getindexid:=(p-Buffer)*4;Break;end;
  if p^ shr 8 = bb then begin getindexid:=((p-Buffer)*4)+1;Break;end;
  inc(p);
  if p >= pEnd then break; 
  if ((p-1)^ shr 16=cc)  and (p^ shl 24 = cca) then begin getindexid:=((p-Buffer)*4)-2;Break;end;
  if ((p-1)^ shr 24=By1) and (p^ shl 16 = dd ) then begin getindexid:=((p-Buffer)*4)-1;Break;end;
 end;
end;
Conscience is the debugger of the mind

benohb

  • Full Member
  • ***
  • Posts: 213
Re: Which way is faster? Comparing $XX or Integer?
« Reply #9 on: September 19, 2014, 10:44:03 am »

@circular
7374 ms <- You
8407 ms <- Me
10621 ms <-Gizmo 
Code: [Select]
-  if p >= pEnd then break;
+ if p > pEnd then break;
Fix last index .
i am  also  find error in my code


Code: [Select]
......
+ getindexid:=-1;
.....
.....

-   for i := 0 to size-1 do
+  for i := 0 to size    do


Great work as especially in " inc(p);"  and its  place :)



« Last Edit: September 19, 2014, 02:07:26 pm by benohb »

Sergios

  • New Member
  • *
  • Posts: 10
Re: Which way is faster? Comparing $XX or Integer?
« Reply #10 on: September 29, 2014, 02:43:55 am »
My approach is less portable, but much flexible and faster...
Code: [Select]
program ByteSearch;

uses SysUtils;

function RDTSC: Int64; // dirty counter...
var
  TSC: record
    case Byte of
      1: (Ticks: Int64);
      2: (LO, HI: LongWord);
  end;
begin
  TSC.Ticks:=0;
  {$ASMMODE INTEL}
  asm
    db $0F, $31;
  {$IFDEF CPU386}
    MOV [TSC.LO], EAX
    MOV [TSC.HI], EDX
  {$ELSE}
    NOP
  {$FATAL This mode of the CPU isn't supported currently!}
  {$ENDIF}
  end;
  Result:=TSC.Ticks;
end;

function GizmoSearch(By1,By2,By3: Byte; Buf: array of byte):LongInt;
 var I: Longword;
begin
   for I := 0 to High(Buf) do
   if Buf[I] = By1 then
     if Buf[I+1] = By2 then
       if Buf[I+2] = By3 then
         begin
          Result:=I;
          Break;
         end;
end;

function ByteSearch (const Haystack: array of byte; const Needle: array of byte): LongInt;
 var
  LenBuf, LenPat: LongInt;
  PB, PP: Pointer;
begin
  PB:=@Haystack;  //for testing purposes (there is no need to hold locally)
  PP:=@Needle;    //for testing purposes (there is no need to hold locally)
  LenBuf:=Length(Haystack);
  LenPat:=Length(Needle);
  if (LenPat=0) or (LenBuf=0) or (LenPat>LenBuf) then Exit(-2);
  {$ASMMODE INTEL}
  asm
  {$IFDEF CPU386}
              CLD
              MOV   ECX, DWORD(LenBuf)
              MOV   EDX, DWORD(LenPat)
              MOV   EDI, DWORD PTR PB //MOV   EDI, DWORD PTR Haystack
              MOV   ESI, DWORD PTR PP //MOV   ESI, DWORD PTR Needle
              MOV   EAX, [ESI]
              AND   EAX, $000000FF
     @SEARCH: REPNE SCASB
              JECXZ @NOT_FOUND
              PUSH  ECX
              PUSH  ESI
              PUSH  EDI
              LEA   ECX, [EDX-1]
              MOV   ESI, DWORD PTR PP //MOV   ESI, DWORD PTR Needle
              INC   ESI
              REPE  CMPSB
              JZ    @FOUND
              POP   EDI
              POP   ESI
              POP   ECX
              JMP   @SEARCH
      @FOUND: POP   EDI
              POP   ESI
              POP   ECX
              INC   ECX
              MOV   EAX, DWORD(LenBuf)
              SUB   EAX, ECX
              JMP   @EXIT
  @NOT_FOUND: XOR   EAX,EAX // will return -1
              NOT   EAX
       @EXIT: MOV   @RESULT, EAX
  {$ELSE} // not written yet...
              NOP
  {$FATAL This mode of the CPU isn't supported currently!}
  {$ENDIF}
  end;
end;

type
  DInt64 = record
    TSC1:Int64;
    TSC2:Int64;
  end;

const
 Zero = 0;
 MaxArr = 1000;
 ScaleFactor = 1.0E6;

var
  Len_P, Len_B, I, J: Integer;
  Offset: LongWord;
  Buffer, Pattern: array of byte;
  T1, T2:  Int64;
  Data: array [1..MaxArr] of DInt64;
  Ave1: Double = Zero;
  Ave2: Double = Zero;
begin
  Len_B:=32*1024; // 32KB
  SetLength(Buffer, Len_B);

  // create pattern
  Len_P:=3;
  SetLength(Pattern, Len_P);
  Pattern[0]:=$20;
  Pattern[1]:=$21;
  Pattern[2]:=$23;

  Writeln('Estimation will take several minutes...');
  for J:=1 to MaxArr do
    begin
      // fill buffer
      Randomize;
      for I:=0 to (Len_B-1) do Buffer[I]:=Byte(Random(256)); //Buffer[I]:=$20
      Offset:=0; //Offset:=Len_B-3;
      Buffer[Len_B-Offset-3]:=$20;
      Buffer[Len_B-Offset-2]:=$21;
      Buffer[Len_B-Offset-1]:=$23;

      T1:=RDTSC;
      for I:=1 to 1000 do GizmoSearch ($20, $21, $23, Buffer);
      T2:=RDTSC;
      Data[J].TSC1:=T2-T1;

      T1:=RDTSC;
      for I:=1 to 1000 do ByteSearch (Buffer, Pattern);
      T2:=RDTSC;
      Data[J].TSC2:=T2-T1;
    end;

  // estimate averages
  for J:=1 to MaxArr do
    begin
      Ave1:=Ave1+(Data[J].TSC1/(MaxArr*ScaleFactor));
      Ave2:=Ave2+(Data[J].TSC2/(MaxArr*ScaleFactor));
    end;

  Writeln('GizmoSearch took on the average ' + FloatToStrF(Ave1*ScaleFactor, ffFixed, 0, 3)+' CPU ticks');
  Writeln('ByteSearch took on the average ' + FloatToStrF(Ave2*ScaleFactor, ffFixed, 0, 3)+' CPU ticks');
  Writeln('...');
  Writeln ('GizmoSearch/ByteSearch ratio = '+FloatToStrF((Ave1/Ave2), ffFixed, 0, 3));

  while true do
   begin
     ;
   end;
end.
« Last Edit: September 30, 2014, 12:53:59 pm by Sergios »
Windows 8.1 Pro, Lazarus 1.2.4, FPC 2.6.4

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Which way is faster? Comparing $XX or Integer?
« Reply #11 on: September 29, 2014, 08:13:01 am »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

 

TinyPortal © 2005-2018