Recent

Author Topic: Yet another generics collection  (Read 19430 times)

avk

  • Sr. Member
  • ****
  • Posts: 411
    • my self-education project
Re: Yet another generics collection
« Reply #75 on: January 05, 2021, 05:25:13 pm »
It's a good question anyway. I do my best to avoid adding nodes directly, as this is error-prone. However, double parsing isn't a very attractive solution either.
I just hastily added a new ArrayAppend method, maybe that's what you need?

lainz

  • Hero Member
  • *****
  • Posts: 4042
  • Leandro Diaz
Re: Yet another generics collection
« Reply #76 on: January 05, 2021, 09:12:42 pm »
Yes that works.

Anyways I get a slower time this time with the change. But I'm not measuring it well, so I can't say, I'm measuring a whole process that depends on internet, so speed can vary and is not only by the json library, but the server speed, internet speed and other things.
« Last Edit: January 05, 2021, 09:22:43 pm by lainz »
https://lainz.github.io/ - My Website :)
https://lazpaint.github.io/ -  Download LazPaint

avk

  • Sr. Member
  • ****
  • Posts: 411
    • my self-education project
Re: Yet another generics collection
« Reply #77 on: January 18, 2021, 03:10:59 pm »
Added pull-style JSON stream reader.

It provides, at the user's choice
  • read all tokens from the stream one by one
  • skip some structures
  • copy selected structures
  • iterate over all items of the current structure
  • search for an item in the current structure by a given key
  • search for an item by a specified path

As for performance, query
Code: Pascal  [Select][+][-]
  1.   Reader.FindPath('/features/100001')
  2.  
on this JSON(~190MB) takes about 0.7 s. on my machine.

avk

  • Sr. Member
  • ****
  • Posts: 411
    • my self-education project
Re: Yet another generics collection
« Reply #78 on: February 22, 2021, 06:59:47 am »
Added implementation of two string matching algorithms (unit lgStrHelpers), both are inheritors of the Boyer-Moore algorithm with minor improvements or simplifications.
The TBmSearch entity implements a Boyer-Moore incarnation called Fast-Search, and TBmhrSearch implements the Boyer-Moore-Horspool-Raita algorithm(sometimes just Raita). TBmSearchCI is a case-insensitive version of TBmSearch.

A simple comparative benchmark on various inputs; BuildinBM denotes FindMatchesBoyerMooreCaseSensitive from StrUtils.

Random string(200000000 bytes) over a five-character alphabet, search for 10 patterns(running time in ms, lower is better).
Code: Text  [Select][+][-]
  1.  pattern len       TBmSearch    TBmhrSearch  BuildinBM
  2.      5               3157          3500        4017
  3.     10               2235          2487        2814
  4.     20               2078          2687        2627
  5.     50               1455          2078        1812
  6.    100               1437          2502        1767
  7.    200               1375          2422        1749
  8.  

Natural language text(English, "The Forsyte Saga", 2079763 bytes), search for 1000 patterns(running time in ms, lower is better).
Code: Text  [Select][+][-]
  1.  pattern len       TBmSearch    TBmhrSearch  BuildinBM
  2.      5               1969          2266        2799
  3.     10               1125          1297        1593
  4.     20                704           811         984
  5.     50                470           516         609
  6.    100                376           405         468
  7.    200                312           327         374
  8.  

Search for a single pattern in a list(200000 items) of moderate-length strings(5000-15000 bytes)(running time in ms, lower is better).
Code: Text  [Select][+][-]
  1.  pattern len       TBmSearch    TBmhrSearch  BuildinBM
  2.      5               2653          3311        3343
  3.     10               1016          1171        1530
  4.     20                842           875        1169
  5.     50                358           375         558
  6.    100                374           419         672
  7.    200                296           314         640
  8.  

Case insensitive search, Pascal sources(83996301 bytes), search for 25 patterns, BuildinBM_CI here denotes FindMatchesBoyerMooreCaseInSensitive(running time in ms, lower is better).
Code: Text  [Select][+][-]
  1.  pattern len       TBmSearchCI    BuildinBM_CI
  2.      5               2565            3250
  3.     10               1488            1744
  4.     20                889            1042
  5.     50                618             709
  6.    100                444             517
  7.    200                390             442
  8.  
« Last Edit: February 22, 2021, 08:17:46 am by avk »

440bx

  • Hero Member
  • *****
  • Posts: 2310
Re: Yet another generics collection
« Reply #79 on: February 22, 2021, 07:36:35 am »
The TBmSearch entity implements a Boyer-Moore incarnation called Fast-Search, and TBmhrSearch implements the Boyer-Moore-Horspool-Raita algorithm(sometimes just Raita). TBmSearchCI is a case-insensitive version of TBmSearch.
Nicely done!.
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

avk

  • Sr. Member
  • ****
  • Posts: 411
    • my self-education project
Re: Yet another generics collection
« Reply #80 on: February 22, 2021, 07:52:21 am »
Thank you.

OkobaPatino

  • Full Member
  • ***
  • Posts: 197
Re: Yet another generics collection
« Reply #81 on: February 22, 2021, 02:05:57 pm »
Well done avk.

 

TinyPortal © 2005-2018