Added implementation of two string matching algorithms (unit lgStrHelpers), both are inheritors of the Boyer-Moore algorithm with minor improvements or simplifications.
The TBmSearch entity implements a Boyer-Moore incarnation called Fast-Search, and TBmhrSearch implements the Boyer-Moore-Horspool-Raita algorithm(sometimes just Raita). TBmSearchCI is a case-insensitive version of TBmSearch.
A simple comparative benchmark on various inputs; BuildinBM denotes FindMatchesBoyerMooreCaseSensitive from StrUtils.
Random string(200000000 bytes) over a five-character alphabet, search for 10 patterns(running time in ms, lower is better).
pattern len TBmSearch TBmhrSearch BuildinBM
5 3157 3500 4017
10 2235 2487 2814
20 2078 2687 2627
50 1455 2078 1812
100 1437 2502 1767
200 1375 2422 1749
Natural language text(English, "The Forsyte Saga", 2079763 bytes), search for 1000 patterns(running time in ms, lower is better).
pattern len TBmSearch TBmhrSearch BuildinBM
5 1969 2266 2799
10 1125 1297 1593
20 704 811 984
50 470 516 609
100 376 405 468
200 312 327 374
Search for a single pattern in a list(200000 items) of moderate-length strings(5000-15000 bytes)(running time in ms, lower is better).
pattern len TBmSearch TBmhrSearch BuildinBM
5 2653 3311 3343
10 1016 1171 1530
20 842 875 1169
50 358 375 558
100 374 419 672
200 296 314 640
Case insensitive search, Pascal sources(83996301 bytes), search for 25 patterns, BuildinBM_CI here denotes FindMatchesBoyerMooreCaseInSensitive(running time in ms, lower is better).
pattern len TBmSearchCI BuildinBM_CI
5 2565 3250
10 1488 1744
20 889 1042
50 618 709
100 444 517
200 390 442