Recent

Author Topic: Case insensitive LIKE (and ORDER) in SQLite, above ASCII [SOLVED by LacaK]  (Read 12482 times)

totya

  • Hero Member
  • *****
  • Posts: 720
Hi!

I created this topic for this important problem, previous similar topic this.

In normal case, this code:

Code: Pascal  [Select][+][-]
  1. SELECT * FROM MYTABLE WHERE MYDATAFIELD LIKE "%searchedtext%" ORDER BY MYDATAFIELD;

works only with ascII code.

For example see these caharceters (from hungarian language):
lower case version:
"á"
same as uppecase version:
"Á"

Like "%á%" doesn't work records which contains "Á".

This is known problem with SQLITE, see this faq

Quote
(18) Case-insensitive matching of Unicode characters does not work.

The default configuration of SQLite only supports case-insensitive comparisons of ASCII characters. The reason for this is that doing full Unicode case-insensitive comparisons and case conversions requires tables and logic that would nearly double the size of the SQLite library. The SQLite developers reason that any application that needs full Unicode case support probably already has the necessary tables and functions and so SQLite should not take up space to duplicate this ability.
Instead of providing full Unicode case support by default, SQLite provides the ability to link against external Unicode comparison and conversion routines. The application can overload the built-in NOCASE collating sequence (using sqlite3_create_collation()) and the built-in like(), upper(), and lower() functions (using sqlite3_create_function()). The SQLite source code includes an "ICU" extension that does these overloads. Or, developers can write their own overloads based on their own Unicode-aware comparison routines already contained within their project.

So, I'd like to know, how can I use LIKE operator with non ASCII text in SQLITE, with Lazarus.

From this faq I see two different ways:

1. collation , see this message

2. with ICU

Anyone can you show me Lazarus code examples for these ways?

The best, if the solution works with all UTF8 chars.

Thanks!

Remark:
- I need SQLite really, because I need local file, without any other (server) installation, like as Firebird.
« Last Edit: October 09, 2016, 08:58:48 pm by totya »

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: Case insensitive LIKE in SQLite above ASCII
« Reply #1 on: October 03, 2016, 08:59:04 am »
I have added http://wiki.freepascal.org/SQLite#Creating_user_defined_collations
But this does not solve LIKE operator  :(, which requires more work ...

In short you must define user function, which will overload default LIKE() function:

Code: Pascal  [Select][+][-]
  1. procedure UTF8xLike(ctx: psqlite3_context; N: cint; V: ppsqlite3_value); cdecl;
  2. var Y1,X1: AnsiString;
  3.     Y2,X2: UTF8String;
  4. begin
  5.   // like(X,Y) = Y LIKE X
  6.   X1 := sqlite3_value_text(V[0]);
  7.   Y1 := sqlite3_value_text(V[1]);
  8.   X2 := UTF8Encode(AnsiLowerCase(UTF8Decode(X1)));
  9.   Y2 := UTF8Encode(AnsiLowerCase(UTF8Decode(Y1)));
  10.   sqlite3_result_int(ctx, ord(sqlite3_strlike(PAnsiChar(X2), PAnsiChar(Y2), 0)=0));
  11. end;
  12.  

then register this user function:
Code: Pascal  [Select][+][-]
  1. sqlite3_create_function(SQLite3.Handle, 'like', 2, SQLITE_UTF8 or SQLITE_DETERMINISTIC, nil, @UTF8xLike, nil, nil);
  2.  

But it will not work as is, because it requires also additions to SQLite3 header file

Another approach can be overload default LOWER() function:
Code: Pascal  [Select][+][-]
  1. procedure UTF8xLower(ctx: psqlite3_context; N: cint; V: ppsqlite3_value); cdecl;
  2. var S: AnsiString;
  3. begin
  4.   SetString(S, sqlite3_value_text(V[0]), sqlite3_value_bytes(V[0]));
  5.   S := UTF8Encode(AnsiLowerCase(UTF8Decode(S)));
  6.   sqlite3_result_text(ctx, PAnsiChar(S), Length(S), sqlite3_destructor_type(SQLITE_TRANSIENT));
  7. end;
  8.  

Code: Pascal  [Select][+][-]
  1.  // override default LOWER function
  2.   sqlite3_create_function(SQLite3.Handle, 'lower', 1, SQLITE_UTF8 or SQLITE_DETERMINISTIC, nil, @UTF8xLower, nil, nil);
  3.  

and then use:
  lower(column1) like '%á%'
« Last Edit: October 03, 2016, 02:04:55 pm by LacaK »

totya

  • Hero Member
  • *****
  • Posts: 720
Re: Case insensitive LIKE in SQLite above ASCII
« Reply #2 on: October 03, 2016, 11:46:15 pm »
Hi!

Big thanks to you!

This second method is working correctly with SQL LIKE, with all hungarian chars, above ASCII, like as:
Quote
öüóűőúéáí ÖÜÓŐÚÉÁÍ

Test data about:
Quote
ööö
ÖÖÖ
üüü
ÜÜÜ
óóó
ÓÓÓ
űűű
ŰŰŰ
őőő
ŐŐŐ
úúú
ÚÚÚ
ééé
ÉÉÉ
ááá
ÁÁÁ
ííí
ÍÍÍ

When I type to searchbar these chars in my app, I get two records certainly, upper and lowercase verions of these chars. I'm happy :)

I'm use TSQLite3Dataset, and I needed some small thing to compile your excellent code, for example:

Code: Pascal  [Select][+][-]
  1. uses
  2.   SqLite3, ctypes;

Code: Pascal  [Select][+][-]
  1. const
  2.   SQLITE_DETERMINISTIC =$800;  

See this link.

In this code:
Code: Pascal  [Select][+][-]
  1. sqlite3_create_function(SQLite3.Handle, 'lower', 1, SQLITE_UTF8 or SQLITE_DETERMINISTIC, nil, @UTF8xLower, nil, nil);

Because I use TSQLite3Dataset, I need to swap:
Code: Pascal  [Select][+][-]
  1. SQLite3.Handle
Code: Pascal  [Select][+][-]
  1. Sqlite3Dataset1.SqliteHandle

SQL code:

Code: Pascal  [Select][+][-]
  1. case SQLSearchMethodRadioGroup.ItemIndex of
  2.  
  3.             0: SQL:= 'SELECT * FROM '+TableName+
  4.                      ' WHERE '+'lower('+DataFieldName+')'+
  5.                      ' LIKE '+SearchStr+
  6.                      ' ORDER BY '+'lower('+DataFieldName+')'+' ASC'+
  7.                      ';';
  8.  
  9.           end;                                

So, everything is fine, except one thing, but this isn't my question before. As you can see, I use ORDER BY ASC, but the result is incorrectly for above asc II chars, with or without lower() conversion. Do you have any idea to the proper sorting? Like as this topic.

But this isn't my original question, big thanks again for your answer!
« Last Edit: October 04, 2016, 12:39:02 am by totya »

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: Case insensitive LIKE in SQLite above ASCII
« Reply #3 on: October 04, 2016, 07:20:18 am »
As you can see, I use ORDER BY ASC, but the result is incorrectly for above asc II chars, with or without lower() conversion. Do you have any idea to the proper sorting?

Look at http://wiki.freepascal.org/SQLite#Creating_user_defined_collations

Then use:
Code: Pascal  [Select][+][-]
  1. sqlite3_create_collation(SQLite3.Handle, 'UTF8_CI', SQLITE_UTF8, nil, @UTF8xCompare_CI);

and in SQL:
Code: Pascal  [Select][+][-]
  1. SQL:= 'SELECT * FROM '+TableName+' WHERE '+'lower('+DataFieldName+')'+' LIKE '+SearchStr+
  2. ' ORDER BY '+DataFieldName+' COLLATE UTF8_CI ASC';
  3.  

totya

  • Hero Member
  • *****
  • Posts: 720
Re: Case insensitive LIKE in SQLite above ASCII [SOLVED]
« Reply #4 on: October 05, 2016, 02:49:06 am »
Hi!

Many thanks to you!

It seems to me it's working!

Remarks:

- I use it with TSQLite3Dataset.

- Sqlite3dyn unit in uses line cause crash (sigsev error), but with sqlite3 in uses line seems to me it's okay.

- In my country the accents chars order is important, for example the "a" smaller than "á" and so on, but the sorting method handle these chars with same chars. But it isn't a big problem, because the Excel works similar too. But I can create correct order, if I will modify the UTF8xCompare_CI function.

Thanks again!

I create unit for usage:

Code: Pascal  [Select][+][-]
  1. unit Unit_SQLite3Dataset_LanguageSupport;
  2.  
  3. {
  4. SQLite3LanguageSupport made by totya from LacaK code, see forum:
  5. http://forum.lazarus.freepascal.org/index.php/topic,34259.0.html
  6.  
  7. Usage: (after table created)
  8. SQLite3LanguageSupport (Sqlite3Dataset1.SqliteHandle);
  9.  
  10. changelog:
  11. v1.0 (2016.10.05 02:30) initial version
  12.  
  13. totya remarks:
  14. - Tested with TSQLite3Dataset (rev34598) on Win7 x64, Lazarus 1.6
  15. - The recommended sqlite3dyn unit doesn't need, but if present, I got sigsev error.
  16. }
  17.  
  18. {$mode objfpc}{$H+}
  19.  
  20. interface
  21.  
  22. uses
  23.   Classes, SysUtils,
  24.  
  25.   sqlite3,
  26.  
  27. //  sqlite3dyn,
  28.  
  29.   ctypes;
  30.  
  31. procedure SQLite3Dataset_LanguageSupport (const _Handle: pointer);
  32.  
  33. implementation
  34.  
  35. const
  36.   SQLITE_UTF8= $01;
  37.   SQLITE_DETERMINISTIC =$800;
  38.  
  39. // http://forum.lazarus.freepascal.org/index.php/topic,34259.msg224029.html#msg224029
  40. procedure UTF8xLower (ctx: psqlite3_context; {%H-}N: cint; V: ppsqlite3_value); cdecl;
  41. var S: AnsiString;
  42. begin
  43.   SetString(S, sqlite3_value_text(V[0]), sqlite3_value_bytes(V[0]));
  44.   S := UTF8Encode(AnsiLowerCase(UTF8Decode(S)));
  45.   sqlite3_result_text(ctx, PAnsiChar(S), Length(S), sqlite3_destructor_type(SQLITE_TRANSIENT));
  46. end;
  47.  
  48. // utf8 case-sensitive compare callback function
  49. // http://wiki.freepascal.org/SQLite#Creating_user_defined_collations
  50. function UTF8xCompare_CS ({%H-}user: pointer; len1: longint; data1: pointer; len2: longint; data2: pointer): longint; cdecl;
  51. var S1, S2: AnsiString;
  52. begin
  53.   SetString(S1, data1, len1);
  54.   SetString(S2, data2, len2);
  55.   Result := UnicodeCompareStr(UTF8Decode(S1), UTF8Decode(S2));
  56. end;
  57.  
  58. // utf8 case-insensitive compare callback function
  59. // http://wiki.freepascal.org/SQLite#Creating_user_defined_collations
  60. function UTF8xCompare_CIS ({%H-}user: pointer; len1: longint; data1: pointer; len2: longint; data2: pointer): longint; cdecl;
  61. var S1, S2: AnsiString;
  62. begin
  63.   SetString(S1, data1, len1);
  64.   SetString(S2, data2, len2);
  65.   Result := UnicodeCompareText(UTF8Decode(S1), UTF8Decode(S2));
  66. end;
  67.  
  68. // http://forum.lazarus.freepascal.org/index.php/topic,34259.msg224029.html#msg224029
  69. // http://forum.lazarus.freepascal.org/index.php/topic,34259.msg224152.html#msg224152
  70. procedure SQLite3Dataset_LanguageSupport (const _Handle: pointer);
  71. begin
  72.   if Assigned(_Handle)
  73.   then
  74.     begin
  75.       sqlite3_create_function
  76.         (_Handle,
  77.         'lower',
  78.         1,
  79.         SQLITE_UTF8 or SQLITE_DETERMINISTIC,
  80.         nil,
  81.         @UTF8xLower,
  82.         nil, nil);
  83.  
  84.       sqlite3_create_collation(_Handle, 'UTF8_CS', SQLITE_UTF8, nil, @UTF8xCompare_CS);
  85.       sqlite3_create_collation(_Handle, 'UTF8_CIS', SQLITE_UTF8, nil, @UTF8xCompare_CIS);
  86.     end
  87.   else raise Exception.Create('Unassigned handle in SQLite3Dataset_LanguageSupport');
  88. end;
  89.  
  90. initialization
  91.  
  92. finalization
  93.  
  94. end.
  95.  
« Last Edit: October 05, 2016, 06:35:13 am by totya »

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: Case insensitive LIKE in SQLite above ASCII [SOLVED]
« Reply #5 on: October 05, 2016, 08:44:59 am »
- In my country the accents chars order is important, for example the "a" smaller than "á" and so on, but the sorting method handle these chars with same chars. But it isn't a big problem, because the Excel works similar too. But I can create correct order, if I will modify the UTF8xCompare_CI function.

For me it works as expected ("a" is before "á").
How do you have modified UTF8xCompare_CI ?

totya

  • Hero Member
  • *****
  • Posts: 720
Re: Case insensitive LIKE in SQLite above ASCII [SOLVED]
« Reply #6 on: October 05, 2016, 07:05:11 pm »
- In my country the accents chars order is important, for example the "a" smaller than "á" and so on, but the sorting method handle these chars with same chars. But it isn't a big problem, because the Excel works similar too. But I can create correct order, if I will modify the UTF8xCompare_CI function.

For me it works as expected ("a" is before "á").
How do you have modified UTF8xCompare_CI ?

Hi!

Thank you for the answer!

My full unit you can see in my previous post... I didn't change anything on it.

My words test for example:

Quote
affe
ale
abd
ács
ábbas

The result after sorting (bad):

Quote
ábbas
abd
ács
affe
ale

See attached demp project.

But as I said, the result same as in MSExcel.

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: Case insensitive LIKE (and ORDER) in SQLite, above ASCII [SOLVED]
« Reply #7 on: October 07, 2016, 01:36:07 pm »
Probably in Hungarian collation is "a"="á" (in 1st level ordering)?

https://en.wikipedia.org/wiki/Alphabetical_order:
" In collating, accented vowels are equivalent with their non-accented counterparts and double and triple characters follow their single originals. Hungarian alphabetic order is: A=Á, B, C, Cs, D, Dz, Dzs, E=É, F, G, Gy, H, I=Í, J, K, L, Ly, M, N, Ny, O=Ó, Ö=Ő, P, Q, R, S, Sz, T, Ty, U=Ú, Ü=Ű, V, W, X, Y, Z, Zs."

https://en.wikipedia.org/wiki/Hungarian_alphabet#Alphabetical_ordering_.28collation.29
« Last Edit: October 07, 2016, 01:53:11 pm by LacaK »

totya

  • Hero Member
  • *****
  • Posts: 720
Re: Case insensitive LIKE (and ORDER) in SQLite, above ASCII [SOLVED]
« Reply #8 on: October 07, 2016, 03:22:22 pm »
Hi!

Thanks for the answer!

After I read your post, I searched on the web many times. Well, the rule, what you wrote, that's correct. (LOL!)

Rules in my language (the best trustworthy source)
Rule
Rule II

Otherwise as I see on forums, many peoples think the correct sort order in my language similar of this: ("a".."á" etc). I asked my younger sister, and two my friends, and both told me: words with "a" the first, and the second is with "á"...

But once again, you are right, because this is the rule, but this rule is very ugly and pointless, and hard to use.

Okay, but if I want own sort with "non-rule logic order" what is the shortest way?
Okay, I understand, this isn't SQL related question, this is UnicodeCompare(x) function question.
« Last Edit: October 07, 2016, 03:38:30 pm by totya »

LacaK

  • Hero Member
  • *****
  • Posts: 691
Re: Case insensitive LIKE (and ORDER) in SQLite, above ASCII [SOLVED]
« Reply #9 on: October 08, 2016, 05:47:52 pm »
But once again, you are right, because this is the rule, but this rule is very ugly and pointless, and hard to use.
But it is like your language ordering is officially defined. No sense to fight against it. ;D
If you want introduce your own ordering you can do it by comparing char by char like:

Code: Pascal  [Select][+][-]
  1. function UTF8xCompare_CI2 ({%H-}user: pointer; len1: longint; data1: pointer; len2: longint; data2: pointer): longint; cdecl;
  2. var S1, S2: UTF8String;
  3.     US1, US2: UnicodeString;
  4.     i,l: integer;
  5. begin
  6.   SetString(S1, data1, len1);
  7.   SetString(S2, data2, len2);
  8.   US1 := UTF8Decode(S1);
  9.   US2 := UTF8Decode(S2);
  10.   if Length(US1) < Length(US2) then
  11.     l := Length(US1)
  12.   else
  13.     l := Length(US2);
  14.   i := 1;
  15.   Result := 0;
  16.   while (Result=0) and (i <= l) do begin
  17.     Result := UnicodeCompareText(US1[i], US2[i]);
  18.     Inc(i);
  19.   end;
  20.   if Result = 0 then
  21.     Result := Length(US1) - Length(US2);
  22. end;
  23.  

totya

  • Hero Member
  • *****
  • Posts: 720
Re: Case insensitive LIKE (and ORDER) in SQLite, above ASCII [SOLVED]
« Reply #10 on: October 09, 2016, 08:57:37 pm »
Hi LacaK!

With your sample, I sucessfully created the own order method, and it's working correctly, but that wasn't too easy... My lucky, hungarian chars doesn't need utf8 or unicode codepage, so I convert string to my cp with UTF8ToCP1250, then I use (double) case with charcode to compare. Surprise, the sort isn't much slower, what I thinked before.

Your help in this topik  is invaluable, thank you very much for your very big help!

totya

  • Hero Member
  • *****
  • Posts: 720
Re: Case insensitive LIKE in SQLite above ASCII
« Reply #11 on: January 15, 2017, 05:44:46 pm »
Code: Pascal  [Select][+][-]
  1. procedure UTF8xLower(ctx: psqlite3_context; N: cint; V: ppsqlite3_value); cdecl;
  2. var S: AnsiString;
  3. begin
  4.   SetString(S, sqlite3_value_text(V[0]), sqlite3_value_bytes(V[0]));
  5.   S := UTF8Encode(AnsiLowerCase(UTF8Decode(S)));
  6.   sqlite3_result_text(ctx, PAnsiChar(S), Length(S), sqlite3_destructor_type(SQLITE_TRANSIENT));
  7. end;
  8.  

Hi!

Thanks again! I tested (revised/retested) this above code, and this line:

Code: Pascal  [Select][+][-]
  1. S := UTF8Encode(AnsiLowerCase(UTF8Decode(S)));

Slightly overcomplicated for me. This code equivalent with these simpler codes:

Code: Pascal  [Select][+][-]
  1. S:= UTF8LowerCase(S);

Code: Pascal  [Select][+][-]
  1. S:= AnsiLowerCase(S);

I tested it with the all of UTF8 chars under 100.000, where the lower and upper case is different, these are 828 UTF8 chars.

This code produce bad result certainly:

Code: Pascal  [Select][+][-]
  1. S:= LowerCase(S);
« Last Edit: January 15, 2017, 10:08:06 pm by totya »

 

TinyPortal © 2005-2018