PosEx variant for case-insensitive search

AlexTP

Hero Member
Posts: 2401

PosEx variant for case-insensitive search

« on: September 17, 2020, 09:37:10 pm »

PosEx is ASM based so it's very fast. (Uses IndexWord ASM based func.)
For CudaText, I need variant with case-insensitive match, with WideChar/UnicodeString params.
It can avoid WidestringManager by using some callback (CudaText has such callback to make UpperCase/LowerCase for widechar. It don't use WidestringManager. It uses table lookup).
Please?

« Last Edit: September 17, 2020, 09:39:44 pm by Alextp »

Logged

CudaText editor - ATSynEdit - More from me

marcov

Administrator
Hero Member
Posts: 11451
FPC developer.

Re: PosEx variant for case-insensitive search

« Reply #1 on: September 17, 2020, 09:41:05 pm »

Quote from: Alextp on September 17, 2020, 09:37:10 pm

PosEx is ASM based so it's very fast. (Uses IndexWord ASM based func.)
For CudaText, I need variant with case-insensitive match, with WideChar/UnicodeString params.
Please?

Nope, it uses indexbyte.

But for unicodestring you would need to based on indexword, but that assumes there is a word based value to search for.

And this is hard because unicode (and unicode based case sensitivity) is simply hard. There is no chance that such version would even be in the same ballpark as the ascii version

Logged

AlexTP

Hero Member
Posts: 2401

Re: PosEx variant for case-insensitive search

« Reply #2 on: September 17, 2020, 11:23:36 pm »

Then we can make a trick- pass TWO UnicodeString params to PosExI (example name) - str1, str2 (uppercase and lowercase) - it is app's work to prepare them. CudaText will prepare them using its table lookup.

Logged

CudaText editor - ATSynEdit - More from me

ASBzone

Hero Member
Posts: 678
Automation leads to relaxation...

Re: PosEx variant for case-insensitive search

« Reply #3 on: September 18, 2020, 02:33:16 am »

Quote from: Alextp on September 17, 2020, 11:23:36 pm

Then we can make a trick- pass TWO UnicodeString params to PosExI (example name) - str1, str2 (uppercase and lowercase) - it is app's work to prepare them. CudaText will prepare them using its table lookup.

Okay, but UPPERCASE and lowercase are only two options in the case-insensitive continuum. What about CamelCase, or jUsTmIxEdUpCaSe?

Logged

-ASB: https://www.BrainWaveCC.com/

Lazarus v2.2.7-ada7a90186 / FPC v3.2.3-706-gaadb53e72c
(Windows 64-bit install w/Win32 and Linux/Arm cross-compiles via FpcUpDeluxe on both instances)

My Systems: Windows 10/11 Pro x64 (Current)

CM630

Hero Member
Posts: 1091
Не съм сигурен, че те разбирам.

Re: PosEx variant for case-insensitive search

« Reply #4 on: September 18, 2020, 09:12:48 am »

Just to mention:
In English the capital lettor for „i“ is „I“.
In Turkish the capital letter for „i“ is „İ“, while the capital letter for „ı“ is „I“. This is only a single exception, that I am aware of, there might be hundreds.
So lowercase and uppercase might be problematic.

Logged

Лазар 3,2 32 bit (sometimes 64 bit); FPC3,2,2; rev: Lazarus_3_0 on Win10 64bit.

AlexTP

Hero Member
Posts: 2401

Re: PosEx variant for case-insensitive search

« Reply #5 on: September 18, 2020, 10:08:22 am »

Quote

>Okay, but UPPERCASE and lowercase are only two options in the case-insensitive continuum. What about CamelCase, or jUsTmIxEdUpCaSe?

PosExI wil search for Widechar - using chars from str1+str2 - it will need the Len(str1)=Len(str2) and will compare next chars wil pairs - str1_i and str2_i. If both compares are False, next char is bad. Otherwise, next char is ok.

« Last Edit: September 18, 2020, 10:11:38 am by Alextp »

Logged

CudaText editor - ATSynEdit - More from me

AlexTP

Hero Member
Posts: 2401

Re: PosEx variant for case-insensitive search

« Reply #6 on: September 18, 2020, 10:10:59 am »

Quote

>In English the capital lettor for „i“ is „I“. In Turkish the capital letter for „i“ is „İ“,

No, in Unicode we have single result for UpperCase(wchar).

Logged

CudaText editor - ATSynEdit - More from me

Thaddy

Hero Member
Posts: 14371
Sensorship about opinions does not belong here.

Re: PosEx variant for case-insensitive search

« Reply #7 on: September 18, 2020, 10:17:28 am »

Quote from: Alextp on September 18, 2020, 10:10:59 am

Quote
>In English the capital lettor for „i“ is „I“. In Turkish the capital letter for „i“ is „İ“,
No, in Unicode we have single result for UpperCase(wchar).

No, wchar does not expand to unicodechar by itself. So that only partially works (UCS2 subset of UTF16 afaik)

« Last Edit: September 18, 2020, 10:22:04 am by Thaddy »

Logged

Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

AlexTP

Hero Member
Posts: 2401

Re: PosEx variant for case-insensitive search

« Reply #8 on: September 18, 2020, 10:20:32 am »

If wchar is not in unicode surrogate range (my code has functions IsCharSurrogateLow/...High), then it's mapped to unicodechar. If it is in, we need next wchar2 to make unicodechar from 2 wchars.

Logged

CudaText editor - ATSynEdit - More from me

Thaddy

Hero Member
Posts: 14371
Sensorship about opinions does not belong here.

Re: PosEx variant for case-insensitive search

« Reply #9 on: September 18, 2020, 10:26:14 am »

Quote from: Alextp on September 18, 2020, 10:20:32 am

If wchar is not in unicode surrogate range (my code has functions IsCharSurrogateLow/...High), then it's mapped to unicodechar. If it is in, we need next wchar2 to make unicodechar from 2 wchars.

Maybe UTF32 is a suggestion, because that maps to everything. (including both UTF8 and UTF16). It is expensive in space but cheap in compute.

Logged

Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

AlexTP

Hero Member
Posts: 2401

Re: PosEx variant for case-insensitive search

« Reply #10 on: September 18, 2020, 10:30:37 am »

No problem with my idea about str1+str2 (of same Len). If we have surrogate pair in str1, we must have the same surrogate pair in str2 (because Uppercase/Lowercase for surrogate pair doesn't change it AFAIK)

Logged

CudaText editor - ATSynEdit - More from me

Lazarus

Bookstore

Search

Recent

Author Topic: PosEx variant for case-insensitive search (Read 1778 times)

AlexTP

PosEx variant for case-insensitive search

marcov

Re: PosEx variant for case-insensitive search

AlexTP

Re: PosEx variant for case-insensitive search

ASBzone

Re: PosEx variant for case-insensitive search

CM630

Re: PosEx variant for case-insensitive search

AlexTP

Re: PosEx variant for case-insensitive search

AlexTP

Re: PosEx variant for case-insensitive search

Thaddy

Re: PosEx variant for case-insensitive search

AlexTP

Re: PosEx variant for case-insensitive search

Thaddy

Re: PosEx variant for case-insensitive search

AlexTP

Re: PosEx variant for case-insensitive search

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook