Recent

Author Topic: My NaturalSort unit  (Read 13521 times)

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: My NaturalSort unit
« Reply #15 on: May 25, 2015, 11:08:39 am »
Thanks a lot for your good work & for sharing it Typo. Can you please add sorting of IP addresses to it?
IP addresses are nothing more than numbers they should be sorted correctly.
O:-)
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #16 on: May 25, 2015, 12:57:15 pm »
IP addresses conflict with thousand separated or decimal numbers, could you please post a short sample list of IP addresses you use (about 10 items)?
« Last Edit: May 25, 2015, 01:24:57 pm by typo »

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: My NaturalSort unit
« Reply #17 on: May 25, 2015, 01:30:49 pm »
IP addresses conflict with thousand separated or decimal numbers, could you please post a short sample list of IP addresses you use (about 10 items)?
there should not be any support for thousand or decimal points in natural compare for that matter the specifications as far as I remember do not even support different length numbers the zero padding is an extension so when anything other than a number is found the numberic check should revert back to string. Strings are not a database after all.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #18 on: May 25, 2015, 01:36:03 pm »
there should not be any support for thousand or decimal points in natural compare for that matter the specifications as far as I remember do not even support different length numbers the zero padding is an extension so when anything other than a number is found the numberic check should revert back to string. Strings are not a database after all.

This is a bonus.

This list should sort as expected:

10.145.254.9
10.145.255.9
10.145.255.10
10.146.254.9
121.243.100.0
255.255.255.254

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: My NaturalSort unit
« Reply #19 on: May 25, 2015, 01:46:29 pm »
there should not be any support for thousand or decimal points in natural compare for that matter the specifications as far as I remember do not even support different length numbers the zero padding is an extension so when anything other than a number is found the numberic check should revert back to string. Strings are not a database after all.

This is a bonus.

Yes it is. It was implemented as an extension to make things a bit more natural. It was not mend to be a fuzzy grammar parser. For more accurate results you need to have more strict naming rules and custom compare routines.

This list should sort as expected:

10.145.254.9
10.145.255.9
10.145.255.10
10.146.254.9
121.243.100.0
255.255.255.254
of course the variable length  numeric extension makes sure of that.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #20 on: May 25, 2015, 01:59:00 pm »
@JD

If you handle large IP lists, maybe you need a fast deduplicator too:

http://wiki.lazarus.freepascal.org/LazUtils#TDictionaryStringList

TDictionaryStringList dedupes string lists without changing the order of them. You don't need that they be sorted in order to dedupe them. It is on your LazUtils directory of Lazarus installation.
« Last Edit: May 25, 2015, 02:24:16 pm by typo »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #21 on: May 25, 2015, 08:37:36 pm »

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: My NaturalSort unit
« Reply #22 on: May 25, 2015, 09:22:13 pm »
@JD

If you handle large IP lists, maybe you need a fast deduplicator too:

http://wiki.lazarus.freepascal.org/LazUtils#TDictionaryStringList

TDictionaryStringList dedupes string lists without changing the order of them. You don't need that they be sorted in order to dedupe them. It is on your LazUtils directory of Lazarus installation.

Thanks for letting me know about this. In addition, I don't need to sort large IP lists, just about 15 to 20 IP addresses. The function I posted earlier does the job though. I'll test the IPs with your NaturalSort unit.

JD
« Last Edit: May 25, 2015, 09:25:41 pm by JD »
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #23 on: May 26, 2015, 01:19:10 am »
The main problem is that there is a conflict with thousand separators in my language (DecimalSeparator = comma and ThousandSeparator = dot).

Maybe the change below can solve the problem:

Code: [Select]
while (pch^ <> #0) and ( IsNumber(pch^) or
    ((not FoundDecSeparator) and((pch^= ADecSeparator)
    {$IFDEF THOUSAND_SEPARATORS_SUPPORT}
      or ( (pch^ = AThousandSeparator) and IsThousand(pch) )
    {$ENDIF}
     )))
    do 

But the support for thousand separators is gone for my decimal system.

For my decimal system, thousand separators support excludes subtopics greater than 99 in topic enumarators and support for IP addresses too, because it is impossible to distinguish between them and thousands.

http://en.wikipedia.org/wiki/Decimal_mark#/media/File:DecimalSeparator.svg

The same decimal and thousand separators as my country uses are used in Europe, Latin America, Africa, Canada and New Zealand.

According to my tests, THOUSAND_SEPARATORS_SUPPORT define can coexist with decimal systems where dot is the decimal separator and comma is the thousand separator, but only for topic enumarators, not for IP addresses.

Under tests, anyway.
« Last Edit: May 26, 2015, 05:02:45 am by typo »

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: My NaturalSort unit
« Reply #24 on: May 26, 2015, 09:57:41 am »
The main problem is that there is a conflict with thousand separators in my language (DecimalSeparator = comma and ThousandSeparator = dot).

I've got the same problem as you have. My language is French. That is why I was interested in how NaturalSort works with floating point numbers where our DecimalSeparator = comma and our ThousandSeparator = space.

JD
« Last Edit: May 26, 2015, 01:56:42 pm by JD »
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #25 on: May 26, 2015, 01:07:16 pm »
Well, I think that in this case there is no conflict and no change is required. To be tested.

First tests:
1) French System (space-comma) is OK with IP's, but FAILS with topic enumerators greater than 99
2) US System (comma-dot) FAILS with IP's
3) Brazilian System (dot-comma) FAILS with topic enumerators greater than 99

And space char usually is replaced by underline char in URLs. In this case the recognition of thousands is broken.
« Last Edit: May 26, 2015, 03:14:47 pm by typo »

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: My NaturalSort unit
« Reply #26 on: May 26, 2015, 05:37:49 pm »
I have radically simplified the unit and now keep support for integers, floats and alphanumeric only.
« Last Edit: May 26, 2015, 05:39:46 pm by typo »

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: My NaturalSort unit
« Reply #27 on: May 26, 2015, 05:43:15 pm »
Well, I think that in this case there is no conflict and no change is required. To be tested.

First tests:
1) French System (space-comma) is OK with IP's, but FAILS with topic enumerators greater than 99
2) US System (comma-dot) FAILS with IP's
3) Brazilian System (dot-comma) FAILS with topic enumerators greater than 99

And space char usually is replaced by underline char in URLs. In this case the recognition of thousands is broken.

Our IPs are the standard IPs used worldwide :D. The space-comma format is only for monetary values.
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

rvk

  • Hero Member
  • *****
  • Posts: 6110
Re: My NaturalSort unit
« Reply #28 on: May 26, 2015, 05:47:31 pm »
Next up.... IPv6 support? :o :D

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: My NaturalSort unit
« Reply #29 on: May 26, 2015, 05:50:09 pm »
Next up.... IPv6 support? :o :D
No, human genome support.  :P
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

 

TinyPortal © 2005-2018