Recent

Author Topic: ASCII characters Questions  (Read 4861 times)

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: ASCII characters Questions
« Reply #15 on: May 11, 2019, 07:01:00 pm »
  ASCIIfy function yea, that's what I need.
I really suppressed that FPC dosen't have this built in.

I have to convert 64,000 X 7 (64,000 lines of text with 7 fields in each line) to a standard so I can compare then to file with verified data.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: ASCII characters Questions
« Reply #16 on: May 11, 2019, 07:11:51 pm »
S : String = 'ÄÖÜß   ñâ'
I would get:        C = 'AOUB na'

Why would you want to convert a lower case bèta (or is it a German Ringel-S?, I can't tell from just looking at it) to upper case B?
A bit inconsistent IMHO.

None of this of course is going to work if you do not know at forehand what encoding the original string (or file) is using.
GuessEncoding in lconvencoding unit (part of Lazarus) may be of hel here.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: ASCII characters Questions
« Reply #17 on: May 11, 2019, 07:38:44 pm »
Assigning the input string to a string with codepage CP_ASCII does some of the magic for you.

Code: Pascal  [Select][+][-]
  1. program test;
  2.  
  3. {$ifdef fpc}
  4. {$mode objfpc}
  5. {$h+}
  6. {$endif}
  7.  
  8. type
  9.   AsciiString = type AnsiString(CP_ASCII);
  10.  
  11. var
  12.   A: AsciiString;
  13.   S: String;
  14.  
  15. begin
  16.   repeat
  17.     write('S: ');
  18.     readln(S);
  19.     A := S;
  20.     writeln('A = ',A);
  21.   until S='';
  22. end.

Code: [Select]
C:\Users\Bart\LazarusProjecten\ConsoleProjecten>test
S: 123
A -> 123
S: äëïöü
A -> aeiou
S: ÄËÏÖÜ
A -> AEIOU

Bart

Zoran

  • Hero Member
  • *****
  • Posts: 1829
    • http://wiki.lazarus.freepascal.org/User:Zoran
Re: ASCII characters Questions
« Reply #18 on: May 11, 2019, 09:11:52 pm »
Bart, how nice and simple!
So, Let's test the following ASCIIfy function: :)

Code: Pascal  [Select][+][-]
  1. function ASCIIfy(const S: AnsiString): AnsiString;
  2. begin
  3.   Result := S;
  4.   SetCodePage(RawByteString(Result), CP_ASCII, True);
  5. end;
  6.  

And seems to work quite well with cutting off the accents from latin letters, but German ß, as well as greek and cyrilic letters are just turned to question marks:
ASCIIfy('ÄÖÜß ñâ ŽĐŁ čćš αβγδε абвгд') returns 'AOU? na ZDL ccs ????? ?????'

So, if you expect only accented latin letters, it is good enough. For every other character you have to add code. Here is example which turs german ß into ss and cyrilic lower г into latin g:

Code: Pascal  [Select][+][-]
  1. function ASCIIfy(const S: AnsiString): AnsiString;
  2. begin
  3.   Result := S;
  4.  
  5.   // You need to replace all but accented latin letters one by one:
  6.   Result := StringReplace(Result, 'ß', 'ss', [rfReplaceAll]); // ß will become ss
  7.   Result := StringReplace(Result, 'г', 'g', [rfReplaceAll]); // г will become g
  8.   // etc.
  9.  
  10.   SetCodePage(RawByteString(Result), CP_ASCII, True);
  11. end;
  12.  

Now:
ASCIIfy('ÄÖÜß ñâ ŽĐŁ čćš αβγδε абвгд') returns 'AOUss na ZDL ccs ????? ???g?'

jamie

  • Hero Member
  • *****
  • Posts: 6091
Re: ASCII characters Questions
« Reply #19 on: May 11, 2019, 09:16:17 pm »
Ok,
There is also the reverse function

Utf8ToWinCp(YourUTF8String):String. etc..

The only true wisdom is knowing you know nothing

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: ASCII characters Questions
« Reply #20 on: May 11, 2019, 09:41:57 pm »
S : String = 'ÄÖÜß   ñâ'
I would get:        C = 'AOUB na'

Why would you want to convert a lower case bèta (or is it a German Ringel-S?, I can't tell from just looking at it) to upper case B?
A bit inconsistent IMHO.

None of this of course is going to work if you do not know at forehand what encoding the original string (or file) is using.
GuessEncoding in lconvencoding unit (part of Lazarus) may be of help here.

Bart

That was an example I used to explain what I wanted to do.  I think it's UTF8.

I had a 7.9 million text line of data. We managed to load it in to an array. That file was UTF8.

From that I extracted some 45,000 +/- records. Some of the fields in the records have the odd characters, i.e  É or Ö. The fields consist of Cities, and country names. From another source I have a list of 55,000 cities and 265 countries. They are not UTF8. When I create the final data set I need to verify against the 55,000 cities and 265 countries.

Record 1:
[KPHX][12548][Renòiò][Perû][SB][41.5101654][-125.4156] <-- Just an example

Convert to:
 Record 2.
[KPHX][12548][Renoio][Peru][SB][41.5101654][-125.4156]

Verify Field 1, 3, 4, and 5 against a data set which has:

Peru
Renoio
United States
SB
K1
Mexico
Canada
KPHX
KJFK

If a field can't be validated then replace with a 'Nil'  and save for hand editing.


   
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: ASCII characters Questions
« Reply #21 on: May 11, 2019, 10:22:06 pm »
I think you would be better off using UTF8 as your "standard" and converting to it everything that is not: that way you will run a much lower risk of loosing any information through conversion.

And it has the advantage of being Lazarus-friendly :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: ASCII characters Questions
« Reply #22 on: May 12, 2019, 02:41:14 am »
Yea, You might be right.

I don't think there is much danger in losing data. It would run just in this  program.
                     {City}    {Cty} {RG}
[KPHX][12548][Renoio][Peru][SB][41.5101654][-125.4156]

This is the one of the finished product of this program.

However it could be:   [KPHX][12548][Nil][Nil][Nil][41.5101654][-125.4156]  because the City and Country didn't validate and the region code wasn't entered.

It's an old record. Now they require region and country.  Not City, some airports are in the wilderness.

On the generation of these records I will have 9,000 +/- 7 field records and about 26,200
with 1 to 3 Nil's in each record.

If the Nil is in the country field but the record has a region 'K1' the country is United States.
'LE' Spain.

No Region, or Country but city (Rare occurrence) but i can search for city in the 9,000 records group, they all have all there fields filled out. if found I know the country and maybe the region. ( Some countries have multiple regions).

Three Nil fields. Difficult but not impossible. I have written and tested a program that will do the 'Haverstine' form any location to any location.

All records must have ICAO, Hash, Lat and lon info. Pick one of the records with 3 Nils and do a haverstine against the 9,000 good records. You can set a distance say 10 miles.

Airport 'A' and airport 'B' are within 10 miles of one another they are probably in the same Country maybe in the same region.

You can also run 'A' and 'B' against  'Middle Latitudes and Longitudes'. These are  airport's in the center of each country.  For the US it's Kansas City for Spain it's Madrid.

Adjust the distance to 400 miles. Any record that get a Haverstine distance within the 400 would be in Spain and a Region code of LE. 100 miles from Kansas City is in the U.S. but maybe not a K1 region.

City:  Any airport within 5 miles of a city would get the city name in the City fields.

You start very small say 5 miles against the 'Middle Latitudes and Longitudes', run the 9,000  against the 26,000. Reload the 9,000 and it might be 9,400. Widen the distance say to 25 miles and go again. Reload and go to 65 miles.  The 9,000 will grow after about 4 runs to 14,000. and the 26,000 will shrink to 21,000. Then you go back to 25 miles.  Because you have a large sampling of good records you can get 1,2 4 miles from known cities and airports. Phoenix area has 9 airports.

I ran all this with a set of bad data. I know have a data file of airports that is correct as far as airport ICAO. KPHX, KJFK ect. Trying to validate as much data a possible before I go to Haverstine.

Long way of saying need to validate the records verify the numbers.

Thanks All.



« Last Edit: May 12, 2019, 02:44:00 am by JLWest »
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: ASCII characters Questions
« Reply #23 on: May 12, 2019, 05:57:17 am »
Airport 'A' and airport 'B' are within 10 miles of one another they are probably in the same Country maybe in the same region.

Don't asume too much: test with known-good airports, say the Charles DeGaulle in Paris and the closest one in Belgium, or one in Andorra vs. the Barcelona one. They might be closer than you think. Though 10 miles (~15 Km?) sounds good: that's less than the real distance betwen Madrid and its airport :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: ASCII characters Questions
« Reply #24 on: May 12, 2019, 07:42:30 am »
Testing is underway. Probably take a week just to test the building of the record with all 7 fields.

Bourges is right in the center of France.

Using Madrid-Spain Bourges-France; Kansas City-US  Bad Hersfield-Germany There are 246
in the airport Middle Lat/Lon dataset.

All are in the middle of there respective country. You don't always get the one you want because it may not be in the X-Plane airport list.

See you in a week.

 
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

 

TinyPortal © 2005-2018