Recent

Author Topic: How to remove non-printable characters from INSIDE a string  (Read 16384 times)

Gizmo

  • Hero Member
  • *****
  • Posts: 831
How to remove non-printable characters from INSIDE a string
« on: November 29, 2011, 04:21:56 pm »
I've read so many string function pages I can't tell you. Such as this one http://www.freepascal.org/docs-html/rtl/sysutils/stringfunctions.html, and others besides and for the life of me, I can't find a function\procedure to do what I need.

Basically, I have a StringList that is populated from a file stream (using LoadFromStream). Each line contains some unprintable characters though, such as carriage returns (0x0D etc) and others which appear as tiny blocks of 4 squares in my Memo object. Though I am able to remove the unprintable chars from the front and end of each line using Trim, how can I ask it to remove other such chars INSIDE the string itself? 

This page (http://everything.explained.at/Trim_(programming)/) for example, mention functions from other languages such as normalize-space() and stripToNull(). Is there something like that, or another way of doing it, with FPC? 

Ta

Ted

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: How to remove non-printable characters from INSIDE a string
« Reply #1 on: November 29, 2011, 04:29:42 pm »
I'll jump in with some naive examples. I'm sure there will be gurus here shortly with the perfect answer  :D

Code: [Select]
YourStringList.Text := StringReplace(YourStringList.Text, #10, ' ', [rfReplaceAll]); //tab
YourStringList.Text := StringReplace(YourStringList.Text, #10, ' ', [rfReplaceAll]); //line feed
YourStringList.Text := StringReplace(YourStringList.Text, #13, ' ', [rfReplaceAll]);//carriage return
.... and so on...
Or go through the text of each stringlist in a loop, check if chr(thecounter)<whatever or > whatever, and replace with ' '

I agree it would be nice if there were something that could get rid of the junk in one go. Perhaps there is such a function somewhere...
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

Arbee

  • Full Member
  • ***
  • Posts: 223
Re: How to remove non-printable characters from INSIDE a string
« Reply #2 on: November 29, 2011, 05:04:25 pm »
For such non-standard functions I have on and off been using a module, developed by a guy in Germany that emulates string functions from the REXX programming language.
One of the functions in there is TRANSLATE, which could be used for that purpose.
It's syntax is:
Code: [Select]
str_out := translate(str_in,{output_table},{input_table));
It searches all characters from str_in in the input_table (which is also just a string of characters) and - when found - replaces it with the character in output_table at the same position.  If not found, it copies the character unchanged.  So BigChimps example above would boil down to:

Code: [Select]
YourStringList.Text := translate(YourStringList.Text,'   ',#9+#10+#13);

Here's the source of "rexxstring00.pp"
« Last Edit: November 29, 2011, 05:09:36 pm by Arbee »
1.0/2.6.0  XP SP3 & OS X 10.6.8

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1927
Re: How to remove non-printable characters from INSIDE a string
« Reply #3 on: November 29, 2011, 06:11:14 pm »
Another possibility would be to think for 30 seconds instead of searching for hours. ;-)

Code: Text  [Select][+][-]
  1. for i:=1 to Length(s) do if s[i] in [#10,#13,#9] then s[i]:='_';

You could populate a second string in a similar way, to copy only printable chars.

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: How to remove non-printable characters from INSIDE a string
« Reply #4 on: November 29, 2011, 09:28:50 pm »
Thanks for your help gents.

The following does work, though I had to make some changes from the suggestion to get it to work (thanks for the tip though BigChimp):

Code: Pascal  [Select][+][-]
  1. // I am changing these non-printable chars with the ',' char instead
  2. SLFile[i] := StringReplace(SLFile.Strings[i], #3, ',', [rfReplaceAll]); //End of Text, 0x03
  3. SLFile[i] := StringReplace(SLFile.Strings[i], #4, ',', [rfReplaceAll]); //End of Transmission, 0x04
  4. SLFile[i] := StringReplace(SLFile.Strings[i], #5, ',', [rfReplaceAll]); //Enquiry, 0x05
  5. SLFile[i] := StringReplace(SLFile.Strings[i], #6, ',', [rfReplaceAll]); //Acknowledgement 0x06
  6. SLFile[i] := StringReplace(SLFile.Strings[i], #7, ',', [rfReplaceAll]); //Bell, 0x07
  7. SLFile[i] := StringReplace(SLFile.Strings[i], #9, ',', [rfReplaceAll]); //Horizontal tab 0x09    
  8. //...and so on for my full list
  9. Memo1.Lines.Add(Trim(slFile.Strings[i]));  //Take any remaining bits and pieces from the ends and add to memo
  10.  

However, I just know that this is a very inefficient way of doing it. A bit like getting to 8 with "2+2+2+2" instead of "4x2".

Theo - thanks for the suggestion, which I have also tried, but I am being told "Error: Operator is not overloaded".
Code: Pascal  [Select][+][-]
  1. for j := 1 to Length(SLfile[i]) do
  2.               begin
  3.                 if SLFile.Strings[i] in [#3,#4,#5,#6,#7,#8,#9,] then SLFile.Strings[i]:=',';
  4.               end;  
  5.  

theo

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1927
Re: How to remove non-printable characters from INSIDE a string
« Reply #5 on: November 29, 2011, 09:58:34 pm »

Theo - thanks for the suggestion, which I have also tried, but I am being told "Error: Operator is not overloaded".
Code: Pascal  [Select][+][-]
  1. for j := 1 to Length(SLfile[i]) do
  2.               begin
  3.                 if SLFile.Strings[i] in [#3,#4,#5,#6,#7,#8,#9,] then SLFile.Strings[i]:=',';
  4.               end;  
  5.  

Yes, don't you see the problem?
You have to access the string char by char, but you are accessing single strings of a stringlist.
Code: [Select]
SLFile[i]  or SLFile.Strings[i]  which is the same is a string, not a character)
I'd write a separate function for the replacement which takes and returns a string.
« Last Edit: November 29, 2011, 10:00:59 pm by theo »

Arbee

  • Full Member
  • ***
  • Posts: 223
Re: How to remove non-printable characters from INSIDE a string
« Reply #6 on: November 30, 2011, 07:57:10 am »
You don't have to write such a function.  the "rexxstring00.pp" file I showed above already contains it.
1.0/2.6.0  XP SP3 & OS X 10.6.8

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: How to remove non-printable characters from INSIDE a string
« Reply #7 on: November 30, 2011, 10:57:18 am »
Code: [Select]
  for i := 0 to SLfile.Count - 1 do
  begin
    s := SLfile[i];
    for j := 1 to Length(s) do
      if (s[j] in [#3..#13]) then
        s[j] := '_';
    SLfile[i] := s;
  end;

or:

Code: [Select]
function CleanupString(S :string) :string;
var
  i :integer;
begin
  Result := S;
  for i := 1 to Length(Result) do
    if Result[i] in [#3..#13] then
      Result[i] := '_';
end;   

procedure TForm1.Button1Click(Sender: TObject);
{...}
begin
{...}
  for i := 0 to SLfile.Count - 1 do
    SLfile[i] := CleanupString(SLfile[i]);   
{...}
end;

« Last Edit: November 30, 2011, 11:25:01 am by typo »

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: How to remove non-printable characters from INSIDE a string
« Reply #8 on: November 30, 2011, 12:26:21 pm »
Typo

Thankyou so much - it works perfectly. That is very clear to understand now that I see it in front of me! I am at that annoying stage in my programming life when I can understand something written for me, but I struggle when it comes down to sitting here and thinking "How do I code this...?". Sorry to have taken up peoples time with something which, as it turns out, is not as advanced as I thought.

Ted

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: How to remove non-printable characters from INSIDE a string
« Reply #9 on: November 30, 2011, 12:38:08 pm »
Of course some library will magically do the same.

PaulANormanNZ

  • Full Member
  • ***
  • Posts: 115
Re: How to remove non-printable characters from INSIDE a string
« Reply #10 on: July 26, 2017, 05:30:22 am »
Hi,

I was wanting to give an attribution and license for this library (rexxstring00.pp - below), and wondered if any one knew who this " guy in Germany " is please - I've tried Googling.

Thanks,
Paul

For such non-standard functions I have on and off been using a module, developed by a guy in Germany that emulates string functions from the REXX programming language.
One of the functions in there is TRANSLATE, which could be used for that purpose.
It's syntax is:
Code: [Select]
str_out := translate(str_in,{output_table},{input_table));
It searches all characters from str_in in the input_table (which is also just a string of characters) and - when found - replaces it with the character in output_table at the same position.  If not found, it copies the character unchanged.  So BigChimps example above would boil down to:

Code: [Select]
YourStringList.Text := translate(YourStringList.Text,'   ',#9+#10+#13);

Here's the source of "rexxstring00.pp"

Bart

  • Hero Member
  • *****
  • Posts: 5275
    • Bart en Mariska's Webstek
Re: How to remove non-printable characters from INSIDE a string
« Reply #11 on: July 26, 2017, 11:05:17 pm »
The LazUtf8.Utf8EscapeControlChars() function may or may not be of help to you.

Bart

 

TinyPortal © 2005-2018