Recent

Author Topic: Find text  (Read 14000 times)

justnewbie

  • Sr. Member
  • ****
  • Posts: 292
Find text
« on: April 10, 2018, 03:29:29 pm »
An interesting task that is beyond my knowledge (again  ::)).

I have a text file. I need to find the highlighted words within the file (see picture).
These words are following one of these key strings: int, double, string and boolean.
The number of space characters can be different between the key strings and highlighted values.
I need to get the highlighted words in a TMemo.

Is there anyone who got this challenge?  :)
(Attached the text file and the example picture.)
« Last Edit: April 10, 2018, 03:32:01 pm by justnewbie »

lainz

  • Hero Member
  • *****
  • Posts: 4460
    • https://lainz.github.io/
Re: Find text
« Reply #1 on: April 10, 2018, 03:42:23 pm »
Use TStringList.

Each line will become a string.

Then for each line in the TStringList create another TStringList that split by spaces.

Then in that second TStringList, search for each reserved word. Then, the characters in the next index are the word you're looking for.

TStringList
* string hello -> TStringList ([string], [hello]) -> hello is in index 1
* int some[] -> TStringList ([int], [some[]]) -> some[] is in index 1 -> remove "[]" -> it becomes "some"

You need to strip any non interesting character from the second word, if is that you need to do.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: Find text
« Reply #2 on: April 10, 2018, 04:44:03 pm »
That would work, but can be much easier for any string if you use one of the overloads of type string:
Code: Pascal  [Select][+][-]
  1.     Function IndexOfAny(const AnyOf: array of String): Integer; overload;
  2.     Function IndexOfAny(const AnyOf: array of String; StartIndex: Integer): Integer; overload;
  3.     Function IndexOfAny(const AnyOf: array of String; StartIndex: Integer; ACount: Integer): Integer; overload;
  4.     Function IndexOfAny(const AnyOf: array of String; StartIndex: Integer; ACount: Integer; Out AMatch : Integer): Integer; overload;
  5.     function IndexOfAnyUnquoted(const AnyOf: array of string; StartQuote, EndQuote: Char; StartIndex: Integer; Out Matched: Integer): Integer; overload;
  6.  
Since these are typehelpers they work on any string directly.    AString.IndexOfAny(
Specialize a type, not a var.

justnewbie

  • Sr. Member
  • ****
  • Posts: 292
Re: Find text
« Reply #3 on: April 10, 2018, 05:37:51 pm »
Guys, it is a bit difficult for me.
So far I have this (the file is already loaded into Memo1):
Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button2Click(Sender: TObject);
  2. var
  3.   i, j: integer;
  4.   actPos: integer;
  5. begin
  6.   Memo2.Lines.Clear;
  7.   for i := 0 to Memo1.Lines.Count - 1 do
  8.   begin
  9.     for j := low(keyWords) to high(keyWords) do
  10.     begin
  11.       actPos := pos(keyWords[j], Memo1.Lines[i]);
  12.       if actPos > 0 then
  13.       begin
  14.         Memo2.Lines.Append(keyWords[j] + '  #  R:' + IntToStr(i + 1) + ' / C:' + IntToStr(actPos));
  15.       end;
  16.     end;
  17.   end;
  18. end;

My problem: this code gives hit for "int" if there is eg. "interceptor".
How can I get the proper hits (only the whole words)?
Based on my code, please.

Also, how can I get the next whole word right after the keyWords?

justnewbie

  • Sr. Member
  • ****
  • Posts: 292
Re: Find text
« Reply #4 on: April 10, 2018, 06:58:31 pm »
Well, by using "IsWordPresent" I can get the whole keyWords (int, double etc ...)
One question remains: how can I get the words after the keyWords?

lainz

  • Hero Member
  • *****
  • Posts: 4460
    • https://lainz.github.io/
Re: Find text
« Reply #5 on: April 10, 2018, 07:01:37 pm »
Well, by using "IsWordPresent" I can get the whole keyWords (int, double etc ...)
One question remains: how can I get the words after the keyWords?

What I do is read character by character until a reserved word or reserved character is present.

for example:

int some=10;

you already found int, then read character by character until the sign = is present, then you can remove the space and get the word "some". The same for finding "10". You find = and then go char by char checking if eachchar is a valid number, until it reach ";" or end of line, or space, whatever you need.

justnewbie

  • Sr. Member
  • ****
  • Posts: 292
Re: Find text
« Reply #6 on: April 10, 2018, 07:37:59 pm »
Well, by using "IsWordPresent" I can get the whole keyWords (int, double etc ...)
One question remains: how can I get the words after the keyWords?

What I do is read character by character until a reserved word or reserved character is present.

for example:

int some=10;

you already found int, then read character by character until the sign = is present, then you can remove the space and get the word "some". The same for finding "10". You find = and then go char by char checking if eachchar is a valid number, until it reach ";" or end of line, or space, whatever you need.

It is not good, please look at the example picture of post#1. There are different cases.
Also, I found "int", but I did not get its position (only true/false). Furthermore, don't forget, there can be eg. "internet" in the same line as well.

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Find text
« Reply #7 on: April 10, 2018, 07:50:36 pm »
By setting up your keywords array with a bit of cunning, you can avoid the need for secondary parsing using a temporary stringlist, and use only the Pos function. This assumes there is never more than one keyword per line.
See the attached example.

justnewbie

  • Sr. Member
  • ****
  • Posts: 292
Re: Find text
« Reply #8 on: April 10, 2018, 08:01:20 pm »
@howardpc:
Little misunderstanding.
These are not the keyWords:
Code: Pascal  [Select][+][-]
  1. FKeywords:= TStringArray.Create(' blVal', ' dblVal', ' intVal', ' strVal');
I want to get these words, but I don't know them in advance. Lets call these "wantedWords".

Here are the keyWords, these are known in advance:
Code: Pascal  [Select][+][-]
  1. keyWords: array [0..3] of string = ('int', 'double', 'string','bool');
wantedWords follow the keyWords (see the picture in post#1).
« Last Edit: April 10, 2018, 08:08:16 pm by justnewbie »

lainz

  • Hero Member
  • *****
  • Posts: 4460
    • https://lainz.github.io/
Re: Find text
« Reply #9 on: April 10, 2018, 08:16:46 pm »
It is not good, please look at the example picture of post#1. There are different cases.
Also, I found "int", but I did not get its position (only true/false). Furthermore, don't forget, there can be eg. "internet" in the same line as well.

Well, is in fact the best way, finding char by char you can check for any rule you need.

A whole word is usually a word between spaces or between special symbols.

int internet=10

int is separated by space, so is not the same as internet because the second has a e following int internet

justnewbie

  • Sr. Member
  • ****
  • Posts: 292
Re: Find text
« Reply #10 on: April 10, 2018, 08:35:17 pm »
@lainz:
OK, see this pic below. These are all different cases.
How can you get all of the "myint"?
(Don't forget the spaces before/after.)

lainz

  • Hero Member
  • *****
  • Posts: 4460
    • https://lainz.github.io/
Re: Find text
« Reply #11 on: April 10, 2018, 08:42:58 pm »
Char by char.

For example this website I did
https://lainz.github.io/webapps/pseudocodigo/

That converts from Spanish Pseudo code language to JavaScript
view-source:https://lainz.github.io/webapps/pseudocodigo/app.js

And yes, I can differentiate between keywords and variables.

I can find them no matter the ammount of spaces that are between words. Because I skip these spaces. As you can see in the sources there is also a list of keyworkds and operators I use to determine everything.

Is not perfect, but there you have code you can use.

I will add a demo, but not now since I'm working

In simple terms: ignore spaces, ignore everything you don't need.

Or if you want to do it really well, analyze everything, spaces, symbols, all, so you know exactly what is typed and what you need to extract from there.
« Last Edit: April 10, 2018, 08:44:37 pm by lainz »

justnewbie

  • Sr. Member
  • ****
  • Posts: 292
Re: Find text
« Reply #12 on: April 10, 2018, 09:10:43 pm »
I made my version, not perfect, but pretty good (for just a newbie  :)):

Code: Pascal  [Select][+][-]
  1. const myWordDelims = [#0..' ', ',', '.', ';', '/', '\', ':', '''', '"', '`', '=', '&', '+'] + Brackets;
  2.  
  3. //...
  4.  
  5. procedure TForm1.Button2Click(Sender: TObject);
  6. const
  7.   crlf = #13#10;
  8. var
  9.   i, j: integer;
  10.   actWord: integer;
  11.   wText: string = '';
  12. begin
  13.   Memo2.Lines.Clear;
  14.   for i := 0 to Memo1.Lines.Count - 1 do
  15.   begin
  16.     actWord := 1;
  17.     while ExtractWord(actWord, Memo1.Lines[i], myWordDelims) <> '' do
  18.     begin
  19.       for j := low(keyWords) to high(keyWords) do
  20.       begin
  21.         if ExtractWord(actWord, Memo1.Lines[i], myWordDelims) = keyWords[j] then
  22.         begin
  23.           wText := wText + ExtractWord(actWord + 1, Memo1.Lines[i], myWordDelims) + crlf;
  24.         end;
  25.       end;
  26.       Inc(actWord);
  27.     end;
  28.     Memo2.Text := wText;
  29.   end;
  30.   Label1.Caption := 'Found: ' + IntToStr(Memo2.Lines.Count);
  31. end;

Noodly

  • Jr. Member
  • **
  • Posts: 70
Re: Find text
« Reply #13 on: April 10, 2018, 09:18:01 pm »
Don't forget to take into account upper and lower case when matching.
Windows 10 Home, Lazarus 2.02 (svn 60954), FPC 3.04

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: Find text
« Reply #14 on: April 10, 2018, 09:21:24 pm »
That's encouraging! Compliments!!
Now find out to write it in three lines.... :P 8-)
Specialize a type, not a var.

 

TinyPortal © 2005-2018