Recent

Author Topic: [SOLVED] POS counts accentuation as characters  (Read 3050 times)

alveselvis2

  • New Member
  • *
  • Posts: 10
[SOLVED] POS counts accentuation as characters
« on: July 16, 2018, 05:30:35 am »
Hi people, I have a problem using Pos, it counts as if "ç é à í ó" etc were two characters, so that:

Pos(' ','boné cama') would return 1 more position than Pos(' ','bone cama')

That is ruining my code, is there anything I can do?

Regards
« Last Edit: July 16, 2018, 08:54:02 pm by alveselvis2 »

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: POS counts accentuation as characters
« Reply #1 on: July 16, 2018, 06:32:21 am »
Hi people, I have a problem using Pos, it counts as if "ç é à í ó" etc were two characters, so that:
Pos(' ','boné cama') would return 1 more position than Pos(' ','bone cama')
That is ruining my code, is there anything I can do?
One possible alternative:
Code: Pascal  [Select][+][-]
  1. uses LazUTF8;
  2. ...
  3. procedure TForm1.Button1Click(Sender: TObject);
  4. begin
  5.   ShowMessage('Pos "boné cama":' + IntToStr(Pos(' ', 'boné cama')) + LineEnding +
  6.               'Pos "bone cama":' + IntToStr(Pos(' ', 'bone cama')) + LineEnding +
  7.               'UTF8Pos "boné cama":' + IntToStr(UTF8Pos(' ', 'boné cama')) + LineEnding +
  8.               'UTF8Pos "bone cama":' + IntToStr(UTF8Pos(' ', 'bone cama')));
  9.  
  10. end;
Result:
Pos "boné cama":6
Pos "bone cama":5
UTF8Pos "boné cama":5
UTF8Pos "bone cama":5

alveselvis2

  • New Member
  • *
  • Posts: 10
Re: POS counts accentuation as characters
« Reply #2 on: July 16, 2018, 06:48:07 am »
Interesting! It is better now, but I still have problem with words coming after the one with accent.
Here's my code, let's imagine this "procedure" adds what's in my clipboard to the end of each word (say the end of a word is what comes before the next blank space) :

Code: Pascal  [Select][+][-]
  1. var
  2.   posi:integer;
  3.   caret: TPoint;
  4. begin
  5.     Caret.y:=Memo.CaretPos.y;
  6.     Caret.x:=Memo.CaretPos.x;
  7.     posi:=UTF8Pos(' ',Memo.Lines[Caret.y].Substring(Caret.x,Length(Memo.Lines[Caret.y])));
  8.     If posi<>0 then
  9.     begin
  10.       Caret.x:=Memo.CaretPos.x+posi-1;
  11.       Memo.CaretPos:=Caret;
  12.       Memo.PasteFromClipboard;
  13.     end
  14. end;

This way, when my cursor is in the middle of the word "bagunça" in the sentence "vc bagunça demais as coisas", and supposing my clipboard has "_TAG", "bagunça" becomes "bagunça_TAG", perfect.

However, if the cursor is at "demais", it becomes "demais _TAGas".

Sentences without any accentuation are tagged like a charm.

Any idea?
« Last Edit: July 16, 2018, 04:27:25 pm by JuhaManninen »

bytebites

  • Hero Member
  • *****
  • Posts: 624
Re: POS counts accentuation as characters
« Reply #3 on: July 16, 2018, 07:42:41 am »
Add
Code: Pascal  [Select][+][-]
  1. {$codepage utf8}
to the beginning of the file.

alveselvis2

  • New Member
  • *
  • Posts: 10
Re: POS counts accentuation as characters
« Reply #4 on: July 16, 2018, 07:55:08 am »
Not yet... It's like every accentuation adds one more character to the actual number of elements in a line.
So, if the sentence is:

(1) "aaaaaaaa bbbbbb" >> "aaaaaaaa_TAG bbbbbb"
(2) "aaaaãaaa bbbbbb" >> "aaaaaaaa _TAGbbbbbb"
(3) "aaaaãããã bbbbbb" >> "aaaaãããã bbb_TAGbbb"

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4458
  • I like bugs.
Re: POS counts accentuation as characters
« Reply #5 on: July 16, 2018, 09:21:43 am »
alveselvis2, you should read this page:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus
and especially this one:
 http://wiki.freepascal.org/UTF8_strings_and_characters

Hint: You can use the byte offset returned by Pos() also with Unicode. A space and TAB are one byte.
Your first code snippet was right but then you must have screwed it somehow later.

Add
Code: Pascal  [Select][+][-]
  1. {$codepage utf8}
to the beginning of the file.
bytebites, please don't give wrong advice!
{$codepage utf8} is not needed and it has nothing to do with the issue alveselvis2 tries to solve.
« Last Edit: July 16, 2018, 09:37:19 am by JuhaManninen »
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

bytebites

  • Hero Member
  • *****
  • Posts: 624
Re: POS counts accentuation as characters
« Reply #6 on: July 16, 2018, 01:36:04 pm »
Indeed not. 
This works with latest stable version.
Code: Pascal  [Select][+][-]
  1. var
  2.   posi:integer;
  3.   caret: TPoint;
  4. begin
  5.     Caret.y:=Memo.CaretPos.y;
  6.     Caret.x:=Memo.CaretPos.x;
  7.     posi:=utf8Pos(' ',Memo.Lines[Caret.y],caret.x);
  8.     If posi<>0 then
  9.     begin
  10.       Caret.x:=posi-1;
  11.       Memo.CaretPos:=Caret;
  12.       Memo.PasteFromClipboard;
  13.     end
  14. end;

alveselvis2

  • New Member
  • *
  • Posts: 10
Re: POS counts accentuation as characters
« Reply #7 on: July 16, 2018, 08:51:28 pm »
alveselvis2, you should read this page:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus
and especially this one:
 http://wiki.freepascal.org/UTF8_strings_and_characters

Hint: You can use the byte offset returned by Pos() also with Unicode. A space and TAB are one byte.
Your first code snippet was right but then you must have screwed it somehow later.

Add
Code: Pascal  [Select][+][-]
  1. {$codepage utf8}
to the beginning of the file.
bytebites, please don't give wrong advice!
{$codepage utf8} is not needed and it has nothing to do with the issue alveselvis2 tries to solve.


I had been reading these links for hours yesterday. I didn't understand much, part because of English not being my main language, part because I'm not so used to this kind of theoretical text... but thanks for the indication, I'm gonna try and read them more paciently.

Indeed not. 
This works with latest stable version.
Code: Pascal  [Select][+][-]
  1. var
  2.   posi:integer;
  3.   caret: TPoint;
  4. begin
  5.     Caret.y:=Memo.CaretPos.y;
  6.     Caret.x:=Memo.CaretPos.x;
  7.     posi:=utf8Pos(' ',Memo.Lines[Caret.y],caret.x);
  8.     If posi<>0 then
  9.     begin
  10.       Caret.x:=posi-1;
  11.       Memo.CaretPos:=Caret;
  12.       Memo.PasteFromClipboard;
  13.     end
  14. end;

It works great! Thanks a lot, it looks like I was going the hardest way. Solved!

 

TinyPortal © 2005-2018