Lazarus

Free Pascal => Beginners => Topic started by: Spoonhorse on June 15, 2021, 12:08:11 pm

Title: Microexamples
Post by: Spoonhorse on June 15, 2021, 12:08:11 pm
FPC's great and all but it's very hard to find out how to do stuff in it. A lot of the time when I Google some feature the top hits aren't the documentation (because what even is that (https://wiki.freepascal.org/Programming_Using_Interfaces)? ) but people on message boards bickering about some abstruse detail of the feature and whether certain wicked people are betraying an unholy love of C dialects.

Then when I do find something it's often howlingly bad, even I can see that. Things done with people's handful of known hacks, without insight. I was trying to find out how to turn off Paste in TEdit and TMemo. Suggestions included "make any key event erase the contents of the clipboard", "use the key events to erase the Shift state when the user presses Ctrl" and this absolute monstrosity (https://www.swissdelphicenter.ch/en/showcode.php?id=1403). I believe that it works, I believe that the author was proud of it.

And then there's the obstacle presented by the "good" code I find on the Internet.

I put "good" in quotation marks because the one good coding practice it never seems to follow is proper commenting. If you're going to show something to other people, please please explain how it works, otherwise you're just saying "and here we chant the mystic runes" and I and everyone else who reads it are left with something we can neither understand nor modify.

(This is, I think, how some of the bad code happens. Features of the good code are retained when it's modified even though the modification makes them unnecessary. Rinse, repeat.)

In other ways the good code is often obnoxiously good in that the actual point I'm trying to get to is hidden under excellent software engineering. The author will kick off by defining the public constants that the user should use when setting the property of the component that's going to control the behavior ... and then goes on through the exception handling ... and so on down to the end where the author says where to register the component.

Meanwhile what I want for my own purposes are the handful of lines that actually do the clever thing. Instead I'm being presented with a complete solution to something different from what I want to do and I have to start digging for these lines, which will not be marked as the particularly useful bit, or explained in any way.

What one needs is the exact opposite, a minimal example which is thoroughly explained. Like this.

Code: Pascal  [Select][+][-]
  1. // The OS sends messages to the visual components which we can intercept by
  2. // overriding the WinProc method of the class.
  3.  
  4. // (For historical reasons everything related to this feature including the names
  5. // of the constants is Windows-related but the feature is not in fact OS specific.)
  6.  
  7. // This example is implemented as an interposer class, see
  8. // https://forum.lazarus.freepascal.org/index.php?topic=54971.msg408998#msg408998
  9. // for more details.
  10.  
  11. unit NoPaste;
  12.  
  13. {$mode objfpc}{$H+}
  14.  
  15. interface
  16.  
  17. uses
  18.   Classes, SysUtils, StdCtrls, Windows;
  19.  
  20. type
  21.   TEdit = class(StdCtrls.TEdit)
  22.   protected
  23.       procedure WndProc(var M: TMessage); override;
  24.   end;
  25.  
  26. implementation
  27.  
  28.   procedure TEdit.WndProc(Var M: TMessage);
  29.   begin
  30.   if M.msg = WM_PASTE then M.msg:=WM_NULL;   // We turn off the message ...
  31.   inherited;   // .. and then go ahead and do what the method would usually do.
  32.   end;
  33. end.
  34.  
  35. // You can use it to turn off any of the other Windows messages to any
  36. // components (a list of the messages can be found here:
  37. // https://wiki.winehq.org/List_Of_Windows_Messages ) and indeed
  38. // to do things other than just shutting them off, though that alone
  39. // could be done with nicer syntactic sugar using the "message" keyword.

This tiny bit of code which can easily be understood and repurposed is way more useful to someone who stumbles across it than a good bit of code that can't. I shall be putting some more stuff here, I've found out some gritty technical things about messing with TEdit and TMemo and TRichMemo, of which the bits other people need to know could be distilled down to five or six snippets of code which are much more readable than my source.

I wish more people could do the same. Perhaps they all have already and it's all sitting in a repository I don't know about.
Title: Re: Microexamples
Post by: Spoonhorse on June 15, 2021, 01:00:07 pm
One more example-of-an-example tonight to show what I'm talking about. Let's learn how to mess with the input field of a TEdit, changing the user's input like an autocorrect or autocomplete. Similar techniques will work for a TMemo or TRichMemo. (More about that later.) As in the previous example, this is implemented as an interposer class, because no-one would want it as-is.

So there are two parts to knowing how to do this. The first is knowing where to do it. Trying to do it using the event handlers simply doesn't work so well, try it if you don't believe me. This Is The Way. (I'll explain why if anyone's interested.) Second, when you reset the Text field of a TEdit the OS "helpfully" puts the cursor at the start of the TEdit. We need to put it where it should be. We can do this by assuming that the change took place just behind the cursor. Then after the change has been made we need to put the cursor back where it was and then increase its position by the difference between the new length of the string and the old length. High school algebra should show you that this is what the expressions in the code do.

I'll talk about why we're using the UFT16 length later. If you don't do that, see what happens if you use some of the higher-range Unicode characters, such as Chinese.

Code: Pascal  [Select][+][-]
  1. unit SmartEdit;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, StdCtrls, StrUtils, LazUTF16, Windows;
  9.  
  10. type
  11.   TEdit = class (StdCtrls.TEdit)
  12.   protected
  13.     procedure TextChanged; override;
  14. end;
  15.  
  16. implementation
  17.  
  18. procedure TEdit.TextChanged;
  19. var offset: integer;
  20. begin
  21. offset:=SelStart-utf16length(Text);
  22. Text:=replacestr(Text,'x','yy');
  23. SelStart:=offset+utf16length(Text);
  24. end;
  25.  
  26. end.

The code given will slickly and seamlessly turn 'x' into 'yy'. Anyone who wants to do anything else with it can take it from there much more easily than if I'd posted code that does something useful.
Title: Re: Microexamples
Post by: dbannon on June 15, 2021, 01:05:15 pm
Hmm, looking at your first example, I wonder how cross platform it is ?   8)

https://wiki.freepascal.org/Portal:HowTo_Demos might be usefull.  But honestly, to expect a nicely packaged snipit for every specific problem is probably a bit optimistic.

But don't let that put you off !

Davo
Title: Re: Microexamples
Post by: Spoonhorse on June 15, 2021, 01:12:58 pm
dbannon, I haven't tested that because all I have is this one laptop BUT as I say in the comments, the fact that the thing uses Windows terminology is just a historical relic presumably left over from Delphi days, according to the documentation stuff I found on the Internet Lazarus is set up so the same messages are passed by every OS.

And no, there's not going to be a snippet for every problem but if people are going to solve problems and then tell people about the solutions on the Internet they could consider doing it in the most snippety way possible.
Title: Re: Microexamples
Post by: Kays on June 15, 2021, 03:15:02 pm
[…] I was trying to find out how to turn off Paste in TEdit and TMemo. […]
[…] the input field of a TEdit, changing the user's input like an autocorrect or autocomplete. […]
I think you’re looking for recipes, a “cookbook”, not what you call “micro-examples”. The documentation and textbooks do contain “micro-examples”, source code (fragments) explaining individual details, but alone they aren’t that helpful if you’re trying to solve problems like you want, because you have to arrange multiple steps in the correct order, you know. That’s a “recipe” to me, and I’m afraid there is simply no Delphi-specific cookbook available (via the internet, for free).
Title: Re: Microexamples
Post by: Zvoni on June 15, 2021, 03:41:29 pm
I think you’re looking for recipes, a “cookbook”, not what you call “micro-examples”. *snip*

And you do remember what happened to the kitchen having 10 chefs, and only 2 waiters.......
Title: Re: Microexamples
Post by: Spoonhorse on June 15, 2021, 05:02:08 pm
Kai, no, I already know how to arrange multiple steps in the correct order. Supplied with the ingredients I can make my own recipes. It's finding the individual bits that has been the problem.

(In this case, the difficult thing was finding out how to capture the content of the Text field after the user's keyboard input has changed it but before the change registers on the screen. Everything else is just string handling.)
Title: Re: Microexamples
Post by: Spoonhorse on June 15, 2021, 08:04:27 pm
Let's continue looking at how to mess with the text entry components (TEdit, TMemo, TRichMemo if you have it, etc), specifically at the cursors of the components. We'll take TEdit as an example and look at the extra complexities of the memos later.

The TEdit component has a cursor position given by the property SelStart. (The length of the highlighted text, if any, is then recorded by SelLength). You would think that the SelStart property in the TEdits would reflect the number of characters between the start of the line and the cursor, and similarly the length of SelLength would record a number of characters. You would be wrong. Let's look at why.

Originally there was ASCII which encoded characters using the low seven bits of an eight-bit byte — the numbers from 0 to 127. This situation couldn't last.

FPC uses the UTF8 encoding. This represents further characters using one, two, or three bytes. You don't need to know the exact details.

Microsoft went a different way, and since FPC mimics Delphi, which was written for Windows, elements of Microsoft's preferred system still need to be taken into account when writing FPC. And they used the encoding UTF16. This represents every character as either one or two sixteen-bit "words".

So: the cursor position SelStart of a TEdit is given by the number of sixteen-bit words it would take to UTF16-encode the characters between the cursor and the start of the line (and SelLength is given in the same metric). Meanwhile when we ask for the actual Text field of the TEdit we're given a UTF8 string. And the original Pascal function Length, applied to a UTF8-encoded string, returns the number of bytes it takes up. (If it happens to also be ASCII, this will also be the number of characters, otherwise not.)

To help us deal with this situation we have the Pascal libraries LazUTF8 and LazUTF16. These contain the functions utf8length and utf16length. The naming of these functions is sheer madness and confused the heck out of me. Now it's your turn.

utf8length takes a string encoded in utf8 and says how many characters are represented by the encoding.

utf16length takes a string encoded in utf8 and tells you how many sixteen-bit words it would take to represent it if it was encoded in utf16 instead.

This means that if we want to put the cursor at a given position in our (UTF8-encoded) Text string, it's quite easy. Suppose we want to stick it after the 远 in 望远镜座. Then we can set the cursor position to be utf16length('望远').

Going the other way takes a little more work — or at least if there's an easier way than what follows I haven't found it. Maybe it's in a library somewhere.

The two-word UFT16 encodings use exclusively the hexadecimal ranges $D800 - $DBFF for the first of the two words and $DC00 - $DFFF for the second. All we have to do is go through the string one 16-bit word at a time looking for stuff like that, and count each such pair of words as one character. The following function does just that, converting a position in the string in the UTF16 metric to the number of characters to the left of that position.

Code: Pascal  [Select][+][-]
  1. function utf8cur(x:integer; s:string):integer;
  2. var WS: WideString;
  3.     i,j:integer;
  4. begin
  5. utf8cur:=0;
  6. WS:=utf8toutf16(s); //   We convert the string to utf16
  7. j:=0;
  8. //As FPC knows a WideString has word elements, length(WS) is the number of words, not bytes.
  9. for i:=1 to length(WS) do
  10.     begin
  11.     if not ((ord(WS[i])>= $D800) and (ord(WS[i])<$DC00)) then
  12.         j:=j+1;        // So we count every word except the first word of each two-word pair.
  13.     if i=x then utf8cur:=j;
  14.     end;
  15. end;

So if we have a TEdit called AEdit, for example, then the substring of characters to the left of the cursor is given by utf8copy(AEdit.Text, 1, utf8cur(AEdit.SelStart, AEdit.Text))

That was a lot of explanation for a few lines of code, but you do understand it now. This is almost everything you need to know to get your own code to work with cursors, except we also need to talk about memos ...
Title: Re: Microexamples
Post by: engkin on June 15, 2021, 09:42:08 pm
Thank you for sharing your understanding. I think this paragraph is not correct:
Quote
utf16length takes a string encoded in utf8 and tells you how many sixteen-bit words it would take to represent it if it was encoded in utf16 instead.

Can you please check the parameter(s) taken by utf16length?
Title: Re: Microexamples
Post by: winni on June 15, 2021, 09:55:34 pm
Hi!

Utf16length calls UTF16CharacterLength per UTF8char.

The result of UTF16CharacterLength can be 0, 1 or 2.

Winni
Title: Re: Microexamples
Post by: Spoonhorse on June 15, 2021, 10:16:43 pm
engkin, you're quite right, I think some sort of implicit type conversion is going on, I'll rewrite that bit when my brain is less wooly. It is a maddening subject.

winni, thanks! Under what circumstances would it be 0?
Title: Re: Microexamples
Post by: winni on June 15, 2021, 11:33:10 pm
Hi!

Just have a look at the source!

If the PWideChar is Nil then zero is returned:


Code: Pascal  [Select][+][-]
  1. function UTF16CharacterLength(p: PWideChar): integer;
  2. // returns length of UTF16 character in number of words
  3. // The endianess of the machine will be taken.
  4. begin
  5.   if p<>nil then begin
  6.     if (ord(p[0]) < $D800) or (ord(p[0]) > $DFFF) then
  7.       Result:=1
  8.     else
  9.       Result:=2;
  10.   end else begin
  11.     Result:=0;
  12.   end;
  13. end;  


Winni
Title: Re: Microexamples
Post by: engkin on June 16, 2021, 12:05:59 am
The example you used does not show the problem. Try something like:
ABC-𐐀𐐁𐐂𐐃
Title: Re: Microexamples
Post by: engkin on June 16, 2021, 01:58:16 am
When UTF16 codepoint is 2 words long, UTF8 codepoint is 4 bytes long. Based on that we can rewrite your code to get rid of string conversion between UTF8 and UTF16:
Code: Pascal  [Select][+][-]
  1. function U16IndexToU8Index(u16Idx:integer; s:string):integer;
  2. var
  3.   U8CPSize:integer;
  4.   p:PChar;
  5.   U8Idx:integer absolute Result;//U8Idx and Result are the same variable, just two names
  6. begin
  7.   U8Idx:=0;
  8.   p:=pchar(s);
  9.   while u16idx>0 do
  10.   begin
  11.     U8CPSize:=UTF8CodepointSizeFast(p);
  12.     case U8CPSize of
  13.       1..3:dec(u16idx,1);//U16CPSize is 1 here
  14.       4:dec(u16idx,2);   //U16CPSize is 2 here
  15.       else
  16.         exit;
  17.     end;
  18.     inc(p,U8CPSize);//Next utf8 codepoint
  19.     inc(U8Idx,1);
  20.   end;
  21. end;

This way we can write your SelStart example like:
Code: Pascal  [Select][+][-]
  1.   u8Idx:=U16IndexToU8Index(Edit1.SelStart, Edit1.Text);
  2.   s:=UTF8Copy(Edit1.Text, 1, u8Idx);

But we can make it faster by not using UTF8Copy. To do so we adjust the previous code to give byte index. Instead of adding 1 to U8Idx, we can add the size:
Code: Pascal  [Select][+][-]
  1. function U16IndexToU8ByteIndex(u16Idx:integer; s:string):integer;
  2. var
  3.   U8CPSize:integer;
  4.   p:PChar;
  5.   U8ByteIdx:integer absolute Result;//U8ByteIdx and Result are the same variable, just two names
  6. begin
  7.   U8ByteIdx:=0;
  8.   p:=pchar(s);
  9.   while u16idx>0 do
  10.   begin
  11.     U8CPSize:=UTF8CodepointSizeFast(p);
  12.     case U8CPSize of
  13.       1..3:dec(u16idx,1);//U16CPSize is 1 here
  14.       4:dec(u16idx,2);   //U16CPSize is 2 here
  15.       else
  16.         exit;
  17.     end;
  18.     inc(p,U8CPSize);//Next utf8 codepoint
  19.     inc(U8ByteIdx,U8CPSize);
  20.   end;
  21. end;

Again, your SelStart example:
Code: Pascal  [Select][+][-]
  1.   u8BIdx:=U16IndexToU8ByteIndex(Edit1.SelStart, Edit1.Text);
  2.   s:=Copy(Edit1.Text, 1, u8BIdx);
Title: Re: Microexamples
Post by: Spoonhorse on June 16, 2021, 06:03:46 am
Engkin, thank you. I didn't know that there was a constant relationship between the sizes of the encodings, this is good stuff.
Title: Re: Microexamples
Post by: Spoonhorse on June 16, 2021, 12:40:12 pm
So, on to the memos, TMemo and TRichMemo. As TRichMemo was designed to mimic TMemo (AND VERY NEARLY DOES BUT SEE WARNING BELOW) we can just talk about TMemo for now and it'll carry over.

The data in a TMemo can be accessed in two ways:

(1) The Lines stringlist. This contains the lines as they appear on the screen. It will change when the TMemo is resized, or when word wrap is turned on and off.

(2) The Text field in which the data is all one big string in which the paragraphs (not the Lines, but the paragraphs of text) are separated by a carriage return character and a line feed character (#13 and #10).

These characters do not appear in the Lines, and so do not affect the length of a Line that they terminate.

So, the TMemo has two ways to address the cursor position.

(1) There is CaretPos, of type TPoint (that is, having two fields, CaretPos.X and CaretPos.Y).

CaretPos.Y is a (zero-indexed) count of which Line of the TMemo the cursor is in.

And then CaretPos.X measures the distance between the start of the Line and the cursor in the same old UTF16 metric we're familiar with from TEdit (see above).

(2) There is SelStart and SelLength, like in a TEdit, which give the position in the Text. While you can use both CaretPos and SelStart to change the position of the cursor, so far as I know you can only use SelLength to choose the length of a selection, there's no equivalent to CaretPos for that. (If you find one, let me know.) These also use the UTF16 metric.

And these do include the #13 and #10 characters in their count.

IMPORTANT WARNING ABOUT TRichMemo. TRichMemo is usually designed to work the same as TMemo, but there are some infuriating discrepancies and one is that everything above applies except that there is only one character (#13) between paragraphs and not two. All code must be adapted accordingly.

So if we want to convert from one metric to the other, we need to take the paragraph breaks into account. But how do we find the paragraph breaks? If we have word wrap turned off it's easy enough. Each line then corresponds to one paragraph, so we can do this (this is the TMemo version with two characters per paragraph break). You hardly need me to tell you:

Code: Pascal  [Select][+][-]
  1. function xytox(M: TMemo): integer;
  2. var i:integer;
  3. begin
  4. xytox:=0;
  5. for i:=0 to M.CaretPos.Y-1 do
  6.     xytox:=xytox+utf16length(M.Lines[i])+2;
  7. xytox:=xytox+M.CaretPos.X;
  8. end;

But what if we have word wrap turned on? We need to look at the underlying Text. The end of a Line in the memo either does or does have a newline after it in the Text and by measuring off distances in the Text corresponding to those in the Line we can look and see if they're there. (I haven't implemented this because everything I myself need to do has word wrap turned off and if I do do it I don't think I'm going to fit it in tonight but there's no reason why that wouldn't work, is there?)
Title: Re: Microexamples
Post by: Spoonhorse on June 20, 2021, 09:19:11 am
Going to post something longer later. Just wanted to register my bafflement, 'cos I just found out ... working with TRichMemo (don't know about TMemo), changing the Text from an empty string to anything else or vice-versa triggers a resize event.
TinyPortal © 2005-2018