Recent

Author Topic: A Challenge for Lazarus Gurus  (Read 1195 times)

tfurnivall

  • New Member
  • *
  • Posts: 49
A Challenge for Lazarus Gurus
« on: June 14, 2025, 06:41:18 pm »
A Challenge for Lazarus Gurus
Introduction
I’ve had occasion recently to think about the file system that underlies Lazarus and FPC.  I’m transferring some utility code to Lazarus that has its roots in older (some mainframe, some more recent) environments, and I’m running into some conceptual and philosophical issues. This forum is populated by people who have vast amounts of collective knowledge, and it occurred to me to pose a challenge (in the hopes that it might provoke people to dump some data that they might not otherwise think to share).
So here it is! A problem statement and series of questions. The original document is uploaded so you can read it off-line if you wish.
Good luck, and thankyou in advance!
Statement
You are given a value ( an integer of some size) which, you are told, represents your “gateway to a file”.
This integer value may be passed to previously compiled code in order to make changes to, or retrieve information from, the file. This includes meta-information (ie information about the file, rather than information contained within the file).
Questions
You get to specify the size (in bits) of the integer, and (if necessary) whether it is signed or insigned.
1.   What are the possible semantics of the integer value? i.e. What does the integer represent? What other areas of memory might be involved with the value?
2.   What Lazarus (or FPC) facilities use this definition of ‘gateway to a file’?
3.   What Lazarus (or FPC) facilities can not be used with this definition of  ‘gateway to a file’?
4.   What additional information do you need to answer any of these questions?
5.   Imagine yourself to be a piece of code that has nothing but this integer value. Thinking about your answer to questions 3 and 4, how would you determine:
5.1.   The name of the file
5.2.   The type of the file
5.3.   The security constraints of the file
5.4.   How the file was opened
5.5.   The location, on disk, of the start of the file
5.6.   The size of the file in bytes
5.7.   The size of the file in sectors (or any other disc allocation unit you may care to use)

jamie

  • Hero Member
  • *****
  • Posts: 6989
Re: A Challenge for Lazarus Gurus
« Reply #1 on: June 14, 2025, 06:51:00 pm »
I didn't look at your attachment but what you are talking about is a FILE HANDLE which if you use OS file handlers, it can be used to navigate the file it's attached to it.

 Passing a handle to different processes does not guarantee it's going to work because there may be some permission issues.

 Also, if it does work, it can also mess up the operation of the process that originated it if its working on the file too, in the event that you change its file position index.
 
 Happy hacking.

 Jamie


The only true wisdom is knowing you know nothing

tfurnivall

  • New Member
  • *
  • Posts: 49
Re: A Challenge for Lazarus Gurus
« Reply #2 on: June 14, 2025, 08:06:50 pm »
Exactly, and that excludes any file opened with reset or rewrite, I think. It also implies that shareability (via a semaphore of some kind) is contained within the data structure pointed at by the handle. (Actually I think that would be indirectly pointed at, because the file itself is the locus of the semaphore, while the attributes of the user and the file together determine whether or not access is granted).
Example;
Two users both have 'theoretical' access to the file (via the security matrix), but one of them requests (and is granted) exclusive access. Any attempt to open the file by any other process will now fail.  (Or possibly wait on the semaphore, but that's a pretty extreme way to do it IMHO).

The attachment is simply a duplicate of the post (with one word changed - apparently 's t i m u l a t e' is considered verboten on the forum)!

Thanks for the response, Jamie, I look forward to reading others...


Tony

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11446
  • Debugger - SynEdit - and more
    • wiki
Re: A Challenge for Lazarus Gurus
« Reply #3 on: June 14, 2025, 08:07:04 pm »
Do you want to write code that provides such a handle?

Or do you want to use existing code, that provides it.

If you want to use existing code (like the Win32 API) then you don't need to worry much. The API will tell you the size of the handle (e.g. 32 or 64 bit), you have to follow that, and then you can use the API.



If you want to write your own such API.

You can freely define the size of that value (signed-ness too, but that usually does not matter since outside the API only the bit pattern matters, there is no meaning to it).


Of course inside the code that makes up the API, you need to be consistent.

If someone says
  myHandle := GetHandleForFile('/foo/bar.txt');


Then your code must return a value fitting into the handle (e.g. a number, an address, some form of token).

The important bit is, your code must later be able to map that handle back to the filename, or to any info it has about that file being open or not, and whatever else may be known.

You can return
- an address, and store the info at this address (then the handle must be big enough to hold an address)
- a number, that is an index into a list. Then you can keep your info in a list.
- a UUID like value => anything unique. Then again you keep a list, and you can search that list.


tfurnivall

  • New Member
  • *
  • Posts: 49
Re: A Challenge for Lazarus Gurus
« Reply #4 on: June 14, 2025, 08:11:19 pm »
My immediate goal is to integrate the treatment of text files and other files. In the questions I've posted recently, there is generally a SourceFile (which I will stipulate, for now, is a text file). But how do I pass a thingamyjig to a text file which is compatible with a handle from FileOpen? And if I pass a textfile to FileOpen, then I can't retrieve a line unless I do all the unblocking, translation etc. What I want is a happy medium until I can tackle the BIG problems involved in that!

THanks for the response,

Tony

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11446
  • Debugger - SynEdit - and more
    • wiki
Re: A Challenge for Lazarus Gurus
« Reply #5 on: June 14, 2025, 08:12:53 pm »
Not any longer. "stimulate".

Some words (usually url of sides used in spam) are blocked. That sometimes includes misspelled versions of the word.

Unfortunately, and I didn’t know that, this also matches within other words. There is a known sales platform that is similar to "timu" (just the I is wrong). And stimulate contained that.

440bx

  • Hero Member
  • *****
  • Posts: 5575
Re: A Challenge for Lazarus Gurus
« Reply #6 on: June 14, 2025, 08:27:10 pm »
Just FYI,

In Windows, if the file is opened by some process then it is possible for another process to get the handle to the file and, with a little mucking around, determine all the information you listed about the file.

The code to get that done isn't really simple but, Process Hacker and its newer incarnation System Informer do that (C source is available for both, you can see how it's done.)  On other O/Ss, I simply have no idea how to get the information.




From your additional posts, it sounds like a much simpler solution is available.  If all you want to do is encode the file open information so it is available to another thread or process then there is a simple solution as long as all the processes/threads involved are aware of it.

For another thread, it's really simple, just keep a global list/log of all the files opened for any thread to look at it.

Across processes, it's slightly more involved.  The list/log must be in a shared memory area the processes know about and use a mutex to control write-access to it.   It's a bit more involved but still fairly simple.

With that method, you could identify the file by its index in the log.  Index "n" gives all the information you put in the log to any process which has access to the shared area.   

HTH.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v4.0rc3) on Windows 7 SP1 64bit.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8453
Re: A Challenge for Lazarus Gurus
« Reply #7 on: June 15, 2025, 02:30:27 pm »
My immediate goal is to integrate the treatment of text files and other files. In the questions I've posted recently, there is generally a SourceFile (which I will stipulate, for now, is a text file). But how do I pass a thingamyjig to a text file which is compatible with a handle from FileOpen? And if I pass a textfile to FileOpen, then I can't retrieve a line unless I do all the unblocking, translation etc. What I want is a happy medium until I can tackle the BIG problems involved in that!

I'm getting involved with misgivings, since it would be easy for this to degenerate into an OS- or Architecture-war or for one of the usual suspects to start telling you that mainframes are the tool of capitalism and that those who associate with them are forever damned.

First, the dominant architectures supported by FPC and Lazarus all genuflect towards "the unix way": MS-DOS v2 and all its successors share with unix the concept of variable-length lines of text terminated with some form of EOL and- as Jamie points out- once opened are represented by a handle which is usually a small number.

Second, files on these systems do not carry record- or block-size metadata around with them: and yes, I was brought up on disk packs and tape spools as well.

Third, things like sharing semantics are an attribute of the operating system, and are not directly held either in the file metadata or in the language's runtime library. OK, so some of the RTL routines allow opening and sharing mode to be specified, but that doesn't mean that the OS will obey it: unix in particular has historically been very lax about its sharing modes (and I don't believe you've told us what OS you're using for FPC/Lazarus).

I'm assuming that you're aware of the FPC documentation root at https://www.freepascal.org/docs.html and the non-authoritative wiki https://wiki.freepascal.org/Main_Page and in particular https://wiki.freepascal.org/File_types .

I'd urge you to think very hard about what your requirements actually are before trying to mix text and non-text handling of the same files.

Text files automatically handle EOL, and if the input is UTF-8 etc. by and large handle it correctly. The TextRec (managed by the RTL i.e. part of the running program) structure carries around the EOL type to be assumed and can- at a pinch- be used to get at the file handle, however something that you cannot ever do with a textfile is seek to a given position. I believe that similar points apply to streams.

Pascal has, from its earliest days, had the concept of a (non-text) file being opened (i.e. Assign(), Reset() etc.) with reference to a record type of fixed size. This is defined as part of the application program, and it's entirely possible for a program to associate different record structures with the same file at different times; a canonical example of that would be if you wanted to first open a file to read (say) the first 1024 bytes, and from the content of that divine what EOL to assume and possibly what codepage transformation to make.

I confess to not being in any way an expert on FPC/Lazarus's handling of codepages, either using string declarations or explicit conversion utilities. To a very large extent that's due to lack of necessity as a Briton, I find myself dealing with odd ALGOL or APL fonts on occasion but those are not well-handled by the standard codepage mechanism in any extant system.

I hope that's helpful, and that we haven't collectively started barking up the wrong tree on your behalf. I must admit that when I read your OP I was immediately reminded of Project Xanadu's "tumblers" which were used to specify the location and interrelationship of documents...

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

tfurnivall

  • New Member
  • *
  • Posts: 49
Re: A Challenge for Lazarus Gurus
« Reply #8 on: June 15, 2025, 02:45:21 pm »
Thanks for your reasoned response, Mark. The last thing I wanted to do was to start any OS wars! I was trying to surface the absence of the meta-data you refer to, and wondering how people manage around that. AFAICT there is no handle associated with a file opened via reset or rewrite, which means that there is no possibility of dealing with that file in any way except monolithically.

Confession - That's my primary goal - to be able to pass something from a text file into a routine that is a couple of levels below where the file is opened. I can't find anyway to do this - and working suggestions will result in a (fairly) immediate marking of this thread as [SOLVED]

The end goal is a message catalog that can be shared - read-only for the most part, but able to be guarded by a semaphore so that one process that does the seeking for a message doesn't get \kicked out of the water by a different process due to having exceeded a time-slice. (A friend of mine once referred to this possibility as sending a passionate love-letter to the ex-with whom you had a major break-up).

So - NO FLAME WARS but welcome ideas on how to get around the need to write pseudo-file-system code.

Thanks for all the responses, so far,

Tony

jamie

  • Hero Member
  • *****
  • Posts: 6989
Re: A Challenge for Lazarus Gurus
« Reply #9 on: June 15, 2025, 03:04:29 pm »
For windows:

 https://learn.microsoft.com/en-us/windows/win32/api/winuser/ns-winuser-copydatastruct

Something to think about.

Also, lazarus has a IPCServer and Client that works in Windows and Linuts!

You can pass FileName info with a file Index value and have your local app open a file under that name and move the pointer to that locatiion etc.

Jamie
The only true wisdom is knowing you know nothing

MarkMLl

  • Hero Member
  • *****
  • Posts: 8453
Re: A Challenge for Lazarus Gurus
« Reply #10 on: June 15, 2025, 03:13:47 pm »
So what are you actually working with: big files containing messages of arbitrary length comprised of lines of arbitrary length?

In the past (long predating FPC etc.) I've tackled this sort of thing by building an external index showing the byte-offset of each line and a second index showing the line-number of each message start, /but/ I think the FPC/Lazarus developers would throw a fit if anybody tried (or even suggested) applying Seek() etc. to the handle in a TextRec. I've fiddled around at that level (to write some custom Telnet stuff) but been very cautious.

It might be worth adding that on unix there are two ways of storing email messages or discussion group postings: either as one message per file or as concatenated messages (necessitating a seek, hence my points above apply).

Have you considered slurping your messages into a database and indexing them there?

Or does your input corpus comprise mainframe-style fixed-length lines/records?

By and large the unix (and derivative) community does without record/block metadata by either assuming that files for a specific program are of known structure (including the case whether they're variable-length lines) or by having an application-specific header at the start (or end) of each file: don't expect files from competing companies to be immediately compatible!

With deference to any of the core team who drop in, I think that you could probably open a textfile, seek to an indexed point using the handle in the textrec, and then start normal reading. But the sequence there would be absolutely critical and completely intolerant of any interleaved reads etc.

I don't know whether the somewhat more recent stream API would be more tolerant...

Apropos opening mode: you can probably rely on that because it's all within the scope of the RTL (i.e. the same program etc.). Apropos sharing mode: as I've said that was tacked onto (in particular) unix rather late, and in some cases was advisory rather than mandatory (i.e. a program had to explicitly ask whether it was allowed to do something).

If you really do want simultaneous multiuser access I really do feel that you should consider putting your stuff in a database with decent multiuser support, even if the fancier ones (e.g. PostgreSQL) do have a lot of storage overhead.

That's slightly stream-of-consciousness I'm afraid, I've got other stuff I need to do...

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

MarkMLl

  • Hero Member
  • *****
  • Posts: 8453
Re: A Challenge for Lazarus Gurus
« Reply #11 on: June 15, 2025, 03:23:55 pm »
For windows:

 https://learn.microsoft.com/en-us/windows/win32/api/winuser/ns-winuser-copydatastruct

Something to think about.

But at that point he'd have to do without the standard Pascal text handling.

Quote
Also, lazarus has a IPCServer and Client that works in Windows and Linuts!

You can pass FileName info with a file Index value and have your local app open a file under that name and move the pointer to that locatiion etc.

No you can't, since you can't *legitimately* seek a textfile.

Note my emphasis there: I think it might be possible to specify the starting point immediately after Reset() is called using the handle in the TextRec.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

tfurnivall

  • New Member
  • *
  • Posts: 49
Re: A Challenge for Lazarus Gurus
« Reply #12 on: June 15, 2025, 04:37:12 pm »
Thanks, Mark!

Yes - the top-down structure is:

CATALOG    A File containing multiple SETs of messages. The sets are indexed such that the starting address (block? byte? whatever?) of each SET is known (SET Index).

SET  A group of messages which - as a group can specify various properties (see below). The messages are further indexed with an offset from a starting point. This could be byte offset from start of file or byte offset from first message in set (MESSAGE Index). The index entries for both SET and MESSAGE contain small amounts of meta-data  as well as the important Offset value.

MESSAGE A text string with optional parameters (here's where the current love and the ex- make their appearances ;)). Parameters may be substituted when the message is retrieved. The message (as mentioned) has an offset from the start of the first message in the set.

Some of the possible SETS might include, for example, error messages from sub-systems, in which case they might have a prefix (for example a file error might have FSERR as a prefix, or a message saying that the user has entered an invalid command might have a CIERR prefix). When the message is retrieved, the prefix is augmented by the message number (FSERR 23, or CIERR976) and added to the message text. Perhaps surrounded by parentheses (FSERR23) or not: CIERR976. At any rate there is meta-data at both the SET and MESSAGE level.

The source for this is (obviously?) a text file, but the actual catalog file is quite a sophisticated double-indexed structure. The messages themselves, however, can actually be stored as 'just text', because the starting byte is known after the source-file has been processed. This leads to the possibility of appending the entire source-file to the end of the index structure.....

One of the advantages of having this type of structure is that translation to a new language involves no change whatever to the application program, merely (?) a translation of the messages.

That's the goal - and I hope it's possible to tease out the file operation challenge from this description.

One possibility did occur to me, overnight (isn't it amazing how that happens), and I'll report back on it after I set up a LAB to play with it.

Tony

PS Obviously the compiled Catalog file is not a proper text file, but it does have a significant text component when it comes to the actual messages. The handle to the catalog is passed to a routine called CATREAD, which is aware of the structure, and can lookup the offset for both the SET and the Message, and then read the text of the message...


MarkMLl

  • Hero Member
  • *****
  • Posts: 8453
Re: A Challenge for Lazarus Gurus
« Reply #13 on: June 15, 2025, 05:16:32 pm »
I took a look at my https://github.com/MarkMLl/telnetsrv/blob/main/telnettextrec.pas earlier which confirms that the standard TextRec type contains a handle which is compatible (on unix) with fpSeek(), which suggests that it's probably compatible with the portable FileSeek() which is normally a thin wrapper.

Previous discussion with Sven (one of the core team) suggested that where the TextRec documentation says "It should be treated as opaque and never manipulated directly" it doesn't really mean that it's likely to change at the drop of a hat, however you'll see from my code that at the very least it's worth doing a bit of consistency checking.

I'm moderately confident that the developers will concede you the Reset()-FileSeek() sequence with nothing in between. I'm somewhat less confident that you could use FilePos() to get the byte offset of the start of each line: it would be safer to count how long each string you read is and to allow for the EOL length.

Hopefully, you've got your head around the pointers I've given you to the documentation. However, when I was looking at https://www.freepascal.org/docs-html/current/rtl/index-8.html#SECTIONG I noticed an entry for https://www.freepascal.org/docs-html/current/rtl/system/gettextcodepage.html ... which is a bit odd since there is no documented codepage field in the TextRec.

However when I looked at the RTL source I found

Code: Pascal  [Select][+][-]
  1. ...
  2.     LineEnd   : TLineEndStr;
  3.     buffer    : textbuf;
  4. {$ifdef FPC_HAS_CPSTRING}
  5.     CodePage  : TSystemCodePage;
  6. {$endif}
  7.   End;
  8.  

and I can confirm that that field is present at least on FPC 3.2.2 for Linux x86_64.

Moral: find your way around the documentation. Find your way around the RTL source. Make sure that the version of FPC/Lazarus you're using allows you to step through library routines. Oh, and there's a permuted index at http://www.kdginstruments.co.uk/public/fpc-ptx.zip which you might find useful, generation code is at https://github.com/MarkMLl/fpc-ptx

MarkMLl
[/code]
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

jamie

  • Hero Member
  • *****
  • Posts: 6989
Re: A Challenge for Lazarus Gurus
« Reply #14 on: June 15, 2025, 05:27:50 pm »
Almost looks like what could help the cause is a File Monitor, one that monitors folder/Directory activity and notifies the main app so it can go can check changes made to inner files.

I know that can be done in Windows because I have code for that which runs in a secondary thread and notifies the main thread of an event taking place.

The only true wisdom is knowing you know nothing

 

TinyPortal © 2005-2018