Lazarus

Free Pascal => FPC development => Topic started by: croco on September 06, 2020, 11:57:31 am

Title: SeekEof in pipelines
Post by: croco on September 06, 2020, 11:57:31 am
Hi all,

several years ago I noticed that, under unix/linux systems, the SeekEof function refuses to work with pipelines, and strace shown me that on anything but the unredirected stdin it tries the lseek(2) syscall on the stream, fails and makes all subsequent things fail as well.

I even found its source code (in fpcsrc/rtl/inc/text.inc) and noticed it simply cannot work properly by the nature of its implementation (well, that "isdevice" variable suggests by its name that wrong assumptions are used; the question is really not whether it is a "device" or not, but whether the stream is seekable or not; e.g. block devices on Linux are perfectly seekable, and on FreeBSD, there are no block devices, but char devs can be seekable too.

I had no time to create a fix at that time, unfortunately.  I've got to admit Pascal is not my language of choice for general programming tasks. The reason I took a dive into all this is that I wrote and published a kinda large book for beginners (it is in Russian which is my native; sorry guys; I doubt whether I should post a reference to the book here) and, as a teacher, I'm absolutely sure there's presently only one programming language that can be used as The First Lang in one's life, and the language is Pascal.  It is absolutely necessary for a serious programmer to know plain C, but it is strongly unacceptable to start learning programming from C, for two main reasons: (1) pointers are too complicated concept for a newbie, so they should be learned BEFORE the newbie starts with C (well, there are no other languages with real pointers, only plain C and Pascal); and (2) getting used to think of a program's execution as a sequence of side effects, as it is in C, seriously and unrecoverably damages the one's brain.  So, in my book the first language is Pascal (yes, Free Pascal, under unix-like systems, because all my book uses unix-like systems only).  Explaining the concept of a char stream which is basic for Unix, I had to mention SeekEof.  The problem is that my readers told me (when the first volume of the book was already published) there's no correct possibility in Free Pascal to write a program which is as simple as computing a sum of integers supplied on the standard input (until EOF): it fails in pipelines, and actually in any situation when stdin is redirected (which is everyday practice in Unix) from something other than a disk text file (e.g., from a FIFO, from a socket, from a char device etc). 

4 years passed, and now I'm preparing the second edition of the book, but the problem is still there.  I noticed someone tried to solve it forcing the InOutRes variable to zero after trying with lseek, but that doesn't solve the problem, because subsequent operations produce more IO errors.  However, this time I had a time to investigate, and I patched the function so that it now works correctly.  Here it goes, with comments "by Croco" marking the changes:

Code: Pascal  [Select][+][-]
  1. Function SeekEof (Var t : Text) : Boolean;
  2. var
  3.   oldfilepos : Int64;
  4.   oldbufpos, oldbufend : SizeInt;
  5.   reads: longint;
  6.   isdevice: boolean;
  7. Begin
  8.   If (InOutRes<>0) then
  9.    exit(true);
  10.   if (TextRec(t).mode<>fmInput) Then
  11.    begin
  12.      if TextRec(t).mode=fmOutPut then
  13.       InOutRes:=104
  14.      else
  15.       InOutRes:=103;
  16.      exit(true);
  17.    end;
  18.   { try to save the current position in the file, seekeof() should not move }
  19.   { the current file position (JM)                                          }
  20.   oldbufpos := TextRec(t).BufPos;
  21.   oldbufend := TextRec(t).BufEnd;
  22.   reads := 0;
  23.   oldfilepos := -1;
  24.   isdevice := Do_IsDevice(TextRec(t).handle);
  25.   repeat
  26.     If TextRec(t).BufPos>=TextRec(t).BufEnd Then
  27.      begin
  28.        { signal that the we will have to do a seek }
  29.        inc(reads);
  30.        if not isdevice and
  31.           (reads = 1) then
  32.          begin
  33.            oldfilepos := Do_FilePos(TextRec(t).handle) - TextRec(t).BufEnd;
  34.            { 2 lines inserted by Croco }
  35.            if InOutRes <> 0 then
  36.              isdevice := true;
  37.            InOutRes:=0;
  38.          end;
  39.        FileFunc(TextRec(t).InOutFunc)(TextRec(t));
  40.        If TextRec(t).BufPos>=TextRec(t).BufEnd Then
  41.         begin
  42.           { if we only did a read in which we didn't read anything, the }
  43.           { old buffer is still valid and we can simply restore the     }
  44.           { pointers (JM)                                               }
  45.           dec(reads);
  46.           SeekEof := true;
  47.           break;
  48.         end;
  49.      end;
  50.     case TextRec(t).Bufptr^[TextRec(t).BufPos] of
  51.       #26 :
  52.         if CtrlZMarksEOF then
  53.           begin
  54.             SeekEof := true;
  55.             break;
  56.           end;
  57.      #10,#13,#9,' ' :
  58.        ;
  59.     else
  60.      begin
  61.        SeekEof := false;
  62.        break;
  63.      end;
  64.     end;
  65.    inc(TextRec(t).BufPos);
  66.   until false;
  67.   { restore file position if not working with a device }
  68.   if not isdevice then
  69.     { if we didn't modify the buffer, simply restore the BufPos and BufEnd  }
  70.     { (the latter because it's now probably set to zero because nothing was }
  71.     {  was read anymore)                                                    }
  72.     if (reads = 0) then
  73.       begin
  74.         TextRec(t).BufPos:=oldbufpos;
  75.         TextRec(t).BufEnd:=oldbufend;
  76.       end
  77.     { otherwise return to the old filepos and reset the buffer }
  78.     else
  79.       begin
  80.         do_seek(TextRec(t).handle,oldfilepos);
  81.           { the following if added by Croco, replacing 3 old lines }
  82.         if InOutRes = 0 then
  83.           begin
  84.             FileFunc(TextRec(t).InOutFunc)(TextRec(t));
  85.             TextRec(t).BufPos:=oldbufpos;
  86.           end
  87.         else
  88.           InOutRes:=0;
  89.       end;
  90. End;
  91.  

I rebuilt the entire Free Pascal (because I'm not familiar with the build system so I don't know how to only rebuild parts) and as far as I can tell, it, well, looks like working correctly.

However, I'd like to point out this fix is hackish and I dislike it.  The problem is that, by the very nature of a text stream, all that effort to return the stream to where it was using lseek(2) is a nonsense.  One of the main properties of the very paradigm "byte stream is everything", which is essential for what Unix is, is that the stream behaves on a precisely same manner not depending on what it actually is -- a file, a pipeline, a socket, a tty, whatever, unless the user or programmer explicitly calls for operations available only on certain types of streams.  So SeekEof, actually, not only "should" move (in contrast with what is there in the comment), actually it MUST move.

Furthermore, if you take a look on that Do_IsDevice function, namely on its implementation on different platforms, you can notice there's no good in having a function which is SO inconsistent on different archs; it is an obvious source of unportabilities. Furthermore, there are only two uses of it: in SeekEof and in Close, and, well, it prevents the file from closing (!).  So what I strongly suggest is to eliminate that Do_IsDevice completely, and remove all the "repositioning" code from SeekEof.  Actually, as of present, (at least on Linux where Do_Isdevice actually is isatty(3) -- so that we can't close a tty once we opened it... funny, heh), a program computing the sum of integer:

Code: Pascal  [Select][+][-]
  1. program SimpleSum;
  2. var
  3.     sum, count, n: longint;
  4. begin
  5.     sum := 0;
  6.     count := 0;
  7.     while not SeekEof do
  8.     begin
  9.         read(n);
  10.         sum := sum + n;
  11.         count := count + 1;
  12.     end;
  13.     writeln(count, ' ', sum)
  14. end.
  15.  

being redirected from a disk file (or, actually, anything but a tty) emits lseek(2) syscalls which, well, obviously surprising.

Actually, I can do all the job, take diff and send it, if the decision will be made the fix should be in one of the next releases.  Please instruct me on the procedure details (e.g., where to send the patch, what persons to contact etc).

However, if the Free Pascal team decides it is for any reason unacceptable or undesired, please at least replace the present version of SeekEof with what I posted above, so that it doesn't fail in pipelines, on sockets and FIFOs.  Folks, this is really serious for me.  Free Pascal is the only supported implementation of Pascal available around, and there's no correct way to read numbers from a stream up to EOF, other than using SeekEof; if you refuse to fix it,I'll have to write in my book that the only existing Pascal doesn't provide correct mechanisms of reading numbers so everyone must read them char-by-char building numbers manually.  I hate to do so.
Title: Re: SeekEof in pipelines
Post by: Jonas Maebe on September 06, 2020, 12:12:45 pm
If you use Eof instead of SeekEof in your program, then you test program works fine for me. Is there a reason why you want to use SeekEof instead?

As the name implies, SeekEof indeed does not work at all with pipes or other non-seekable inputs. That doesn't mean its error handling can't be improved, of course.
Title: Re: SeekEof in pipelines
Post by: croco on September 06, 2020, 12:29:34 pm
Quote
If you use Eof instead of SeekEof

Definitely it doesn't work, and it can not.  Running without redirections, I have to press Ctrl-D twice instead of once, and (even worse!) read reads an extra zero, so if I type 5 numbers, it tells there were 6 of them:

Code: [Select]
crocodil@manul:~$ ./se2
1 2 3 4 5
6 15
crocodil@manul:~$

BTW, this behaviour is absolutely correct if we recall what text stream is, and what is the EOF situation (well, I'd suggest read raises io error in case it can't read the number, instead of simply pretending it got zero... but it doesn't).  Incorrect here is the program. You can't use eof together with read on text streams if you read anything but chars. This is exactly what SeekEof is for.

Quote
As the name implies, SeekEof indeed does not work at all with pipes

This is wrong statement.  Neither the name nor the specification of SeekEof has anything to do with seekability of the stream (btw, just like with seekeoln, which doesn't have any problems on any streams). And SeekEof DOES work on non-seekable streams -- but, in its present form, on ttys (that is, terminals) only (e.g. on a non-redirected stdin),

Quote
That doesn't mean its error handling can't be improved, of course.

The fix I posted has nothing to do with error handling, it only detects the stream is not seekable.  Actually, all these isdevice and Do_IsDevice tried to achieve precisely the same thing, but incorrectly.

In the form I posted, it will work on any stream but will still try to emit lseek(2) calls.  But if we, e.g., decide isdevice is always true and remove all the code which works for isdevice=false, SeekEof will work on all streams equally, and will never emit lseek(2)s.  Furthermore, if we eliminate Do_IsDevice, we'll have one less source of unportability.
TinyPortal © 2005-2018