Forum > General

Changing file type while reading

(1/3) > >>

MarkMLl:
The NetPBM family of file formats starts off with a brief text header before, in some cases, switching to binary. So for example,


00000000: 5036 0a38 3438 2038 3930 0a32 3535 0a0f  P6.848 890.255..
00000010: 140b 0f14 0b0f 140b 0f14 0b0f 140b 0f14  ................


where the final byte of the header is 0a and the first pixel is 0f 14 0b.

If a file is initially opened as text, will reading a single line where that line is known to be terminated with LF rather than CR LF etc., be reliable on all platforms?

In a separate and more general case, are there any good ways of handling a text file which might comprise content written in an 8-bit codepage (recognisable by content), or possibly Unicode recognisable by BOM etc.? Is the only reasonable method to open the file, inspect, close and reopen, or is there some way of "changing horses in the middle of the race"?

MarkMLl

MarkMLl:
Answering my first question myself, it's probably necessary to parse NetPBM files byte-by-byte since the separators between header fields are specified to be any whitespace character, rather than spaces in the header and a newline after it.

I'd still be interested on anybody's thoughts on handling a text file with unknown encoding.

MarkMLl

Thaddy:
spawning horses? of course that is not a good idea in the middle of the race...
Hence all options regarding files.

MarkMLl:

--- Quote from: Thaddy on November 22, 2022, 01:17:46 pm ---spawning horses? of course that is not a good idea in the middle of the race...

--- End quote ---

"Changing horses"... sorry, it's an English idiom.

I've just spent an hour trying to work out whether it's possible to parse the header and save its length either as bytes to be skipped (for the binary variant of the NetPBM files) or a number of lines to be skipped (for the text variant) but have concluded that I'm flogging a dead horse: the format as defined is quite simply too vague with the possibility of variable amounts of whitespace in the header etc. which might or might not include newlines.

MarkMLl

Thaddy:
No, mark. I really meant spawning horses...

Navigation

[0] Message Index

[#] Next page

Go to full version