OK, guys, I have another problem with these files in DELTA format and I again ask for your help (even under the risk of being deemed inconvenient and banned from this forum forever!

).
So, I have a file with 'real' descriptions coded in DELTA, as follows (just two species are included in this example). The first line (preceded by a '#' character) indicates the species name, and the subsequent lines represent the coded description of each species. The numbers before the commas correspond to the descriptor number (as listed in another file, formatted just like that I have used as an exemple when starting this thread), and the numbers after the commas indicate the attribute value (or "state") for that descriptor: so, for exemple, 2,2 codes the state 2 for descriptor 2, etc. Attribute-value pairs are separated by spaces, and blank lines separate the records for each species.
Notice that the lines can be (and are) broken by carriage returns.# E. australasicum <(F. Muell.) Koern.>/
1,2-10 2,2 3,2 4,2-6.5 5,1.3-2.2 6,1 7,3-5 8,1 9,2 10,2-10 11,4-6 12,2
13,2 14,14-35 15,2 16,2 17,2-3 18,2.5-3.5 19,1-2 20,4<to broadly
elliptic> 21,1.5-1.75 22,0.75-1 23,3 24,2 25,2 26,1 27,2 28,2
29<narrow->,4 30,1.2-1.6 31,0.5-0.6 32,2 33,2 34,2 35,1 36,1 37,1
38,3/4<rarely> 39,1 40,2 41,2 42,0.5-0.77 43,0.1 44,1 45,2 46,4< one
larger> 47,1 50,2 51,2 52,2 53,5 54,1 56,1/3<upper flowers typically
lacking perianth> 57,1 58,2 59,1 60,2 61,0.9 62,0.01 63,1 64,2 65,1 75,2
76,1 77,0.21-0.32 78,3 79,1 80,0.25-0.28 81,0.175-0.21 82,2 83,5
# E. australe <R. Br.>/
1,33-100 2,2 3,2 4,23-80 5,2-12 6,3 7,8-18 8,1 9,1<at base with dense
long hyaline hairs and scattered hairs along lamina> 10,33-100 11,5-7
12,1 13,2 14,120-150 15,2-1<sparsely> 16,1<to depressed hemispheric>
17,4-8 18,7-10.5 19,4 20,2<broad> 21,1.5-3.5 22,2.5-3.5 23,3 24,1<at
base of outer bracts with hyaline hairs> 25,2 26,2 27,2 28,4 29,1
30,2.5-3 31,1.75-2.5 32,3&4<tip inflexed> 33,1<apically with short white
papillae> 34,2 35,3 36,2 38,4 39,3 40,1 42,1.75-3 43,0.7-1.5 44,5 45,2
46,4 47,1 50,2 51,1<with apical fringe of white hairs> 52,2 53,5 54,2
56,4 57,1 58,1 59,2<, laterals> 60,8&12<with dissected margin> 61,2-3
62,0.75-1.25 63,6 64,1<at base of crest with patch of white hairs,
median sepal linear, 2-2.25 x 0.25-0.6 mm, obtuse, glabrous> 65,4 66,1
67,1 68,1 69,2 70,2-2.5 71,0.1-0.2 72,2 73,1<apically with white hairs,
hyaline hairs scattered on margin> 74,2 75,2 76,1 77,1.25-1.5 78,3 79,1
80,0.75 81,0.45-0.5 82,3<with c. 12 longitudinal rows of peltate hairs>
83,5
So, I want to read the data for each species as a single long line, that is, by removing the carriage returns, so that I can parse them as separate attributes. For exemple, given the first lines of the first record (for E. australasicum):
1,2-10 2,2 3,2 4,2-6.5 5,1.3-2.2 6,1 7,3-5 8,1 9,2 10,2-10 11,4-6 12,2
13,2 14,14-35 15,2 16,2 17,2-3 18,2.5-3.5 19,1-2 20,4<to broadly
elliptic>
after parsing I should have:
1,2-10
2,2
3,2
4,2-6.5
5,1.3-2.2
6,1
7,3-5
8,1
9,2
10,2-10
11,4-6
12,2
13,2
14,14-35
15,2
16,2
17,2-3
18,2.5-3.5
19,1-2
20,4<to broadly elliptic>
(Notice that text between '<' and '>' are to be treated as start and end quotes and should not be changed.)
As a first attempt at tackling this vexing problem, I wrote the lame lines of code below:
program ReadDescriptions;
{$APPTYPE CONSOLE}
{$MODE DELPHI}
uses
Classes, SysUtils, StrUtils;
var
infile: TextFile;
s: string;
a: TStringArray;
i: integer;
recs: integer = 0;
function ParseAttributes(Line: string): TStringArray;
var
token: string;
arr: TStringArray;
begin
token := StringReplace(StringReplace(Line, #10, #32, [rfReplaceAll]), #13, #32, [rfReplaceAll]);
token := Trim(DelSpace1(token));
arr := token.Split(' ', '<', '>');
Result := arr;
end;
begin
AssignFile(infile, 'items_erio');
Reset(infile);
recs := 0
while not EoF(infile) do begin
ReadLn(infile, s);
s := Trim(s);
if Length(s) > 0 then begin
if s[1] = '#' then begin
Inc(recs);
WriteLn(s);
end else
a := ParseAttributes(s);
for i := 0 to Length(a) - 1 do
WriteLn(a[i]);
end;
end;
WriteLn(recs, ' records read');
end.
This code works, but does
not strip the carriage returns from each line, so data like 20,4<to broadly
elliptic> are being parsed separately as '20,4<to broadly' and 'elliptic>'.
Could you wizards give me a hand with this?
Thanks in advance!