I am working on an application that translates data from one file format to another. The read part is very efficient. I allocate memory for my arrays in big chunks at a time with SetLength, resulting in a read from a file stream plus some processing of a 2.4GB file in about 18sec from an internal 2TB Intel M2 SSD (so somewhat slow, but faster than SATA SSD).
The write part from my generated array to files takes a lot of more time, total about 12 minutes. In taskmanager I can see memory use for the MemoryStream slowly creeping up during the progress of each file, after which it drops after writing a file and start creeping up again. (I am splitting the original array into 4 files, thus the procedures below are run multiple times). If my understanding of MemoryStream is correct, that the code below will first fill the stream and then write the file in one go (supported by the observation that direct write to filestream is much slower), it seems that there is a lot of overhead in piece by piece memory adjustment for the MemoryStream. While I can live with this, is there a way to speed this up, to make memory adjustments in bigger jumps (as I do during the read) ? Should I consider alternatives to MemoryStream?
I tried adding a call to SetSize after the kmax := RecSamples; to the code below, but I still see the same memory increase, not a jump.
MSize:=(2*imax*jmax*kmax)+mStream.position;
mStream.SetSize(Msize);
(I am of course aware that write speed of an SSD is much slower than read, but I find the above disproportionate and the slow memory increase indicates that the main bottleneck is not the SSD speed).
Here is the code snippet in question:
procedure WriteCAREDFDataRecords(const ix: integer; const subj: integer;
mStream: TMemoryStream);
var
m,i,j,k,CARBuffVar: Integer;
rawValue: smallInt;
begin
m:=0; //index for buffered CARdata
imax:= CAREDFdoc.iNumOfDataRecs;
jmax := CAREDFdoc.iNumOfSignals; // Number of signals
kmax := RecSamples; // Total Number of samples per record
for i := 0 to imax - 1 do
begin
for j := 0 to jmax - 1 do
begin
CARBuffVar:= SubjVarIdx[CarSubj,j+1]-1;
for k := 0 to kmax - 1 do
begin
if (CAREDFdoc.iNumOfSamples[j] > 0) and
(m+k<=ix) then
begin
rawValue := RawBuffArr[(m+k),CarBuffVar];
mStream.Write(NToLE(rawValue), 2);
end;
end;
end;
m:=m+kmax;
end;
end;
procedure WriteCAREDFStream(const ix: integer; const subj: integer;
aStream: TStream);
//const aBaseURI: ansistring);
var
mStream: TMemoryStream;
Stat: Integer;
begin
mStream := TMemoryStream.Create;
if assigned(CAREDFDoc) then
try
CAREDFDoc.WriteHeaderToStream(mStream);
if CAREDFDoc.StatusCode = noErr then
begin
WriteCAREDFDataRecords(ix,subj,mStream);
mStream.Position:=0;
aStream.CopyFrom(mStream, mStream.Size);
end;
except
CAREDFDoc.StatusCode := saveErr;
end;
mStream.Free;
end;
I am currently using Lazarus 2.08 64-bit on Windows 10.