Recent

Author Topic: Importing a string matrix into Lazarus - problem only ONE column  (Read 14765 times)

Mickel

  • New Member
  • *
  • Posts: 21
Importing a string matrix into Lazarus - problem only ONE column
« on: November 02, 2015, 03:06:49 pm »
I tried to import a large csv.file/text.file into Lazarus with the following code:

procedure LinkdProc();
  var  matrix1 : TKUStringMatrix;

  begin
       matrix1 := TKUStringMatrix.Create('Compustat v005.csv');

       Readln;
  end;   


However the matrix imported has only one column and the ";" stays in between every String. Is there a code that takes in account for European csv files?

Kind regards,
Mickel

rvk

  • Hero Member
  • *****
  • Posts: 7043
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #1 on: November 02, 2015, 03:45:53 pm »
I'm not sure where you got the TKUStringMatrix from (I can't find KUMatrixGeneric anywhere) but if it's anything like TStringList maybe it has a Delimeter option. The problem is that you need to set it before reading a file. So maybe initialize TKUStringMatrix without a file, set a Delimeter and then read the file. (Of course that only works if it has such an option)

Mickel

  • New Member
  • *
  • Posts: 21
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #2 on: November 02, 2015, 03:49:34 pm »
Thank you for replying! The KUMatrix refers to a code giving by my professor, where should I adjust the coding?

onstructor TKUStringMatrix.Create(filename: String; delimiter: Char);
var
  DataFile: TextFile;
  len: Integer;
  seencols: Integer;
  item: _T;
  i: Integer;
  delimlist: TStringList;
begin
     if not FileExists(filename) then
        ShowError('TKUStringMatrix.Create: Matrix bestand bestaat niet', filename, 0, nil, true);

     self.Create(0, 0);
     ZeroMemory(@item, SizeOf(item));

     len := 0;
     seencols := 0;
     SetLength(self.Items, len);
     AssignFile(DataFile, FileName);
     Reset(DataFile);

     while not EOF(DataFile) do
     begin
          seencols := 0;
          self.Rows := self.Rows + 1;
          while not EOLN(DataFile) do
          begin
               read(DataFile, item);
               delimlist := Split(item, delimiter);
               for i := 0 to delimlist.Count-1 do
               begin
                    seencols := seencols + 1;
                    len := len + 1;
                    SetLength(self.Items, len);
                    self.Items[len-1] := delimlist;
               end;
          end;
          if self.Cols = 0 then
             self.Cols := seencols
          else if self.Cols <> seencols then
             ShowError('TKUStringMatrix.Create: Matrix bestand bevat ongeldige lijn', filename, 0, nil, true);
          readln(DataFile);
     end;
     CloseFile(DataFile);

     if (self.Rows = 0) or (self.Cols = 0) then
     begin
          self.Rows := 0;
          self.Cols := 0;
     end;
end;

constructor TKUMatrix.Create(nrrows: Integer; nrcols: Integer);
var item :_T;
begin
     ZeroMemory(@item, SizeOf(item));
     self.Create(nrrows, nrcols, item);
end;

constructor TKUMatrix.Create(nrrowsorcols : Integer);
begin
   self.Create(1,nrrowsorcols);
end;

constructor TKUMatrix.Create(nrrows: Integer; nrcols: Integer; value: _T);
var r, c: Integer;
begin
     self.Rows := nrrows;
     self.Cols := nrcols;
     SetLength(self.Items, nrrows * nrcols);
     for r := 1 to self.NrRows() do
         for c := 1 to self.NrCols() do
             self.SetCell(r, c, value);
end;

function TKUMatrix.IsEqual(othermatrix: TKUMatrix): Boolean;
var
  r, c: Integer;
begin
  { Should use Equals() override here but type equality checking is hard... }
  result := true;
  if othermatrix = nil then result := false
  else if self.nrrows <> othermatrix.nrrows then result := false
  else if self.nrcols <> othermatrix.nrcols then result := false
  else
  begin
       for r := 1 to self.NrRows() do
       begin
            for c := 1 to self.NrCols() do
            begin
                 if self.getcell(r,c) <> othermatrix.getcell(r,c) then
                 begin
                      result := false;
                      break;
                 end;
            if result = false then break;
            end;
       end;
  end;
end;

function TKUMatrix.Copy(): TKUMatrix;
var r, c: Integer;
begin
     result := TKUMatrix.Create(self.NrRows, self.NrCols);
     for r := 1 to self.NrRows() do
         for c := 1 to self.NrCols() do
             result.SetCell(r, c, self.GetCell(r, c));
end;

function  TKUMatrix.NrRows(): Integer;
begin
     result := self.Rows;
end;

function  TKUMatrix.NrCols(): Integer;
begin
     result := self.Cols;
end;

procedure TKUMatrix.SetCell(row: Integer; col: Integer; value: _T);
var
  pos: Integer;
begin
     if (row < 1) or (row > self.NrRows) then
        ShowError('TKUMatrix.SetCell: Ongeldig rijnummer', '', row, self, true);
     if (col < 1) or (col > self.NrCols) then
        ShowError('TKUMatrix.SetCell: Ongeldig kolomnummer', '', col, self, true);
     pos := (row - 1) * self.NrCols() + col - 1;
     self.Items[pos] := value;
end;

function TKUMatrix.GetCell(row: Integer; col: Integer): _T;
var
  pos: Integer;
begin
     if (row < 1) or (row > self.NrRows) then
        ShowError('TKUMatrix.GetCell: Ongeldig rijnummer', '', row, self, true);
     if (col < 1) or (col > self.NrCols) then
        ShowError('TKUMatrix.GetCell: Ongeldig kolomnummer', '', col, self, true);
     pos := (row - 1) * self.NrCols() + col - 1;
     result := self.Items[pos];
end;

procedure TKUMatrix.SetCell(roworcol: Integer; value: _T);
begin
     if (self.NrRows > 1) and (self.NrCols > 1) then
        ShowError('TKUMatrix.SetCell: Deze matrix is geen vector (geef rij en kolom mee)', '', roworcol, self, true);
     if self.NrRows = 1 then self.SetCell(1, roworcol, value)
     else self.SetCell(roworcol, 1, value);
end;

function TKUMatrix.GetCell(roworcol: Integer): _T;
begin
     if (self.NrRows > 1) and (self.NrCols > 1) then
        ShowError('TKUMatrix.SetCell: Deze matrix is geen vector (geef rij en kolom mee)', '', roworcol, self, true);
     if self.NrRows = 1 then result := self.GetCell(1, roworcol)
     else result := self.GetCell(roworcol, 1);
end;

procedure TKUMatrix.SwapRows(row1, row2: Integer);
var
   i: Integer;
begin
     for i := 1 to self.NrCols do
     begin
          self.SwapCells(row1, i, row2, i);
     end;
end;

procedure TKUMatrix.SwapCols(col1, col2: Integer);
var
   i: Integer;
begin
     for i := 1 to self.NrRows do
     begin
          self.SwapCells(i, col1, i, col2);
     end;
end;

procedure TKUMatrix.SwapCells(row1, col1, row2, col2: Integer);
var
   v: _T;
begin
     v := self.GetCell(row1, col1);
     self.SetCell(row1, col1, self.GetCell(row2, col2));
     self.SetCell(row2, col2, v);
end;

procedure TKUMatrix.SwapCells(roworcol1, roworcol2: Integer);
var
   v: _T;
begin
     v := self.GetCell(roworcol1);
     self.SetCell(roworcol1, self.GetCell(roworcol2));
     self.SetCell(roworcol2, v);
end;

procedure TKUMatrix.Save(FileName: String);
var
  DataFile: TextFile;
  r, c: Integer;
begin
     AssignFile(DataFile, FileName);
     ReWrite(DataFile);

     for r := 1 to self.NrRows() do
     begin
          for c := 1 to self.NrCols do
          begin
               write(DataFile, self.GetCell(r, c));
               if (TypeInfo(_T) <> TypeInfo(Char)) and (c <> self.NrCols) then
                  write(DataFile, #9);
          end;
          writeln(DataFile);
     end;
     CloseFile(DataFile);
end;

function TKUMatrix.ToStr(): String;
var
  r, c: Integer;
begin
     result := '';

     result := result + '     ';
     for c := 1 to self.NrCols do
          result := result + ' | ' + Format('%5d', [c]);
     result := result + #13 + #10;

     result := result + '-----';
     for c := 1 to self.NrCols do
          result := result + '-+-' + '-----';
     result := result + #13 + #10;

     for r := 1 to self.NrRows do
     begin
          result := result + Format('%5d', [r]);
          for c := 1 to self.NrCols do
          begin
               result := result + ' | ';
               if TypeInfo(_T) = TypeInfo(Char) then result := result + '    ';
               result := result + self.GenericFormat(self.GetCell(r, c));
          end;
          result := result + #13 + #10;
     end;
     result := result + #13 + #10;
end;

function TKUMatrix.ToShortStr(): String;
var
  r, c: Integer;
begin
     result := '';
     for r := 1 to self.NrRows do
     begin
          for c := 1 to self.NrCols do
          begin
               result := result + self.GenericFormat(self.GetCell(r, c));
          end;
          result := result + #13 + #10;
     end;
     result := result + #13 + #10;
end;

procedure TKUMatrix.Show();
begin
     write(self.ToStr());
end;

procedure TKUMatrix.ShowShort();
begin
     write(self.ToShortStr());
end;

end.                                                 

rvk

  • Hero Member
  • *****
  • Posts: 7043
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #3 on: November 02, 2015, 03:56:10 pm »
Well, looking at that code your initialization of TKUStringMatrix was wrong. Did that work for you ???

You did this:
Code: Pascal  [Select][+][-]
  1. matrix1 := TKUStringMatrix.Create('Compustat v005.csv'); // <-- this should have given an error
but according to the function you were given it has this:
Code: Pascal  [Select][+][-]
  1. constructor TKUStringMatrix.Create(filename: String; delimiter: Char);
So either there is some more code, or the inherited Create only asks for Filename or this should result in an error.

As I see it you could do this:
Code: Pascal  [Select][+][-]
  1. matrix1 := TKUStringMatrix.Create('Compustat v005.csv', ';');

The second parameter of TKUStringMatrix.Create is the delimeter (which is the character for the separation of the fields).
« Last Edit: November 02, 2015, 03:57:47 pm by rvk »

Mickel

  • New Member
  • *
  • Posts: 21
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #4 on: November 02, 2015, 04:21:45 pm »
Thanks I tried it, but if I add matrix1 := TKUStringMatrix.Create('Compustat v005.csv', ';'); it returns a matrix that increases by 1 each cell. However it does include columns now to my matrix.

The matrix1 := TKUStringMatrix.Create('Compustat v005.csv'); works but returns me with a matrix with 1 million rows and one column. All information is written like the following:
1004;31/05/1993;may/93;54595; ...

If I do it with a IntegerMatrix just with integer numbers, there is no problem importing them. As soon as I make it a Stringmatrix, it will count the csv comma delimited as semicoloms ";" when creating a matrix.


rvk

  • Hero Member
  • *****
  • Posts: 7043
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #5 on: November 02, 2015, 04:30:58 pm »
Are you saying this line does not give any errors?
Code: Pascal  [Select][+][-]
  1. matrix1 := TKUStringMatrix.Create('Compustat v005.csv');

I think you need to post the complete code.

Thanks I tried it, but if I add matrix1 := TKUStringMatrix.Create('Compustat v005.csv', ';'); it returns a matrix that increases by 1 each cell. However it does include columns now to my matrix.
"a matrix that increases by 1 each cell."... Not sure what you mean by this. If I got the code and some test-data it would be more clear.

Quote
If I do it with a IntegerMatrix just with integer numbers, there is no problem importing them. As soon as I make it a Stringmatrix, it will count the csv comma delimited as semicoloms ";" when creating a matrix.
How would you seperate the integer numbers in that file? Also with ; ?
You didn't show TKUIntegerMatrix so I can't say anything about that.


Mickel

  • New Member
  • *
  • Posts: 21
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #6 on: November 02, 2015, 05:35:37 pm »
1) Yes, the code can create the matrix, never used the delimiter before. There is a simplified version without delimiter.

2)
I get a matrix : (1 2 3 4 5 6 7 8 9 ... ), the exact code is:
constructor TKUStringMatrix.Create(filename: String);
begin
     self.Create(filename, #0);
end;

constructor TKUStringMatrix.Create(filename: String; delimiter: Char);
var
  DataFile: TextFile;
  len: Integer;
  seencols: Integer;
  item: _T;
  i: Integer;
  delimlist: TStringList;
begin
     if not FileExists(filename) then
        ShowError('TKUStringMatrix.Create: Matrix bestand bestaat niet', filename, 0, nil, true);

     self.Create(0, 0);
     ZeroMemory(@item, SizeOf(item));

     len := 0;
     seencols := 0;
     SetLength(self.Items, len);
     AssignFile(DataFile, FileName);
     Reset(DataFile);

     while not EOF(DataFile) do
     begin
          seencols := 0;
          self.Rows := self.Rows + 1;
          while not EOLN(DataFile) do
          begin
               read(DataFile, item);
               delimlist := Split(item, delimiter);
               for i := 0 to delimlist.Count-1 do
               begin
                    seencols := seencols + 1;
                    len := len + 1;
                    SetLength(self.Items, len);
                    self.Items[len-1] := delimlist;
               end;
          end;
          if self.Cols = 0 then
             self.Cols := seencols
          else if self.Cols <> seencols then
             ShowError('TKUStringMatrix.Create: Matrix bestand bevat ongeldige lijn', filename, 0, nil, true);
          readln(DataFile);
     end;
     CloseFile(DataFile);

     if (self.Rows = 0) or (self.Cols = 0) then
     begin
          self.Rows := 0;
          self.Cols := 0;
     end;
end;                 


3)
The Integer (Real) one just works by itself, no delimiter needed. The code recognizes the fact that a ";" or tab or a "," is not an integer and assumes it is a delimiter for a column.



 

rvk

  • Hero Member
  • *****
  • Posts: 7043
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #7 on: November 02, 2015, 06:00:22 pm »
1) Yes, the code can create the matrix, never used the delimiter before. There is a simplified version without delimiter.
Ah, that explains why your code works. But in that case the default delimiter is #0 (end of string) so your file should have looked like this:
Code: [Select]
line1_col1#0line1_col2
line1_col1#0line1_col2

(I don't see how the split() function is implemented but logically that would split the line into separate columns on character in delimiter, #0 or otherwise)

There is a Read() which reads a type _T. Is that just a complete line from your source (Readln) ? If so... I still think passing ';' should have worked. What is type _T?

Quote
3) The Integer (Real) one just works by itself, no delimiter needed. The code recognizes the fact that a ";" or tab or a "," is not an integer and assumes it is a delimiter for a column.
In that case the ; as delimiter should have worked too.

How does the split() look like?
If _T is a stringtype you could put in debug-lines to show what _T is after the first read and how it is interpreted in the loop to assign to self.items.

(Again, without the complete code I can't help you much further)

Mickel

  • New Member
  • *
  • Posts: 21
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #8 on: November 02, 2015, 08:02:33 pm »
I have honestly no idea or clear understanding about the KUMatrixGeneric code, it is just provided so we do not have to code everything ourselves and use easy functions for simple problems. I can attach my coding in a zip file, but the only code I typed myself is below. Don't mind the details of the function linken, but the problem is that I want to select item in row i and column 4, however without more than one column in my uploaded file I cannot proceed.

How can I check the _T ?


uses
  Classes, SysUtils, KUMatrixGeneric;

function linken(matrix1, matrix2 : TKUStringMatrix): TKUStringMatrix;
  var i, j : Integer;
  begin

  for i:=1 to 10 do
  begin
       for j:=2 to matrix2.NrRows do
        begin

             if matrix1[i,4] = matrix2[j,4] then    // PERMCO | date equal

             begin
        matrix1.setCell(i,31,matrix2[j,4]) ;
             //matrix2.remove

             end;
        end;
     if i=5 then
        writeln('Halfweg');
  end;

  matrix1.Save('outputBestand.txt');


  end;


procedure LinkdProc();
  //import gegevens
  var  matrix1, matrix2: TKUStringMatrix;

  begin
       matrix1 := TKUStringMatrix.Create('Compustat v005.csv');
       matrix2 := TKUStringMatrix.Create('CRPS dataset 1993-2002.csv');

       matrix1.show();
       writeln();
       matrix2.show();

       writeln('Import bestanden gelukt');
       //linken(matrix1, matrix2);

       Readln;
  end;

  begin
  LinkdProc();
  end.                               
« Last Edit: November 02, 2015, 08:06:32 pm by Mickel »

rvk

  • Hero Member
  • *****
  • Posts: 7043
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #9 on: November 02, 2015, 08:31:48 pm »
Ah, nu zie ik het  :D I needed to see the KUMatrixGeneric and what was actually done in it. It works with generic.

The _T would be string in your TKUStringMatrix. I did the following and it worked perfectly. So what is exactly your problem?

test.csv
Code: [Select]
test1; col2; col3
line2; col2; col3

project1.lpr
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$APPTYPE CONSOLE}
  5.  
  6. uses
  7.   Classes,
  8.   SysUtils,
  9.   KUMatrixGeneric;
  10.  
  11.   procedure LinkdProc();
  12.   var
  13.     matrix1: TKUStringMatrix;
  14.   begin
  15.     matrix1 := TKUStringMatrix.Create('test.csv', ';');
  16.     matrix1.Show();
  17.     writeln('Import bestanden gelukt');
  18.     Readln;
  19.   end;
  20.  
  21. begin
  22.   LinkdProc();
  23. end.

And the result is:
Code: [Select]
      |     1 |     2 |     3
------+-------+-------+------
    1 | test1 |  col2 |  col3
    2 | line2 |  col2 |  col3

Import bestanden gelukt

Mickel

  • New Member
  • *
  • Posts: 21
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #10 on: November 03, 2015, 12:17:34 pm »
Enorm bedankt! I don't know why the program is not working then, perhaps my imported files have the wrong lay-out?

Would the following work if I try it with a text file?

 matrix1 := TKUStringMatrix.Create('Compustat_v005.txt', 'TAB');

because somehow the csv is not working with:
  matrix1 := TKUStringMatrix.Create('test.csv', ';');
or
  matrix1 := TKUStringMatrix.Create('test.csv', ',');

« Last Edit: November 03, 2015, 12:20:40 pm by Mickel »

rvk

  • Hero Member
  • *****
  • Posts: 7043
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #11 on: November 03, 2015, 12:26:59 pm »
I don't know why the program is not working then, perhaps my imported files have the wrong lay-out?
If the columns in your Compustat_v005.txt are seperated with a real tab-character you could use:
Code: Pascal  [Select][+][-]
  1.  matrix1 := TKUStringMatrix.Create('Compustat_v005.txt', #9);
You can't use 'TAB' because that is not a character but a string. You need to use #9 (which is the character for tab).

Could you post a sample of your Compustat_v005.csv with ; as separator. Maybe we can find out what went wrong with that one.

Mickel

  • New Member
  • *
  • Posts: 21
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #12 on: November 03, 2015, 12:38:17 pm »
Okay, I wil test with a text file then as well. Attached you can find the little sample of the csv, as well as pictures of the results I receive when importing.

rvk

  • Hero Member
  • *****
  • Posts: 7043
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #13 on: November 03, 2015, 12:42:50 pm »
You attached a Sample.xlsx.
That's not a .csv file.
I need the .csv you used and which you say doesn't work.

Mickel

  • New Member
  • *
  • Posts: 21
Re: Importing a string matrix into Lazarus - problem only ONE column
« Reply #14 on: November 03, 2015, 12:52:46 pm »
My apologies, I try to be too quick. I attached the correct file, thank you for helping.
Also text files seem to be not working.

 

TinyPortal © 2005-2018