Recent

Author Topic: LARGE DBF FILES IN LAZARUS  (Read 8804 times)

jufau

  • New Member
  • *
  • Posts: 42
LARGE DBF FILES IN LAZARUS
« on: July 07, 2015, 06:16:02 pm »
Hello helpers!!!
Is there any component, library, or any way to open large DBF files (over 6GB) using Lazarus? I have a file which needs to be exported to MySql 5.6.x. I DO NOT want to use third party tools to do the job. I want to do it in my own application.
Right now my application already does exactly what I need. But I got stuck on these huge files. So I am wondering whether I can find any component to be able to open this file in my app.

Thank you guys
J
« Last Edit: July 07, 2015, 07:44:28 pm by jufau »

rvk

  • Hero Member
  • *****
  • Posts: 6171
Re: LARGE DBF FILES
« Reply #1 on: July 07, 2015, 06:24:40 pm »
How did you create DBF files larger than 2GB ??

I thought the limit was 2GB.
(also according to this)

(Or can DBF 7 files be larger?)

What kind/version of DBF is this and with what software is it created?

Also... what kind of problems are you experiencing (and with what component)?

If it's for onetime reading of such large files I would recommend just reading the DBF file raw and exporting it.
« Last Edit: July 07, 2015, 06:31:00 pm by rvk »

jufau

  • New Member
  • *
  • Posts: 42
Re: LARGE DBF FILES
« Reply #2 on: July 07, 2015, 06:47:02 pm »
Actually the file was given to me as is. I believe it was created/generated using FlagShip. We have several large DBF files here with sizes from 2 to 3GB and I have no issues opening them. But this big boy does not open at all.

I don't know how it got to this size. But there is a linux tool called DBFDump (http://search.cpan.org/~janpaz/DBD-XBase-1.03/bin/dbfdump.PL) which we use here to export the data from DBF to CSV and then open it anywhere. But I would like to open the original DBF directly from my application instead of using external tools.
« Last Edit: July 07, 2015, 07:12:00 pm by jufau »

kpeters58

  • Sr. Member
  • ****
  • Posts: 267
Re: LARGE DBF FILES IN LAZARUS
« Reply #3 on: July 07, 2015, 08:05:27 pm »
I am pretty certain that this will not be possible, as rvk has already hinted at.

Apparently, the Flagship people have made modifications that are not "standard" compatible and therefore you should not expect them to be present in any pascal DBF libs....

If DBFDump is actually capable of handling such files and it also happens to be command line driven, maybe you can execute it from within an app you write, if that fulfills your requirements for controlling the conversion process?
« Last Edit: July 07, 2015, 08:08:01 pm by kpeters58 »
Lazarus 2.0.4/FPC 3.0.4/Win 64

jufau

  • New Member
  • *
  • Posts: 42
Re: LARGE DBF FILES IN LAZARUS
« Reply #4 on: July 07, 2015, 08:13:36 pm »
Yeah!!! this is exactly what I want to avoid, use any external way to get the data. Even though DBFDump will just transfer from dbf to csv. I want to open the dbf file in my application and do whatever I need from there.

kpeters58

  • Sr. Member
  • ****
  • Posts: 267
Re: LARGE DBF FILES IN LAZARUS
« Reply #5 on: July 07, 2015, 08:55:50 pm »
Well, then do it yourself.

Forget about indexes and memos (unless you have a lot of time to blow), learn about the file format (headers in particular) in question, and use something like TFileStream to process the raw file...
Lazarus 2.0.4/FPC 3.0.4/Win 64

JD

  • Hero Member
  • *****
  • Posts: 1848
Re: LARGE DBF FILES IN LAZARUS
« Reply #6 on: July 07, 2015, 11:32:54 pm »
Yeah!!! this is exactly what I want to avoid, use any external way to get the data. Even though DBFDump will just transfer from dbf to csv. I want to open the dbf file in my application and do whatever I need from there.
This may sound silly but since you want to read DBF files in your application BUT the Flagship DBF files are not "standard" DBF files, why don't you move the data from DBF to CSV using DBFDump (or something else like Exportizer http://www.vlsoftware.net/exportizer/) and then from CSV to standard DBF that your application can read.

Note however that Exportizer free version can export direct to SQL script. You may then be able to use this in MySQL 5.6. I haven't tried it; it is just a suggestion.

JD
Windows - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe),
Linux Mint - Lazarus 2.1/FPC 3.2 (built using fpcupdeluxe)

mORMot; Zeos 8; SQLite, PostgreSQL & MariaDB; VirtualTreeView

rvk

  • Hero Member
  • *****
  • Posts: 6171
Re: LARGE DBF FILES IN LAZARUS
« Reply #7 on: July 08, 2015, 12:21:56 am »
Also... Did you check that DBFDump could read the 6GB file without problems? Maybe it's corrupt and that's why the standard dbf-components have trouble with it. Also make sure not to include the index files because for just first/next access you don't need them and it could be they are too large for the standard-components.

rvk

  • Hero Member
  • *****
  • Posts: 6171
Re: LARGE DBF FILES IN LAZARUS
« Reply #8 on: July 21, 2015, 02:54:31 pm »
I can confirm TDbf doesn't support DBF-files over 2GB.

Probably because positioning is done with an Integer (max 2.147.483.647). And indeed, with the code below it should produce a DBF of over 6GB but it only produces a file of 2.147.484.141 (just over 2GB because the last record is still written).

Creates a test.dbf in c:\temp
Code: [Select]
procedure TForm1.Button1Click(Sender: TObject);
var
  MyDbf: TDbf;
  I: integer;
begin
  MyDbf := TDbf.Create(nil);
  try
    MyDbf.FilePathFull := 'c:\temp';
    MyDbf.TableName := 'test.dbf';
    MyDbf.TableLevel := 7;
    with MyDbf.FieldDefs do
    begin
      Add('Id', ftAutoInc, 0, True); //Autoincrement field called Id
      Add('str1', ftString, 80);
      Add('str2', ftString, 80);
      Add('str3', ftString, 80);
      Add('str4', ftString, 80);
      Add('str5', ftString, 80);
      Add('str6', ftString, 80);
      Add('str7', ftString, 80);
      Add('str8', ftString, 80);
    end;
    MyDbf.CreateTable;
    MyDbf.Exclusive := True;
    MyDbf.Open;
    for I := 1 to 10000000 do
    begin
      MyDbf.Insert;
      MyDbf.FieldByName('str1').AsString := IntToStr(I);
      Button1.Caption := IntToStr(I);
      MyDbf.Post;
    end;
    MyDbf.Close;
  finally
    MyDbf.Free;
  end;
end;


The problem lies somewhere in TPagesFile (which tdbf uses to read the file (I think). It uses Integers to calculate the file offset which of course doesn't support files over 2GB.
Code: [Select]
function TPagedFile.ReadRecord(IntRecNum: Integer; Buffer: Pointer): Integer;
var
  Offset: Integer;
begin
  if FBufferAhead then
  begin
    Offset := (IntRecNum - FBufferPage) * PageSize;
    if (FBufferPage <> -1) and (FBufferPage <= IntRecNum) and
        (Offset+RecordSize <= FBufferReadSize) then
    begin
      // have record in buffer, nothing to do here
    end else begin

I leave it up to the experts to comment on this.

But like we stated before... maybe it's easier (if no fix is found) to raw-read the file.

jufau

  • New Member
  • *
  • Posts: 42
Re: LARGE DBF FILES IN LAZARUS
« Reply #9 on: July 21, 2015, 04:04:04 pm »
Hey rvk,
Thanks for updating the topic with your skills.
Look, I am willing to do what you said about "Raw Read" the DBF file. But I would like to get some starting point with you, like how to "read" the DBF file in a raw mode in Lazarus. If you could post some examples that I could use to work on, I would appreciate that a lot.

thanks

rvk

  • Hero Member
  • *****
  • Posts: 6171
Re: LARGE DBF FILES IN LAZARUS
« Reply #10 on: July 21, 2015, 04:25:18 pm »
Look, I am willing to do what you said about "Raw Read" the DBF file. But I would like to get some starting point with you, like how to "read" the DBF file in a raw mode in Lazarus. If you could post some examples that I could use to work on, I would appreciate that a lot.
Ok... the following is very RAW and does require some extra work to read the fields from the buffer. It also assumes the DBF version is level 7 (DBF VII).
Depending on your programming skills you could dismember the buffer yourself...

It first reads rDbfHdr. It checks if the version is 4 (DBF VII). If so it reads an extra rEndFixedHdrVII. After that is reads the field definitions (rFieldDescVII) into an array. The number of fields could be calculated from DbfHdr.FullHdrSize - rDbfHdr size - rEndFixedHdrVII size) divided by size of rFieldDescVII. The code also reads 1 character extra (and sets it back) to make sure if it encounters $0D, it stops. This is not strictly necessary but extra security against corruption. (the tDBF component does this too)

And last... it reads all the records in Buffer^. Again... the code to dismember the buffer to correct fields (read in the FieldsDescVII array) you can try yourself. As you can see this code is really strict about the DBF being version VII. Of course it could be made more flexible by checking and reading other versions too but for now it's just to show how to RAW-read a DBF-file.

Note: These structures etc are taken from dbf_struct.inc from tDbf. Looking through the code in dbf.pas, dbf_dbffile.pas and dbf_pgfile.pas you can learn a lot from reading a dbf-file.

Code: [Select]
type
  rDbfHdr = packed record
    VerDBF      : Byte;     // 0 DBF file type/version
    Year        : Byte;     // 1 year last updated
    Month       : Byte;     // 2 month last updated
    Day         : Byte;     // 3 day last updated
    RecordCount : Integer;  // 4-7 number of records in file
    FullHdrSize : Word;     // 8-9 size of the header in bytes
    RecordSize  : Word;     // 10-11 sum of all field sizes, including delete flag
    Dummy1      : Word;     // 12-13
    IncTrans    : Byte;     // 14
    Encrypt     : Byte;     // 15 DBase encryption flag
    MultiUse    : Integer;  // 16-19
    LastUserID  : Integer;  // 20-23
    Dummy2      : array[24..27] of Byte;
    // $01: mdx (or cdx for VFP) index file present
    // $02: (Visual FoxPro): associated memo file?
    // $04: (Visual FoxPro): is this a dbc/database container
    MDXFlag     : Byte;     // 28 Flags:
    Language    : Byte;     // 29 code page mark
    Dummy3      : Word;     // 30-31 reserved
  end;

  rEndFixedHdrVII = packed record
    LanguageDriverName  : array[32..63] of Char; //starting position 32 of header
    Dummy               : array[64..67] of Byte;
  end;

  rFieldDescVII = packed record
    FieldName      : array [0..31] of Char;
    FieldType      : Char;  // 32
    FieldSize      : Byte;  // 33
    FieldPrecision : Byte;  // 34
    Reserved1      : Word;  // 35-36
    MDXFlag        : Byte;  // 37
    // NOTE: the docs say Reserved2 is 2 bytes, and Reserved3 is 4 bytes
    //   but testing shows BDE has them the other way around
    //   be BDE compatible :S
    Reserved2      : Cardinal; // 38-41
    NextAutoInc    : Cardinal; // 42-45
    Reserved3      : Word; // 46-47
  end;

const
  FIELD_DESCRIPTOR_ARRAY_TERMINATOR = $0D; // Marker at end of list of fields within header
  EofTerminator = $1A;

type
 EMyException = class(Exception);

procedure TForm1.Button1Click(Sender: TObject);
var
  DbfHdr: rDbfHdr;
  EndFixedHdrVII: rEndFixedHdrVII;
  FieldsDescVII: array of rFieldDescVII;
  DbfFile: TFileStream;
  Columns, I: Int64;
  Ch: Char;
  Buffer: Pointer;
begin
  DbfFile := TFileStream.Create('c:\temp\test2.dbf', fmOpenReadWrite or fmShareDenyWrite);
  DbfFile.ReadBuffer(DbfHdr, SizeOf(DbfHdr));
  if DbfHdr.VerDBF <> 4 then raise EMyException.Create('Only DBase level 7 accepted');

  DbfFile.ReadBuffer(EndFixedHdrVII, SizeOf(EndFixedHdrVII));
  Columns := (DbfHdr.FullHdrSize - SizeOf(DbfHdr) - SizeOf(EndFixedHdrVII)) div SizeOf(rFieldDescVII);

  I := -1;
  SetLength(FieldsDescVII, Columns);
  repeat
    Inc(I);
    DbfFile.ReadBuffer(FieldsDescVII[I], SizeOf(rFieldDescVII));
    // Showmessage(FieldsDescVII[I].FieldName);
    if (DbfFile.Read(Ch, 1) = 1) and (Ord(Ch) <> FIELD_DESCRIPTOR_ARRAY_TERMINATOR) then
        DbfFile.Seek(-1, soCurrent) // unread it if not FIELD_DESCRIPTOR_ARRAY_TERMINATOR character.
  until (I > Columns) or (Ord(Ch) = FIELD_DESCRIPTOR_ARRAY_TERMINATOR);

  I := -1;
  GetMem(Buffer, DbfHdr.RecordSize);
  repeat
    Inc(I);
    DbfFile.ReadBuffer(Buffer^, DbfHdr.RecordSize);

    // now do something with the record in Buffer^
    // Use FieldsDescVII to determine the field-offsets
    // ...

    if (DbfFile.Read(Ch, 1) = 1) and (Ord(Ch) <> EofTerminator) then
        DbfFile.Seek(-1, soCurrent) // unread it if not EofTerminator character.
  until (I > DbfHdr.RecordCount) or (Ord(Ch) = EofTerminator);
  Freemem(Buffer, DbfHdr.RecordSize);

  DbfFile.Free;

end;

jufau

  • New Member
  • *
  • Posts: 42
Re: LARGE DBF FILES IN LAZARUS
« Reply #11 on: July 21, 2015, 06:31:01 pm »
Thank you rvk,
I will play around with it and I will let you know the results.

J

 

TinyPortal © 2005-2018