Recent

Author Topic: Maximum (UTF8)String length on 64-bit  (Read 4319 times)

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Maximum (UTF8)String length on 64-bit
« on: February 09, 2016, 10:22:41 am »
I have some XML and SQL files that are more than 1 GB in size. As I expect them to grow in the future, I wondered what the maximum size is.

The Free Pascal wiki says their length is unlimited, or only limited by available memory. But in the diagrams, they use 4 bytes for the length. And the program I'm working on already requires 5 GB of RAM at the moment.

If I have a TStringList with more than 2 GB of (UTF8)String data, can I use SaveToFile?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11446
  • FPC developer.
Re: Maximum (UTF8)String length on 64-bit
« Reply #1 on: February 09, 2016, 11:00:15 am »
No, probably the limit is signed integer, 2GB. The question is if the codepath that you use creates one single string of everything or not.

P.s. Worse, operations with TStringlist often use multiples of the memory.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Maximum (UTF8)String length on 64-bit
« Reply #2 on: February 09, 2016, 11:22:38 am »
Ok, thanks. I'll see if SaveToFile / LoadFromFile works. Or see if I can change them to prevent .Text in both cases.

avra

  • Hero Member
  • *****
  • Posts: 2514
    • Additional info
Re: Maximum (UTF8)String length on 64-bit
« Reply #3 on: February 10, 2016, 11:00:22 am »
I have some XML and SQL files that are more than 1 GB in size.
I can only imagine some table like data of such size as a query result from some outside system. If you ever encounter size limitation problem (which you shouldn't) then memory dataset with custom parsing while loading data line by line could be a solution.

Where in FPC wiki is this 4 bytes length limit mentioned? In real life I have only seen XML messages prefixed with 4 byte length integer in some TCPIP messaging protocols (which is understandable), but that should not be a general XML size limit inside or outside of FPC.

Why don't you create some ultra giga big XML and TStringList files in memory and try to save them and check the results?
« Last Edit: February 10, 2016, 11:02:16 am by avra »
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Maximum (UTF8)String length on 64-bit
« Reply #4 on: February 10, 2016, 11:32:14 am »
It's geographical data from the government that I'm parsing to extract the addresses and zipcodes. The minimal input data set is 19 GB of XML data (it's 55 GB in total, after unzipping, but I don't need all of it).

If it was internally consistent, I could create an SQL script on the fly, but it isn't, so I have to process everything in memory.

The SQL script generated is 1.3 GB in size. I could chop it up, but it would make little difference. The resulting database is only 80 MB, so that is far easier to distribute.


My main bottleneck is querying the database to find changes. That takes far longer than a day. So it's easier to just generate a new database each time, that only takes a few hours. Hence the SQL script.

If it was me, I wouldn't even use a database, because the data itself isn't that big. Just dump it to / read it from a custom binary file on the server.

 

TinyPortal © 2005-2018