I'm thinking how to tackle a problem.
I have a situation where very large text files (word lists, essentially) need to be read and the duplicates removed.
For small files of several Mb, a TStringList could be used, but I'm not sure how to tackle this with large files of sometimes several Gb. If the whole file cannot be loaded into memory, how does the program know what may lay ahead in the file? e.g. if the word "Peter" is found on line 5 but also on line 1,000,000,000, how can my program know about the entry on that millionth line when it reads the first one on line 5?
I'm guessing a TFileStream will be needed but I'm still unsure how to efficiently tackle the problem without reading the whole file from start to end for every line in the file. At the moment, that's the only way I can think of : i.e. ReadLine from stream, check every other line to see if it equals line being read. If not, leave. If it does, delete. But that is very inefficient it seems to me.
I don't expect code necessarily as I realise I haven't given anything yet. I'm just after ideas from those with the experience.
Ta