Turning them into indexed hashes and sorting those sounds good to me. Depending on how you sort, comparing them is often the thing that takes the most time and with hashes you only have to read the data once.
Then again, if the entries are long sentences, finding the first difference might be faster. I can think of edge cases either way.