1. First of all you should check the fourCC code (first four bytes of the file). It might also tell you the encoding of the file if it is a txt one.
2. Probably you could search the file for non-printable character, i.e. those whose ascii code is less than 32. If the file contains too many of them, then it should not be a text file.
But maybe these chars are to be counatined in UTF8 encodings, so it might get a little trickier.
3. You could search the text for some speciffic tags, which should be contained in HTML, RTF, DOC, etc.
Also, you should check
UTF-8 Tools by
Theo.