Recent

Author Topic: [SOLVED] How to identify Base64 strings from non-Base64 strings  (Read 31245 times)

Gizmo

  • Hero Member
  • *****
  • Posts: 831
[SOLVED] How to identify Base64 strings from non-Base64 strings
« on: October 17, 2012, 11:13:27 pm »
Hi

Further to this thread (http://www.lazarus.freepascal.org/index.php/topic,18605.msg105185.html#msg105185), I am now able to find potential Base64 strings in my data. Where found, it will decode them. I am using DecodeStringBase64 as identified by reading Marco van de Voort edited reply here : http://stackoverflow.com/questions/10242580/how-to-encode-file-of-any-type-into-base64-string-and-then-decode-it-into-file-a

s:=DecodeStringBase64(s);

However, my question is whether it is possible, either by examining (with my program) the resulting decoded string or by examining before decoding it, to work out if a string that happens to match a Base64 Reg Ex is actually Base64 in the first place instead of some other data that happens to still match the expression without being Base64? AFAIK, there is not "true" parameter that you can pass to DecodeStringBase64, like you can with the PHP function base64_decode() (http://stackoverflow.com/questions/2556345/detect-base64-encoding-in-php) :

Code: [Select]
// PHP version of Base64Decode:
DecodedString = base64_decode('Base64EncodedData', true)

For example, a fairly long and modertely unique string like 'VGhlIFNseSBGb3g=' decodes to 'The Sly Fox'.

However, my name ('ted') decode to simply 'dGVk'. There's not even an '=' footer in this instance to make it apart from just the letter d, g, v, or k. . So I can't even code my program to say "Only decode the suspected Base64 value if it is X characters in length" because, potentially, something as small as 'ted' (dGVk) could be a Base64 value.

Another equally viable Base64 value is WeJcFMQ/8+8QJ/w0hHh+0g== but this does not decode properly and returns Yâ\Ä?óï'ü4„xÒ . Is there any way to validate the returned decoded data?

What I have done is to use mod to determine if it is divisible by 4, as all Base64 has to be divisible by 4. But other than that, I am stumped.

Any ideas? The best idea I keep thinking of is a check to see if the decoded value is "human English" as opposed gibberish. For example "if it deoces to "Hello World" accept it, but if it decodes to "?`¬!" reject it? Not sure how to do that though? Better still, is there any error returns with DecodeStringBase64? In other words, does it do any checks of it's own to see if a value legitmately decodes? I suspect it can't, given the nature of Base64, but you never know - I thought I'd ask (AFAIK, it just returns the resulting string).

Before anyone says "Google it", I have already done so and read the replies. On the whole, it seems it is not possible to know for sure and other than the techniques I have employed already (using mod 4 for example), it seems that is probably as good as it gets. But I thought I'd ask.

http://stackoverflow.com/questions/6889450/methods-for-identifying-encoding-type-using-php
http://stackoverflow.com/questions/8571501/how-to-check-whether-the-string-is-base64-encoded-or-not
... and more besides
« Last Edit: October 18, 2012, 11:18:26 pm by tedsmith »

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: How to identify Base64 strings from non-Base64 strings
« Reply #1 on: October 18, 2012, 11:18:10 pm »
Answered here by TLama and Marco and an improvement to be added to FPC 2.6.2:

http://stackoverflow.com/questions/12943971/validating-base64-input-with-free-pascal-and-decodestringbase64

Thanks

 

TinyPortal © 2005-2018