Recent

Author Topic: Clean an email address?  (Read 4944 times)

KarenT

  • Full Member
  • ***
  • Posts: 120
Clean an email address?
« on: June 11, 2018, 08:05:23 pm »
Hello,

I need to clean incoming email addresses. Can someone please point me at some code snippets?
All I want is the actual address. e.g.

something@somewhere.com

Reading the RFC and searching on-line an email address can be quite a mess.
See here for starters, page-down x 2:
https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/

But I have seen
DesignSpark <designspark@news.rs-online.com> Using nlserver, Build 6.1.1.8705
"B&H Photo" <ord-status@bhphotovideo.com> Using MIME::Lite 3.01 (E2.72; F4.60; Q2.21; G4.21)

It gets complicated fast.

dsiders

  • Hero Member
  • *****
  • Posts: 1052
Re: Clean an email address?
« Reply #1 on: June 11, 2018, 09:24:04 pm »
Hello,

I need to clean incoming email addresses. Can someone please point me at some code snippets?
All I want is the actual address. e.g.

something@somewhere.com

Reading the RFC and searching on-line an email address can be quite a mess.
See here for starters, page-down x 2:
https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/

But I have seen
DesignSpark <designspark@news.rs-online.com> Using nlserver, Build 6.1.1.8705
"B&H Photo" <ord-status@bhphotovideo.com> Using MIME::Lite 3.01 (E2.72; F4.60; Q2.21; G4.21)

It gets complicated fast.

I know Indy10 has the TIdEMailAddressItem and TIdEMailAddressList classes to help with the process (lib/Protocols/IdEmailAddress.pas). If nothing else, look at the logic for TIdEMailAddressItem.SetText for the particulars.

Hope that helps.

Don
Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

Zvoni

  • Hero Member
  • *****
  • Posts: 2319
Re: Clean an email address?
« Reply #2 on: June 12, 2018, 09:34:06 am »
I found it always helpful to think about a logic/algorithm how i would implement it myself, before using someone else's code.

Get incoming string containing the email-address
parse the string looking for the "<"-character
from that position+1 parse the string further looking for the ">"-character.
now you've found your token containing the address
a further check if the token contains one (and only one) "@"-symbol by splitting the token along the @-character
the lower part is the name the upper part the domain
checking the domain if it at least contains one dot "."-character (2 or more if it's coming from subdomains (e.g. myname@sub.domain.com)

And what do you know?
I've written such a splitfunction as an excercise to get familiar with Pascal.

EDIT: I've just read your link.
I didn't know that you could use the @-symbol in the local part if you escape or quote it!!  :o :o :o
« Last Edit: June 12, 2018, 09:41:07 am by Zvoni »
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

ASBzone

  • Hero Member
  • *****
  • Posts: 678
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Clean an email address?
« Reply #3 on: June 12, 2018, 03:03:49 pm »
EDIT: I've just read your link.
I didn't know that you could use the @-symbol in the local part if you escape or quote it!!  :o :o :o

To be fair, I have never seen this used anywhere.

It seems to me that the logic you used would address the vast majority of the use-cases that are likely to occur on any regular basis.

And once you have extracted the contents in between "<" and ">" you can still choose to handle the occurrence of a second "@" if necessary.
-ASB: https://www.BrainWaveCC.com/

Lazarus v2.2.7-ada7a90186 / FPC v3.2.3-706-gaadb53e72c
(Windows 64-bit install w/Win32 and Linux/Arm cross-compiles via FpcUpDeluxe on both instances)

My Systems: Windows 10/11 Pro x64 (Current)

KarenT

  • Full Member
  • ***
  • Posts: 120
Re: Clean an email address?
« Reply #4 on: June 12, 2018, 03:24:26 pm »
EDIT: I've just read your link.

:D As I was reading your post a grin began to form as I was pretty sure you had not checked what is allowable -- and -- not only a second "@", but as many as you like.

I am already doing as you suggested and a lot more, but still occasionally get a blank "To" or "From" address meaning my attempts have failed to clean it up.

Seems to my simple mind that something as basic as an email address should be defined in the RFC way more rigidly. But what do I know. :)

KarenT

  • Full Member
  • ***
  • Posts: 120
Re: Clean an email address?
« Reply #5 on: June 12, 2018, 03:32:39 pm »
I know Indy10

Thanks I had already checked but their checking is not as comprehensive as the stuff I have already developed. And, as mentioned on my other reply my function still occasionally cannot resolve to a clean address.

In my windows/Delphi days many years back, I remember seeing something on cleaning email address as part of a package like Synautils etc. But have spent two hours going through old backup HDDs and cannot find it. It was a weirdly named thing like "u18263address(..." probably the number was naming an RFC or something like that.

ASBzone

  • Hero Member
  • *****
  • Posts: 678
  • Automation leads to relaxation...
    • Free Console Utilities for Windows (and a few for Linux) from BrainWaveCC
Re: Clean an email address?
« Reply #6 on: June 12, 2018, 03:47:57 pm »
Seems to my simple mind that something as basic as an email address should be defined in the RFC way more rigidly. But what do I know. :)

Two words:  Backwards Compatibility

Some of these earlier standards were developed when there was quite a bit of flexibility across a variety of proprietary systems.  Today, there is a greater tendency to be a bit more structured in the RFCs...
-ASB: https://www.BrainWaveCC.com/

Lazarus v2.2.7-ada7a90186 / FPC v3.2.3-706-gaadb53e72c
(Windows 64-bit install w/Win32 and Linux/Arm cross-compiles via FpcUpDeluxe on both instances)

My Systems: Windows 10/11 Pro x64 (Current)

Zvoni

  • Hero Member
  • *****
  • Posts: 2319
Re: Clean an email address?
« Reply #7 on: June 12, 2018, 06:00:17 pm »
Quote from: KarenT

I am already doing as you suggested and a lot more, but still occasionally get a blank "To" or "From" address meaning my attempts have failed to clean it up.
What are you struggling with?
Examples which fail?
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

KarenT

  • Full Member
  • ***
  • Posts: 120
Re: Clean an email address?
« Reply #8 on: June 12, 2018, 08:13:48 pm »
What are you struggling with?

I don't have anything at the moment and I am not so much struggling as in "don't know how to do it," but more along the lines of "can't keep up with the moving window." :)

No sooner do I include something weird in my "Clean" routine and I get another even weirder one, albeit usually weeks apart. I was hoping someone had been down this path before and had an all singing, all dancing version that coped with everything.

Zvoni

  • Hero Member
  • *****
  • Posts: 2319
Re: Clean an email address?
« Reply #9 on: June 13, 2018, 08:02:47 am »
I was hoping someone had been down this path before and had an all singing, all dancing version that coped with everything.

Ah no! Even the worlds most famous physicists haven't found the Formula of everything.
The Unified Field Theory of Programming still has to be discovered!  :P :P :P
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: Clean an email address?
« Reply #10 on: June 13, 2018, 11:52:30 am »
This may help:
Code: Pascal  [Select][+][-]
  1. program findemail;
  2. {$mode objfpc}
  3. // finds basically most valid email addresses in a string or file (99%)
  4. uses regexpr;
  5. var
  6.   Expr:TRegExpr;
  7. begin
  8.   Expr := TRegExpr.Create('[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}');
  9.   try
  10.     if Expr.Exec('vincent@vangogh.museum designspark@news.rs-online.com    <ord-status@bhphotovideo.com> some stuff "myemail@mail.info" mail@mail.mail.com') then
  11.     repeat
  12.       writeln(Expr.Match[0]);
  13.     until Expr.ExecNext =  false;
  14.   finally
  15.     Expr.free;
  16.   end;  
  17. end.
There is an official RegEx that conforms to the RFC 5322 that you can also use, but that is:
a) rather complex.
b) will fail in a case such as you mentioned.
c) I could not get it to work with TRegExpr.. :o :( Maybe someone else can.
RFC 5322 regexpr:
Code: [Select]
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])Probably needs some extra escaping to make that work in TRegExpr.

« Last Edit: June 13, 2018, 12:52:15 pm by Thaddy »
Specialize a type, not a var.

Thaddy

  • Hero Member
  • *****
  • Posts: 14201
  • Probably until I exterminate Putin.
Re: Clean an email address?
« Reply #11 on: June 13, 2018, 02:05:28 pm »
@KarenT
Here is the same code as a command-line utility:
Code: Pascal  [Select][+][-]
  1. program findemails;
  2. {$mode objfpc}
  3. // simple commandline utility
  4. // finds basically all email addresses in a text (including html) file (99%)
  5. uses sysutils, classes, regexpr;
  6. var
  7.   MyFile:TStringList;
  8.   Expr:TRegExpr;
  9. begin
  10.   if ParamCount <> 1 then
  11.   begin
  12.     writeln('Use: findemails <filename>');
  13.     Halt;
  14.   end;
  15.   if FileExists(ParamStr(1)) then
  16.   begin
  17.     MyFile := TStringList.Create;
  18.     MyFile.LoadFromFile(ParamStr(1));
  19.     try    
  20.       Expr := TRegExpr.Create('[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}');
  21.       try
  22.        if Expr.Exec(MyFile.Text) then
  23.        repeat
  24.          writeln(Expr.Match[0]);
  25.        until Expr.ExecNext =  false;
  26.       finally
  27.         Expr.free;
  28.       end;
  29.     finally
  30.       MyFile.Free;
  31.     end;
  32.   end;  
  33. end.
« Last Edit: June 13, 2018, 02:09:19 pm by Thaddy »
Specialize a type, not a var.

RayoGlauco

  • Full Member
  • ***
  • Posts: 176
  • Beers: 1567
Re: Clean an email address?
« Reply #12 on: June 13, 2018, 02:18:43 pm »
Only to mess things up a little more, what about IDN domains, that include international characters? https://en.wikipedia.org/wiki/Internationalized_domain_name
To err is human, but to really mess things up, you need a computer.

RayoGlauco

  • Full Member
  • ***
  • Posts: 176
  • Beers: 1567
Re: Clean an email address?
« Reply #13 on: June 13, 2018, 02:44:48 pm »
I found some examples of internationalised email adresses here: https://en.wikipedia.org/wiki/International_email
  用户@例子.广告   
  अजय@डाटा.भारत   
  квіточка@пошта.укр   
  θσερ@εχαμπλε.ψομ   
  Dörte@Sörensen.example.com
  аджай@экзампл.рус   
To err is human, but to really mess things up, you need a computer.

 

TinyPortal © 2005-2018