Recent

Author Topic: Extract hostname from URL  (Read 1725 times)

pcurtis

  • Hero Member
  • *****
  • Posts: 951
Extract hostname from URL
« on: December 03, 2020, 07:43:20 am »
Hi All,

How do I extract the host name from a URL?

Thanks in advance.
Windows 10 20H2
Laz 2.2.0
FPC 3.2.2

trev

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2020
  • Former Delphi 1-7, 10.2 user
Re: Extract hostname from URL
« Reply #1 on: December 03, 2020, 07:44:30 am »
I'd use a regex.

pcurtis

  • Hero Member
  • *****
  • Posts: 951
Re: Extract hostname from URL
« Reply #2 on: December 03, 2020, 07:49:06 am »
That's nice,  :) but how?
Windows 10 20H2
Laz 2.2.0
FPC 3.2.2

Roland57

  • Sr. Member
  • ****
  • Posts: 416
    • msegui.net
Re: Extract hostname from URL
« Reply #3 on: December 03, 2020, 08:27:34 am »
Hello !

Here is a quick example.

Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils, RegExpr;
  3.  
  4. function HostName(const AUrl: string): string;
  5. const
  6.   CExpr = '(http://|https://)(\w+\.)?(\w+\.\w+).*';
  7. var
  8.   LExpr: TRegExpr;
  9. begin
  10.   LExpr := TRegExpr.Create(CExpr);
  11.   if LExpr.Exec(AUrl) then
  12.     result := LExpr.Match[2] + LExpr.Match[3]
  13.   else
  14.     result := '';
  15.   LExpr.Free;
  16. end;
  17.  
  18. const
  19.   CSample: array[0..3] of string = (
  20.     'https://forum.lazarus.freepascal.org/index.php?action=forum',
  21.     'https://www.lazarusforum.de/index.php',
  22.     'https://duckduckgo.com/',
  23.     'http://www.blockmrecords.org/bach/index.htm'
  24.   );
  25.  
  26. var
  27.   s: string;
  28.  
  29. begin
  30.   for s in CSample do
  31.     WriteLn(s, LineEnding, HostName(s));
  32. end.
  33.  
« Last Edit: December 05, 2020, 05:27:31 am by Roland57 »
My projects are on Gitlab and on Codeberg.

pcurtis

  • Hero Member
  • *****
  • Posts: 951
Re: Extract hostname from URL
« Reply #4 on: December 03, 2020, 08:53:34 am »
Thanks. I'll have a look.
Windows 10 20H2
Laz 2.2.0
FPC 3.2.2

Bart

  • Hero Member
  • *****
  • Posts: 5265
    • Bart en Mariska's Webstek
Re: Extract hostname from URL
« Reply #5 on: December 03, 2020, 09:01:12 am »
Isn't copying until the first forwardslash after the protocol enough?
Assuming an URL is [protocol://]someserver.some_toplevel_domain/and_then_the_rest?

Bart

Thaddy

  • Hero Member
  • *****
  • Posts: 14161
  • Probably until I exterminate Putin.
Re: Extract hostname from URL
« Reply #6 on: December 03, 2020, 09:10:02 am »
No. because you can have things like my old http://thaddy.co.uk (not functioning anymore). Note co.uk is a valid top domain extension.
Afaikt this also fails with the above regular expression.
Specialize a type, not a var.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5444
  • Compiler Developer
Re: Extract hostname from URL
« Reply #7 on: December 03, 2020, 09:34:53 am »
Bart was talking about forward slash, not dot. And as long as the code also handles the case of there being no forward slash after the protocol designation this would indeed work correctly.

dsiders

  • Hero Member
  • *****
  • Posts: 1045
Re: Extract hostname from URL
« Reply #8 on: December 03, 2020, 09:36:55 am »
Hi All,

How do I extract the host name from a URL?

Thanks in advance.

I would use the ParseURI routine URIParser.pp. Returns a TURI record with the values. Handles encoded characters too.

https://www.freepascal.org/docs-html/current/fcl/uriparser/parseuri.html
Preview Lazarus 3.99 documentation at: https://dsiders.gitlab.io/lazdocsnext

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11351
  • FPC developer.
Re: Extract hostname from URL
« Reply #9 on: December 03, 2020, 09:59:02 am »
http(s)://fdqndomainname:port/path/file#tag?params=

there can be a port number too.

I would use a prepared routine that tries to implement some uri parsing documentation, like dsiders suggests.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6647
Re: Extract hostname from URL
« Reply #10 on: December 03, 2020, 10:31:52 am »
How do I extract the host name from a URL?

In the general case, you can't.

You can extract a domain name, but more often than not that will resolve (via DNS) to an IP address. That IP address will represent either one or a cluster of machines at an ISP, which handle the traffic for multiple (possibly thousands of) domains; those machines might have one or more host names from the POV of the ISP but this might or might not be public.

If you were running the system yourself, you might be telling people that your URL was something like http://www.thaddy.co.uk In that particular case, the hostname would be www and the domain would be thaddy.co.uk (Thaddy, sorry for coopting and mangling your example).

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Thaddy

  • Hero Member
  • *****
  • Posts: 14161
  • Probably until I exterminate Putin.
Re: Extract hostname from URL
« Reply #11 on: December 03, 2020, 11:02:54 am »
I gave up that domain... and stick to .com, .nl and .org.  :D I forgot about example.com etc...
Specialize a type, not a var.

 

TinyPortal © 2005-2018