Recent

Author Topic: Extract hostname from URL  (Read 602 times)

pcurtis

  • Sr. Member
  • ****
  • Posts: 377
Extract hostname from URL
« on: December 03, 2020, 07:43:20 am »
Hi All,

How do I extract the host name from a URL?

Thanks in advance.
Windows 10 / Linux Mint 20
Laz 2.10.0
FPC 3.2.0

trev

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1185
  • Former Delphi 1-7, 10.2 User
Re: Extract hostname from URL
« Reply #1 on: December 03, 2020, 07:44:30 am »
I'd use a regex.
o Lazarus 2.1.0 r64368, FPC 3.3.1 r48100, macOS 10.14.6, Xcode 11.3.1
o Lazarus 2.1.0 r64392, FPC 3.3.1 Jan 13 21:24, macOS 11.1 (aarch64), Xcode 12.3
o Lazarus 2.1.0 r61574, FPC 3.3.1 r42318, FreeBSD 12.1 amd64 (VMware VM)
o Lazarus 2.1.0 r61574, FPC 3.0.4, Ubuntu 20.04 (Parallels VM)

pcurtis

  • Sr. Member
  • ****
  • Posts: 377
Re: Extract hostname from URL
« Reply #2 on: December 03, 2020, 07:49:06 am »
That's nice,  :) but how?
Windows 10 / Linux Mint 20
Laz 2.10.0
FPC 3.2.0

Roland57

  • Full Member
  • ***
  • Posts: 114
Re: Extract hostname from URL
« Reply #3 on: December 03, 2020, 08:27:34 am »
Hello !

Here is a quick example.

Code: Pascal  [Select][+][-]
  1. uses
  2.   SysUtils, RegExpr;
  3.  
  4. function HostName(const AUrl: string): string;
  5. const
  6.   CExpr = '(http://|https://)(\w+\.)?(\w+\.\w+).*';
  7. var
  8.   LExpr: TRegExpr;
  9. begin
  10.   LExpr := TRegExpr.Create(CExpr);
  11.   if LExpr.Exec(AUrl) then
  12.     result := LExpr.Match[2] + LExpr.Match[3]
  13.   else
  14.     result := '';
  15.   LExpr.Free;
  16. end;
  17.  
  18. const
  19.   CSample: array[0..3] of string = (
  20.     'https://forum.lazarus.freepascal.org/index.php?action=forum',
  21.     'https://www.lazarusforum.de/index.php',
  22.     'https://duckduckgo.com/',
  23.     'http://www.blockmrecords.org/bach/index.htm'
  24.   );
  25.  
  26. var
  27.   s: string;
  28.  
  29. begin
  30.   for s in CSample do
  31.     WriteLn(s, LineEnding, HostName(s));
  32. end.
  33.  
« Last Edit: December 05, 2020, 05:27:31 am by Roland57 »

pcurtis

  • Sr. Member
  • ****
  • Posts: 377
Re: Extract hostname from URL
« Reply #4 on: December 03, 2020, 08:53:34 am »
Thanks. I'll have a look.
Windows 10 / Linux Mint 20
Laz 2.10.0
FPC 3.2.0

Bart

  • Hero Member
  • *****
  • Posts: 4093
    • Bart en Mariska's Webstek
Re: Extract hostname from URL
« Reply #5 on: December 03, 2020, 09:01:12 am »
Isn't copying until the first forwardslash after the protocol enough?
Assuming an URL is [protocol://]someserver.some_toplevel_domain/and_then_the_rest?

Bart

Thaddy

  • Hero Member
  • *****
  • Posts: 10683
Re: Extract hostname from URL
« Reply #6 on: December 03, 2020, 09:10:02 am »
No. because you can have things like my old http://thaddy.co.uk (not functioning anymore). Note co.uk is a valid top domain extension.
Afaikt this also fails with the above regular expression.

PascalDragon

  • Hero Member
  • *****
  • Posts: 2578
  • Compiler Developer
Re: Extract hostname from URL
« Reply #7 on: December 03, 2020, 09:34:53 am »
Bart was talking about forward slash, not dot. And as long as the code also handles the case of there being no forward slash after the protocol designation this would indeed work correctly.

dsiders

  • Sr. Member
  • ****
  • Posts: 410
Re: Extract hostname from URL
« Reply #8 on: December 03, 2020, 09:36:55 am »
Hi All,

How do I extract the host name from a URL?

Thanks in advance.

I would use the ParseURI routine URIParser.pp. Returns a TURI record with the values. Handles encoded characters too.

https://www.freepascal.org/docs-html/current/fcl/uriparser/parseuri.html
Lazarus 2.1 (SVN) / FPC 3.0.4 / FPC 3.2.0 / x86-win64 / Windows 8.1

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8998
  • FPC developer.
Re: Extract hostname from URL
« Reply #9 on: December 03, 2020, 09:59:02 am »
http(s)://fdqndomainname:port/path/file#tag?params=

there can be a port number too.

I would use a prepared routine that tries to implement some uri parsing documentation, like dsiders suggests.

MarkMLl

  • Hero Member
  • *****
  • Posts: 1710
Re: Extract hostname from URL
« Reply #10 on: December 03, 2020, 10:31:52 am »
How do I extract the host name from a URL?

In the general case, you can't.

You can extract a domain name, but more often than not that will resolve (via DNS) to an IP address. That IP address will represent either one or a cluster of machines at an ISP, which handle the traffic for multiple (possibly thousands of) domains; those machines might have one or more host names from the POV of the ISP but this might or might not be public.

If you were running the system yourself, you might be telling people that your URL was something like http://www.thaddy.co.uk In that particular case, the hostname would be www and the domain would be thaddy.co.uk (Thaddy, sorry for coopting and mangling your example).

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

Thaddy

  • Hero Member
  • *****
  • Posts: 10683
Re: Extract hostname from URL
« Reply #11 on: December 03, 2020, 11:02:54 am »
I gave up that domain... and stick to .com, .nl and .org.  :D I forgot about example.com etc...

 

TinyPortal © 2005-2018