Recent

Author Topic: Tool to extract m3u8 stream from embedded video player?  (Read 14254 times)

BosseB

  • Sr. Member
  • ****
  • Posts: 484
Tool to extract m3u8 stream from embedded video player?
« on: August 28, 2023, 09:33:26 am »
This question is about Lazarus/fpc on Linux Ubuntu.
I would like to create a tool that can extract the m3u8 video stream URL given the URL of the webpage where the player is embedded. It would be used on streaming video sites that change this URL regularly with short time intervals.
With the m3u8 url I can use a call to ffmpeg to do the actual stream download I want to do.

Some such sites can be processed by a command like this in a shell script to extract the url:
Code: [Select]
curl -s "<url to webpage>" | grep -o -e \"https://.\+m3u8\"
But others (most) need some other way of finding the m3u8 url for the stream.

When playing such a site in FireFox one can hit F12 to bring up a debug window where one can find the m3u8 URL below the "Network" tab and right-click to copy the url.
The problem with this is that the m3u8 url on some sites change at intervals ranging from hours to days, so a manal examination and extraction is not practical when downloading the video using ffmpeg in unattended mode.

I would like to create an fpc utility that can do this on an arbitrary webpage such that when the download script starts it will call the utility to get the m3u8 url and send that into an ffmpeg command that does the download itself.

So my question is really if someone here can suggest a way using fpc/lazarus to create a command line utility that will run on Ubuntu to extract the m3u8 url given the main webpage url where the player is embedded into?

Note:
There is a FireFox extension named "Video Controller", which can do this extraction, but I have not found the source for this so I cannot see how it is being done...
And it cannot be automated because of this.

Any suggestions?
--
Bo Berglund
Sweden

Fibonacci

  • Hero Member
  • *****
  • Posts: 788
  • Internal Error Hunter
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #1 on: August 28, 2023, 11:10:38 am »
Hardest thing is to catch the URL to the m3u8 file. It can be well hidden, dynamically created, obfuscated or sent via WebSockets.

In pure FPC you can do as much as your curl | grep command.

How would I do it?

I would do it as an extension for the browser, in JavaScript. It would catch all HTTP requests, search for m3u8 in URL and send it together with cookies to FPC app, which would be a simple HTTP server just to receive m3u8 URL and cookies. I would then download that file and pass it to ffmpeg or ytdl.

Probably instead of extension I would do it in some headless browser such as PhantomJS or Selenium

Also using chromedriver (or other browser driver) is an option. Forum member ported WebDriver4D to Lazarus (click). I don't think capturing network traffic is possible with this unit but you can inject some JS code (I guess).

BTW Arent firefox extensions just JavaScript files? Search for *.xpi files and open it as ZIP, it will contain full source code of this extension you mentioned

BosseB

  • Sr. Member
  • ****
  • Posts: 484
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #2 on: August 28, 2023, 12:44:15 pm »
BTW Arent firefox extensions just JavaScript files? Search for *.xpi files and open it as ZIP, it will contain full source code of this extension you mentioned
Maybe, I don't know.
I looked inside this folder:
Code: [Select]
C:\Users\Username\AppData\Roaming\Mozilla\Firefox\Profiles\lxxxxxl9m.default-release\extensions
and I found a number of files of this kind there:
Code: [Select]
langpack-sv-SE@firefox.mozilla.org.xpi
privatkopiera@firefox.stefansundin.com.xpi
swedish@dictionaries.addons.mozilla.org.xpi
uBlock0@raymondhill.net.xpi
{06b3e550-d8ff-4273-8fa0-03cd60feee8a}.xpi
{6cc0a66e-ae3d-4cd8-9a03-5cd93b392903}.xpi
{943b8007-a895-44af-a672-4f4ea548c95f}.xpi
{b9db16a4-6edc-47ec-a1f4-b86292ed211d}.xpi
{e4a8a97b-f2ed-450b-b12d-ee082ba24781}.xpi

How can I know which file belongs to the extension I mentioned?
Those with actual names I recognize but not the others with GUID names...
--
Bo Berglund
Sweden

Fibonacci

  • Hero Member
  • *****
  • Posts: 788
  • Internal Error Hunter
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #3 on: August 28, 2023, 12:49:01 pm »
Extract all and check, you will know which one is it if you see whats inside

TRon

  • Hero Member
  • *****
  • Posts: 4377
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #4 on: August 28, 2023, 03:04:19 pm »
Some such sites can be processed by a command like this in a shell script to extract the url:
curl -s "<url to webpage>" | grep -o -e \"https://.\+m3u8\"
In such cases you can use fphttpclient

Quote
But others (most) need some other way of finding the m3u8 url for the stream.
In case you are able to 'locate' the m3u8 file in the HTML source-code then you can try and extract it using jurassic pork's webdriver. In case you have no idea what that does then do a search for selenium webdriver -> you control your (embedded and if wanted hiding in the background) browser. The trickery used to hide files/links in webpages can be daunting so be warned.
Today is tomorrow's yesterday.

BosseB

  • Sr. Member
  • ****
  • Posts: 484
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #5 on: August 28, 2023, 03:09:59 pm »
Quote from: Fibonacci
Extract all and check, you will know which one is it if you see whats inside
OK, will do.
Meanwhile I have been checking manually on one of the video streams I want to use and it turns out that every hour it changes by modifying a 3 hour timestamp by 1 hour...
The part of the url that that changes looks like this:

Code: [Select]
FUu9Jj3Tre1H_KYVTSBE0g==,1693238086
(=August 28, 2023 17:54:46 GMT+02:00 DST)
6OwnmgK447Rhi7Z8z6NTeQ==,1693241698
(=August 28, 2023 18:54:58 GMT+02:00 DST)
so the last numeric part seems to be the expire time, decoded by me inside the ()..
Every hour it changes by 1 hour, but the preceding mumbo-jumbo also changes...

So if I could get the proper url once every 2 hours or so it would probably work.
--
Bo Berglund
Sweden

BosseB

  • Sr. Member
  • ****
  • Posts: 484
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #6 on: August 28, 2023, 03:53:48 pm »
Extract all and check, you will know which one is it if you see whats inside
I found it, but to no avail...
It is apparently a lot of javascript code put into files with line lengths of >> 10000 chars.
Not possible for me to decode. :(
--
Bo Berglund
Sweden

delphius

  • Jr. Member
  • **
  • Posts: 83
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #7 on: August 29, 2023, 09:34:23 am »
In case you have no idea what that does then do a search for selenium webdriver -> you control your (embedded and if wanted hiding in the background) browser.
At the moment, there is an easier way to get a similar result with the chrome browser running in headless mode using the webui library

Webdriver is tied to the browser version and is not small in size, and in general, the chain turns out to be very long, time-consuming and not reliable in practical use, except for testing purposes (which is what it was originally intended for)
« Last Edit: August 29, 2023, 09:39:10 am by delphius »
fpmtls - ssl/tls 1.3 implementation in pure pascal
fpmailsend - sending a simple email message
pascal-webui - use web browser as gui and fpc as backend

BosseB

  • Sr. Member
  • ****
  • Posts: 484
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #8 on: August 29, 2023, 02:44:51 pm »
Well, the server that my download code is running on is really an Ubuntu *server* with no GUI so I doubt that there are any web browsers or the like installed to hook into...
--
Bo Berglund
Sweden

delphius

  • Jr. Member
  • **
  • Posts: 83
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #9 on: August 29, 2023, 05:57:18 pm »
Well, the server that my download code is running on is really an Ubuntu *server* with no GUI so I doubt that there are any web browsers or the like installed to hook into...

Of course, there are certain difficulties with installing on the server, but they are mostly solvable

In any case, before trying any solution directly on the server, you need to fully understand the problem and the algorithm for solving it, while, as far as I understand, there is no final clarity.

Are you sure that the algorithms for obfuscating links to the stream will be the same for different sites?

In order for someone here to be able to competently advise a way to do this in the most convenient way, give any example of such a site with such links

Just firing a cannon at sparrows is also not the case  ::)
« Last Edit: August 29, 2023, 06:04:12 pm by delphius »
fpmtls - ssl/tls 1.3 implementation in pure pascal
fpmailsend - sending a simple email message
pascal-webui - use web browser as gui and fpc as backend

Fibonacci

  • Hero Member
  • *****
  • Posts: 788
  • Internal Error Hunter
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #10 on: August 29, 2023, 06:33:38 pm »
At the moment, there is an easier way to get a similar result with the chrome browser running in headless mode using the webui library

I checked this webui lib you linked to cos Im interested in alternatives to LCL. It looks nice, I was even able to static-compile this lib into my app so no DLL file needed. Single exe file, 261 KB.

But one thing, with webui its not possible to browse other webpages. It creates socket connection between browser and your app, and as soon as you close the connection the webui no longer works. Socket connection management is JavaScript appended to your HTML. So if for example you execute JS "document.location.href = 'some domain...'", then webui socket script is no longer running and connection breaks and your app closes - webui_wait() thinks the browser window was closed.

So you cant use it the same way as webdriver, you cant make it to go to some URL and play with DOM. URL changes = socket connection breaks = webui loses control.

BosseB

  • Sr. Member
  • ****
  • Posts: 484
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #11 on: August 29, 2023, 10:51:00 pm »
Today I discovered that my commands using curl which did not get results failed because something curl does makes the server send false data...
So I changed the code as follows below and now I get the m3u8 url for half of the sites I am using. Half of the remaining sources do not change the m3u8 very often so there I can do the manual extraction and save into the url-file.
I still have a handful of sources where the extraction needs to dig deeper...
Here is a snippet of the code I have changed to where I use wget instead:

Code: Bash  [Select][+][-]
  1.    #New command working for sources 1-6:
  2.    TMPFILE=$(mktemp)
  3.    CMD="wget -q ${STREAMURL} -O ${TMPFILE}"
  4.    eval $CMD #Download source to temp file
  5.    CMD="grep m3u8 ${TMPFILE} | cut -d '\"' -f2"
  6.    M3U8=$(eval $CMD)
  7.    rm $TMPFILE
  8.  

STREAMURL is the URL to the webpage hosting the streaming player.
--
Bo Berglund
Sweden

delphius

  • Jr. Member
  • **
  • Posts: 83
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #12 on: August 30, 2023, 06:35:24 am »
I checked this webui lib you linked to cos Im interested in alternatives to LCL. It looks nice, I was even able to static-compile this lib into my app so no DLL file needed. Single exe file, 261 KB.
It's great that you managed to try the library and link it statically. Could you write a little instruction on how to do this for other users?

But one thing, with webui its not possible to browse other webpages. It creates socket connection between browser and your app, and as soon as you close the connection the webui no longer works. Socket connection management is JavaScript appended to your HTML. So if for example you execute JS "document.location.href = 'some domain...'", then webui socket script is no longer running and connection breaks and your app closes - webui_wait() thinks the browser window was closed.
The library itself is not intended for testing web pages, it is precisely the replacement of the interface using the browser. But I use some tricks, at least they worked in the old version of the library, which allowed me to work around this problem, something like this, wich makes to autoload webui js code, or you can manipulate dom manually, including webui.js in each page you load.

And it was my suggestion to use the library in headless mode, because I'm not so interested in using it as an interface for applications, but I really need a full-fledged replacement of the heavy webdriver.

Here is a snippet of the code I have changed to where I use wget instead
At first glance, the presented command code is not difficult to repeat on pascal, even without third-party modules, since the entire page is simply downloaded and the necessary link is located in the text, given statically from the server when the page is generated. Difficulties will begin when link generation occurs on the fly, when the page is running in the browser, then it will require the methods discussed in the topic.
« Last Edit: August 30, 2023, 07:29:41 am by delphius »
fpmtls - ssl/tls 1.3 implementation in pure pascal
fpmailsend - sending a simple email message
pascal-webui - use web browser as gui and fpc as backend

BosseB

  • Sr. Member
  • ****
  • Posts: 484
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #13 on: August 30, 2023, 08:03:10 am »
Here is a snippet of the code I have changed to where I use wget instead
At first glance, the presented command code is not difficult to repeat on pascal, even without third-party modules, since the entire page is simply downloaded and the necessary link is located in the text, given statically from the server when the page is generated. Difficulties will begin when link generation occurs on the fly, when the page is running in the browser, then it will require the methods discussed in the topic.

I thought that these pages too were doing some cloaking like the few I still have to handle manually, but that was not the case - just a curl problem.

But there is another problem this script does not handle:
Pages with multiple video players...
It is not very common but my the page I used back in 2018 suddenly stuffed a second player on the same page and I guess this would need to be handled. That page is long gone, though, so currently no problem like that.
--
Bo Berglund
Sweden

delphius

  • Jr. Member
  • **
  • Posts: 83
Re: Tool to extract m3u8 stream from embedded video player?
« Reply #14 on: August 30, 2023, 08:20:46 am »
But there is another problem this script does not handle:
Pages with multiple video players...
It is not very common but my the page I used back in 2018 suddenly stuffed a second player on the same page and I guess this would need to be handled. That page is long gone, though, so currently no problem like that.
Theoretically, there should be one video source, and there may be several players, but they all use the same link to the stream, or is it not so?
fpmtls - ssl/tls 1.3 implementation in pure pascal
fpmailsend - sending a simple email message
pascal-webui - use web browser as gui and fpc as backend

 

TinyPortal © 2005-2018