Recent

Author Topic: How to acces html elements on web page from stand alone application?  (Read 10855 times)

Ronan

  • Full Member
  • ***
  • Posts: 132
Hi,

I need to access the html elements of a web page and modify/fill them with appropriate user datas, but all this need to be done from the stand alone application side, so the only thing left to the end user will be to enter captcha and press on submit button, is it possible to achieve that, preferably with synapse ?

sysrpl

  • Sr. Member
  • ****
  • Posts: 315
    • Get Lazarus
Re: How to acces html elements on web page from stand alone application?
« Reply #1 on: February 27, 2016, 03:27:45 am »
Depending on the page complexity, I would recommend using an http implementation (synapse) last. The reason is that often form state may depend on authorization, cookies, https, hidden form variables, json submission and other variable methods of posting data.

If you're on Windows I'd say use TWebBrowser. You can access the full document with late binding through variants. The WebBrowser.Document property provides this type of access.

The next most powerful method to automate web form submission is to install tampermonkey. This allows you to add scripts to any page. You can run a separate server to hand your tampermonkey scripts data. For example your script can open an XmlHttpRequest to ask for information which can be used in form fields. Although this sounds complex, it's actually a lot easier in many situations since you don't have to worry about the various transport methods attached to submit buttons.

Finally, yes you can use an http post implementation with something like synapse. If you working with a pure form submission with no cookies you can just pass a HTTP string to a socket send. It will look something like (you need to url escape post parameters):

POST /url/to/post/action.whatever HTTP/1.1
Host: www.domain.com
Connection: Close
Content-Type: application/x-www-form-urlencoded
Content-Length: 43
 
first_name=John&last_name=Doe&action=Submit

Ronan

  • Full Member
  • ***
  • Posts: 132
Re: How to acces html elements on web page from stand alone application?
« Reply #2 on: February 27, 2016, 08:34:52 am »
Thanks in advance,
Quote
Depending on the page complexity, I would recommend using an http implementation (synapse) last. The reason is that often form state may depend on authorization, cookies, https, hidden form variables, json submission and other variable methods of posting data.

The web site is Joomla created site and I'm trying to refer to one of those pages created by joomla.

That TwebBrowser seems to be the best option, googled for that component but it didn't retrieve that its a visual component nor do I find it it on my component palette, for the backward compatibility I use old version of Lazarus 0.9.30.2 probably its due to that?

The synapse option seems to be 2nd but another difficulty with that is : I need to familiarize myself with the HTTP POST content filing of targeted HTML elements. For example, this is exactly the element that  I want manpulate fill it with data from end-user automatically and submit it.<textarea id="comments-form-comment" name="comment" cols="65" rows="8" tabindex="5"></textarea> Any simple code snippet will be appreciated .

All of this hassle is to create a software registering mechanism, stand alone application provides data from end-user machine and send it to web site so I can proces it and I can send the registration info back to user. If all this is very awkward way of dealing with software registering or if there is much more easier way I'm welcome to any idea.

Your enlightement will be appreciated,

BeniBela

  • Hero Member
  • *****
  • Posts: 908
    • homepage
Re: How to acces html elements on web page from stand alone application?
« Reply #3 on: February 27, 2016, 02:40:07 pm »
Quote from: Ronan link=topic=31710.msg203486#msg203486
The synapse option seems to be 2nd but another difficulty with that is : I need to familiarize myself with the HTTP POST content filing of targeted HTML elements. For example, this is exactly the element that  I want manpulate fill it with data from end-user automatically and submit it.<textarea id="comments-form-comment" name="comment" cols="65" rows="8" tabindex="5"></textarea> Any simple code snippet will be appreciated .

My Internet Tools send the http request, if  you give them the webpage address and the form element names:

Code: [Select]
uses xquery, synapseinternetaccess;
xqvalue('http://your-site').retrieve().map('form(//form,{"comment": $_1})', your-data).retrieve()

Everything except javascript is handled automatically

Quote from: Ronan link=topic=31710.msg203486#msg203486
All of this hassle is to create a software registering mechanism, stand alone application provides data from end-user machine and send it to web site so I can proces it and I can send the registration info back to user. If all this is very awkward way of dealing with software registering or if there is much more easier way I'm welcome to any idea.

Although, if you control the server, it would be might be more efficient to do it without html

Ronan

  • Full Member
  • ***
  • Posts: 132
Re: How to acces html elements on web page from stand alone application?
« Reply #4 on: February 27, 2016, 04:15:25 pm »
Thanks for commenting,

Quote
My Internet Tools send the http request, if  you give them the webpage address and the form element names:

Code: [Select]
uses xquery, synapseinternetaccess;
xqvalue('http://your-site').retrieve().map('form(//form,{"comment": $_1})', your-data).retrieve()

Everything except javascript is handled automatically

The thing that I don't understand with this scenario is, I have web page to which I'm trying to connect(which has bunch of HTml elements in it) and trying insert some data on its specific html element(extarea id="comments-form-comment" name="comment" cols="65" rows="8" tabindex="5) which receives user input and submit it. How does this line of code came to understand all of this I was expecting at least getELementId to be invoked through the targeted page for traversal to find the specific html element and if it finds it modify and submit it.

Regards,

BeniBela

  • Hero Member
  • *****
  • Posts: 908
    • homepage
Re: How to acces html elements on web page from stand alone application?
« Reply #5 on: February 27, 2016, 04:31:19 pm »
I have web page to which I'm trying to connect(which has bunch of HTml elements in it) and trying insert some data on its specific html element(extarea id="comments-form-comment" name="comment" cols="65" rows="8" tabindex="5) which receives user input and submit it. How does this line of code came to understand all of this

If we do it step-by-step:

Create a wrapped string with your address, but do not do anything with it:

Code: [Select]
xqvalue('http://your-site')

Download the webpage and parse it:

Code: [Select]
xqvalue('http://your-site').retrieve()

Now it knows all the HTML elements.

Find all the <form>-elements among them: (if there are more than one, you might want to use (//form)[1] or (//form)[2] to choose the i-th instead)

Code: [Select]
xqvalue('http://your-site').retrieve().map('//form')

Prepare a HTTP-request that corresponds to the user submitting the form without entering anything by calling a function called form() that returns such a request:

Code: [Select]
xqvalue('http://your-site').retrieve().map('form(//form)')

Override the comment parameter in that request with 123 (it can find that element from the name 'comment'):

Code: [Select]
xqvalue('http://your-site').retrieve().map('form(//form, {"comment": "123" })')

Use your data instead of 123:

Code: [Select]
xqvalue('http://your-site').retrieve().map('form(//form, {"comment": $_1 })', [your-data])

Send the prepared request to the server (also downloads and parses the returned page, but that can be ignored).

Code: [Select]
xqvalue('http://your-site').retrieve().map('form(//form,{"comment": $_1})', [your-data]).retrieve()

Ronan

  • Full Member
  • ***
  • Posts: 132
Re: How to acces html elements on web page from stand alone application?
« Reply #6 on: February 27, 2016, 05:48:27 pm »
Thank you very much for the live-debug, I took look at your project at github, your project seems to be doing a great deal of stuff at the backend parsing, concatenating, that obviously requires deliberate study for the novice, but you have managed to boil it down as one line. I'll study it but that definetely will take some time.

Instead, Is it possbile to achive that in plain Pascal(default LCL) or some athoer component?

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: How to acces html elements on web page from stand alone application?
« Reply #7 on: February 27, 2016, 10:47:42 pm »
fcl-web and fcl-xml has everything you need. Grab the webpage with TFPHTTPClient.SimpleGet (or construct the object and call Get after configuring some properties if you need more tuning for the request). Then use either DOM, SAX or XPath to process the elements depending on the style you like/prefer.

CC

  • Full Member
  • ***
  • Posts: 149
Re: How to acces html elements on web page from stand alone application?
« Reply #8 on: July 26, 2016, 10:43:41 am »
BeniBela,

your project  is really promising. 8-)

I have tried to grab a login page.

defaultInternet.get(URL) returns the page correctly,
but  xqvalue(URL).retrive.ToString returns something which appears to be only a part of the page.

I hoped it would be possible to get to the point where xqvalue(URL).retrive should have  by starting with defaultInternet.get(URL) and go on with mapping from there. But I could not find a way to initalize an IXQValue variable with the result of defaultInternet.get(URL) .

I am also looking for an example about how to handle session cookies with internet-tools. (The initial request sets a cookie which need to be sent with the login form and other requests)





« Last Edit: July 26, 2016, 12:42:54 pm by CC »

BeniBela

  • Hero Member
  • *****
  • Posts: 908
    • homepage
Re: How to acces html elements on web page from stand alone application?
« Reply #9 on: July 26, 2016, 02:52:55 pm »
I have tried to grab a login page.

defaultInternet.get(URL) returns the page correctly,
but  xqvalue(URL).retrive.ToString returns something which appears to be only a part of the page.

Retrieve runs it through an HTML parser, and toString removes all HTML tags, so just the text remains.

xqvalue(URL).retrieve.toNode.outerHTML would convert it to HTML again, but you usually do not need it that way.

I am also looking for an example about how to handle session cookies with internet-tools. (The initial request sets a cookie which need to be sent with the login form and other requests)

Cookies are handled automatically

CC

  • Full Member
  • ***
  • Posts: 149
Re: How to acces html elements on web page from stand alone application?
« Reply #10 on: July 26, 2016, 04:44:50 pm »
BeniBela,

Thanks for the quick answer. :)

Page2:= Page.map('form("tform", {"logonIdX": $_1, "password": $_2})', [User, Psw]).retrieve();

runs without error now, but Page2.toNode.outerHTML is empty.

If there is a documentation containing the answer to my questions I am not aware of please point me to the right direction.

1. Can persisted  cookies be  added to the initial request next time?
2. It would be nice to have some sample code about all possible level of error handling. (mapping errors. ..)


BeniBela

  • Hero Member
  • *****
  • Posts: 908
    • homepage
Re: How to acces html elements on web page from stand alone application?
« Reply #11 on: July 27, 2016, 06:30:10 pm »

BeniBela,

Page2:= Page.map('form("tform", {"logonIdX": $_1, "password": $_2})', [User, Psw]).retrieve();

runs without error now, but Page2.toNode.outerHTML is empty.

If there is a documentation containing the answer to my questions I am not aware of please point me to the right direction.

You cannot pass "tform" to form. It is a string, not a form, what should it do?

See http://www.benibela.de/documentation/internettools/xquery.TXQueryEngine.html form needs a parameter "as node()" which is a sequencetype according to https://www.w3.org/TR/xquery-30/#id-sequencetype-syntax


1. Can persisted  cookies be  added to the initial request next time?

Persistent across program restarts?

The field is not public: https://github.com/benibela/internettools/issues/7

2. It would be nice to have some sample code about all possible level of error handling. (mapping errors. ..)


Error handling?

The exceptions, or the HTTP error codes?

There are three base exceptions: EInternetException, ETreeParseException, EXQException for http, parsing and  evaluation errors


CC

  • Full Member
  • ***
  • Posts: 149
Re: How to acces html elements on web page from stand alone application?
« Reply #12 on: July 28, 2016, 02:17:47 pm »
BeniBela,

I have checked the docs you had suggested, but it is not clear yet.

This is my first attempt to use internet-tools  with a login form, probably best to paste the relevant html elements here:

Form:
<form id="loginForm" name="tform" action="/login/UI/Login" method="post"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="hidden" name="loginForm_hf_0" id="loginForm_hf_0"></div>

User:
 <div class="fInput">
  <input name="logonIdX" id="tf-logonIdX" type="text" class="textInp loginid error" min="0" step="any" autocomplete="off" tabindex="1" value="">                  

Password:
<div class="fInput">
<input name="logonId" id="tf-logonId" type="hidden" value="">     [[One could think that this should be the input for the user, but the browser debugger  shows the previous one ]]
<input name="password" id="tf-password1" type="password" class="textInp password error" autocomplete="off" tabindex="2" value="">   

These are other ways I have tried but the login is still unsuccessful:

Page2:= Page.map('form(id("loginForm"), {id("tf-logonId"): $_1, id("tf-password1"): $_2})', ...
Page2:= Page.map('form(id("loginForm"), {id("tf-logonIdX"): $_1, id("tf-password1"): $_2})', ...


About error handling:

I need to make sure that mapping  founds all  elements and is able to assign a value to them before posting the form.
There are  2 ways it could work:
  1. validate all the element references before mapping
  2. handling the errors returned by mapping (Tried  with  deliberately wrong form id, but there was no signal of error.)


A skeleton error handling code would be nice to see how to write solid code with internet-tools. 
« Last Edit: July 28, 2016, 03:10:00 pm by CC »

BeniBela

  • Hero Member
  • *****
  • Posts: 908
    • homepage
Re: How to acces html elements on web page from stand alone application?
« Reply #13 on: July 28, 2016, 03:35:31 pm »
Page2:= Page.map('form(id("loginForm"), {id("tf-logonIdX"): $_1, id("tf-password1"): $_2})', ...

Well, the parameters have to be the opposite, the name rather than the node/id.

So "logonIdX" or id("tf-logonIdX")/@name

I need to make sure that mapping  founds all  elements and is able to assign a value to them before posting the form.
There are  2 ways it could work:
  1. validate all the element references before mapping
  2. handling the errors returned by mapping (Tried  with  deliberately wrong form id, but there was no signal of error.)

Everything always returns a sequence

When it is a single form as in your case, it is a sequence of one element.
When it fails, it is just an empty sequence.

You could wrap everything in calls to exactly-one() to check that the sequence has the correct count

CC

  • Full Member
  • ***
  • Posts: 149
Re: How to acces html elements on web page from stand alone application?
« Reply #14 on: July 29, 2016, 02:38:31 am »
BeniBela,


Where else can samples you have already provided be found? (I hate to use your time with my questions, if there are other resources to learn from)

It turned out, I have never actually submitted the form, the mapped fields and values became parameters of the URL of a get command.
So the next task is to figure out how to "push" this button:

<button id="submitLogin" type="submit" class="btn mag" tabindex="3">Log in</button>

 

TinyPortal © 2005-2018