Bookstore

Recent

Author Topic: Is this a case for a RegEx?  (Read 812 times)

guest58172

  • Guest
Re: Is this a case for a RegEx?
« Reply #15 on: February 17, 2020, 12:56:48 am »
Given your technical means I would say that the correct way to do that imho is to process the DOM and build the list from that.

Also the bag of c**t providing the medication should offer a REST api, e.g allow get the contents of the page as JSON, so that the different elements can be retrieved accurately. The idea of using the DOM is somewhat similar but still shitty as if the formatting changes (as you said this happens 2 or 3 times per year) the results might become wrong. (but let's say with some unittests run on a particular page you can detect a change, e.g reading the DOM doesn't give the expected result anymore)

For obtaining the DOM there are several libraries: https://forum.lazarus.freepascal.org/index.php?topic=47423.0

BTW I was thinking... the format of the page can change, you said, but what about the medication, are they written in the marble ? (i.e same format but different content, for example if at some point they switch from a brand to another, assuming both furnish similar products for example)

zamronypj

  • Jr. Member
  • **
  • Posts: 70
    • Fano Framework, Free Pascal web application framework
Re: Is this a case for a RegEx?
« Reply #16 on: February 17, 2020, 01:33:13 am »
Instead of working with inconsistent plain text format,
 if you can get html document source, it would be easier to extract data with regex, especially if html tag is consistent everytime page loaded.
Fano Framework, Free Pascal web application framework https://fanoframework.github.io
Personal Projects https://v3.juhara.com
Github https://github.com/zamronypj

Bart

  • Hero Member
  • *****
  • Posts: 3725
    • Bart en Mariska's Webstek
Re: Is this a case for a RegEx?
« Reply #17 on: February 17, 2020, 08:45:56 pm »
The Html is ugly and looks like:
Code: Text  [Select]
  1. <table cellspacing="0" cellpadding="0"><tbody><tr valign="top"><td><span onclick="$('#sub_0').toggle();swapImage('Img_0');"><img id="Img_0" src="https://www.farmedvisie.nl/tmz/images/plus.gif"></span></td><td><table cellspacing="0" cellpadding="0"><tbody><tr valign="top"><td style="width:14em;"><b>calciumcarbonaat/colecalciferol</b></td><td><ul style="margin:0;"><li><font color="black">1 dd 1 tablet van 1,25g/800ie</font></li></ul></td></tr></tbody></table></td>
  2.   </tr>
  3.   <tr>
  4.     <td colspan="2">
  5.     </td>
  6.   </tr>
  7. <tr height="6"><td></td></tr>  <tr valign="top">
  8.     <td><span onclick="$('#sub_1').toggle();swapImage('Img_1');"><img id="Img_1" src="https://www.farmedvisie.nl/tmz/images/plus.gif"></span></td><td><table cellspacing="0" cellpadding="0"><tbody><tr valign="top"><td style="width:14em;"><b>ciprofloxacine</b></td><td><ul style="margin:0;"><li><font color="black">2 dd 1 tablet van 500mg  (= 500 mg)</font></li></ul></td></tr></tbody></table></td>
  9.   </tr>
  10.   <tr>
  11.     <td colspan="2">
  12.     </td>
  13.   </tr>
  14. <tr height="6"><td></td></tr>  <tr valign="top">
  15.     <td><span onclick="$('#sub_2').toggle();swapImage('Img_2');"><img id="Img_2" src="https://www.farmedvisie.nl/tmz/images/plus.gif"></span></td><td><table cellspacing="0" cellpadding="0"><tbody><tr valign="top"><td style="width:14em;"><b>koelzalf</b></td><td><ul style="margin:0;"><li><font color="black">zo nodig</font></li></ul></td></tr></tbody></table></td>
  16.   </tr>
  17.   <tr>
  18.     <td colspan="2">
  19.     </td>
  20.   </tr>
  21. <tr height="6"><td></td></tr>  <tr valign="top">
  22.     <td><span onclick="$('#sub_3').toggle();swapImage('Img_3');"><img id="Img_3" src="https://www.farmedvisie.nl/tmz/images/plus.gif"></span></td><td><table cellspacing="0" cellpadding="0"><tbody><tr valign="top"><td style="width:14em;"><b>lactulose</b></td><td><ul style="margin:0;"><li><font color="black">1 dd 15 milliliter van 670mg/ml</font></li></ul></td></tr></tbody></table></td></tr></tbody></table>

Bart

Thaddy

  • Hero Member
  • *****
  • Posts: 9809
Re: Is this a case for a RegEx?
« Reply #18 on: February 17, 2020, 09:20:07 pm »
Can you try and feed it to this (official) example?
Code: Pascal  [Select]
  1. program testhtml;
  2. {
  3.   simple demo to demonstrate rewriting a HTML file
  4. }
  5. uses sysutils, dom_html,sax_html, XMLWrite;
  6.  
  7. Var
  8.   H : THTMLDocument;
  9.  
  10. begin
  11.   if ParamCount<>2 then
  12.     begin
  13.     Writeln('Usage: ',ExtractFileName(Paramstr(0)),' inputfile outputfile');
  14.     Halt(1);
  15.     end;
  16.   ReadHTMLFile(H,ParamStr(1));
  17.   WriteXMLFile(H,Paramstr(2));
  18. end.
From fcl-xml ....
If you can get a nice DOM tree, you can use events to filter it.
(Btw, as a word of caution: I would not use regex in a medical application. Regular means regular, it does not cater for any precision, although it often looks it does...)

« Last Edit: February 17, 2020, 09:21:46 pm by Thaddy »
I am more like donkey than shrek

asdf1337

  • New Member
  • *
  • Posts: 21
Re: Is this a case for a RegEx?
« Reply #19 on: February 17, 2020, 09:25:48 pm »
What about Xidel or using Internet Tools directly if you want to parse the HTML.

Thaddy

  • Hero Member
  • *****
  • Posts: 9809
Re: Is this a case for a RegEx?
« Reply #20 on: February 17, 2020, 09:34:52 pm »
What about Xidel or using Internet Tools directly if you want to parse the HTML.
internettools is GPL'd.
And fcl-xml is a standard package and in the distribution. Which also means it is high quality and you don't need to download it.. Although internettools is also high quality.
I am more like donkey than shrek