Recent

Author Topic: Basic HTML Generation...  (Read 13356 times)

commodianus

  • New Member
  • *
  • Posts: 18
Basic HTML Generation...
« on: March 24, 2010, 05:16:19 am »
I'm trying to come up with a way of shortening the time it takes me to convert (as I'm doing by hand now), certain texts to another format for use in a popular software program.

Basically what I aim to do is parse text from one memo and output the generated code in another memo so I can just copy and paste it into my document.

There's two basic types of parsing I need to do, and I've really no idea how to go about it.

The first thing I need generated, is the HTML <p> and </p> tags. Example:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Would become:

<p>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>

<p>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>

The second type of generation I need is thus...

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris "nisi ut aliquip ex ea commodo consequat". Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

would become

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris <i>"nisi ut aliquip ex ea commodo consequat"</i>. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Can someone help me out here? It would be very greatly appreciated.

Just a note, I'm using lazarus IDE under Ubuntu. I have setup a two panel form with two memos and a button to make the parsing happen.

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Basic HTML Generation...
« Reply #1 on: March 24, 2010, 06:32:29 am »
If the above is all you want: go through the lines of the input, insert a <p> as soon as you hit a non-empy line, copy all non-empty lines and insert a </p> at the first empty line (or at the end of the text).

Same for the '"' : go through the lines character by character. At the first '"', insert <i> and at the second insert </i>.

Understanding the concept of state machines can help.
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

GordonShumway

  • Newbie
  • Posts: 6
Re: Basic HTML Generation...
« Reply #2 on: March 24, 2010, 06:40:16 am »
If your original (base) text will always be formatted the way you've just provided it and you always want the <p> tag around the paragraph you could open the text from a plain text file into a buffer and start going through it looking for line feeds (linux is the most flexible in terms of new line control characters. So unless you know if your source text is using Carriage Returns or Carriage Returns + Line Feed you will need to compensate.) and the like. Then simply insert the code you need. A quick Example:

Code: [Select]
uses
  StrUtils;
procedure AddCode;
var
  aFile: TextFile;
  txtStr: String; // A simple holder.
  txtCoded: String; // Final with code added.
  sStart, sPos: Integer;
begin
  // Of course replace with the real thing.
  Assign(aFile, 'TheRealFile.txt');
  Reset(aFile);  // Open for reading.
  
  // Now we go through line by line.
  while not Eof(aFile) do
  begin
    ReadLn(aFile, txtStr); // We store each new line into the holder variable.
    sStart := 0;
    sPos := 0; // Beginning of string.
    while sStart <> Length(txtStr) do
    begin
      // We find the first instance of the quotation mark in the string and store it.
      sStart := PosEx('"', txtStr, sStart);
      // Next we copy the text before the quotation mark, then include your tag.
      txtCoded := Copy(txtStr, sPos, (sStart - 1)) + '<i>"';
      // Update our current string position to the character immediately AFTER the quotation mark.
      sPos := sStart + 1;
      // Find the next instance of the quotation mark.
      sStart := PosEx('"', txtStr, (sStart + 1));
      // Copy everything from right AFTER the opening quotation mark to right BEFORE the closing quotation mark.
      // then add our closing quotation mark with your tag.
      txtCoded := Copy(txtStr, sPos, (sStart - 1)) + ' " </i>';
      // Set the new start position to right AFTER the closing quotation mark.
      Inc(sStart);
    end;
    // Add the newly formated text to your Memo control.
    Memo1.Lines.Add(txtCoded);
    // Clear our variable.
    txtCoded := '';
  end;
end;

NOTICE In the above that we never do copy the quotation marks from the source text, we always added our own!!!

This example is just a simple one, and I haven't tested this iteration but it should give you an idea of how to do what you're looking for. If you need memo to memo support just read from the first Memo line by line and do something similar to the above, and it should appear in your second Memo. Obviously there are faster and better methods available but it sounds like something like this could work for what you're doing. If you need any more help let me know.

Hope it helps. :D
« Last Edit: March 24, 2010, 06:47:18 am by GordonShumway »

commodianus

  • New Member
  • *
  • Posts: 18
Re: Basic HTML Generation...
« Reply #3 on: March 24, 2010, 09:17:08 am »
Ok I think I've got something here, though I could use a hand trying to figure out why it's not looping through. I've been pulling my hair out maybe some fresh eyes can help me figure this out. e.g, in a two paragraph setup, the first paragraph is treated properly but the second one gets none.

Code: [Select]
procedure TForm1.Button4Click(Sender: TObject);
var
PastLine, LineCount, I: integer;
Holder, Sniff: String;
WasCls, WasOp, HadStart, Trig, FirstLine, LastLine, PrevLine, HadBreak: Boolean;


begin

HadBreak := False;
LineCount := 0;

for I := 0 to Memo1.Lines.Count do
    begin
     Sniff := Copy(Memo1.Lines[I], 1,1);
     LineCount := LineCount +1;

     if LineCount = 1 then //First line, insert <p> #13#10
     begin
      Holder := Memo1.Lines[I];
      Memo1.Lines[I] := '<p>' + #13#10 + Holder;
      HadBreak := False;
      WasOp := True;
      WasCls := False;
     end else
     begin
     if LineCount = Memo1.Lines.Count then //Last Line, add </p> & Break
     begin
      Holder := Memo1.Lines[I];
      Memo1.Lines[I] := '</p>' + #13#10#13#10 + Holder;
      HadBreak := True;
      WasCls := True;
      WasOP := False;
     end else; //last line

     if Sniff = '' then //Line Break Found
     begin


     if WasOp = False then
     begin

     if HadBreak = False then //This is our first break. Start Paragraph.
     begin
      HadBreak := True;
      Holder := Memo1.Lines[I];
      Memo1.Lines[I] := '<p>' + Holder;
      WasOP := True;
      WasCls := False;
      end else // This is the end of a paragraph.
       begin
      Holder := Memo1.Lines[I];
      Memo1.Lines[I] := '</p>' + Holder;
      WasOp := False;
      WasCls := True;
      end;

     end; //</p>
end;

    end;

     end;
     end;                     

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Basic HTML Generation...
« Reply #4 on: March 24, 2010, 10:58:50 am »
No exceptions?  8-)

Make sure to count to Count-1:
Code: [Select]
for I := 0 to Memo1.Lines.Count - 1
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

GordonShumway

  • Newbie
  • Posts: 6
Re: Basic HTML Generation...
« Reply #5 on: March 24, 2010, 04:04:48 pm »
I think we're making this more complicated then it has to be. The below will do what you're looking for if the text paragraphs are separated by a single blank line.

First paragraph.
                                      <- Blank line :D
Second paragraph.
                                      <- Blank line :D
Third paragraph.

Code: [Select]
procedure TForm1.btnTransferClick(Sender: TObject);
var
 I: Integer;
begin
  // We add the opening '<p>' tag.
  Memo2.Lines.Add('<p>');
  // Now we start going through the lines of your original text.
  for I := 0 to (Memo1.Lines.Count - 1) do
  begin
    // If the line is blank (e.g. from CR/LF) then we add a closing '</p>' and
    // start a new one.
    if Memo1.Lines[i] = '' then
      Memo2.Lines.Add('</p><p>')
    else
      Memo2.Lines.Add(Memo1.Lines[i]); // If the line wasn't blank copy it over.
  end;
  Memo2.Lines.Add('</p>'); // Put out final closing tag.
end;

Comes out to:

<p>
First paragraph.
</p><p>
Second paragraph.
</p><p>
Third paragraph.
</p>
« Last Edit: March 24, 2010, 04:25:46 pm by GordonShumway »

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Basic HTML Generation...
« Reply #6 on: March 24, 2010, 04:25:28 pm »
Interesting concept.
If you really want to make things simple  ;)
Code: [Select]
 Memo2.lines.Text  := format('<p>%s</p>', [
                         StringReplace(memo1.Lines.Text,
                                       #13#10#13#10,
                                       #13#10'</p><p>'#13#10,
                                       [rfReplaceAll]) ] );

<<edit>>
And my 100th post  8-)
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

GordonShumway

  • Newbie
  • Posts: 6
Re: Basic HTML Generation...
« Reply #7 on: March 24, 2010, 04:27:57 pm »
LOL Yep! That would do it it's just hard on the eyes  %)

Edit: Congrats!  ;D

commodianus

  • New Member
  • *
  • Posts: 18
Re: Basic HTML Generation...
« Reply #8 on: March 24, 2010, 07:16:46 pm »
Oh wow yeah that's much simpler eny. I'll try implementing that. Thanks everyone who's helped.

commodianus

  • New Member
  • *
  • Posts: 18
Re: Basic HTML Generation...
« Reply #9 on: March 24, 2010, 07:17:55 pm »
Interesting concept.
If you really want to make things simple  ;)
Code: [Select]
 Memo2.lines.Text  := format('<p>%s</p>', [
                         StringReplace(memo1.Lines.Text,
                                       #13#10#13#10,
                                       #13#10'</p><p>'#13#10,
                                       [rfReplaceAll]) ] );

<<edit>>
And my 100th post  8-)

The only thing I can think of is, how will this address putting the first <p> in before the first paragraph? Testing...

commodianus

  • New Member
  • *
  • Posts: 18
Re: Basic HTML Generation...
« Reply #10 on: March 24, 2010, 07:46:37 pm »
This is what I'm going with as far as <p> </p> formatting goes. It works how I want... Now trying to add the quotation finding code and I should be all set barring any problems. Thanks folks!

Code: [Select]
procedure TForm1.Button4Click(Sender: TObject);
var
 I: Integer;
begin
  // We add the opening '<p>' tag.
  Memo2.Lines.Add('<p>');
  // Now we start going through the lines of your original text.
  for I := 0 to (Memo1.Lines.Count - 1) do
  begin
    // If the line is blank (e.g. from CR/LF) then we add a closing '</p>' and
    // start a new one.
    if Memo1.Lines[i] = '' then
      Memo2.Lines.Add('</p>' + #13#10#13#10 + '<p>')
    else
      Memo2.Lines.Add(Memo1.Lines[i]); // If the line wasn't blank copy it over.
  end;
  Memo2.Lines.Add('</p>'); // Put out final closing tag.
  end;

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Basic HTML Generation...
« Reply #11 on: March 25, 2010, 06:32:05 am »
The only thing I can think of is, how will this address putting the first <p> in before the first paragraph? Testing...

Hm, I wonder what this piece of code does ...
Code: [Select]
... := format('<p>%s</p>', [ ...
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

GordonShumway

  • Newbie
  • Posts: 6
Re: Basic HTML Generation...
« Reply #12 on: March 25, 2010, 06:35:11 am »
Your welcome commodianus! Good luck!

commodianus

  • New Member
  • *
  • Posts: 18
Re: Basic HTML Generation...
« Reply #13 on: March 25, 2010, 08:35:49 pm »
Gordon,

I've modified your procedure to use a memo. As far as I can tell nothing I've done is causing it to freeze, but that's what it's doing. I can't seem to track it down.

Code: [Select]
procedure TForm1.Button1Click(Sender: TObject);
var
  aFile: TextFile;
  txtStr: String; // A simple holder.
  txtCoded: String; // Final with code added.
  I, sStart, sPos: Integer;
begin

  for I := 0 to Memo1.Lines.Count-1 do
  begin
    txtStr := Memo1.Lines[I];
    sStart := 0;
    sPos := 0; // Beginning of string.
    while sStart <> Length(txtStr) do
    begin
      // We find the first instance of the quotation mark in the string and store it.
      sStart := PosEx('"', txtStr, sStart);
      // Next we copy the text before the quotation mark, then include your tag.
      txtCoded := Copy(txtStr, sPos, (sStart - 1)) + '<i>"';
      // Update our current string position to the character immediately AFTER the quotation mark.
      sPos := sStart + 1;
      // Find the next instance of the quotation mark.
      sStart := PosEx('"', txtStr, (sStart + 1));
      // Copy everything from right AFTER the opening quotation mark to right BEFORE the closing quotation mark.
      // then add our closing quotation mark with your tag.
      txtCoded := Copy(txtStr, sPos, (sStart - 1)) + ' " </i>';
      // Set the new start position to right AFTER the closing quotation mark.
      Inc(sStart);
    end;
    // Add the newly formated text to your Memo control.
    Memo2.Lines.Add(txtCoded);
    // Clear our variable.
    txtCoded := '';
  end;
end;

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Basic HTML Generation...
« Reply #14 on: March 25, 2010, 10:24:30 pm »
The example given is a dangerous one because there is no well defined loop invariant.
One of the side effects being that your program hangs because (at least) one of the lines doesn't contain a quotation character.

Another approach:
Code: [Select]
procedure TForm1.Button3Click(Sender: TObject);
const C_BOOL2HTML_ITALIC: array[boolean] of string = ('<i>', '</i>');
var i      : integer;
    s      : string;
    InQuote: boolean;
begin
  Memo2.Lines.Clear;
  for i := 0 to Memo1.Lines.Count-1 do
  begin
    s := Memo1.Lines[i];
    InQuote := false;
    while pos('"',s) > 0 do
    begin
      s := StringReplace(s, '"', C_BOOL2HTML_ITALIC[InQuote], []);
      InQuote := not InQuote;
    end;
    if InQuote then
      s := s + C_BOOL2HTML_ITALIC[InQuote];
    Memo2.Append(s);
  end;
end;

([edit] Code shortened)
« Last Edit: March 25, 2010, 10:32:55 pm by eny »
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

 

TinyPortal © 2005-2018