Recent

Author Topic: fpOdf - Creating and modifying OpenDocument ODF files with freepascal  (Read 11476 times)

dgaspary

  • Jr. Member
  • **
  • Posts: 55
This announce is mainly for developers that could help improve and/or test the library.

To those who want just to create OpenDocument files, the best way today to work with ODF files in Freepascal seems to be the fpVectorial[1].

This project could be classified as being at Alpha or even pre-Alpha stage.

You have been warned. :)

fpOdf[2] is a library meant to help the fpc developer to generate and to perform modifications to OASIS OpenDocument[3] files. The main file format adopted by LibreOffice, OpenOffice and Calligra(KDE KOffice fork) and with support in Microsoft Office recent versions.

The library extends DOM with Classes and procedures that use structures containing ODF elements, attributes, types and namespaces. These structures(enums, arrays and functions) are automatically generated with tools[4] that read the official Oasis ODF Specifications files(in RelaxNg format).

To date, the main features are only Read and Write ODF Files. Packages(Zip: ODT) and plain XML (FODT extension in LibreOffice/OpenOffice).

Due to these restrictions the library could be classified as a similar to Apache OdfToolkit OdfDom[5]. It is not as complete, however.

The work-flow of use would be:

1 - Create a new, or Read a Odt file;

2 - Modify it adding elements and attributes using the methods of TOdfDocument and TOdfElement classes;

3 - Save the file.

The step 2 is not easy to those not used to the internals of ODF File format. Decompressing LibreOffice generated files is a good way of learning how and what to modify.

The Pros of fpOdf are:

1 - Easy to extend functionality: Just create TOdfElement descendants.

2 - Using the enums items you avoid misspelling Namespaces, Elements and Attributes names.

3 - The methods to create and modify the document automatically set the correct namespaces of Elements and Attributes.

The Cons:

1 - Need (A lot)more testing. And to create Unit testing.

2 - The library increases the size of programs, near 1M. More than 7500 LOC procedures do it for you. Absurd, I know. Suggestions for improving are welcome.

3 - The user is responsible for the correct creation and attribution of the right elements and attributes at the right location.

[1] http://wiki.freepascal.org/fpvectorial
[2] https://github.com/dgaspary/fpOdf
[3] http://en.wikipedia.org/wiki/OpenDocument
[4] I will upload these files soon. They need cleaning and organizing.
[5] http://incubator.apache.org/odftoolkit/odfdom/index.html
« Last Edit: August 26, 2013, 01:05:52 am by dgaspary »

herux

  • Full Member
  • ***
  • Posts: 100
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #1 on: August 26, 2013, 05:08:04 am »
cool  8)

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1248
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #2 on: August 26, 2013, 06:32:05 pm »
This is...  amazing  :o   You coded that by hand?  I'm impressed.  Out of curiosity, how long did that take you?  I've been working on the docx writer in my spare time for a month, and I'm nowhere near as feature complete as this appears to be.  And my code is nowhere near as elegant.

I'm still stepping through the code to understand your design, but I needed to change my test code to produce a working odt file:

The text
Code: [Select]
#11 + '<test>&"This shouldn''t break the resulting document."</test>' + #11;
Should be rendered as
Code: [Select]
<text:span text:style-name="Text_Body"><text:tab/>&lt;test&gt;&amp;&quot;This shouldn&#39;t break the resulting document.&quot;&lt;/test&gt;<text:tab/></text:span>
Instead you're rendering it as:
Code: [Select]
<text:span text:style-name="Text_Body">&#xB;&lt;test&gt;&amp;"This shouldn't break the resulting document."&lt;/test&gt;&#xB;
</text:span>

Tabs, single quotes and double quotes are rendered differently.  I don't know which one it is, but something in that set breaks the resulting odt file.   For now I've just commented out the line in my test case that attempts to export that line.

Right, back to stepping through code...  This may take a while :-)
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

dgaspary

  • Jr. Member
  • **
  • Posts: 55
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #3 on: August 26, 2013, 07:14:29 pm »
This is...  amazing  :o

Thank you, Mike and herux.

You coded that by hand?

The .inc files were generated by a tool I have created. It converts the ODF specification files. I will upload this tool(in reality a set of tools). I need to organize them first.

Out of curiosity, how long did that take you?  I've been working on the docx writer in my spare time for a month, and I'm nowhere near as feature complete as this appears to be.  And my code is nowhere near as elegant.

I don't know for sure. I have been developing it for more than a year, only on my spare time. I believe that it takes me a total of two months of work. And this is the second incarnation of the project, before this code I have tried another design.

And today I find some attributes are repeated. It's an Issue, but not a real problem at moment as I'm not using the attributes array.

So, don't worry, you are developing much faster than I.

I'm still stepping through the code to understand your design, but I needed to change my test code to produce a working odt file

Can you attach your example or upload to some sharing service?

I believe it could be something related to Unicode/UTF-8. I hope not, because I need to learn more about it. :)

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1248
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #4 on: August 26, 2013, 07:41:47 pm »
The example I use is in the examples folder for fpvectorial:    fpvtextwritetest2

As Felipe and I are working on different aspects at the same time, we've each got out own test code :-)  Everything in fpwritetest2 is currently supported by the docxwriter, though the docx writer still doesn't implement everything in the fpvectorial document class.  It's close though.
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

dgaspary

  • Jr. Member
  • **
  • Posts: 55
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #5 on: August 27, 2013, 12:37:53 am »
The text
Code: [Select]
#11 + '<test>&"This shouldn''t break the resulting document."</test>' + #11;
Should be rendered as
Code: [Select]
<text:span text:style-name="Text_Body"><text:tab/>&lt;test&gt;&amp;&quot;This shouldn&#39;t break the resulting document.&quot;&lt;/test&gt;<text:tab/></text:span>

The problem is the tab character. The attached example is a simple way to see what's happening.
The Laz2_DOM is translating tab as &#xB; . This seems to be illegal in XML. I don't know whether this is the correct behavior.  Someone can comment on this ?

But... somewhere in fpVectorial the tab is being translated to <text:tab> . I don't know where.

p.s.: fpvtextwritetest2 is raising an exception:

fpvtextwritetest2.pas(34,9) Error: identifier idents no member "AddStandardODTTextDocumentStyles"


Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1248
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #6 on: August 27, 2013, 09:53:57 am »
Quote
The problem is the tab character. The attached example is a simple way to see what's happening.
The Laz2_DOM is translating tab as &#xB; . This seems to be illegal in XML. I don't know whether this is the correct behavior.  Someone can comment on this ?

Translating Tab to <text:tab/> is an ODT specific item, not a generic XML item.  It needs to be converted prior to the XMLDom save routine.  In your FPVectorial wrapper, the place to put this code would be in WriteTextSpan.  This would also be a good place to call EscapeHTML() as this will also convert the quotes.  I forget the unit EscapeHTML is in, htmlsupport, something like that.  Now I think of it there's another translation that needs to occur: CR (#13) inside a text:span to <text:line-break/>

I'm away from code right now, so I'm unsure where to convert these special chars within your code (which is where it should happen, not in the FP Vectorial wrapper, which will only solve it for that specific case).   Perhaps you could supply an EscapeText routine, and rely on the user calling this before he adds TextSpan?

I'll send you an update to the FPVectorial wrapper you created tonight, which will fix these chars in that writer only.

Quote
p.s.: fpvtextwritetest2 is raising an exception:

fpvtextwritetest2.pas(34,9) Error: identifier idents no member "AddStandardODTTextDocumentStyles"


Sigh, yes it is, sorry about that :-(  I saw this and fixed it in my local copy easily enough, but those changes haven't been uploaded yet.  The interim fix is to find the similar line in fpvtextwritetest, and replace the offending line in fpvtextwritetest2.  AddStandardODTTextDocumentStyles has been renamed to something like AddStandardTextDocumentStyles.  It currently requires a parameter being sent through (parameter is currently ignored, so you can pass any document type through).  At some stage I'd like to make that parameter optional.
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

dgaspary

  • Jr. Member
  • **
  • Posts: 55
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #7 on: August 29, 2013, 11:18:20 pm »
Translating Tab to <text:tab/> is an ODT specific item, not a generic XML item.  It needs to be converted prior to the XMLDom save routine.  In your FPVectorial wrapper, the place to put this code would be in WriteTextSpan.

I know, I was talking about Felipe's ODT writer. But, he haven't implemented it yet too.

Anyway, I have read about how must be treated some specials characters, but I am without time right now.

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1248
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #8 on: August 30, 2013, 08:25:29 am »
 :)  Time is an issue that is affecting all of us right now.  I'm about to return home, and the change in time zones and the travel means I won't be able to do any coding until mid next week now.

Many thanks for all your time to date.  Look forward to working with you when time is more forgiving :)
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines

dgaspary

  • Jr. Member
  • **
  • Posts: 55
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #9 on: September 01, 2013, 05:26:49 am »
I had start implementing the treatment to some special characters. It's not working yet. I will replace it with the Text Tab element until some better solution appears.

But at specs, the vertical tab (#11) was not mentioned(I didn't find at least), only the "traditional" horizontal tab, #9.

This [1] thread, its links, and some others led me to think that #11 should be replaced by a Life Feed, Carriage Return or some other *vertical* "movement character".

[1] http://stackoverflow.com/questions/3380538/what-is-a-vertical-tab

Mike.Cornflake

  • Hero Member
  • *****
  • Posts: 1248
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #10 on: September 02, 2013, 05:21:29 am »
Doh!

Nice catch :-)  I got my ascii table crossed.  Absolutely correct, #9 should be the character I'm escaping as tab.   You know, I've never heard of a vertical tab, so I've no idea how it should be interpreted by Word/OfficeWriter.

Right, now to go change some code....
Lazarus Trunk/FPC Trunk on Windows [7, 10]
  Have you tried searching this forum or the wiki?:   http://wiki.lazarus.freepascal.org/Alternative_Main_Page
  BOOKS! (Free and otherwise): http://wiki.lazarus.freepascal.org/Pascal_and_Lazarus_Books_and_Magazines


dgaspary

  • Jr. Member
  • **
  • Posts: 55
Re: fpOdf - Creating and modifying OpenDocument ODF files with freepascal
« Reply #12 on: September 07, 2013, 03:39:21 am »
These structures(enums, arrays and functions) are automatically generated with tools[4] that read the official Oasis ODF Specifications files(in RelaxNg format).

[4] I will upload these files soon. They need cleaning and organizing.


A week later than I was expecting, but here it is:

https://github.com/dgaspary/fpOdf/tree/master/code_generation/ODF_Processor

This project is not properly the generator, it reads the ODF Relax NG file and creates a list of elements and theirs Children elements and attributes in a much simpler format (It can easily be changed to create a json/ini/whatever file).

The real generator simply reads this file and for each item writes some code(my enums in .inc files). I will upload the code generator later., but it's not important, the real "magic" happens with the processor.

To run the project the fpRelaxNG package is required: https://github.com/dgaspary/fpRelaxNG