Forum > Lazarus
Idea for a Lazarus addition: resource string validator
JuhaManninen:
Even translated strings can cause crash bugs in Lazarus. See:
Lazarus IDE shortcuts can't be changed
http://bugs.freepascal.org/view.php?id=20811
If someone is looking for an idea for a project, here is one:
Make a Lazarus package that checks for Format() parameter errors in translated .po files.
It should also check for unused and duplicate resource strings. There are plenty of them.
Juha
Bart:
If I would have any knowledge about .po files I wouldn't mind having a go at it.
If I understand any of it, then msgid and msgstr (if not "") should hold the same format arguments?
I assume msgstr is the translation of msgid?
I also assume we would check the created bla.xx.po agains bla.po?
This should not be to difficult then?
What exactly do you mean by unused resourcestrings?
Can you also give an example of a duplicate resourcestring?
Would we want it to be a gui or console program (with batch processing all po files?)?
Bart
JuhaManninen:
--- Quote from: Bart on December 06, 2011, 11:08:52 pm ---If I would have any knowledge about .po files I wouldn't mind having a go at it.
If I understand any of it, then msgid and msgstr (if not "") should hold the same format arguments?
I assume msgstr is the translation of msgid?
I also assume we would check the created bla.xx.po agains bla.po?
--- End quote ---
Yes, msgstr is the translation of msgid and msgid is originally defined in a pascal source file under a resourceString section.
For the validator program's purposes, the "master .po file", bla.po in your example can be used as a main source for resource strings.
The format params (%x) in those strings should be compared with the translated strings. The country code in translation files is typically 2 chars (like .de.po) but can be 5 chars (like .pt_BR.po).
--- Quote ---This should not be to difficult then?
What exactly do you mean by unused resourcestrings?
Can you also give an example of a duplicate resourcestring?
--- End quote ---
"Unused resourcestrings" means that a string is defined but not used anywhere in the project's pascal source.
To find out if the string is used or not you need to scan all the source files. A simple search operation without context checking should be enough.
You will always find one instance of the string which is its definition. You can make a rule:
- if no instances are found -> something is wrong, should not happen.
- one instance found -> unused in source.
- more than one instances -> ok, used.
"Duplicate resourcestring" means that 2 or more string definitions have the exact same text.
However they can't always be combined into 1 resourcestring because their meaning may depend on context and they may need a different translation.
In practice you need to keep all the string names and values in a hash- or tree-map for a fast lookup.
--- Quote ---Would we want it to be a gui or console program (with batch processing all po files?)?
--- End quote ---
Being a Lazarus package it should have a GUI. It could show reports of its findings in listboxes for example.
It could be simple first. It usually happens that people (and the author himself) start to get ideas for improvements after a first simple version is done.
Like in my Tools -> Example Projects ... feature. It was very simple first. Then Martin suggested a load of improvements and I figured some more myself.
It means, initially keep it simple. The refinement and complication comes later by itself.
This validator would benefit all translated applications made with Lazarus, not only the Lazarus project.
Juha
ludob:
--- Quote ---If I understand any of it, then msgid and msgstr (if not "") should hold the same format arguments?
I assume msgstr is the translation of msgid?
--- End quote ---
Spec is here : http://www.gnu.org/s/hello/manual/gettext/PO-Files.html.
If the tool is to be a general po checker then you need to pay attention to some oddities such as the string concatenation rule found at the bottom of the spec. PO files can be generated with all kind of external tools.
--- Quote ---"Unused resourcestrings" means that a string is defined but not used anywhere in the project's pascal source.
To find out if the string is used or not you need to scan all the source files. A simple search operation without context checking should be enough.
You will always find one instance of the string which is its definition. You can make a rule:
- if no instances are found -> something is wrong, should not happen.
- one instance found -> unused in source.
- more than one instances -> ok, used.
--- End quote ---
Again if the tool is to be a general purpose tool, then resource strings is only part of it. A lot of people, including myself, use gnugettext exclusively with the _() function. Advantage of this method is that the strings and code are kept close together.
An interesting open source (MozillaPL) project and a good starting point is dxgettext http://dxgettext.po.dk/download/ written in Delphi but including support for different pascal dialects. It contains a po parser, a string extractor, a "what strings have changed in the source compared to po" tool, etc.
JuhaManninen:
I don't know exactly how to use the gnugettext _() function.
Do you just pass the string as a parameter for the function, and you don't need any resourcestring sections?
Lazarus does not use that. I would suggest the validator tool first supports this resourcestring system and has the gettext func support as a ToDo item.
If there is existing open source .po parser code it can be used of course.
Yet, parsing the .po file is not very difficult even with the string concatenation rules.
Juha
Navigation
[0] Message Index
[#] Next page