Lazarus

Announcements => Lazarus => Topic started by: JuhaManninen on December 06, 2011, 04:33:28 pm

Title: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 06, 2011, 04:33:28 pm
Even translated strings can cause crash bugs in Lazarus. See:
 Lazarus IDE shortcuts can't be changed
 http://bugs.freepascal.org/view.php?id=20811

If someone is looking for an idea for a project, here is one:

Make a Lazarus package that checks for Format() parameter errors in translated .po files.
It should also check for unused and duplicate resource strings. There are plenty of them.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 06, 2011, 11:08:52 pm
If I would have any knowledge about .po files I wouldn't mind having a go at it.

If I understand any of it, then msgid and msgstr (if not "") should hold the same format arguments?
I assume msgstr is the translation of msgid?
I also assume we would check the created bla.xx.po agains bla.po?

This should not be to difficult then?

What exactly do you mean by unused resourcestrings?
Can you also give an example of a duplicate resourcestring?

Would we want it to be a gui or console program (with batch processing all po files?)?

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 07, 2011, 01:00:09 am
If I would have any knowledge about .po files I wouldn't mind having a go at it.

If I understand any of it, then msgid and msgstr (if not "") should hold the same format arguments?
I assume msgstr is the translation of msgid?
I also assume we would check the created bla.xx.po agains bla.po?

Yes, msgstr is the translation of msgid and msgid is originally defined in a pascal source file under a resourceString section.
For the validator program's purposes, the "master .po file", bla.po in your example can be used as a main source for resource strings.
The format params (%x) in those strings should be compared with the translated strings. The country code in translation files is typically 2 chars (like .de.po) but can be 5 chars (like .pt_BR.po).

Quote
This should not be to difficult then?

What exactly do you mean by unused resourcestrings?
Can you also give an example of a duplicate resourcestring?

"Unused resourcestrings" means that a string is defined but not used anywhere in the project's pascal source.
To find out if the string is used or not you need to scan all the source files. A simple search operation without context checking should be enough.
You will always find one instance of the string which is its definition. You can make a rule:
- if no instances are found -> something is wrong, should not happen.
- one instance found -> unused in source.
- more than one instances -> ok, used.

"Duplicate resourcestring" means that 2 or more string definitions have the exact same text.
However they can't always be combined into 1 resourcestring because their meaning may depend on context and they may need a different translation.

In practice you need to keep all the string names and values in a hash- or tree-map for a fast lookup.

Quote
Would we want it to be a gui or console program (with batch processing all po files?)?

Being a Lazarus package it should have a GUI.  It could show reports of its findings in listboxes for example.
It could be simple first. It usually happens that people (and the author himself) start to get ideas for improvements after a first simple version is done.
Like in my Tools -> Example Projects ... feature. It was very simple first. Then Martin suggested a load of improvements and I figured some more myself.
It means, initially keep it simple. The refinement and complication comes later by itself.

This validator would benefit all translated applications made with Lazarus, not only the Lazarus project.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: ludob on December 07, 2011, 09:10:44 am
Quote
If I understand any of it, then msgid and msgstr (if not "") should hold the same format arguments?
I assume msgstr is the translation of msgid?
Spec is here : http://www.gnu.org/s/hello/manual/gettext/PO-Files.html (http://www.gnu.org/s/hello/manual/gettext/PO-Files.html).
If the tool is to be a general po checker then you need to pay attention to some oddities such as the string concatenation rule found at the bottom of the spec. PO files can be generated with all kind of external tools.

Quote
"Unused resourcestrings" means that a string is defined but not used anywhere in the project's pascal source.
To find out if the string is used or not you need to scan all the source files. A simple search operation without context checking should be enough.
You will always find one instance of the string which is its definition. You can make a rule:
- if no instances are found -> something is wrong, should not happen.
- one instance found -> unused in source.
- more than one instances -> ok, used.
Again if the tool is to be a general purpose tool, then resource strings is only part of it.  A lot of people, including myself, use gnugettext exclusively with the _() function. Advantage of this method is that the strings and code are kept close together.

An interesting open source (MozillaPL) project and a good starting point is dxgettext http://dxgettext.po.dk/download/ (http://dxgettext.po.dk/download/) written in Delphi but including support for different pascal dialects. It contains a po parser, a string extractor, a "what strings have changed in the source compared to po" tool, etc.
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 07, 2011, 10:23:16 am
I don't know exactly how to use the gnugettext _() function.
Do you just pass the string as a parameter for the function, and you don't need any resourcestring sections?

Lazarus does not use that. I would suggest the validator tool first supports this resourcestring system and has the gettext func support as a ToDo item.
If there is existing open source .po parser code it can be used of course.
Yet, parsing the .po file is not very difficult even with the string concatenation rules.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: ludob on December 07, 2011, 11:08:39 am
Quote
I don't know exactly how to use the gnugettext _() function.
Do you just pass the string as a parameter for the function, and you don't need any resourcestring sections?
That's it. No resourcestring. Very easy also when translation comes as late requirement in the project: just put a _() around the strings that need to be translated.

Quote
If there is existing open source .po parser code it can be used of course.
Yet, parsing the .po file is not very difficult even with the string concatenation rules.
po file parsing is indeed the easy part. The dxgettext code is the interesting part. It contains code to extract all kind of strings from sources (dpr, pas, dfm, rc,..) and even exe files. LFM and LPR are probably easy to add. It does some basic pascal parsing to extract only resourcestrings and gnugettext function parameters (ie. text to translate).
At least all the basic building blocks seem to be there to get a quick start on this project.
 
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 07, 2011, 02:43:27 pm
You are making this too big for me.
I can see myself wriitng a tool for processing po files and identifying wrong formatparameters, and duplicate resourcestringvalues (in one po file, so not across different po files, or is this "required" too?), but making it a general purpose translation checking tool is way beyond me.

B.t.w. looking for unused resourcestrings, can't we get that info from the compiler (doesn't it generate a hint), if we build lazarus?
There are also probably other standard (and higly specialized) tools available for checking fast amounts of files for the occurence of a particular string?

If you want a simple tool, I think I can write one.
It can have a GUI, and it can have a console front-end (which can be used for automated testing).

If you want more than that, I don't think I can participate.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 07, 2011, 10:26:28 pm
You are making this too big for me.

You are a "Hero Member", how can it be big for you?
... ok, just kidding :)
But seriously, often people are too shy or afraid or something with their own code. In reality they can create good code. Maybe the core developers' critical comments scare them away.

For example I can't handle very big or complex things myself, yet I commit code to a big Lazarus project. The secret is to go in small steps. Learn existing code, copy and modify other people's code shamelessly, make a small improvement... Repeat that and finally you have a big feature.

Quote
I can see myself wriitng a tool for processing po files and identifying wrong formatparameters, and duplicate resourcestringvalues (in one po file, so not across different po files, or is this "required" too?), but making it a general purpose translation checking tool is way beyond me.

You can go ahead with that. Later someone will add features anyway.
BTW, checking duplicate string values from many files is as easy as from one file. In practice you keep the values and names in a string-string map and you check if a value already exists there. It makes no difference if those values came from 2 files.

Quote
B.t.w. looking for unused resourcestrings, can't we get that info from the compiler (doesn't it generate a hint), if we build lazarus?
There are also probably other standard (and higly specialized) tools available for checking fast amounts of files for the occurence of a particular string?

It may be easier to search the files than parse the compiler output. Besides it would require compiling all files in the project when you just want a report of resource strings.
There may be specialized tools for searching a word but this one would be even more specialized. It would search all resource strings in all source files of a Lazarus project. No such tool exists yet.
But, this "unused resourcestring" feature can be a ToDo item if you just implement the 2 things (format param check and duplicate value).

Quote
If you want a simple tool, I think I can write one.
It can have a GUI, and it can have a console front-end (which can be used for automated testing).

If you want more than that, I don't think I can participate.

People will come with improvement ideas and patches. I also will participate in it.
It is a psychological effect that people find it easier to improve an existing feature than to create a whole new feature. You just have to realize it and not be jealous for your code when someone wants to change it. That is how open source works.

I already mentioned the Example Projects dialog. Another one is the "Add Used Unit" dialog (Alt-F11). There was a feature request of such dialog for years but nobody had implemented it, until I did.
Then many people started to push patches to improve it and it still continues. It is getting too complex for my taste already but that is OK.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 07, 2011, 11:25:10 pm
But seriously, often people are too shy or afraid or something with their own code. In reality they can create good code. Maybe the core developers' critical comments scare them away.

I kind of am the unofficial maintainer of the maskedit unit, which at one stage I almost completely rewrote.
However, that kind of coding I find myself comfortable with.

Criticisme on my coding is welcome, and it doesn't scare me. In the past this critic form Lazarus/Fpc devels has inspired me to improve my coding and explore new things in Pascal.
I have posted fixes in the bugtracker which didn't meet the standards and rewritten them until they did.
If it turns out to be more than I can chew on, I (gracefully) stand down.
I feel no shame in that.

I have no experience with string-string maps, search trees and so on.
So, my first job would be to think about a suitable data model: how do I store the strings/string-pairs, in a way I can search it fast.

Given enough time, I'll come up with something.

Where to share my initial code then?

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 08, 2011, 12:07:52 am
I have no experience with string-string maps, search trees and so on.
So, my first job would be to think about a suitable data model: how do I store the strings/string-pairs, in a way I can search it fast.

Given enough time, I'll come up with something.

Where to share my initial code then?

TStringToStringTree in unit AvgLvlTree is a good one.
I could commit the code, just copy here or to bugtracker or somewhere.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 08, 2011, 06:31:11 pm
I'll have a look at it.

If I come up with anything acceptable (to me), I'll post it in the bugtracker (which category b.t.w.?), and I'll assign t to you, if that's ok.
May take a while though...

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 11, 2011, 05:44:29 pm
ATM I have the following working:

I ended up adapting the code from Translations unit, stripping everything out that is not needed atm.
Still it takes up to 2-3 seconds to load lazaruside.xx.po (the bloody thing being up to > 0.5 MB)

I found 13 format argument errors in lazaruside.nl.po file to start with.

Source is at http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip (http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip)
(Lazarus 0.9.31 r33459 FPC 2.4.4 i386-win32-win32/win64)

Mind you this is just a testing example, to begin with you need to alter the hardcoded paths for the po-files in main.pp!

No error checking what so ever has been added.
Test at your own risk.

Plans:

Known bugs:

Juha: can you take a look. Is the TSimplePoFile class feasible for what we want to achieve, or should I go another way?
I'ld  rather hear this now, then trying to rewrite the base class of the app much later on in the process.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 12, 2011, 12:57:23 pm
I ended up adapting the code from Translations unit, stripping everything out that is not needed atm.
Still it takes up to 2-3 seconds to load lazaruside.xx.po (the bloody thing being up to > 0.5 MB)

[... skip ...]

Juha: can you take a look. Is the TSimplePoFile class feasible for what we want to achieve, or should I go another way?
I'ld  rather hear this now, then trying to rewrite the base class of the app much later on in the process.

It is good to use the existing parser as you did, from Translations unit. I had never looked at it before.
Your code found format param errors also in Finnish translation.

About the speed, this kind of reporting tool is not very speed critical.
Yet it would be interesting to see how much faster it becomes if you replace the StringHashList container with a real hash map.

StringHashList is a weird combination of containers. I was reading about it already when I used Delphi and did some tests. Its performance is poor, especially when adding items.
It calculates an integer hash key for the strings, but instead of using the key as an index for a bucket array, it inserts the keys into a sorted list!
Inserting items into a sorted list is always an expensive operation.
To find the strings it does a binary search from the integer hash list. It is little faster than from a sorted StringList because comparing integers is faster than comparing strings. It still needs to calculate the hash key before the binary search which pretty much nullifies the speed gain.
The real benefit of this class is that you don't need to decide the bucket array size in the beginning as you need with a hash map, and also it saves some memory.
Maybe it made sense in 1980's or early 90's with very limited memory but I don't see much use for this container any more.
Somehow it made its way to Lazarus code base. I bet Mattias didn't do any speed tests before using it!

If you want to test the speed difference, you could use this:
 http://wiki.lazarus.freepascal.org/StringHashMap
or the similar class in LCL somewhere (forgot the name). My wild guess is that > 50% of .po file reading time is spent for adding items to the container. Using a hash map with O(1) time can make a BIG difference.

[Edit] In LCL there is TMap which is based on TAvgLvlTree. It is a balanced tree and pretty fast.
Still, hash maps are superior when there is lots of data. FCL contnrs has TFPDataHashTable which is a string-pointer hash map. I have not tested it myself.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 13, 2011, 04:19:42 pm
Felipe: I have done some speedtesting.
Mind you my machine is an 11 years old Intel Celeron 700 Mhz with 512 MB memory.

Original codeSkip adding entries
to TStringHashList
Skip adding entries
alltogether
5713 ticks5798 ticks5105 ticks

It seems that the internally used TFPList and TStringHashList are fast enough.
It the parsing of the > 500 Kb strings that is the bottleneck here.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 13, 2011, 04:23:28 pm
I rewrote the ReadPoText() procedure to use Strings instead of PChars.

Original code using PChars: 6427 ticks.
New code using Strings: 300 ticks.

Wow: 20 times faster!

I updated the sources (see link in previous post), so you can test it.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 14, 2011, 11:11:06 am
I rewrote the ReadPoText() procedure to use Strings instead of PChars.

Original code using PChars: 6427 ticks.
New code using Strings: 300 ticks.

Wow: 20 times faster!

It is indeed faster. But why is it faster? Now a stringlist is used:
    SL.LoadFromStream(AStream);
Earlier the data was read into a string also from a stream. What makes the difference?

Is there a cross-platform alternative to "GetTickCount"? I had to comment them out what testing on Linux.

The container speed was also a surprise for me. I guess the reason is that there are only few thousand items. The speed differences become relevant only when there is much more data.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 15, 2011, 12:43:01 pm
I rewrote the ReadPoText() procedure to use Strings instead of PChars.

To me it looks as if handling Pascal type strings in a Pascal type mannor is faster then handling the enire data as a PChar.

In ReadPOText(s: String) the string is typecasted to a PChar and then there is much pointer-calculation to determine line-ending etc.
This overhead is redundant once we treat the data as a stringlist.

Is there a cross-platform alternative to "GetTickCount"? I had to comment them out what testing on Linux.

I'll take it out when I am happy with the testing code.
It was just crude way for me to see if I could get any speed improvement.
If I'ld have done it on Linux I simply would have used Now().

The container speed was also a surprise for me. I guess the reason is that there are only few thousand items. The speed differences become relevant only when there is much more data.

May very well be.
For the time being I leave it as is.
I can test all lazaruside.xx.po files in a matter of seconds on my ancient system.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 15, 2011, 01:56:05 pm
To me it looks as if handling Pascal type strings in a Pascal type mannor is faster then handling the enire data as a PChar.

In ReadPOText(s: String) the string is typecasted to a PChar and then there is much pointer-calculation to determine line-ending etc.
This overhead is redundant once we treat the data as a stringlist.

Could you please leave the old parser code there, too, so they are easy to compare, with IFDEF or other switch.

Mattias will have to comment on this issue. Whole codetools uses such parsing with PChar. Would it also become much faster with a stringlist? Maybe not because in pascal source the newline has no special meaning, it is just a whitespace like the other whitespace chars.
Still, 20 times speedup for PO files is a lot.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 15, 2011, 05:28:00 pm
Updated the sources (see link above).

Replaced the TMemo with a TSynEdit, apparantly a TMemo cannot hold the amount of text we might need.
Also the report is now on a separate form.

(I could not Tab out of the SynEdit (readonky = true, wanttabs = false), is this normal behaviour?
I worked around it in OnKeyDown of the form.)

Cleaned up the constructors (now there's only one).

Quote from: Juha
Could you please leave the old parser code there, too, so they are easy to compare, with IFDEF or other switch.

I ifdef-ed the old code with {$ifdef ReadPoTextPChar}

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 17, 2011, 04:04:55 pm
Hi Juha,

I included an experimental po-file highlighter for the synedit in the results form.
Updated sources at: http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip (http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip)

I would like to offer the highlighter as an addition to the synedit package of Lazarus.

Some things I know what/how to do:

Things I don't know how to:

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 17, 2011, 07:57:57 pm
The highlighter is cool.
I hope Martin or someone else can help with synedit integration. I have never worked with it.

In your GUI, the "select all tests" still creates a RunError 202. It may be caused by a recursive event handler loop and a stack overflow. Tested on Linux with QT bindings.

I was thinking if this is made a package, it could fetch all .po files under the active project, or all  .po files of a certain locale. These things can also be tweaked later. The basic functionality looks very good.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 18, 2011, 01:27:12 am
In your GUI, the "select all tests" still creates a RunError 202. It may be caused by a recursive event handler loop and a stack overflow. Tested on Linux with QT bindings.

What do you mean by that?
Just clicking on the checkbox of this entry, or when indeed running the tests (after clicking the "Run Tests" button)?

B.t.w. there is no recursion in any of my code AFAIK.

It runs smoothly under windows.
I'll test GTK2 Linux.

I was thinking if this is made a package, ...

I also have no clue as how to achieve that.
Why not just have it as a stand-alone tool?

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 18, 2011, 01:39:07 am
@Juha: Just saw you filed a bugreport on the issue (#20927 (http://bugs.freepascal.org/view.php?id=20927)), so it seems it's not my part of the code  O:-)

Bart
Title: po hl [Re: Idea for a Lazarus addition: resource string validator]
Post by: Martin_fr on December 18, 2011, 02:10:40 am
I included an experimental po-file highlighter for the synedit in the results form.
Updated sources at: http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip (http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip)

I would like to offer the highlighter as an addition to the synedit package of Lazarus.

I'll have a look at it, when I have time. Probably not before January though
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 18, 2011, 11:06:07 am
@Juha: Just saw you filed a bugreport on the issue (#20927 (http://bugs.freepascal.org/view.php?id=20927)), so it seems it's not my part of the code  O:-)

Correct. I was just planning to write about it.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 18, 2011, 01:52:54 pm
Just for my own info (so I can easily find it back).

The original bugreport for adding bat and ini highlighters is #18230 (http://bugs.freepascal.org/view.php?id=18230) (Note that the svn (http://svn.freepascal.org/cgi-bin/viewvc.cgi?view=rev&root=lazarus&revision=28745) wrongly states that the issue nr is #18320).


Bart
Title: Re: po hl [Re: Idea for a Lazarus addition: resource string validator]
Post by: Martin_fr on December 20, 2011, 12:40:41 am
I included an experimental po-file highlighter for the synedit in the results form.
Updated sources at: http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip (http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip)

I would like to offer the highlighter as an addition to the synedit package of Lazarus.

I'll have a look at it, when I have time. Probably not before January though

I started to add it

It appears in the IDE now (for source editor)
But has empty default colors

If some one has a choice for colors, sent me an export, otherwise I will put in some random colors


As for adding to component palette, missing an icon in the same style as other hl
Title: Re: po hl [Re: Idea for a Lazarus addition: resource string validator]
Post by: Bart on December 20, 2011, 10:23:50 pm
I started to add it
It appears in the IDE now (for source editor)
But has empty default colors
If some one has a choice for colors, sent me an export, otherwise I will put in some random colors

I attached my editoroptions.xml (changed the extension, the forum won't let me attach xml), which provides a colorscheme for the po hl, applying the defaults as they are set in the constructor of the po hl.

Here's the relevant part of it:

Code: [Select]
    <Color Version="8">
      <Langpo_language_files Version="8">
        <SchemeDefault>
          <Key Style="fsBold"/>
          <Flags Foreground="clTeal"/>
          <String Foreground="clFuchsia"/>
          <Comment Style="fsItalic" Foreground="clGreen"/>
          <Identifier Style="fsBold" Foreground="clGreen"/>
          <Previous_value Style="fsItalic" Foreground="clOlive"/>
        </SchemeDefault>
      </Langpo_language_files>
    </Color>

Bart
Title: Re: po hl [Re: Idea for a Lazarus addition: resource string validator]
Post by: Bart on December 22, 2011, 06:29:27 pm
It appears in the IDE now (for source editor)
But has empty default colors

Why does it ovverride the default colors if no section with user defined colors exists?

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Martin_fr on December 22, 2011, 07:16:26 pm
Iam away for a few days.

Not sure why it doesn't use the defaults. needs to be debugged. but I am not sure I will do that anytime soon ( there are so many compatibility hacks in there.....)
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 22, 2011, 08:24:41 pm
I added a check for duplicate orginal values (untranslated strings) in the master po-file.

Sources still at http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip (http://home.tiscali.nl/~knmg0017/software/gpocheck_bron.zip)

@Juha: time to add this to Lazarus tools?
(You can remove the highligher files, it is in trunk already.)

Bart
Title: Re: po hl [Re: Idea for a Lazarus addition: resource string validator]
Post by: Bart on December 23, 2011, 12:02:27 am
It appears in the IDE now (for source editor)
But has empty default colors

If some one has a choice for colors, sent me an export, otherwise I will put in some random colors

I attached a diff for ColourDefault.xml in issue #20953 (http://bugs.freepascal.org/view.php?id=20953) in the bugtracker.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 23, 2011, 05:27:39 pm
@Juha: time to add this to Lazarus tools?

Bart, is it ok for you if I make it a package? Then it would be registered under Tools menu but source would be under Components directory.
This would allow at least 2 improvements in future:
1. Find automatically and verify the PO files that belong to the current project. This is very intuitive because we are dealing with translations of a Lazarus project always.
2. Use the IDE integration features better. The results window could be part of the editor windows for example.

Those improvements are not necessarily needed now but IMO this should be made a Lazarus IDE package because it works with a Lazarus project.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 24, 2011, 01:15:17 pm
Jha,

Feel free to make it a package.
I would still like to have this tool available as a seperate project for fpc/Lazarus users though...

When you commit the code, credits are appreciated  O:-)

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 25, 2011, 03:35:41 pm
Feel free to make it a package.
I would still like to have this tool available as a seperate project for fpc/Lazarus users though...

When you commit the code, credits are appreciated  O:-)

Ok, I did so. Now the package can be selected in the list of IDE packages. It installs under Tools menu.
Please test.

The stand-alone application version should be maintained somewhere else. Do you have access to CCR? Maybe some external repository?
There could also be project files under the package directory and one could build the application from the same sources.
However, that will not work when the package gets more IDE integration, so I don't consider it a good idea.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 25, 2011, 04:23:21 pm
Ok, I did so. Now the package can be selected in the list of IDE packages. It installs under Tools menu.
Please test.

Will do when I upgrade, may take quit a while though...

The stand-alone application version should be maintained somewhere else. Do you have access to CCR?

No I don't think so.

There could also be project files under the package directory and one could build the application from the same sources.
However, that will not work when the package gets more IDE integration, so I don't consider it a good idea.

As long as the SimplePoFiles and PoFamilies units, which basically do all the reading and testing, remain independent  of the IDE (and I see no reason why they shouldn't) a simple stand-alone version is IMHO still a feasible project to maintain.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 26, 2011, 12:09:03 am
As long as the SimplePoFiles and PoFamilies units, which basically do all the reading and testing, remain independent  of the IDE (and I see no reason why they shouldn't) a simple stand-alone version is IMHO still a feasible project to maintain.

I will look at it. Maybe just the main window should be copied.

I think the shared code also needs more attention. I guess it is a char-encoding problem.
For example "lazaruside.he.po" reports 260 errors in Format params although they look ok.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 26, 2011, 12:37:46 am
I think the shared code also needs more attention. I guess it is a char-encoding problem.
For example "lazaruside.he.po" reports 260 errors in Format params although they look ok.

Took a quick look.
The lazaruside.he.po file says it is UTF-8.
Looking at is, I see severeral s% instead of %s.
I know Hebrew (I guess it is Hebrew, I cannot display the characters on my system) is RTL, but should this affect the %s as I can see them in an UTF-8 capable (SynEdit) editor?

The lazaruside.zh_CN.po file (which is some kind of chinese, I guess, and so also RTL) otoh has only 13 errors in this test.
So maybe the Hebrew translation actually is off???

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on December 26, 2011, 12:49:23 am
Looking at is, I see severeral s% instead of %s.
I know Hebrew (I guess it is Hebrew, I cannot display the characters on my system) is RTL, but should this affect the %s as I can see them in an UTF-8 capable (SynEdit) editor?

The lazaruside.zh_CN.po file (which is some kind of chinese, I guess, and so also RTL) otoh has only 13 errors in this test.
So maybe the Hebrew translation actually is off???

Right! I didn't look at it carefully.
I don't think the Format function will interpret it correctly either. Right-to-Left languages will need attention in many places.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on December 26, 2011, 01:51:22 am
I think the lazaruside.he.po file is fawd.

Try this:
Options -> Editor -> Display -> Markup and Colors
In that form there is a slider to set some delaytime. Above it there is a label that says (in English, with dutch locale): (1,50 seconds delay).
This is controlled by the resoucestring "dlgeddelayinsec".

Now set desktop language to Hebrew:

The same label now says ( s), note: the "1,50" is gone.

Also when you do File -> Save As the promt "Delete old file [FileName]" (controlled by "lisdeleteoldfile") now says "s [FileName]"

(I then could not set the language back to English, since on that particular frame all controls were invisible.
I had to manually edit environmentoptions.xml.)

These experiments IMHO show that the languagefile lazaruside.he.po is incorrect.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on January 03, 2012, 04:32:04 pm
Bart, I added the project files for PoChecker into "Proj" directory. The project uses the same source files except for the main form file which I copied from the package. (Earlier I improved its anchors and layout).
Please test.

The test for duplicate values is broken. If you have an idea how to fix it, please tell me. Otherwise I will learn the code better at some time.
I guess the values should be stored in another hash map.
There are many duplicate values and this feature would be nice.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on January 03, 2012, 05:48:47 pm
Bart, I added the project files for PoChecker into "Proj" directory. The project uses the same source files except for the main form file which I copied from the package. (Earlier I improved its anchors and layout).
Please test.

I'll test it when I have updated (will take while, busy with other problems)

The test for duplicate values is broken. If you have an idea how to fix it, please tell me. Otherwise I will learn the code better at some time.

What do youmean by broken?

What it tries to do is this:
Only in the master .po file it searches for any duplicate value of an (untranslated) resourcestring.
For this it stores each untranslated string in a separate list.

I roughly tested this with a dummy po-file and it found all duplicates.

e.g.:

Code: [Select]
#: dummy.wiseditform
msgid "&Edit"
msgstr ""

#: dummy.wiseditsource
msgid "&Edit"
msgstr ""

It will warn that the value for dummy.wiseditsource is a duplicate of dummy.wiseditform.
(If it has more than 1 duplicate it will only list the firts one as reference)
If it does not do this, the test is broken.

Maybe I misunderstood the meaning of duplicates in this context.
In that case the test uses the wrong concept and must be re-designed.
In that case please create a dummy master po-file (max 20 entries) with duplicates and explain what is wrong with the file and what the test should report.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on January 03, 2012, 07:54:59 pm
I testing with the big LazarusIDEStrConsts.
If I load lazaruside.fi.po or lazaruside.po, the checker always says "No errors found".
I know there are many identical strings.

BTW, I have changed the color of "No errors found" to green. Red usually indicates an error.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on January 04, 2012, 02:23:58 am
In lazaruside.po can you please point out one duplicate (with the corresponding linenumbers) so I can re-test?

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on January 04, 2012, 01:44:25 pm
In lazaruside.po can you please point out one duplicate (with the corresponding linenumbers) so I can re-test?

at line 3209:
#: lazarusidestrconsts.dlgunitdepbrowse
msgctxt "lazarusidestrconsts.dlgunitdepbrowse"
msgid "Open"

at line 8515:
#: lazarusidestrconsts.lishintopen
msgctxt "lazarusidestrconsts.lishintopen"
msgid "Open"

at line 11126:
#: lazarusidestrconsts.lismenutemplateopen
msgctxt "lazarusidestrconsts.lismenutemplateopen"
msgid "Open"

And there are many others.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on January 04, 2012, 06:13:32 pm
Juha,

I found appr 250 of them, but all of them have a context specified.
I exluded items with a context from the check, based upon the documentation I found in the wiki: http://wiki.lazarus.freepascal.org/Translations_/_i18n_/_localizations_for_programs#Fuzzy_entries (http://wiki.lazarus.freepascal.org/Translations_/_i18n_/_localizations_for_programs#Fuzzy_entries).

I can simply add a check for all duplicates if you want, but I think 2 (or more) entries with the same text, but with a context specified, should not be considered an error?

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on January 04, 2012, 11:12:49 pm
I found appr 250 of them, but all of them have a context specified.

Where do you see 250 of them?

Quote
I exluded items with a context from the check, based upon the documentation I found in the wiki: http://wiki.lazarus.freepascal.org/Translations_/_i18n_/_localizations_for_programs#Fuzzy_entries (http://wiki.lazarus.freepascal.org/Translations_/_i18n_/_localizations_for_programs#Fuzzy_entries).

I can simply add a check for all duplicates if you want, but I think 2 (or more) entries with the same text, but with a context specified, should not be considered an error?

It is not an error, it could be marked as a warning instead.
The context is created automatically by the .po file generator. It means there is never a duplicate without a context.
Some of the duplicates need to be there but most of them could use the same string, IMO. It should be decided case by case.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on January 05, 2012, 02:12:51 am
We should have seperate Errors and Warnings (and seperate counters for them then)?

Quote from: Juha
Where do you see 250 of them?

In pofamilies.pp, procedure TPoFamily.CheckDuplicateOriginals, change these lines

Code: [Select]
  for i := FMaster.Count - 1 downto 0 do
  begin
    PoItem := FMaster.PoItems[i];
    Dup := FMaster.OriginalToItem(PoItem.Original);
    if Assigned(Dup) and (Dup.Identifier <> PoItem.Identifier) and (Dup.Context = '') then


into

Code: [Select]
  for i := FMaster.Count - 1 downto 0 do
  begin
    PoItem := FMaster.PoItems[i];
    Dup := FMaster.OriginalToItem(PoItem.Original);
    // remove the check for (Dup.Context = '')
    if Assigned(Dup) and (Dup.Identifier <> PoItem.Identifier)  then

Open lazaruside.po, run only this test and see:

Code: [Select]
--------------------------------------------------
Errors reported by CheckDiplicateOriginals for:
lazaruside.po
--------------------------------------------------

[Line: 17722]
This resourcestring:
#: lazarusidestrconsts.uemsetfreebookmark
msgid "Set a Free Bookmark"
msgctxt "lazarusidestrconsts.uemsetfreebookmark"
has the same value as idenftifier lazarusidestrconsts.lismenusetfreebookmark at line 10971
For this entry it is recommended to set: msgctxt="lazarusidestrconsts.uemsetfreebookmark"

...snip ...

[Line: 12]
This resourcestring:
#: lazarusidestrconsts.dbgbreakgroupdlgcaptionenable
msgid "Select Groups"
msgctxt "lazarusidestrconsts.dbgbreakgroupdlgcaptionenable"
has the same value as idenftifier lazarusidestrconsts.dbgbreakgroupdlgcaptiondisable at line 7
For this entry it is recommended to set: msgctxt="lazarusidestrconsts.dbgbreakgroupdlgcaptionenable"

Found 253 errors.
--------------------------------------------------


Total errors found: 253

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: JuhaManninen on January 06, 2012, 01:40:08 am
We should have seperate Errors and Warnings (and seperate counters for them then)?

[...]

In pofamilies.pp, procedure TPoFamily.CheckDuplicateOriginals, change these lines

I made the change in r34606 and now it reports the duplicates. Good.
I also commented out the message text "For this entry it is recommended..." because it doesn't make sense after the change.

Ideally there should be separate Errors and Warnings. Now I just changed the message text from Error to Error / Warning. No big deal.
In future this test could also be separate from the other checkboxes and the results could be presented in a grid.

Juha
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on January 06, 2012, 02:16:57 am
I'm not happy with the current listing of these duplicate errors (not your modification, but how it turns up in the ErrorLog).

I would like to have it somewhat like this:

The value 'Bar' has duplicates and is used by the follwing ID's
[Line x] lazarusidedstrconst.myres
[Line y] lazarusidedstrconst.anotherres

The value 'Foo' has duplicates and is used by the follwing ID's
[Line z] lazarusidedstrconst.yetanotherres
etc.

I'll re-think that and when I have time re-implement.
Then I'll split it into warnings and errors as well.

Bart
Title: Re: Idea for a Lazarus addition: resource string validator
Post by: Bart on January 08, 2012, 05:22:35 pm
Hi Juha,

I made some changes to the po-checker and uploaded the patch in the bugtracker http://bugs.freepascal.org/view.php?id=21049 (http://bugs.freepascal.org/view.php?id=21049) and assigned it to you.


Bart
TinyPortal © 2005-2018