Recent

Author Topic: TRegExpr equivalent of PHP preg_match_all  (Read 8194 times)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9908
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr equivalent of PHP preg_match_all
« Reply #15 on: August 19, 2017, 11:22:13 pm »
I did a quick test:

Code: Pascal  [Select][+][-]
  1. writeln(Syllables('tier'));
  2. writeln(Syllables('funnier'));
  3. writeln(Syllables('verbose'));
  4. writeln(Syllables('goat'));
  5.  

outputs
1
2
2
1

tier: 1 correct
funnier: 2 wrong, should be 3
  http://www.dictionary.com/browse/funnier
  https://www.howmanysyllables.com/words/funnier
verbose: 2 correct
goat: 1 wrong, should be 2

---------------
You may look at https://www.howmanysyllables.com/howtocountsyllables (method 5)
for more tips.

However: Diphthong: au, oy, oo
Is not sufficient.
"ie" may indicate 1 or 2 syllables. (funnier vs tier)

------------
Not tested, but your "e" at the end test will go wrong for
soiree https://www.howmanysyllables.com/words/soiree
theatre
lee

« Last Edit: August 19, 2017, 11:29:33 pm by Martin_fr »

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: TRegExpr equivalent of PHP preg_match_all
« Reply #16 on: August 20, 2017, 12:43:42 am »
Counting syllables is a bit of a minefield.
However goat is only 1 syllable.

Nevertheless, I'm sure that it would be possible to find English words that defy almost any rule-based algorithm, because English is a fairly irregular language and British English has some most peculiar spellings.
Sometimes this arises from the many foreign loan-words that are now part of the language, but even basic English spelling is a bit of a black art, with numerous exceptions to the 'rules', particularly for proper names.
For example, how many people learning the language would suppose that both "Cholmondley" and "Leicester" have only two syllables?

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9908
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr equivalent of PHP preg_match_all
« Reply #17 on: August 20, 2017, 12:54:35 am »
Right about goat.

I had Goa (2 syllables), but since this is a name, I looked for something similar.

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: TRegExpr equivalent of PHP preg_match_all
« Reply #18 on: August 20, 2017, 01:03:59 am »
Gentlemen,

As howardpc pointed out, counting syllables is complex matter, which therefore may not have any single solution.

As I wrote in an earlier post to this thread, all programs that I could find which implement the computation of readability indexes provide 24 as the number of syllables for the "Platypus text". However, the website Readable.io (https://readable.io/) does provide 26 for the same text. This is also the result provided by the Pascal routines from here: http://eljaco.se/SWAG/TEXTFILE/0047.PAS.html.

A flexible solution would be to implement in my program an option for the user to choose the method to be used for counting syllables; this would be a new functionality in this kind of software, as all the others only implement one method.

Best regards,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9908
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr equivalent of PHP preg_match_all
« Reply #19 on: August 20, 2017, 02:03:43 am »
Actually your code is a good start.

Suggestion: Add more testcases (that just the one sentence), and see which other rules may help (and if you need them, or if you can live with the error margin)

I am not a native english speaker, but it looks like the following may improve your code even if they can have false positive.

1) You have removing e at end, except le.
Make it except le or <consonant>re

2) counting ie as one, except if the word ends on ier (and is longer than 4 / or has more than one syllable / to avoid tier)


maybe there is a rule for ua?
in-fat-u-at-ed

(careful if the u is after q)

Or ia:
in·e·bri·ate

« Last Edit: August 20, 2017, 02:07:29 am by Martin_fr »

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: TRegExpr equivalent of PHP preg_match_all
« Reply #20 on: August 20, 2017, 02:13:39 am »
Thanks for the suggestion. I will look into the issues you pointed out.

Merci!  :D
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

 

TinyPortal © 2005-2018