Recent

Author Topic: Algorithm for searching duplicate documents  (Read 1343 times)

vladvons

  • Jr. Member
  • **
  • Posts: 65
Algorithm for searching duplicate documents
« on: November 08, 2013, 07:49:53 am »
There is PostgreSQL DB with anecdote, articles, stories, etc.
I want to check new entered text for originality.
In Postgres there is FuzzySearch extension, but has 255 char text length limitation. 

Has someone Pascal or PostgreSQL implementation of algorithm for searching duplicate documents?
Also known as 'Shingles algorithm'.
http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/Princeton.pdf
Windows 7, Ubuntu 12.04, Lazarus 1.2.2, FPC 2.6.4, PostgreSQL 9.2

 

TinyPortal © 2005-2018