seo - Delphi 'duplicate content' library or example -


as need test texts check duplicate content seo purposes.

for have 2 texts (in 2 strings s1 & s2) , need determine percentage of similarity between 2 strings. first code ok, determine %

(nbr of common words in s1 & s2)/100x(nbr of words in shorter string in s1 & s2). 

but not sure algorythm.

do have experience code example share ?

what trying finding percentage of similarity of 2 strings.

some algorithm out there solve exact same problem. been using mainly:

  • levenshteindistance
  • ngramdistance

i had quick search in delphi code source. found source code lenvenshtein in delphi

lenvenshtein algorith trying find in "how many change" can rollback original string.
ngramdistance comparing words splitting them.


lenvenshtein string "abc def | klm mno" see very different "klm mn | abc def"
ngramdistance see them 100% similar.

so depend if want order of string account.


couldn't find source code ngramdistance. can translate java delpi.

the source code in java come lucene, open source search software. implemented lot more string metric algorithms checkout in package


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -