seo - Delphi 'duplicate content' library or example -
as need test texts check duplicate content seo purposes.
for have 2 texts (in 2 strings s1 & s2) , need determine percentage of similarity between 2 strings. first code ok, determine %
(nbr of common words in s1 & s2)/100x(nbr of words in shorter string in s1 & s2).
but not sure algorythm.
do have experience code example share ?
what trying finding percentage of similarity of 2 strings.
some algorithm out there solve exact same problem. been using mainly:
- levenshteindistance
- ngramdistance
i had quick search in delphi code source. found source code lenvenshtein in delphi
lenvenshtein algorith trying find in "how many change" can rollback original string.
ngramdistance comparing words splitting them.
lenvenshtein string "abc def | klm mno" see very different "klm mn | abc def"
ngramdistance see them 100% similar.
so depend if want order of string account.
couldn't find source code ngramdistance. can translate java delpi.
- source code lenvenshtein in delphi
- source code lenvenshtein in java
- source code ngramdistance in java
the source code in java come lucene, open source search software. implemented lot more string metric algorithms checkout in package
Comments
Post a Comment