| Package | Description |
|---|---|
| com.wcohen.ss |
This package contains a bunch of approximate string comparators, plus code for
performing controlled experiments with this.
|
| com.wcohen.ss.api | |
| com.wcohen.ss.expt | |
| com.wcohen.ss.lookup | |
| com.wcohen.ss.tokens |
| Modifier and Type | Class and Description |
|---|---|
class |
AbbreviationAlignment
Abbreviation distance metric which evaluates the probability of a short-form string being an abbreviation/acronym
of another long-form string.
|
class |
AbstractSourcedStatisticalTokenDistance
Abstract token distance metric that uses frequency statistics.
|
class |
AbstractSourcedTokenizedStringDistance
Abstract distance metric for tokenized strings.
|
class |
AbstractStatisticalTokenDistance
Abstract token distance metric that uses frequency statistics.
|
class |
AbstractStringDistance
Abstract class which implements StringDistanceLearner as well as StringDistance.
|
class |
AbstractTokenizedStringDistance
Abstract distance metric for tokenized strings.
|
class |
AffineGap
Affine-gap string distance, following Durban et al.
|
class |
ApproxNeedlemanWunsch
Needleman-Wunsch string distance, following Durban et al.
|
class |
AveragedStringDistanceLearner
Abstract StringDistanceLearner class which averages results of a number of
inner distance metrics, learned by a number of inner distance learners.
|
class |
CombinedStringDistanceLearner
Abstract StringDistanceLearner class which combines results of a number of
inner distance metrics, learned by a number of inner distance learners.
|
class |
DirichletJS
Jensen-Shannon distance of two unigram language models, smoothed
using Dirichlet prior.
|
class |
Jaccard
Jaccard distance implementation.
|
class |
Jaro
Jaro distance metric.
|
class |
JaroTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the Jaro distance metric.
|
class |
JaroWinkler
Jaro distance metric, as extended by Winkler.
|
class |
JaroWinklerTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the JaroWinkler distance metric.
|
class |
JelinekMercerJS
Jensen-Shannon distance of two unigram language models, smoothed
using Jelinek-Mercer mixture model.
|
class |
JensenShannonDistance
Distance metrics based on Jensen-Shannon distance of two smoothed
unigram language models.
|
class |
Level2
Generic version of Monge & Elkan's "level 2" recursive field
matching.
|
class |
Level2Jaro
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
class |
Level2JaroWinkler
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
class |
Level2Levenstein
"Level 2" recursive field matching algorithm using Levenstein
distance.
|
class |
Level2MongeElkan
Monge & Elkan's "level 2" recursive field matching algorithm.
|
class |
Levenstein
Levenstein string distance.
|
class |
Mixture
Mixture-based distance metric.
|
class |
MongeElkan
The match method proposed by Monge and Elkan.
|
class |
MongeElkanTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the MongeElkan distance metric.
|
class |
NeedlemanWunsch
Needleman-Wunsch string distance, following Durban et al.
|
class |
ScaledLevenstein
Levenstein string distance.
|
class |
SmithWaterman
Smith-Waterman string distance, following Durban et al.
|
class |
SoftTFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
|
class |
SoftTokenFelligiSunter
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
class |
SourcedSoftTFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
|
class |
SourcedTFIDF
Sourced-based distance metric.
|
class |
TagLink |
class |
TFIDF
TFIDF-based distance metric.
|
class |
TokenFelligiSunter
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
class |
UnsmoothedJS
Jensen-Shannon distance of two unsmoothed unigram language models.
|
class |
WinklerRescorer
Winkler's reweighting scheme for distance metrics.
|
| Modifier and Type | Field and Description |
|---|---|
protected StringDistanceLearner[] |
CombinedStringDistanceLearner.innerLearners |
| Modifier and Type | Method and Description |
|---|---|
static StringDistanceLearner |
DistanceLearnerFactory.build(String classNames)
Generate a StringDistance from a class name, or a sequence of classnames
separated by slashes.
|
static StringDistanceLearner |
DistanceLearnerFactory.build(String[] classNames)
Generate a StringDistance from a sequence of classnames.
|
| Constructor and Description |
|---|
AveragedStringDistanceLearner(StringDistanceLearner[] innerLearners,
String delim) |
CombinedStringDistanceLearner(StringDistanceLearner[] innerLearners,
String delim) |
| Modifier and Type | Method and Description |
|---|---|
StringDistance |
StringDistanceTeacher.train(StringDistanceLearner learner) |
| Constructor and Description |
|---|
MatchExpt(MatchData data,
StringDistanceLearner learner) |
MatchExpt(MatchData data,
StringDistanceLearner learner,
Blocker blocker) |
SpecialMatchExpt(MatchData data,
StringDistanceLearner learner,
Blocker blocker,
boolean useTrueClusters,
String moreNamesFile,
String similarTokenFile,
boolean untrained) |
| Constructor and Description |
|---|
SoftDictionary(StringDistanceLearner distanceLearner) |
SoftDictionary(StringDistanceLearner distanceLearner,
Tokenizer tokenizer) |
| Modifier and Type | Class and Description |
|---|---|
class |
TagLinkToken |
Copyright © 2016. All rights reserved.