| Package | Description |
|---|---|
| com.wcohen.ss |
This package contains a bunch of approximate string comparators, plus code for
performing controlled experiments with this.
|
| com.wcohen.ss.api | |
| com.wcohen.ss.expt | |
| com.wcohen.ss.lookup | |
| com.wcohen.ss.tokens |
| Modifier and Type | Class and Description |
|---|---|
class |
AbbreviationAlignment
Abbreviation distance metric which evaluates the probability of a short-form string being an abbreviation/acronym
of another long-form string.
|
class |
AbstractSourcedStatisticalTokenDistance
Abstract token distance metric that uses frequency statistics.
|
class |
AbstractSourcedTokenizedStringDistance
Abstract distance metric for tokenized strings.
|
class |
AbstractStatisticalTokenDistance
Abstract token distance metric that uses frequency statistics.
|
class |
AbstractStringDistance
Abstract class which implements StringDistanceLearner as well as StringDistance.
|
class |
AbstractTokenizedStringDistance
Abstract distance metric for tokenized strings.
|
class |
AffineGap
Affine-gap string distance, following Durban et al.
|
class |
ApproxNeedlemanWunsch
Needleman-Wunsch string distance, following Durban et al.
|
protected class |
CombinedStringDistanceLearner.CombinedStringDistance
Abstract class for combining innerDistances's
|
class |
DirichletJS
Jensen-Shannon distance of two unigram language models, smoothed
using Dirichlet prior.
|
class |
Jaccard
Jaccard distance implementation.
|
class |
Jaro
Jaro distance metric.
|
class |
JaroTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the Jaro distance metric.
|
class |
JaroWinkler
Jaro distance metric, as extended by Winkler.
|
class |
JaroWinklerTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the JaroWinkler distance metric.
|
class |
JelinekMercerJS
Jensen-Shannon distance of two unigram language models, smoothed
using Jelinek-Mercer mixture model.
|
class |
JensenShannonDistance
Distance metrics based on Jensen-Shannon distance of two smoothed
unigram language models.
|
class |
Level2
Generic version of Monge & Elkan's "level 2" recursive field
matching.
|
class |
Level2Jaro
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
class |
Level2JaroWinkler
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
class |
Level2Levenstein
"Level 2" recursive field matching algorithm using Levenstein
distance.
|
class |
Level2MongeElkan
Monge & Elkan's "level 2" recursive field matching algorithm.
|
class |
Levenstein
Levenstein string distance.
|
class |
Mixture
Mixture-based distance metric.
|
class |
MongeElkan
The match method proposed by Monge and Elkan.
|
class |
MongeElkanTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the MongeElkan distance metric.
|
class |
MultiStringAvgDistance
StringDistance defined over Strings that are broken into fields,
with distance defined as the average distance between any field.
|
class |
MultiStringDistance
Abstract class StringDistance defined over Strings that are broken
into fields.
|
class |
NeedlemanWunsch
Needleman-Wunsch string distance, following Durban et al.
|
class |
ScaledLevenstein
Levenstein string distance.
|
class |
SmithWaterman
Smith-Waterman string distance, following Durban et al.
|
class |
SoftTFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
|
class |
SoftTokenFelligiSunter
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
class |
SourcedSoftTFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
|
class |
SourcedTFIDF
Sourced-based distance metric.
|
class |
TagLink |
class |
TFIDF
TFIDF-based distance metric.
|
class |
TokenFelligiSunter
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
class |
UnsmoothedJS
Jensen-Shannon distance of two unsmoothed unigram language models.
|
class |
WinklerRescorer
Winkler's reweighting scheme for distance metrics.
|
| Modifier and Type | Field and Description |
|---|---|
protected StringDistance[] |
CombinedStringDistanceLearner.CombinedStringDistance.innerDistances |
| Modifier and Type | Method and Description |
|---|---|
static StringDistance[] |
DistanceLearnerFactory.buildArray(String classNames)
Generate a StringDistanceArray given a sequence of classnames
separated by slashes.
|
abstract StringDistance |
CombinedStringDistanceLearner.getDistance()
Get the final string distance, which will be based on the distances learned by the
inner learners, as well as the combination scheme learned by comboSetAnswer, comboTrain,
and etc.
|
StringDistance |
AveragedStringDistanceLearner.getDistance() |
StringDistance |
AbstractStringDistance.getDistance()
Implements the StringDistanceLearner api by return a StringDistance.
|
protected abstract StringDistance |
MultiStringDistance.getDistance(int i)
Get the distance used for the i-th pair of fields
|
protected StringDistance |
MultiStringAvgDistance.getDistance(int i) |
protected StringDistance[] |
CombinedStringDistanceLearner.getInnerDistances()
Get an array of trained inner distances.
|
| Modifier and Type | Method and Description |
|---|---|
protected static void |
MultiStringDistance.doMain(StringDistance d,
String[] argv)
Default main routine for testing
|
protected static void |
AbstractStringDistance.doMain(StringDistance d,
String[] argv)
Default main routine for testing
|
void |
MultiStringWrapper.prepare(StringDistance[] innerDistances)
Prepare each field with the appropriate distance
|
| Constructor and Description |
|---|
CombinedStringDistance(StringDistance[] innerDistances,
MultiStringWrapper prototype) |
Level2(Tokenizer tokenizer,
StringDistance tokenDistance) |
MultiStringAvgDistance(StringDistance distance,
String delim) |
SoftTFIDF(StringDistance tokenDistance) |
SoftTFIDF(StringDistance tokenDistance,
double tokenMatchThreshold) |
SoftTFIDF(Tokenizer tokenizer,
StringDistance tokenDistance,
double tokenMatchThreshold) |
SoftTokenFelligiSunter(Tokenizer tokenizer,
StringDistance tokenDistance,
double tokenMatchThreshold,
double mismatchFactor) |
SourcedSoftTFIDF(SourcedTokenizer tokenizer,
StringDistance tokenDistance,
double tokenMatchThreshold) |
SourcedSoftTFIDF(StringDistance tokenDistance) |
SourcedSoftTFIDF(StringDistance tokenDistance,
double tokenMatchThreshold) |
WinklerRescorer(StringDistance innerDistance)
Rescore the innerDistance's scores, to account for the
subjectively greater importance of the first few characters.
|
| Modifier and Type | Method and Description |
|---|---|
StringDistance |
StringDistanceLearner.getDistance()
Return the learned distance.
|
StringDistance |
StringDistanceTeacher.train(StringDistanceLearner learner) |
| Modifier and Type | Method and Description |
|---|---|
StringDistance |
SpecialMatchExpt.getLearnedDistance() |
| Constructor and Description |
|---|
RescoringSoftTFIDFDictionary(FastLookup inner,
double innerMinScore,
StringDistance rescorer) |
| Modifier and Type | Class and Description |
|---|---|
class |
TagLinkToken |
Copyright © 2016. All rights reserved.