| Package | Description |
|---|---|
| com.wcohen.ss |
This package contains a bunch of approximate string comparators, plus code for
performing controlled experiments with this.
|
| com.wcohen.ss.tokens |
| Modifier and Type | Class and Description |
|---|---|
class |
AbbreviationAlignment
Abbreviation distance metric which evaluates the probability of a short-form string being an abbreviation/acronym
of another long-form string.
|
class |
AbstractSourcedStatisticalTokenDistance
Abstract token distance metric that uses frequency statistics.
|
class |
AbstractSourcedTokenizedStringDistance
Abstract distance metric for tokenized strings.
|
class |
AbstractStatisticalTokenDistance
Abstract token distance metric that uses frequency statistics.
|
class |
AbstractTokenizedStringDistance
Abstract distance metric for tokenized strings.
|
class |
AffineGap
Affine-gap string distance, following Durban et al.
|
class |
ApproxNeedlemanWunsch
Needleman-Wunsch string distance, following Durban et al.
|
class |
DirichletJS
Jensen-Shannon distance of two unigram language models, smoothed
using Dirichlet prior.
|
class |
Jaccard
Jaccard distance implementation.
|
class |
Jaro
Jaro distance metric.
|
class |
JaroTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the Jaro distance metric.
|
class |
JaroWinkler
Jaro distance metric, as extended by Winkler.
|
class |
JaroWinklerTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the JaroWinkler distance metric.
|
class |
JelinekMercerJS
Jensen-Shannon distance of two unigram language models, smoothed
using Jelinek-Mercer mixture model.
|
class |
JensenShannonDistance
Distance metrics based on Jensen-Shannon distance of two smoothed
unigram language models.
|
class |
Level2
Generic version of Monge & Elkan's "level 2" recursive field
matching.
|
class |
Level2Jaro
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
class |
Level2JaroWinkler
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
class |
Level2Levenstein
"Level 2" recursive field matching algorithm using Levenstein
distance.
|
class |
Level2MongeElkan
Monge & Elkan's "level 2" recursive field matching algorithm.
|
class |
Levenstein
Levenstein string distance.
|
class |
Mixture
Mixture-based distance metric.
|
class |
MongeElkan
The match method proposed by Monge and Elkan.
|
class |
MongeElkanTFIDF
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the MongeElkan distance metric.
|
class |
NeedlemanWunsch
Needleman-Wunsch string distance, following Durban et al.
|
class |
ScaledLevenstein
Levenstein string distance.
|
class |
SmithWaterman
Smith-Waterman string distance, following Durban et al.
|
class |
SoftTFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
|
class |
SoftTokenFelligiSunter
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
class |
SourcedSoftTFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
|
class |
SourcedTFIDF
Sourced-based distance metric.
|
class |
TagLink |
class |
TFIDF
TFIDF-based distance metric.
|
class |
TokenFelligiSunter
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
class |
UnsmoothedJS
Jensen-Shannon distance of two unsmoothed unigram language models.
|
class |
WinklerRescorer
Winkler's reweighting scheme for distance metrics.
|
| Constructor and Description |
|---|
TagLink(AbstractStringDistance tokenDistance)
TagLink constructor requires a character based string metric.
|
TagLink(String[] dataSetArray,
AbstractStringDistance tokenDistance)
TagLink constructor requires dataset string array in order to compute the IDF
weights and a tokenDistance metric.
|
TagLink(Tokenizer tokenizer,
AbstractStringDistance tokenDistance)
TagLink constructor requires a tokenizer and a tokenDistance metric
|
| Modifier and Type | Class and Description |
|---|---|
class |
TagLinkToken |
Copyright © 2016. All rights reserved.