| Package | Description |
|---|---|
| com.wcohen.ss |
This package contains a bunch of approximate string comparators, plus code for
performing controlled experiments with this.
|
| com.wcohen.ss.api | |
| com.wcohen.ss.expt | |
| com.wcohen.ss.lookup | |
| com.wcohen.ss.tokens |
| Modifier and Type | Field and Description |
|---|---|
protected Tokenizer |
AbstractTokenizedStringDistance.tokenizer |
| Constructor and Description |
|---|
AbstractSourcedTokenizedStringDistance(Tokenizer tokenizer) |
AbstractStatisticalTokenDistance(Tokenizer tokenizer) |
AbstractTokenizedStringDistance(Tokenizer tokenizer) |
DirichletJS(Tokenizer tokenizer,
double pseudoCount) |
Jaccard(Tokenizer tokenizer) |
JelinekMercerJS(Tokenizer tokenizer,
double lambda) |
JensenShannonDistance(Tokenizer tokenizer) |
Level2(Tokenizer tokenizer,
StringDistance tokenDistance) |
Mixture(Tokenizer tokenizer) |
SoftTFIDF(Tokenizer tokenizer,
StringDistance tokenDistance,
double tokenMatchThreshold) |
SoftTokenFelligiSunter(Tokenizer tokenizer,
StringDistance tokenDistance,
double tokenMatchThreshold,
double mismatchFactor) |
TagLink(Tokenizer tokenizer,
AbstractStringDistance tokenDistance)
TagLink constructor requires a tokenizer and a tokenDistance metric
|
TFIDF(Tokenizer tokenizer) |
TokenFelligiSunter(Tokenizer tokenizer,
double mismatchFactor) |
| Modifier and Type | Interface and Description |
|---|---|
interface |
SourcedTokenizer
Split a string into tokens, retaining provinance.
|
| Modifier and Type | Field and Description |
|---|---|
protected Tokenizer |
TokenBlocker.tokenizer |
protected Tokenizer |
AbbreviationsBlocker.tokenizer |
| Constructor and Description |
|---|
AbbreviationsBlocker(Tokenizer tokenizer,
double maxFraction) |
ClusterTokenBlocker(Tokenizer tokenizer,
double maxFraction) |
TokenBlocker(Tokenizer tokenizer,
double maxFraction) |
| Constructor and Description |
|---|
SoftDictionary(StringDistanceLearner distanceLearner,
Tokenizer tokenizer) |
SoftDictionary(Tokenizer tokenizer) |
SoftTFIDFDictionary(Tokenizer tokenizer) |
SoftTFIDFDictionary(Tokenizer tokenizer,
double minTokenSimilarity) |
SoftTFIDFDictionary(Tokenizer tokenizer,
double minTokenSimilarity,
int windowSize,
int maxInvertedIndexSize)
Create a new SoftTFIDFDictionary.
|
| Modifier and Type | Class and Description |
|---|---|
class |
CharacterTokenizer
Character tokenizer implementation.
|
class |
NGramTokenizer
Wraps another tokenizer, and adds all computes all ngrams of
characters from a single token produced by the inner tokenizer.
|
class |
SimpleSourcedTokenizer
Simple implementation of a Tokenizer.
|
class |
SimpleTokenizer
Simple implementation of a Tokenizer.
|
| Constructor and Description |
|---|
NGramTokenizer(int minNGramSize,
int maxNGramSize,
boolean keepOldTokens,
Tokenizer innerTokenizer) |
Copyright © 2016. All rights reserved.