public class SoftTFIDF extends TFIDF
On the WHIRL datasets, thresholding JaroWinkler at 0.9 or 0.95 seems to be about right.
TFIDF.UnitVectorcollectionSize, documentFrequency, totalTokenCounttokenizer| Constructor and Description |
|---|
SoftTFIDF() |
SoftTFIDF(StringDistance tokenDistance) |
SoftTFIDF(StringDistance tokenDistance,
double tokenMatchThreshold) |
SoftTFIDF(Tokenizer tokenizer,
StringDistance tokenDistance,
double tokenMatchThreshold) |
| Modifier and Type | Method and Description |
|---|---|
String |
explainScore(StringWrapper s,
StringWrapper t)
Explain how the distance was computed.
|
double |
getTokenMatchThreshold() |
double |
score(StringWrapper s,
StringWrapper t)
This method needs to be implemented by subclasses.
|
void |
setTokenMatchThreshold(double d) |
void |
setTokenMatchThreshold(Double d) |
String |
toString() |
asUnitVector, getCollectionSize, getDocumentFrequency, getTokens, getVocabularySize, getWeight, main, prepare, setCollectionSize, setDocumentFrequency, setTokenCountcheckTrainingHasHappened, tokenIterator, trainasBagOfTokens, prepare, setStringWrapperPooladdExample, doMain, explainScore, getDistance, hasNextQuery, nextQuery, prepare, score, setDistanceInstancePoolpublic SoftTFIDF(Tokenizer tokenizer, StringDistance tokenDistance, double tokenMatchThreshold)
public SoftTFIDF(StringDistance tokenDistance, double tokenMatchThreshold)
public SoftTFIDF(StringDistance tokenDistance)
public SoftTFIDF()
public void setTokenMatchThreshold(double d)
public void setTokenMatchThreshold(Double d)
public double getTokenMatchThreshold()
public double score(StringWrapper s, StringWrapper t)
AbstractStringDistancescore in interface StringDistancescore in class TFIDFpublic String explainScore(StringWrapper s, StringWrapper t)
explainScore in interface StringDistanceexplainScore in class TFIDFCopyright © 2016. All rights reserved.