| Package | Description |
|---|---|
| com.wcohen.ss |
This package contains a bunch of approximate string comparators, plus code for
performing controlled experiments with this.
|
| com.wcohen.ss.api | |
| com.wcohen.ss.tokens |
| Modifier and Type | Method and Description |
|---|---|
Token[] |
TFIDF.getTokens()
Access the tokens of the last prepare()-ed string.
|
Token[] |
SourcedTFIDF.getTokens()
Access the tokens of the last prepare()-ed string.
|
| Modifier and Type | Method and Description |
|---|---|
protected double |
JensenShannonDistance.backgroundProb(Token tok)
Probability of token in the background language model
|
int |
TFIDF.getDocumentFrequency(Token token)
Get the document frequency of the token.
|
int |
SourcedTFIDF.getDocumentFrequency(Token token)
Get the document frequency of the token.
|
int |
AbstractStatisticalTokenDistance.getDocumentFrequency(Token tok) |
int |
AbstractSourcedStatisticalTokenDistance.getDocumentFrequency(Token tok) |
double |
TFIDF.getWeight(Token token)
Access the weight of a token in the vector created for the last prepare()-ed string.
|
double |
SourcedTFIDF.getWeight(Token token)
Access the weight of a token in the vector created for the last prepare()-ed string.
|
void |
TFIDF.setDocumentFrequency(Token token,
int df)
Set the document frequency of the token to some value.
|
void |
SourcedTFIDF.setDocumentFrequency(Token token,
int df)
Set the document frequency of the token to some value.
|
protected double |
UnsmoothedJS.smoothedProbability(Token tok,
double freq,
double totalWeight)
Unsmoothed probability of the token
|
protected abstract double |
JensenShannonDistance.smoothedProbability(Token tok,
double freq,
double totalWeight)
Smoothed probability of the token with frequency freq in a bag with the given totalWeight
|
protected double |
JelinekMercerJS.smoothedProbability(Token tok,
double freq,
double totalWeight)
smoothed probability of the token
|
protected double |
DirichletJS.smoothedProbability(Token tok,
double freq,
double totalWeight)
smoothed probability of the token
|
| Constructor and Description |
|---|
UnitVector(String s,
Token[] tokens) |
UnitVector(String s,
Token[] tokens) |
| Modifier and Type | Interface and Description |
|---|---|
interface |
SourcedToken
An interned version of a string, with provinance information
|
| Modifier and Type | Method and Description |
|---|---|
Token |
Tokenizer.intern(String s)
Convert a given string into a token.
|
Token[] |
Tokenizer.tokenize(String input)
Return tokenized version of a string
|
| Modifier and Type | Class and Description |
|---|---|
class |
BasicSourcedToken
An interned version of a string, with provinance information
|
class |
BasicToken
An interned version of a string.
|
| Modifier and Type | Method and Description |
|---|---|
Token |
SimpleTokenizer.intern(String s) |
Token |
NGramTokenizer.intern(String s) |
Token |
CharacterTokenizer.intern(String s) |
Token[] |
SimpleTokenizer.tokenize(String input)
Return tokenized version of a string.
|
Token[] |
NGramTokenizer.tokenize(String input)
Return tokenized version of a string.
|
Token[] |
CharacterTokenizer.tokenize(String input)
Return tokenized version of a string.
|
| Modifier and Type | Method and Description |
|---|---|
Iterator<Token> |
CharacterTokenizer.tokenIterator() |
Copyright © 2016. All rights reserved.