public class PlingStemmer extends Object
System.out.println(PlingStemmer.stem("boy"));
----> boy
System.out.println(PlingStemmer.stem("boys"));
----> boy
System.out.println(PlingStemmer.stem("biophysics"));
----> biophysics
System.out.println(PlingStemmer.stem("automata"));
----> automaton
System.out.println(PlingStemmer.stem("genus"));
----> genus
System.out.println(PlingStemmer.stem("emus"));
----> emu
There are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.
It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.
The PlingStemmer uses material from WordNet.
It requires the class FinalSet from the Java Tools.
| Modifier and Type | Field and Description |
|---|---|
static Set<String> |
category00
Words that do not have a distinct plural form (like "atlas" etc.)
|
static Set<String> |
categoryCHE_CHES
Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms
|
static Set<String> |
categoryEX_ICES
Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms
|
static Set<String> |
categoryICS
Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)
|
static Set<String> |
categoryIE_IES
Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms
|
static Set<String> |
categoryIS_ES
Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms
|
static Set<String> |
categoryIX_ICES
Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms
|
static Set<String> |
categoryO_I
Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms
|
static Set<String> |
categoryOE_OES
Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms
|
static Set<String> |
categoryON_A
Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms
|
static Set<String> |
categorySE_SES
Words that end in "-se" in their plural forms (like "nurse" etc.)
|
static Set<String> |
categorySSE_SSES
Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms
|
static Set<String> |
categoryU_US
Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms
|
static Set<String> |
categoryUM_A
Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms
|
static Set<String> |
categoryUS_I
Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms
|
static Map<String,String> |
irregular
Maps irregular Germanic English plural nouns to their singular form
|
static Set<String> |
singAndPlur
Contains word forms that can either be plural or singular
|
| Constructor and Description |
|---|
PlingStemmer() |
| Modifier and Type | Method and Description |
|---|---|
static String |
cut(String s,
String suffix)
Cuts a suffix from a string (that is the number of chars given by the suffix)
|
static boolean |
isPlural(String s)
Tells whether a word form is plural.
|
static boolean |
isSingular(String s)
Tells whether a word form is singular.
|
static boolean |
isSingularAndPlural(String s)
Tells whether a word form is the singular form of one word and at
the same time the plural form of another.
|
static boolean |
noLatin(String s)
Returns true if a word is probably not Latin
|
static String |
stem(String s)
Stems an English noun
|
public static Set<String> categorySE_SES
public static Set<String> category00
public static Set<String> categoryUM_A
public static Set<String> categoryON_A
public static Set<String> categoryO_I
public static Set<String> categoryUS_I
public static Set<String> categoryIX_ICES
public static Set<String> categoryIS_ES
public static Set<String> categoryOE_OES
public static Set<String> categoryEX_ICES
public static Set<String> categoryU_US
public static Set<String> categorySSE_SSES
public static Set<String> categoryCHE_CHES
public static Set<String> categoryICS
public static Set<String> categoryIE_IES
public static Map<String,String> irregular
public static boolean isPlural(String s)
public static boolean isSingular(String s)
public static boolean isSingularAndPlural(String s)
public static String cut(String s, String suffix)
public static boolean noLatin(String s)
Copyright © 2015 Bluebrain Project. All rights reserved.