public interface ISegmenter
| Modifier and Type | Method and Description |
|---|---|
int |
computeSegments(String text)
Calculate the segmentation of a given plain text string.
|
int |
computeSegments(TextContainer container)
Calculates the segmentation of a given TextContainer object.
|
LocaleId |
getLanguage()
Gets the language used to apply the rules.
|
Range |
getNextSegmentRange(TextContainer container)
Compute the range of the next segment for a given TextContainer object.
|
List<Range> |
getRanges()
Gets the list off all segments ranges calculated when
calling
computeSegments(String), or
computeSegments(TextContainer). |
List<Integer> |
getSplitPositions()
Gets the list of all the split positions in the text
that was last segmented.
|
boolean |
includeEndCodes()
Indicates if end codes should be included (See SRX implementation notes).
|
boolean |
includeIsolatedCodes()
Indicates if isolated codes should be included (See SRX implementation notes).
|
boolean |
includeStartCodes()
Indicates if start codes should be included (See SRX implementation notes).
|
boolean |
oneSegmentIncludesAll()
Indicates if, when there is a single segment in a text, it should include
the whole text (no spaces or codes trim left/right)
|
void |
reset()
Resets the options to their defaults, and the compiled rules
to nothing.
|
boolean |
segmentSubFlows()
Indicates if sub-flows must be segmented.
|
void |
setIncludeEndCodes(boolean includeEndCodes) |
void |
setIncludeIsolatedCodes(boolean includeIsolatedCodes) |
void |
setIncludeStartCodes(boolean includeStartCodes) |
void |
setLanguage(LocaleId locale)
Sets the locale used to apply the rules.
|
void |
setOneSegmentIncludesAll(boolean oneSegmentIncludesAll) |
void |
setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS)
Sets the options for this segmenter.
|
void |
setSegmentSubFlows(boolean segmentSubFlows) |
void |
setTreatIsolatedCodesAsWhitespace(boolean treatIsolatedCodesAsWhitespace) |
void |
setTrimCodes(boolean trimCodes) |
void |
setTrimLeadingWS(boolean trimLeadingWS) |
void |
setTrimTrailingWS(boolean trimTrailingWS) |
boolean |
treatIsolatedCodesAsWhitespace()
Indicate if the segmenter should treat each isolated code as a single
whitespace character (U+0020) when applying segmentation.
|
boolean |
trimLeadingWhitespaces()
Indicates if leading white-spaces should be left outside the segments.
|
boolean |
trimTrailingWhitespaces()
Indicates if trailing white-spaces should be left outside the segments.
|
int computeSegments(String text)
text - plain text to segment.int computeSegments(TextContainer container)
container - the object to segment.Range getNextSegmentRange(TextContainer container)
container - the text container where to look for the next segment.List<Integer> getSplitPositions()
computeSegments(TextContainer)
or computeSegments(String) before calling this method.
A split position is the first character position of a new segment.
IMPORTANT: The position returned here are the position WITHOUT taking in account any options for trimming or not leading and trailing white-spaces.
List<Range> getRanges()
computeSegments(String), or
computeSegments(TextContainer).Range object where start is the start and end the end of the range.
Returns null if no ranges have been defined yet.LocaleId getLanguage()
boolean includeEndCodes()
boolean includeIsolatedCodes()
boolean includeStartCodes()
void reset()
boolean segmentSubFlows()
boolean trimLeadingWhitespaces()
boolean trimTrailingWhitespaces()
boolean oneSegmentIncludesAll()
boolean treatIsolatedCodesAsWhitespace()
void setLanguage(LocaleId locale)
locale - Code of the language to use to apply the rules.void setIncludeEndCodes(boolean includeEndCodes)
void setIncludeIsolatedCodes(boolean includeIsolatedCodes)
void setIncludeStartCodes(boolean includeStartCodes)
void setOneSegmentIncludesAll(boolean oneSegmentIncludesAll)
void setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS)
segmentSubFlows - true to segment sub-flows, false to no segment them.includeStartCodes - true to include start codes just before a break in the 'left' segment,
false to put them in the next segment.includeEndCodes - true to include end codes just before a break in the 'left' segment,
false to put them in the next segment.includeIsolatedCodes - true to include isolated codes just before a break in the 'left' segment,
false to put them in the next segment.oneSegmentIncludesAll - true to include everything in segments that are alone.trimLeadingWS - true to trim leading white-spaces from the segments, false to keep them.trimTrailingWS - true to trim trailing white-spaces from the segments, false to keep them.void setSegmentSubFlows(boolean segmentSubFlows)
void setTrimCodes(boolean trimCodes)
void setTrimLeadingWS(boolean trimLeadingWS)
void setTrimTrailingWS(boolean trimTrailingWS)
void setTreatIsolatedCodesAsWhitespace(boolean treatIsolatedCodesAsWhitespace)
Copyright © 2021. All rights reserved.