public class TextFragment extends Object implements Appendable, CharSequence, Comparable<TextFragment>
The model uses two objects to store the data:
Code object.
The coded text string is composed of normal characters and markers.
A marker is a sequence of two special characters (in the Unicode PUA) that
indicate the type of underlying code (opening, closing, isolated), and an
index pointing to its corresponding Code object where more information can be
found. The value of the index is encoded as a Unicode PUA character. You can
use the toChar(int) and toIndex(char) methods to encoded
and decode the index value.
To get the coded text of a TextFragment object use getCodedText(),
and to get its list of codes use getCodes().
You can modify directly the coded text or the codes and re-apply them to the
TextFragment object using setCodedText(String) and
setCodedText(String, List).
Adding a code to the coded text can be done by:
append(TagType, String, String)
changeToCode(int, int, TagType, String)
| Modifier and Type | Class and Description |
|---|---|
static class |
TextFragment.CompareMode
Enum constants to specify how compareTo should work.
|
static class |
TextFragment.Marker
List of the marker types as an
Enum. |
static class |
TextFragment.TagType
List of the types of tag usable for in-line codes.
|
| Modifier and Type | Field and Description |
|---|---|
static int |
CHARBASE
Special value used as the base of inline code indices.
|
protected List<Code> |
codes
List of the inline codes for this fragment.
|
protected boolean |
isBalanced
Flag indicating if the opening/closing inline codes of this fragment have
been balanced or not.
|
protected int |
lastCodeID
Value of the last inline code ID in this fragment.
|
static int |
MARKER_CLOSING
Special character marker for a closing inline code.
|
static int |
MARKER_ISOLATED
Special character marker for an isolated inline code.
|
static int |
MARKER_OPENING
Special character marker for a opening inline code.
|
static Pattern |
MARKERS_REGEX |
static String |
REFMARKER_END
Marker for end of reference.
|
static String |
REFMARKER_SEP
Marker for reference separator.
|
static String |
REFMARKER_START
Marker for start of reference.
|
protected StringBuilder |
text
Coded text buffer of this fragment.
|
| Constructor and Description |
|---|
TextFragment()
Creates an empty TextFragment.
|
TextFragment(String text)
Creates a TextFragment with a given text.
|
TextFragment(String text,
int lastCodeId)
Creates a TextFragment with a given text and an initial id value for codes.
|
TextFragment(String codedText,
List<Code> codes)
Creates a TextFragment with the content made of a given coded text and a list
of codes.
|
TextFragment(TextFragment fragment)
Creates a TextFragment with the content of a given TextFragment.
|
| Modifier and Type | Method and Description |
|---|---|
void |
alignCodeIds(TextFragment base) |
void |
alignCodeIds(TextFragment base,
CodeMatchStrategy strategy)
Aligns the code IDs of this fragment with the ones of a given fragment.
|
int |
annotate(int start,
int end,
String type,
InlineAnnotation annotation)
Annotates a section of this text.
|
TextFragment |
append(char value)
Appends a character to the fragment.
|
TextFragment |
append(CharSequence csq)
Appends the specified character sequence to this fragment.
|
void |
append(CharSequence text,
Function<Code,Code> codeProcessor)
Appends a CharSequence.
|
TextFragment |
append(CharSequence csq,
int start,
int end)
Appends a subsequence of the specified character sequence to this fragment.
|
TextFragment |
append(Code code)
Appends an existing code to this fragment.
|
TextFragment |
append(String text)
Appends a string to the fragment.
|
void |
append(String text,
Function<Code,Code> codeProcessor)
Appends a string.
|
Code |
append(TextFragment.TagType tagType,
String type,
InlineAnnotation annotation)
Appends an annotation-type code to this text.
|
Code |
append(TextFragment.TagType tagType,
String type,
String data)
Appends a new code to the text.
|
Code |
append(TextFragment.TagType tagType,
String type,
String data,
int id)
Appends a new code to the text, when the code has a defined identifier.
|
TextFragment |
append(TextFragment fragment)
Appends a TextFragment object to this fragment.
|
TextFragment |
append(TextFragment fragment,
boolean keepCodeIds)
Appends a TextFragment object to this fragment.
|
void |
balanceMarkers()
Balances the markers based on the tag type of the codes.
|
int |
changeToCode(int start,
int end,
TextFragment.TagType tagType,
String type)
Changes a section of the coded text into a single code.
|
int |
changeToCode(int start,
int end,
TextFragment.TagType tagType,
String type,
boolean setDisplayText)
Changes a section of the coded text into a single code.
|
char |
charAt(int index)
Returns the character at the specified index in the coded text of this
fragment.
|
TextFragment |
cleanCodes()
Removes all codes both in the Codes list and the markers.
|
void |
cleanUnusedCodes()
Removes all codes that have no data and no annotation.
|
void |
clear()
Clears the fragment of all content.
|
TextFragment |
clone()
Clones this TextFragment.
|
void |
collapseWhitespace()
Collapse all whitespace to a single space character.
|
int |
compareTo(TextFragment tf)
Compares an object with this TextFragment.
|
int |
compareTo(TextFragment frag,
boolean codeSensitive)
Deprecated.
Use compareTo(TextFragment, CompareMode)
|
int |
compareTo(TextFragment frag,
TextFragment.CompareMode compMode)
Compares with another TextFragment.
|
boolean |
equals(Object object) |
int |
findClosingCodePosition(int id,
int indexOfOpening)
Finds the position in this coded text of the closing code for a given opening
code.
|
int |
findOpeningCodePosition(int id,
int indexOfClosing)
Finds the position in this coded text of the opening code for a give closing
code.
|
static int |
fromFragmentToString(TextFragment frag,
int pos)
Gets the position in the string representation of a fragment of a given
position in that fragment.
|
List<AnnotatedSpan> |
getAnnotatedSpans(String type)
Gets the list of all spans of text annotated with a given type of annotation.
|
List<Code> |
getClonedCodes()
Gets a list of the copy of the codes for this fragment.
|
Code |
getCode(char indexAsChar)
Gets the code for a given index formatted as character (the second special
character in a marker in a coded text string).
|
Code |
getCode(Code fc)
Finds the first code with a given ID and tagType in this fragment, or null if there is no such
code.
|
Code |
getCode(int index)
Gets the code for a given index.
|
String |
getCodedText()
Gets the coded text representation of the fragment.
|
String |
getCodedText(int start,
int end)
Gets the portion of coded text for a given section of the coded text.
|
int |
getCodePosition(int index) |
List<Code> |
getCodes()
Gets the list of all codes for the fragment.
|
List<Code> |
getCodes(int start,
int end)
Gets a copy of the list of the codes that are within a given section of coded
text.
|
int |
getIndex(int id)
Gets the index value for the first in-line code (in the codes list) with a
given identifier.
|
int |
getIndexForClosing(int id)
Gets the index value for the closing in-line code (in the codes list) with a
given identifier.
|
Code |
getLastCode()
Return the last code appended to this fragment, or null if there are no
codes.
|
int |
getLastCodeId()
Gets the last value used for code id.
|
static Object[] |
getRefMarker(StringBuilder text)
Helper method to retrieve a reference marker from a string.
|
String |
getText()
Get the text of the fragment (all codes are removed)
|
static String |
getText(String codedText)
Helper method that will take a coded string and return a text only version.
|
boolean |
hasAnnotation()
Indicates if this text has at least one annotation.
|
boolean |
hasAnnotation(String type)
Indicates if this text has at least one annotation of a given type.
|
boolean |
hasCode()
Indicates if the fragment contains at least one code.
|
int |
hashCode() |
boolean |
hasReference()
Indicates if this TextFragment contains any in-line code with a reference.
|
boolean |
hasText()
Indicates if this fragment contains at least one character other than a
whitespace.
|
boolean |
hasText(boolean whiteSpacesAreText)
Indicates if this fragment contains at least one character (inline codes,
segment markers, and annotation markers do not count as characters).
|
static int |
indexOfFirstNonWhitespace(String codedText,
int fromIndex,
int untilIndex,
boolean openingMarkerIsWS,
boolean closingMarkerIsWS,
boolean isolatedMarkerIsWS,
boolean whitespaceIsWS)
Helper method to find the first non-whitespace character of a coded text,
starting at a given position and no farther than another given position.
|
static int |
indexOfLastNonWhitespace(String codedText,
int fromIndex,
int untilIndex,
boolean openingMarkerIsWS,
boolean closingMarkerIsWS,
boolean isolatedMarkerIsWS,
boolean whitespaceIsWS)
Helper method to find, from the back, the first non-whitespace character of a
coded text, starting at a given position and no farther than another given
position.
|
void |
insert(int offset,
Code code)
Inserts a
Code object to this fragment. |
void |
insert(int offset,
String str)
Inserts a
String object to this fragment. |
void |
insert(int offset,
TextFragment fragment)
Inserts a TextFragment object to this fragment.
|
void |
insert(int offset,
TextFragment fragment,
boolean keepCodeIds)
Inserts a TextFragment object to this fragment.
|
void |
invalidate()
Sets the fragment in a state where it has to be re-balanced before being used
for output.
|
boolean |
isEmpty()
Indicates if the fragment is empty (no text and no codes).
|
static boolean |
isMarker(char ch)
Helper method that checks if a given character is an inline code marker.
|
int |
length()
Returns the number of character in the coded text of this fragment.
|
void |
ltrim()
Remove leading whitespace from this fragment
|
static String |
makeRefMarker(String id)
Helper method to build a reference marker string from a given identifier.
|
static String |
makeRefMarker(String id,
String propertyName)
Helper method to build a reference marker string from a given identifier and
a property name.
|
int |
minimumIdValue()
Returns the smallest id value
|
void |
remove(int start,
int end)
Removes a section of the fragment (including its codes).
|
void |
removeAnnotations()
Removes all annotations in this text.
|
void |
removeAnnotations(String type)
Removes all annotations of a given type in this text.
|
void |
removeCode(Code code)
Remove the
Code from thios fragment |
int |
renumberCodes()
Renumbers the IDs of the codes in the fragment.
|
int |
renumberCodes(int idBase)
Re-assigns IDs of the codes in this fragment to be in a sequential order
starting from a given base.
|
int |
renumberCodes(int idBase,
boolean mindPosition)
Re-assigns IDs of the codes in this fragment to be in a sequential order
starting from a given base.
|
void |
rtrim()
Remove trailing whitespace from this fragment
|
void |
setCodedText(String newCodedText)
Sets the coded text of the fragment, using its the existing codes.
|
void |
setCodedText(String newCodedText,
boolean allowCodeDeletion)
Sets the coded text of the fragment, using its the existing codes.
|
void |
setCodedText(String newCodedText,
List<Code> newCodes)
Sets the coded text of the fragment and its corresponding codes.
|
void |
setCodedText(String newCodedText,
List<Code> newCodes,
boolean allowCodeDeletion)
Sets the coded text of the fragment and its corresponding codes.
|
protected void |
setCodes(List<Code> codes) |
TextFragment |
subSequence(int start,
int end)
Gets a copy of a sub-sequence of this object.
|
static char |
toChar(int index)
Helper method to convert a marker index to its character value in the coded
text string.
|
static int |
toIndex(char index)
Helper method to convert the index-coded-as-character part of a marker into
its index value.
|
String |
toString()
Gets the coded text for this fragment.
|
String |
toText()
Returns the content of this fragment, including the original codes whenever
possible.
|
void |
trim()
Trims white-spaces from the beginning and the end of this fragment.
|
static void |
unwrap(TextFragment frag)
Unwraps the content of a TextFragment.
|
finalize, getClass, notify, notifyAll, wait, wait, waitchars, codePointspublic static final int MARKER_OPENING
public static final int MARKER_CLOSING
public static final int MARKER_ISOLATED
public static final int CHARBASE
public static final String REFMARKER_START
public static final String REFMARKER_END
public static final String REFMARKER_SEP
public static final Pattern MARKERS_REGEX
protected StringBuilder text
protected boolean isBalanced
protected int lastCodeID
public TextFragment()
public TextFragment(String text)
text - the text to use.public TextFragment(String text, int lastCodeId)
text - the text to use.lastCodeId - value to use to start the code id. The first new code will
have for id this value+1. The value should be -1 or a
positive number. Values below -1 will be automatically
reset to -1.public TextFragment(TextFragment fragment)
fragment - the content to use.public static char toChar(int index)
index - the index value to encode.public static int toIndex(char index)
index - the character to decode.public static String makeRefMarker(String id)
id - the identifier to use.public static String makeRefMarker(String id, String propertyName)
id - The identifier to use.propertyName - the name of the property to use.public static Object[] getRefMarker(StringBuilder text)
text - the text to search for a reference marker.public static int fromFragmentToString(TextFragment frag, int pos)
For example if you find a match in a coded text string, use this method to convert the boundaries of the match into character position in the string representing the fragment (4 in "xxyyMATCHyyxx" -> 6 in "{b}{i}MATCH{/i}{/b}")
frag - the fragment where the position is located.pos - the position.public static int indexOfLastNonWhitespace(String codedText, int fromIndex, int untilIndex, boolean openingMarkerIsWS, boolean closingMarkerIsWS, boolean isolatedMarkerIsWS, boolean whitespaceIsWS)
codedText - the coded text to process.fromIndex - the first position to check (must be greater or
equal to untilIndex). Use -1 to point to the last
position of the text.untilIndex - The last position to check (must be lesser or equal
to fromIndex).openingMarkerIsWS - indicates if opening markers count as whitespace.closingMarkerIsWS - indicates if closing markers count as whitespace.isolatedMarkerIsWS - indicates if isolated markers count as whitespace.whitespaceIsWS - indicates if whitespace characters count as
whitespace.public static int indexOfFirstNonWhitespace(String codedText, int fromIndex, int untilIndex, boolean openingMarkerIsWS, boolean closingMarkerIsWS, boolean isolatedMarkerIsWS, boolean whitespaceIsWS)
codedText - the coded text to process.fromIndex - the first position to check (must be lesser or
equal to untilIndex).untilIndex - the last position to check (must be greater or
equal to fromIndex). Use -1 to point to the last
position of the text.openingMarkerIsWS - indicates if opening markers count as whitespace.closingMarkerIsWS - indicates if closing markers count as whitespace.isolatedMarkerIsWS - indicates if isolated markers count as whitespace.whitespaceIsWS - indicates if whitespace characters count as
whitespace.public static void unwrap(TextFragment frag)
frag - the text fragment to unwrap.public static boolean isMarker(char ch)
ch - the character to check.public static String getText(String codedText)
codedText - string with possible TextFragment codes.public TextFragment clone()
public boolean hasReference()
public TextFragment append(String text)
text - the string to append.public void append(String text, Function<Code,Code> codeProcessor)
text - the string to append.codeProcessor - when a Code is generated to mask an Okapi marker this
function will be called on it and can modify or replace
the generated codepublic void append(CharSequence text, Function<Code,Code> codeProcessor)
text - the string to append.codeProcessor - when a Code is generated to mask an Okapi marker this
function will be called on it and can modify or replace
the generated codepublic TextFragment append(TextFragment fragment)
fragment - the TextFragment to append.public TextFragment append(TextFragment fragment, boolean keepCodeIds)
fragment - the TextFragment to append.keepCodeIds - if true do not renumber Code.idpublic TextFragment append(Code code)
code - the existing code to append.public Code append(TextFragment.TagType tagType, String type, InlineAnnotation annotation)
tagType - the tag type of the code (e.g. TagType.OPENING).type - the type of the annotation (e.g. "protected").annotation - the annotation to add (can be null).public Code append(TextFragment.TagType tagType, String type, String data)
tagType - the tag type of the code (e.g. TagType.OPENING).type - the type of the code (e.g. "bold").data - the raw code itself. (e.g. "<b>").public Code append(TextFragment.TagType tagType, String type, String data, int id)
tagType - the tag type of the code (e.g. TagType.OPENING).type - the type of the code (e.g. "bold").data - the raw code itself. (e.g. "<b>").id - the identifier to use for this code.public void insert(int offset,
String str)
String object to this fragment.offset - position in the coded text where to insert the new String. You
can use -1 to append at the end of the current content.str - String to insert.InvalidPositionException - when offset points inside a marker.public void insert(int offset,
Code code)
Code object to this fragment.offset - position in the coded text where to insert the new Code. You
can use -1 to append at the end of the current content.code - Code to insert.InvalidPositionException - when offset points inside a marker.public void insert(int offset,
TextFragment fragment)
offset - position in the coded text where to insert the new fragment.
You can use -1 to append at the end of the current content.fragment - the TextFragment to insert.InvalidPositionException - when offset points inside a marker.public void insert(int offset,
TextFragment fragment,
boolean keepCodeIds)
offset - position in the coded text where to insert the new
fragment. You can use -1 to append at the end of the
current content.fragment - the TextFragment to insert.keepCodeIds - true to not change Ids of the codes of the inserted
TextFragment.public void clear()
public void trim()
public void ltrim()
public void rtrim()
public void collapseWhitespace()
public String getText()
public String getCodedText()
public void setCodedText(String newCodedText)
newCodedText - the coded text to apply.InvalidContentException - when the coded text is not valid, or does not
correspond to the existing codes.public String getCodedText(int start, int end)
start - the position of the first character or marker of the section (in
the coded text representation).end - The position just after the last character or marker of the
section (in the coded text representation). You can use -1 for
ending the section at the end of the fragment.InvalidPositionException - when start or end points inside a marker.public Code getCode(char indexAsChar)
indexAsChar - the index value coded as character.public Code getCode(int index)
index - the index of the code.public List<Code> getCodes()
public List<Code> getClonedCodes()
public List<Code> getCodes(int start, int end)
start - the position of the first character or marker of the section (in
the coded text representation).end - the position just after the last character or marker of the
section (in the coded text representation).InvalidPositionException - when start or end points inside a marker.public int getIndex(int id)
id - the identifier to look for.public int getIndexForClosing(int id)
id - the identifier of the closing tag to look for.public boolean isEmpty()
public boolean hasText()
public boolean hasText(boolean whiteSpacesAreText)
whiteSpacesAreText - indicates if whitespaces should be considered
characters or not for the purpose of checking if
this fragment is empty.public boolean hasCode()
public void remove(int start,
int end)
start - the position of the first character or marker of the section (in
the coded text representation).end - the position just after the last character or marker of the
section (in the coded text representation). You can use -1 to
indicate the end of the fragment.InvalidPositionException - when start or end points inside a marker.public TextFragment subSequence(int start, int end)
subSequence in interface CharSequencestart - the position of the first character or marker of the section (in
the coded text representation).end - the position just after the last character or marker of the
section (in the coded text representation). You can use -1 for
ending the section at the end of the fragment.public void setCodedText(String newCodedText, boolean allowCodeDeletion)
newCodedText - The coded text to apply.allowCodeDeletion - True when missing in-line codes in the coded text
means the corresponding codes should be deleted from
the fragment.InvalidContentException - When the coded text is not valid, or does not
correspond to the existing codes.public void setCodedText(String newCodedText, List<Code> newCodes)
newCodedText - the coded text to apply.newCodes - the list of the corresponding codes.InvalidContentException - when the coded text is not valid or does not
correspond to the new codes.public void setCodedText(String newCodedText, List<Code> newCodes, boolean allowCodeDeletion)
newCodedText - the coded text to apply.newCodes - the list of the corresponding codes.allowCodeDeletion - True when missing in-line codes in the coded text
means the corresponding codes should be deleted from
the fragment.InvalidContentException - when the coded text is not valid or does not
correspond to the new codes.public String toString()
getCodedText().
Each code is represented by a placeholder made of two special characters. To
get the content with the codes expanded as their original data use
toText().
toString in interface CharSequencetoString in class Objectpublic String toText()
getCodedText()
or toString().public int compareTo(TextFragment tf)
compareTo(fragment, CompareMode.IGNORE_CODE)
compareTo in interface Comparable<TextFragment>tf - the object to compare with this TextFragment.@Deprecated public int compareTo(TextFragment frag, boolean codeSensitive)
frag - the TextFragment to compare with this one.codeSensitive - true if the codes' positions and codes' contents need to be compared as well.public int compareTo(TextFragment frag, TextFragment.CompareMode compMode)
Caveat #1:
The current implementation assumes that code indexes are in the normal ascending order in the coded text.
For example, if
tf1.text="ABC", tf1.codes={{tagType:OPENING,id:1,data:"<em>"}, {tagType:CLOSING,id:1,data:"</em>"}}
and
tf2.text="ABC", tf2.codes={{tagType:CLOSING,id:1,data:"</em>"}, {tagType:OPENING,id:1,data:"<em>"}}
tf1.equals(tf2) returns false in all comparison modes, although they are semantically equal.
frag - compMode - public int changeToCode(int start,
int end,
TextFragment.TagType tagType,
String type)
start - The position of the first character or marker of the section
(in the coded text representation).end - the position just after the last character or marker of the
section (in the coded text representation).tagType - the tag type of the new code.type - the type of the new code.InvalidPositionException - when start or end points inside a marker.public int changeToCode(int start,
int end,
TextFragment.TagType tagType,
String type,
boolean setDisplayText)
start - The position of the first character or marker of the
section (in the coded text representation).end - the position just after the last character or marker of
the section (in the coded text representation).tagType - the tag type of the new code.type - the type of the new code.setDisplayText - if true set the subsequence (sub) as the displayText of
the codeInvalidPositionException - when start or end points inside a marker.public int findClosingCodePosition(int id,
int indexOfOpening)
id - identifier of the opening code.indexOfOpening - index of the opening code.public int findOpeningCodePosition(int id,
int indexOfClosing)
id - identifier of the opening code.indexOfClosing - index of the opening code.public int annotate(int start,
int end,
String type,
InlineAnnotation annotation)
start - the position of the first character or marker of the
section to annotate (in the coded text representation).end - the position just after the last character or marker of the
section to annotate (in the coded text representation).type - the type of annotation to set.annotation - the annotation to set (can be null).InvalidPositionException - when start or end points inside a marker.public void removeAnnotations()
public void removeAnnotations(String type)
type - the type of annotation to remove.public boolean hasAnnotation()
public boolean hasAnnotation(String type)
type - the type of annotation to look for.public void cleanUnusedCodes()
public TextFragment cleanCodes()
TextFragment, with the codes removedpublic int getCodePosition(int index)
public List<AnnotatedSpan> getAnnotatedSpans(String type)
type - the type of annotation to look for.public int renumberCodes()
public int renumberCodes(int idBase)
idBase - The base from which code IDs start numbering.public int renumberCodes(int idBase,
boolean mindPosition)
idBase - The base from which code IDs start numbering.mindPosition - If true, the codes with lesser positions in this text
fragment will have lesser IDs. If false, the codes with
lesser original IDs will be assigned lesser IDs.public void removeCode(Code code)
Code from thios fragmentcode - - the Code to removepublic void balanceMarkers()
invalidate() prior to calling this
method.public void alignCodeIds(TextFragment base, CodeMatchStrategy strategy)
%d equals %s and the target is
%s equals %d and %s and %d are codes.
You want their IDs to match for the code with the same content.base - the fragment to use as the base for the synchronization.public void alignCodeIds(TextFragment base)
public TextFragment append(char value)
append in interface Appendablevalue - the character to append.public TextFragment append(CharSequence csq)
append in interface Appendablecsq - the character sequence to append. If the parameter is null, the
string "null" is appended.public TextFragment append(CharSequence csq, int start, int end)
append in interface Appendablecsq - the character sequence to append. If csq is null, then
characters will be appended as if csq contained the string
"null".start - the index of the first character in the subsequence.end - the index of the character following the last character in the
subsequence.public char charAt(int index)
For example: If the fragment is "A[xy]B" and "[xy]" is a code, charAt(3) returns 'B' not 'x'.
If the specified index falls on a code placeholder, the character returned is
either a marker (first character of the placeholder) or a special index to
access the underlying code (second character of the placeholder). Markers can
be identified using isMarker(char).
charAt in interface CharSequenceindex - the index of the character to be returned.IndexOutOfBoundsException - if the if the index argument is negative or
not less than the length of the coded text.isMarker(char)public int length()
This is not the length of the content with all its codes. In the coded text, each code is represented by a placeholder made of two characters regardless of the size of the code. For example: If the fragment is "A[xy]B" and "[xy]" is a code, length() returns 4, not 6.
To get the length of the content including codes use
. Note that codes with referenced are
not expanded by toText().length()toText().
length in interface CharSequencepublic void invalidate()
public int getLastCodeId()
public Code getLastCode()
public Code getCode(Code fc)
fc - the Code to look for.public int minimumIdValue()
Copyright © 2022. All rights reserved.