Package net.sf.okapi.common.resource
Class TextUnitUtil
- java.lang.Object
-
- net.sf.okapi.common.resource.TextUnitUtil
-
public class TextUnitUtil extends Object
-
-
Constructor Summary
Constructors Constructor Description TextUnitUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static AltTranslationsAnnotationaddAltTranslation(Segment seg, AltTranslation alt)Adds anAltTranslationobject to a givenSegment.static AltTranslationsAnnotationaddAltTranslation(TextContainer targetContainer, AltTranslation alt)Adds anAltTranslationobject to a givenTextContainer.static voidaddQualifiers(ITextUnit textUnit, String qualifier)Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text.static voidaddQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text.static ITextUnitbuildGenericTU(String source)Creates a generic new text unit resource based a given string becoming the source text of the text unit.static ITextUnitbuildGenericTU(String srcPart, String skelPart)Creates a generic new text unit resource based on a given string becoming the source text of the text unit, and a skeleton string, which gets appended to the new text unit's skeleton.static ITextUnitbuildGenericTU(ITextUnit textUnit, String name, TextContainer source, TextContainer target, LocaleId locId, String comment)Creates a new generic text unit resource or updates the one passed as the parameter.static ITextUnitbuildGenericTU(TextContainer source)Creates a new generic text unit resource based on a given text container object becoming the source part of the text unit.static voidconvertTextPart_whitespaceCodesToText(TextPart textPart)static voidconvertTextParts_whitespaceCodesToText(TextContainer tc)static voidconvertTextPartsToCodes(TextContainer tc)Convert all TextParts (not Segments) in a given TextContainer to each contain a single code with the part's text.static voidconvertTextPartToCode(TextPart textPart)Create a single code with a given TextPart's text.static GenericSkeletonconvertToSkeleton(ITextUnit textUnit)Copies source and target text of a given text unit into a newly created skeleton.static voiddeleteLastChar(TextFragment textFragment)Deletes the last non-whitespace and non-code character of a given text fragment.static booleanendsWith(TextFragment textFragment, String substr)Indicates if a given text fragment ends with a given sub-string.static TextFragmentexpandCodes(TextFragment tf)Expand codes that have been previously merged.static TextFragmentextractSegMarkers(TextFragment tf, TextFragment original, boolean removeFromOriginal)Extracts segment and text part markers from a given string, creates codes (place-holder type) for those markers, and appends them to a given text fragment.static GenericSkeletonforceSkeleton(ITextUnit tu)Makes sure that a given text unit contains a skeleton.static StringgetCodedText(TextFragment textFragment)Gets text of a given text fragment object possibly containing inline codes.static chargetLastChar(TextFragment textFragment)Gets the last character of a given text fragment.static <A extends IAnnotation>
AgetSourceAnnotation(ITextUnit textUnit, Class<A> type)Gets an annotation attached to the source part of a given text unit resource.static StringgetSourceText(ITextUnit textUnit)Gets the coded text of the first part of the source of a given text unit resource.static StringgetSourceText(ITextUnit textUnit, boolean removeCodes)Gets the coded text of the first part of a source part of a given text unit resource.static <A extends IAnnotation>
AgetTargetAnnotation(ITextUnit textUnit, LocaleId locId, Class<A> type)Gets an annotation attached to the target part of a given text unit resource in a given locale.static StringgetTargetText(ITextUnit textUnit, LocaleId locId)Gets text of the first part of the target of a given text unit resource in the given locale.static StringgetText(TextFragment textFragment)Extracts text from the given text fragment.static StringgetText(TextFragment textFragment, List<Integer> markerPositions)Extracts text from the given text fragment.static booleanhasExternalRefMarker(Code code)static booleanhasMergedCode(TextFragment tf)static booleanhasSegEndMarker(Code code)static booleanhasSegOrTpMarker(Code code)static booleanhasSegStartMarker(Code code)static booleanhasSource(ITextUnit textUnit)Indicates if a given text unit resource is null, or its source part is null or empty.static booleanhasTpEndMarker(Code code)static booleanhasTpStartMarker(Code code)static booleanisApproved(ITextUnit tu, LocaleId targetLocale)static booleanisEmpty(ITextUnit textUnit)Indicates if a given text unit resource is null, or its source part is null or empty.static booleanisEmpty(ITextUnit textUnit, boolean ignoreWS)Indicates if a given text unit resource is null, or its source part is null or empty.static booleanisEmpty(TextFragment textFragment)Indicates if a given text fragment object is null, or the text it contains is null or empty.static booleanisStandalone(ITextUnit tu)static booleanisWellformed(TextContainer tc)static booleanisWellformed(TextFragment tf)static intlastIndexOf(TextFragment textFragment, String findWhat)Returns the index (within a given text fragment object) of the rightmost occurrence of the specified substring.static booleanneedsPreserveWhitespaces(ITextUnit tu)static booleanneedsPreserveWhitespaces(TextContainer tc)Detects if a given TextContainer contains whitespace characters to be preserved in XML.static StringprintMarkerIndexes(TextFragment textFragment)static StringprintMarkers(TextFragment textFragment)static StringremoveAndReplaceCodes(String codedText, String isolatedCodeReplacement)Removes the opening and closing codes and replaces the isolated codes in text with the specified string.static StringremoveCodes(String codedText)Removes all inline tags from a given coded text.static voidremoveCodes(ITextUnit textUnit, boolean removeTargetCodes)Removes all inline tags in the source (or optionally the target) text unit resource.static voidremoveCodes(TextContainer tc)Removes all inline tags from the givenTextContainerstatic voidremoveCodes(TextFragment tf)Removes all inline tags from the givenTextFragmentstatic booleanremoveQualifiers(ITextUnit textUnit, String qualifier)Removes from the source part of a given text unit resource qualifiers (quotation marks etc.) around text.static booleanremoveQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)Removes from the source part of a given un-segmented text unit resource qualifiers (parenthesis, quotation marks etc.) around text.static voidrenumberCodes(TextContainer tc)static StringrestoreSegmentation(TextContainer tc, TextFragment segStorage)Restores original segmentation of a given text container from a given text fragment created with storeSegmentation().static voidsetSourceAnnotation(ITextUnit textUnit, IAnnotation annotation)Attaches an annotation to the source part of a given text unit resource.static voidsetSourceText(ITextUnit textUnit, String text)Sets the coded text of the un-segmented source of a given text unit resource.static voidsetTargetAnnotation(ITextUnit textUnit, LocaleId locId, IAnnotation annotation)Attaches an annotation to the target part of a given text unit resource in a given language.static voidsetTargetText(ITextUnit textUnit, LocaleId locId, String text)Sets the coded text of the the target part of a given text unit resource in a given language.static voidsimplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes)Simplifies all possible tags in the source part of a given text unit resource.static voidsimplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)Simplifies all possible tags in the source part of a given text unit resource.static TextFragment[]simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes)Simplifies all possible tags in a given text container.static TextFragment[]simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)Simplifies all possible tags in a given text container.static TextFragment[]simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes)Simplifies all possible tags in a given text fragment.static TextFragment[]simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)Simplifies all possible tags in a given text fragment.static voidsimplifyCodesPostSegmentation(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)Simplifies all possible tags in the source part of a given text unit resource.static voidsimplifyCodesPostSegmentation(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)Simplifies all possible tags in the source part of a given text unit resource.static TextFragmentstoreSegmentation(TextContainer tc)static StringtestMarkers()static StringtoText(String text, List<Code> codes)Returns representation of a given coded text with code data enclosed in brackets.static StringtoText(TextFragment tf)Returns the content of a given text fragment, including the original codes whenever possible.static voidtrimLeading(TextFragment textFragment)Removes leading whitespaces from a given text fragment.static voidtrimLeading(TextFragment textFragment, GenericSkeleton skel)Removes leading whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.static voidtrimSegments(TextContainer tc)static voidtrimSegments(TextContainer tc, boolean trimLeading, boolean trimTrailing)Trims segments of a given text container that contains leading or trailing whitespaces.static voidtrimTrailing(TextFragment textFragment)Removes trailing whitespaces from a given text fragment.static voidtrimTrailing(TextFragment textFragment, GenericSkeleton skel)Removes trailing whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.static voidtrimTU(ITextUnit textUnit, boolean trimLeading, boolean trimTrailing)Removes leading and/or trailing whitespaces from the source part of a given text unit resource.static voidunsegmentTU(ITextUnit tu)
-
-
-
Method Detail
-
trimLeading
public static void trimLeading(TextFragment textFragment)
Removes leading whitespaces from a given text fragment.- Parameters:
textFragment- the text fragment which leading whitespaces are to be removed.
-
trimLeading
public static void trimLeading(TextFragment textFragment, GenericSkeleton skel)
Removes leading whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.- Parameters:
textFragment- the text fragment which leading whitespaces are to be removed.skel- the skeleton to put the removed whitespaces.
-
trimTrailing
public static void trimTrailing(TextFragment textFragment)
Removes trailing whitespaces from a given text fragment.- Parameters:
textFragment- the text fragment which trailing whitespaces are to be removed.
-
trimTrailing
public static void trimTrailing(TextFragment textFragment, GenericSkeleton skel)
Removes trailing whitespaces from a given text fragment, puts removed whitespaces to the given skeleton.- Parameters:
textFragment- the text fragment which trailing whitespaces are to be removed.skel- the skeleton to put the removed whitespaces.
-
endsWith
public static boolean endsWith(TextFragment textFragment, String substr)
Indicates if a given text fragment ends with a given sub-string. Trailing spaces are not counted.- Parameters:
textFragment- the text fragment to examine.substr- the text to lookup.- Returns:
- true if the given text fragment ends with the given sub-string.
-
isEmpty
public static boolean isEmpty(ITextUnit textUnit)
Indicates if a given text unit resource is null, or its source part is null or empty.- Parameters:
textUnit- the text unit to check.- Returns:
- true if the given text unit resource is null, or its source part is null or empty.
-
hasSource
public static boolean hasSource(ITextUnit textUnit)
Indicates if a given text unit resource is null, or its source part is null or empty. Whitespaces are not taken into account, e.g. if the text unit contains only whitespaces, it's considered empty.- Parameters:
textUnit- the text unit to check.- Returns:
- true if the given text unit resource is null, or its source part is null or empty.
-
isEmpty
public static boolean isEmpty(ITextUnit textUnit, boolean ignoreWS)
Indicates if a given text unit resource is null, or its source part is null or empty. Whitespaces are not taken into account, if ignoreWS = true, e.g. if the text unit contains only whitespaces, it's considered empty.- Parameters:
textUnit- the text unit to check.ignoreWS- if true and the text unit contains only whitespaces, then the text unit is considered empty.- Returns:
- true if the given text unit resource is null, or its source part is null or empty.
-
getSourceText
public static String getSourceText(ITextUnit textUnit)
Gets the coded text of the first part of the source of a given text unit resource.- Parameters:
textUnit- the text unit resource which source text should be returned.- Returns:
- the source part of the given text unit resource.
-
getSourceText
public static String getSourceText(ITextUnit textUnit, boolean removeCodes)
Gets the coded text of the first part of a source part of a given text unit resource. If removeCodes = false, and the text contains inline codes, then the codes will be removed.- Parameters:
textUnit- the text unit resource which source text should be returned.removeCodes- true if possible inline codes should be removed.- Returns:
- the source part of the given text unit resource.
-
getTargetText
public static String getTargetText(ITextUnit textUnit, LocaleId locId)
Gets text of the first part of the target of a given text unit resource in the given locale.- Parameters:
textUnit- the text unit resource which source text should be returned.locId- the locale the target part being sought.- Returns:
- the target part of the given text unit resource in the given loacle, or an empty string if the text unit doesn't contain one.
-
getCodedText
public static String getCodedText(TextFragment textFragment)
Gets text of a given text fragment object possibly containing inline codes.- Parameters:
textFragment- the given text fragment object.- Returns:
- the text of the given text fragment object possibly containing inline codes.
-
getText
public static String getText(TextFragment textFragment, List<Integer> markerPositions)
Extracts text from the given text fragment. Used to create a copy of the original string but without code markers. The original string is not stripped of code markers, and remains intact.- Parameters:
textFragment- TextFragment object with possible codes insidemarkerPositions- List to store initial positions of removed code markers. use null to not store the markers.- Returns:
- The copy of the string, contained in TextFragment, but without code markers
-
printMarkerIndexes
public static String printMarkerIndexes(TextFragment textFragment)
-
printMarkers
public static String printMarkers(TextFragment textFragment)
-
getText
public static String getText(TextFragment textFragment)
Extracts text from the given text fragment. Used to create a copy of the original string but without code markers. The original string is not stripped of code markers, and remains intact.- Parameters:
textFragment- TextFragment object with possible codes inside- Returns:
- The copy of the string, contained in TextFragment, but w/o code markers
-
getLastChar
public static char getLastChar(TextFragment textFragment)
Gets the last character of a given text fragment.- Parameters:
textFragment- the text fragment to examin.- Returns:
- the last character of the given text fragment, or '\0'.
-
deleteLastChar
public static void deleteLastChar(TextFragment textFragment)
Deletes the last non-whitespace and non-code character of a given text fragment.- Parameters:
textFragment- the text fragment to examine.
-
lastIndexOf
public static int lastIndexOf(TextFragment textFragment, String findWhat)
Returns the index (within a given text fragment object) of the rightmost occurrence of the specified substring.- Parameters:
textFragment- the text fragment to examine.findWhat- the substring to search for.- Returns:
- if the string argument occurs one or more times as a substring within this object, then the index of the
first character of the last such substring is returned. If it does not occur as a substring,
-1is returned.
-
isEmpty
public static boolean isEmpty(TextFragment textFragment)
Indicates if a given text fragment object is null, or the text it contains is null or empty.- Parameters:
textFragment- the text fragment to examine.- Returns:
- true if the given text fragment object is null, or the text it contains is null or empty.
-
buildGenericTU
public static ITextUnit buildGenericTU(TextContainer source)
Creates a new generic text unit resource based on a given text container object becoming the source part of the text unit. WARNING: Not all filters useGenericSkeleton. Use with caution.- Parameters:
source- the given text container becoming the source part of the text unit.- Returns:
- a new text unit resource with the given text container object being its source part.
-
buildGenericTU
public static ITextUnit buildGenericTU(String source)
Creates a generic new text unit resource based a given string becoming the source text of the text unit. WARNING: Not all filters useGenericSkeleton. Use with caution.- Parameters:
source- the given string becoming the source text of the text unit.- Returns:
- a new text unit resource with the given string being its source text.
-
buildGenericTU
public static ITextUnit buildGenericTU(String srcPart, String skelPart)
Creates a generic new text unit resource based on a given string becoming the source text of the text unit, and a skeleton string, which gets appended to the new text unit's skeleton. WARNING: Not all filters useGenericSkeleton. Use with caution.- Parameters:
srcPart- the given string becoming the source text of the created text unit.skelPart- the skeleton string appended to the new text unit's skeleton.- Returns:
- a new text unit resource with the given string being its source text, and the skeleton string in the skeleton.
-
buildGenericTU
public static ITextUnit buildGenericTU(ITextUnit textUnit, String name, TextContainer source, TextContainer target, LocaleId locId, String comment)
Creates a new generic text unit resource or updates the one passed as the parameter. You can use this method to create a new text unit or modify existing one (adding or modifying its fields' values). WARNING: Not all filters useGenericSkeleton. Use with caution.- Parameters:
textUnit- the text unit to be modified, or null to create a new text unit.name- name of the new text unit, or a new name for the existing one.source- the text container object becoming the source part of the text unit.target- the text container object becoming the target part of the text unit.locId- the locale of the target part (passed in the target parameter).comment- the optional comment becoming a NOTE property of the text unit.- Returns:
- a reference to the original or newly created text unit.
-
forceSkeleton
public static GenericSkeleton forceSkeleton(ITextUnit tu)
Makes sure that a given text unit contains a skeleton. If there's no skeleton already attached to the text unit, a new skeleton object is created and attached to the text unit.- Parameters:
tu- the given text unit to have a skeleton.- Returns:
- the skeleton of the text unit.
-
convertToSkeleton
public static GenericSkeleton convertToSkeleton(ITextUnit textUnit)
Copies source and target text of a given text unit into a newly created skeleton. The original text unit remains intact, and plays a role of a pattern for a newly created skeleton's contents.- Parameters:
textUnit- the text unit to be copied into a skeleton.- Returns:
- the newly created skeleton, which contents reflect the given text unit.
-
getSourceAnnotation
public static <A extends IAnnotation> A getSourceAnnotation(ITextUnit textUnit, Class<A> type)
Gets an annotation attached to the source part of a given text unit resource.- Type Parameters:
A- a class implementing IAnnotation- Parameters:
textUnit- the given text unit resource.type- reference to the requested annotation type.- Returns:
- the annotation or null if not found.
-
setSourceAnnotation
public static void setSourceAnnotation(ITextUnit textUnit, IAnnotation annotation)
Attaches an annotation to the source part of a given text unit resource.- Parameters:
textUnit- the given text unit resource.annotation- the annotation to be attached to the source part of the text unit.
-
getTargetAnnotation
public static <A extends IAnnotation> A getTargetAnnotation(ITextUnit textUnit, LocaleId locId, Class<A> type)
Gets an annotation attached to the target part of a given text unit resource in a given locale.- Type Parameters:
A- a class implementing IAnnotation- Parameters:
textUnit- the given text unit resource.locId- the locale of the target part being sought.type- reference to the requested annotation type.- Returns:
- the annotation or null if not found.
-
setTargetAnnotation
public static void setTargetAnnotation(ITextUnit textUnit, LocaleId locId, IAnnotation annotation)
Attaches an annotation to the target part of a given text unit resource in a given language.- Parameters:
textUnit- the given text unit resource.locId- the locale of the target part being attached to.annotation- the annotation to be attached to the target part of the text unit.
-
setSourceText
public static void setSourceText(ITextUnit textUnit, String text)
Sets the coded text of the un-segmented source of a given text unit resource.- Parameters:
textUnit- the given text unit resource.text- the text to be set.
-
setTargetText
public static void setTargetText(ITextUnit textUnit, LocaleId locId, String text)
Sets the coded text of the the target part of a given text unit resource in a given language.- Parameters:
textUnit- the given text unit resource.locId- the locale of the target part being set.text- the text to be set.
-
trimTU
public static void trimTU(ITextUnit textUnit, boolean trimLeading, boolean trimTrailing)
Removes leading and/or trailing whitespaces from the source part of a given text unit resource.- Parameters:
textUnit- the given text unit resource.trimLeading- true to remove leading whitespaces if there are any.trimTrailing- true to remove trailing whitespaces if there are any.
-
addQualifiers
public static void addQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)
Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text. This method is useful when the starting and ending qualifiers are different.- Parameters:
textUnit- the given text unit resourcestartQualifier- the qualifier to be added before textendQualifier- the qualifier to be added after text
-
addQualifiers
public static void addQualifiers(ITextUnit textUnit, String qualifier)
Adds to the skeleton of a given text unit resource qualifiers (quotation marks etc.) to appear around text.- Parameters:
textUnit- the given text unit resourcequalifier- the qualifier to be added before and after text
-
removeQualifiers
public static boolean removeQualifiers(ITextUnit textUnit, String startQualifier, String endQualifier)
Removes from the source part of a given un-segmented text unit resource qualifiers (parenthesis, quotation marks etc.) around text. This method is useful when the starting and ending qualifiers are different.- Parameters:
textUnit- the given text unit resource.startQualifier- the qualifier to be removed before source text.endQualifier- the qualifier to be removed after source text.- Returns:
- true if the qualifiers were found and removed
-
simplifyCodes
public static void simplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in the source part of a given text unit resource.- Parameters:
textUnit- the given text unitrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.
-
simplifyCodes
public static void simplifyCodes(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource.- Parameters:
textUnit- the given text unitrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codesmergeCodes- true to merge adjacent codes, false to leave as-is of the source part and place their text in the skeleton.
-
simplifyCodesPostSegmentation
public static void simplifyCodesPostSegmentation(ITextUnit textUnit, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource. If the TextUnit has a target then skip simplification.- Parameters:
textUnit- the given text unitrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codes of the source part and place their text in the corresponding inter-segment TextPart.mergeCodes- true to merge adjacent codes, false to leave as-is
-
simplifyCodesPostSegmentation
public static void simplifyCodesPostSegmentation(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in the source part of a given text unit resource. If the TextUnit has a target then skip simplification.- Parameters:
tc- the given text containerrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codes of the source part and place their text in the corresponding inter-segment TextPart.mergeCodes- true to merge adjacent codes, false to leave as-is
-
expandCodes
public static TextFragment expandCodes(TextFragment tf)
Expand codes that have been previously merged.- Parameters:
tf- The originalTextFragmentwith possibly merged codes.- Returns:
- new
TextFragmentwith expanded codes or original if there are no codes or they have not been merged.
-
hasMergedCode
public static boolean hasMergedCode(TextFragment tf)
-
removeCodes
public static void removeCodes(ITextUnit textUnit, boolean removeTargetCodes)
Removes all inline tags in the source (or optionally the target) text unit resource.- Parameters:
textUnit- the given text unitremoveTargetCodes- - remove target codes?
-
removeCodes
public static void removeCodes(TextContainer tc)
Removes all inline tags from the givenTextContainer- Parameters:
tc- the given text container
-
removeCodes
public static void removeCodes(TextFragment tf)
Removes all inline tags from the givenTextFragment- Parameters:
tf- the given text fragment
-
removeCodes
public static String removeCodes(String codedText)
Removes all inline tags from a given coded text.- Parameters:
codedText- the given coded text string- Returns:
- the string without code markers
-
removeAndReplaceCodes
public static String removeAndReplaceCodes(String codedText, String isolatedCodeReplacement)
Removes the opening and closing codes and replaces the isolated codes in text with the specified string.- Parameters:
codedText- The given coded text stringisolatedCodeReplacement- The isolated code replacement- Returns:
- The string without code markers
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in a given text fragment.- Parameters:
tf- the given text fragmentrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextFragment tf, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in a given text fragment.- Parameters:
tf- the given text fragmentrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.mergeCodes- true to merge adjacent codes, false to leave as-is- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes)
Simplifies all possible tags in a given text container.- Parameters:
tc- the given text containerrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
simplifyCodes
public static TextFragment[] simplifyCodes(TextContainer tc, String rules, boolean removeLeadingTrailingCodes, boolean mergeCodes)
Simplifies all possible tags in a given text container.- Parameters:
tc- the given text containerrules- rules for the data-driven simplificationremoveLeadingTrailingCodes- true to remove leading and/or trailing codes of the source part and place their text in the skeleton.mergeCodes- true to merge adjacent codes, false to leave as-is- Returns:
- Null (no leading or trailing code removal was) or a string array with the original data of the codes removed. The first string if there was a leading code, the second string if there was a trailing code. Both or either can be null
-
removeQualifiers
public static boolean removeQualifiers(ITextUnit textUnit, String qualifier)
Removes from the source part of a given text unit resource qualifiers (quotation marks etc.) around text.- Parameters:
textUnit- the given text unit resource.qualifier- the qualifier to be removed before and after source text.- Returns:
- true if the qualifiers were found and removed
-
addAltTranslation
public static AltTranslationsAnnotation addAltTranslation(TextContainer targetContainer, AltTranslation alt)
Adds anAltTranslationobject to a givenTextContainer. TheAltTranslationsAnnotationannotation is created if it does not exist already.- Parameters:
targetContainer- the container where to add the object.alt- alternate translation to add.- Returns:
- the annotation where the object was added, it may be a new annotation or the one already associated with the container.
-
addAltTranslation
public static AltTranslationsAnnotation addAltTranslation(Segment seg, AltTranslation alt)
Adds anAltTranslationobject to a givenSegment. TheAltTranslationsAnnotationannotation is created if it does not exist already.- Parameters:
seg- the segment where to add the object.alt- alternate translation to add.- Returns:
- the annotation where the object was added, it may be a new annotation or the one already associated with the segment.
-
storeSegmentation
public static TextFragment storeSegmentation(TextContainer tc)
-
trimSegments
public static void trimSegments(TextContainer tc, boolean trimLeading, boolean trimTrailing)
Trims segments of a given text container that contains leading or trailing whitespaces. Removed whitespaces are placed in newly created whitespace-only text parts before and after a trimmed segment.- Parameters:
tc- the given text containertrimLeading- true to remove leading whitespaces of a segmenttrimTrailing- true to remove trailing whitespaces of a segment
-
trimSegments
public static void trimSegments(TextContainer tc)
-
extractSegMarkers
public static TextFragment extractSegMarkers(TextFragment tf, TextFragment original, boolean removeFromOriginal)
Extracts segment and text part markers from a given string, creates codes (place-holder type) for those markers, and appends them to a given text fragment.- Parameters:
tf- the given text fragment to append extracted codesoriginal- the given stringremoveFromOriginal- remove found markers from the given string- Returns:
- the given string if removeFromOriginal == false, or the modified original string with markers removed otherwise
-
hasSegOrTpMarker
public static boolean hasSegOrTpMarker(Code code)
-
hasSegStartMarker
public static boolean hasSegStartMarker(Code code)
-
hasSegEndMarker
public static boolean hasSegEndMarker(Code code)
-
hasTpStartMarker
public static boolean hasTpStartMarker(Code code)
-
hasTpEndMarker
public static boolean hasTpEndMarker(Code code)
-
hasExternalRefMarker
public static boolean hasExternalRefMarker(Code code)
-
restoreSegmentation
public static String restoreSegmentation(TextContainer tc, TextFragment segStorage)
Restores original segmentation of a given text container from a given text fragment created with storeSegmentation().- Parameters:
tc- the given text containersegStorage- the text fragment created with storeSegmentation() and containing the original segmentation info- Returns:
- a test string containing a sequence of markers created by the internal algorithm. Used for tests only.
-
testMarkers
public static String testMarkers()
-
toText
public static String toText(TextFragment tf)
Returns the content of a given text fragment, including the original codes whenever possible. Codes are decorated with '[' and ']' to tell them from regular text.- Parameters:
tf- the given text fragment- Returns:
- the content of the given fragment
-
toText
public static String toText(String text, List<Code> codes)
Returns representation of a given coded text with code data enclosed in brackets.- Parameters:
text- the given coded textcodes- the given list of codes- Returns:
- content of the given coded text
-
convertTextPartsToCodes
public static void convertTextPartsToCodes(TextContainer tc)
Convert all TextParts (not Segments) in a given TextContainer to each contain a single code with the part's text. Needed to protect the text of text part (e.g. created from original codes) against being escaped by an encoder.- Parameters:
tc- the given TextContainer
-
convertTextPartToCode
public static void convertTextPartToCode(TextPart textPart)
Create a single code with a given TextPart's text. Needed to protect the text of the text part from being escaped by an encoder. If the TextPart already has codes, no conversion is performed.- Parameters:
textPart- the given TextPart
-
convertTextParts_whitespaceCodesToText
public static void convertTextParts_whitespaceCodesToText(TextContainer tc)
-
convertTextPart_whitespaceCodesToText
public static void convertTextPart_whitespaceCodesToText(TextPart textPart)
-
isStandalone
public static boolean isStandalone(ITextUnit tu)
-
renumberCodes
public static void renumberCodes(TextContainer tc)
-
needsPreserveWhitespaces
public static boolean needsPreserveWhitespaces(TextContainer tc)
Detects if a given TextContainer contains whitespace characters to be preserved in XML. Single space 0x20 doesn't need to be preserved, other whitespace characters, also a sequence of 2 or more single spaces do.- Parameters:
tc- the given TextContainer object.- Returns:
- true if the given TextContainer has whitespace sequences that need to be preserved.
-
needsPreserveWhitespaces
public static boolean needsPreserveWhitespaces(ITextUnit tu)
-
isWellformed
public static boolean isWellformed(TextFragment tf)
-
isWellformed
public static boolean isWellformed(TextContainer tc)
-
unsegmentTU
public static void unsegmentTU(ITextUnit tu)
-
-