public class BidiFormatter extends Object
1. Bidi Wrapping When text in one language is mixed into a document in another, opposite-directionality language, e.g. when an English business name is embedded in a Hebrew web page, both the inserted string and the text surrounding it may be displayed incorrectly unless the inserted string is explicitly separated from the surrounding text in a "wrapper" that:
- Declares its directionality so that the string is displayed correctly. This can be done in HTML
markup (e.g. a 'span dir="rtl"' element) by spanWrap(java.lang.String, boolean, boolean) and similar methods, or - only in
contexts where markup can't be used - in Unicode bidi formatting codes by unicodeWrap(java.lang.String, boolean, boolean)
and similar methods. Optionally, the markup can be inserted even when the directionality is the
same, in order to keep the DOM structure more stable.
- Isolates the string's directionality, so it does not unduly affect the surrounding content. Currently, this can only be done using invisible Unicode characters of the same direction as the context (LRM or RLM) in addition to the directionality declaration above, thus "resetting" the directionality to that of the context. The "reset" may need to be done at both ends of the string. Without "reset" after the string, the string will "stick" to a number or logically separate opposite-direction text that happens to follow it in-line (even if separated by neutral content like spaces and punctuation). Without "reset" before the string, the same can happen there, but only with more opposite-direction text, not a number. One approach is to "reset" the direction only after each string, on the theory that if the preceding opposite- direction text is itself bidi-wrapped, the "reset" after it will prevent the sticking. (Doing the "reset" only before each string definitely does not work because we do not want to require bidi-wrapping numbers, and a bidi-wrapped opposite-direction string could be followed by a number.) Still, the safest policy is to do the "reset" on both ends of each string, since RTL message translations often contain untranslated Latin-script brand names and technical terms, and one of these can be followed by a bidi-wrapped inserted value. On the other hand, when one has such a message, it is best to do the "reset" manually in the message translation itself, since the message's opposite-direction text could be followed by an inserted number, which we would not bidi-wrap anyway. Thus, "reset" only after the string is the current default. In an alternative to "reset", recent additions to the HTML, CSS, and Unicode standards allow the isolation to be part of the directionality declaration. This form of isolation is better than "reset" because it takes less space, does not require knowing the context directionality, has a gentler effect than "reset", and protects both ends of the string. However, we do not yet allow using it because required platforms do not yet support it.
Providing these wrapping services is the basic purpose of the bidi formatter.
2. Directionality estimation How does one know whether a string about to be inserted into surrounding text has the same directionality? Well, in many cases, one knows that this must be the case when writing the code doing the insertion, e.g. when a localized message is inserted into a localized page. In such cases there is no need to involve the bidi formatter at all. In some other cases, it need not be the same as the context, but is either constant (e.g. urls are always LTR) or otherwise known. In the remaining cases, e.g. when the string is user-entered or comes from a database, the language of the string (and thus its directionality) is not known a priori, and must be estimated at run-time. The bidi formatter can do this automatically.
3. Escaping When wrapping plain text - i.e. text that is not already HTML or HTML-escaped - in HTML markup, the text must first be HTML-escaped to prevent XSS attacks and other nasty business. This of course is always true, but the escaping can not be done after the string has already been wrapped in markup, so the bidi formatter also serves as a last chance and includes escaping services.
Thus, in a single call, the formatter can escape the input string as specified, determine its directionality, and wrap it as necessary. It is then up to the caller to insert the return value in the output.
| Modifier and Type | Class and Description |
|---|---|
static class |
BidiFormatter.Builder
A class for building a BidiFormatter with non-default options.
|
| Modifier and Type | Method and Description |
|---|---|
String |
dirAttr(String str)
Operates like
dirAttr(String, boolean), but assumes isHtml is false. |
String |
dirAttr(String str,
boolean isHtml)
Returns "dir=\"ltr\"" or "dir=\"rtl\"", depending on
str's estimated directionality,
if it is not the same as the context directionality. |
String |
dirAttrValue(String str,
boolean isHtml)
Returns "rtl" if
str's estimated directionality is RTL, and "ltr" if it is LTR. |
String |
endEdge()
Returns "left" for RTL context directionality.
|
static Dir |
estimateDirection(String str,
boolean isHtml)
Estimates the directionality of a string using the best known general-purpose method, i.e.
|
boolean |
getAlwaysSpan() |
Dir |
getContextDir() |
static BidiFormatter |
getInstance(boolean rtlContext)
Factory for creating an instance of BidiFormatter given the context directionality.
|
static BidiFormatter |
getInstance(Dir contextDir)
Factory for creating an instance of BidiFormatter given the context directionality.
|
static BidiFormatter |
getInstanceWithNoContext()
Factory for creating an instance of BidiFormatter for an unknown directionality context.
|
boolean |
getStereoReset() |
boolean |
isRtlContext() |
String |
knownDirAttr(Dir dir)
Returns "dir=\"ltr\"" or "dir=\"rtl\"", depending on the given directionality, if it is not
NEUTRAL or the same as the context directionality.
|
String |
knownDirAttrValue(Dir dir)
Returns "rtl" if the given directionality is RTL, and "ltr" if it is LTR.
|
String |
mark()
Returns the Unicode bidi mark matching the context directionality (LRM for LTR context
directionality, RLM for RTL context directionality), or the empty string for unknown context
directionality.
|
String |
markAfter(String str)
Operates like
markAfter(String, boolean), but assumes isHtml is false. |
String |
markAfter(String str,
boolean isHtml)
Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the
overall or the exit directionality of a given string is opposite to the context directionality.
|
String |
markAfterKnownDir(Dir dir,
String str)
Operates like
markAfterKnownDir(Dir, String, boolean), but assumes that
isHtml is false. |
String |
markAfterKnownDir(Dir dir,
String str,
boolean isHtml)
Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the
overall or the exit directionality of a given string is opposite to the context directionality.
|
String |
markBefore(String str)
Operates like
markBefore(String, boolean), but assumes isHtml is false. |
String |
markBefore(String str,
boolean isHtml)
Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the
overall or the entry directionality of a given string is opposite to the context
directionality.
|
String |
markBeforeKnownDir(Dir dir,
String str)
Operates like
markBeforeKnownDir(Dir, String, boolean), but assumes that
isHtml is false. |
String |
markBeforeKnownDir(Dir dir,
String str,
boolean isHtml)
Returns a Unicode bidi mark matching the context directionality (LRM or RLM) if either the
overall or the entry directionality of a given string is opposite to the context
directionality.
|
String |
spanWrap(String str)
|
String |
spanWrap(String str,
boolean isHtml)
Operates like
spanWrap(String, boolean, boolean), but assumes isolate is
true. |
String |
spanWrap(String str,
boolean isHtml,
boolean isolate)
Formats a given string of unknown directionality for use in HTML output of the context
directionality, so an opposite-directionality string is neither garbled nor garbles its
surroundings.
|
String |
spanWrapWithKnownDir(Dir dir,
String str)
Operates like
spanWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isHtml is false and isolate is true. |
String |
spanWrapWithKnownDir(Dir dir,
String str,
boolean isHtml)
Operates like
spanWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isolate is true. |
String |
spanWrapWithKnownDir(Dir dir,
String str,
boolean isHtml,
boolean isolate)
Formats a string of given directionality for use in HTML output of the context directionality,
so an opposite-directionality string is neither garbled nor garbles its surroundings.
|
String |
startEdge()
Returns "right" for RTL context directionality.
|
String |
unicodeWrap(String str)
Operates like
unicodeWrap(String, boolean, boolean), but assumes isHtml is
false and isolate is true. |
String |
unicodeWrap(String str,
boolean isHtml)
Operates like
unicodeWrap(String, boolean, boolean), but assumes isolate is
true. |
String |
unicodeWrap(String str,
boolean isHtml,
boolean isolate)
Formats a given string of unknown directionality for use in plain-text output of the context
directionality, so an opposite-directionality string is neither garbled nor garbles its
surroundings.
|
String |
unicodeWrapWithKnownDir(Dir dir,
String str)
Operates like
unicodeWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isHtml is false and isolate is true. |
String |
unicodeWrapWithKnownDir(Dir dir,
String str,
boolean isHtml)
Operates like
unicodeWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isolate is true. |
String |
unicodeWrapWithKnownDir(Dir dir,
String str,
boolean isHtml,
boolean isolate)
Formats a string of given directionality for use in plain-text output of the context
directionality, so an opposite-directionality string is neither garbled nor garbles its
surroundings.
|
public static BidiFormatter getInstance(@Nullable Dir contextDir)
spanWrap(java.lang.String, boolean, boolean) and its variations is set to avoid span wrapping unless
there's a reason ('dir' attribute should be appended).contextDir - The context directionality. Must not be NEUTRAL. It can be (Dir) null to
indicate that the context is unknown, but this is not recommended: the wrapping methods
then wrap text of either directionality, and cannot "reset" the directionality back to the
context.public static BidiFormatter getInstance(boolean rtlContext)
spanWrap(java.lang.String, boolean, boolean) and its variations is set to avoid span wrapping unless
there's a reason ('dir' attribute should be appended).rtlContext - Whether the context directionality is RTLpublic static BidiFormatter getInstanceWithNoContext()
spanWrap(java.lang.String, boolean, boolean) and
its variations is set to avoid span wrapping when it can (which is only for neutral content).public boolean isRtlContext()
public boolean getAlwaysSpan()
spanWrap(java.lang.String, boolean, boolean) and spanWrapWithKnownDir(com.google.template.soy.data.Dir, java.lang.String, boolean, boolean) methods should produce
a stable span structure, i.e. wrap the string in a span even when its directionality does not
need to be declared.public boolean getStereoReset()
public String dirAttrValue(String str, boolean isHtml)
str's estimated directionality is RTL, and "ltr" if it is LTR. In case
it's NEUTRAL, returns "rtl" if the context directionality is RTL, and "ltr" otherwise.
Needed for GXP, which can't handle dirAttr.Example use case:
str - String whose directionality is to be estimatedisHtml - Whether str is HTML / HTML-escapedstr's estimated directionality is RTL, and "ltr" otherwise.public String knownDirAttrValue(Dir dir)
dir - Given directionality. Must not be null.public String dirAttr(String str, boolean isHtml)
str's estimated directionality,
if it is not the same as the context directionality.
Otherwise, returns the empty string.str - String whose directionality is to be estimatedisHtml - Whether str is HTML / HTML-escapedpublic String dirAttr(String str)
dirAttr(String, boolean), but assumes isHtml is false.str - String whose directionality is to be estimatedpublic String knownDirAttr(Dir dir)
dir - Given directionality. Must not be null.public String spanWrap(String str, boolean isHtml, boolean isolate)
The algorithm: estimates the directionality of the given string. In case its directionality doesn't match the context directionality, wraps it with a 'span' element and adds a "dir" attribute (either 'dir=\"rtl\"' or 'dir=\"ltr\"').
If the formatter was built using #alwaysSpan(true), the input is always wrapped
in a span, skipping just the dir attribute when it's not needed.
If isolate, directionally isolates the string so that it does not garble its
surroundings. Currently, this is done by "resetting" the directionality after the string by
appending a trailing Unicode bidi mark matching the context directionality (LRM or RLM) when
either the overall directionality or the exit directionality of the string is opposite to that
of the context. If the formatter was built using #stereoReset(true), also prepends a
Unicode bidi mark matching the context directionality when either the overall directionality
or the entry directionality of the string is opposite to that of the context.
If !isHtml, HTML-escapes the string regardless of wrapping.
str - The input stringisHtml - Whether str is HTML / HTML-escapedisolate - Whether to directionally isolate the string to prevent it from garbling the
content around itpublic String spanWrap(String str, boolean isHtml)
spanWrap(String, boolean, boolean), but assumes isolate is
true.str - The input stringisHtml - Whether str is HTML / HTML-escapedpublic String spanWrap(String str)
str - The input stringpublic String spanWrapWithKnownDir(@Nullable Dir dir, String str, boolean isHtml, boolean isolate)
The algorithm: In case the given directionality doesn't match the context directionality, wraps the string with a 'span' element and adds a 'dir' attribute (either 'dir=\"rtl\"' or 'dir=\"ltr\"').
If the formatter was built using #alwaysSpan(true), the input is always wrapped
in a span, skipping just the dir attribute when it's not needed.
If isolate, directionally isolates the string so that it does not garble its
surroundings. Currently, this is done by "resetting" the directionality after the string by
appending a trailing Unicode bidi mark matching the context directionality (LRM or RLM) when
either the overall directionality or the exit directionality of the string is opposite to that
of the context. If the formatter was built using #stereoReset(true), also prepends a
Unicode bidi mark matching the context directionality when either the overall directionality
or the entry directionality of the string is opposite to that of the context. Note that as
opposed to the overall directionality, the entry and exit directionalities are determined from
the string itself.
If !isHtml, HTML-escapes the string regardless of wrapping.
dir - str's directionality. If null, i.e. unknown, it is estimated.str - The input stringisHtml - Whether str is HTML / HTML-escapedisolate - Whether to directionally isolate the string to prevent it from garbling the
content around itpublic String spanWrapWithKnownDir(@Nullable Dir dir, String str, boolean isHtml)
spanWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isolate is true.dir - str's directionalitystr - The input stringisHtml - Whether str is HTML / HTML-escapedpublic String spanWrapWithKnownDir(@Nullable Dir dir, String str)
spanWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isHtml is false and isolate is true.dir - str's directionalitystr - The input stringpublic String unicodeWrap(String str, boolean isHtml, boolean isolate)
spanWrap(java.lang.String, boolean, boolean), this makes use of Unicode bidi formatting
characters. In HTML, its *only* valid use is inside elements within which markup is not
allowed, e.g. the 'option' and 'title' elements.
The algorithm: estimates the directionality of the given string. In case it doesn't match the
context directionality, wraps it with Unicode bidi formatting characters:
RLE+str+PDF for RTL text, or LRE+str+PDF for LTR text.
If isolate, directionally isolates the string so that it does not garble its
surroundings. Currently, this is done by "resetting" the directionality after the string by
appending a trailing Unicode bidi mark matching the context directionality (LRM or RLM) when
either the overall directionality or the exit directionality of the string is opposite to that
of the context. If the formatter was built using #stereoReset(true), also prepends a
Unicode bidi mark matching the context directionality when either the overall directionality
or the entry directionality of the string is opposite to that of the context.
Does *not* do HTML-escaping regardless of the value of isHtml.
str - The input stringisHtml - Whether str is HTML / HTML-escapedisolate - Whether to directionally isolate the string to prevent it from garbling the
content around itpublic String unicodeWrap(String str, boolean isHtml)
unicodeWrap(String, boolean, boolean), but assumes isolate is
true.str - The input stringisHtml - Whether str is HTML / HTML-escapedpublic String unicodeWrap(String str)
unicodeWrap(String, boolean, boolean), but assumes isHtml is
false and isolate is true.str - The input stringpublic String unicodeWrapWithKnownDir(@Nullable Dir dir, String str, boolean isHtml, boolean isolate)
spanWrapWithKnownDir(com.google.template.soy.data.Dir, java.lang.String, boolean, boolean), this makes use of Unicode bidi
formatting characters. In HTML, its *only* valid use is inside of elements that do not allow
markup, e.g. the 'option' and 'title' elements.
The algorithm: In case the given directionality doesn't match the context directionality, wraps
the string with Unicode bidi formatting characters: RLE+str+PDF for RTL text, or
LRE+str+PDF for LTR text.
If isolate, directionally isolates the string so that it does not garble its
surroundings. Currently, this is done by "resetting" the directionality after the string by
appending a trailing Unicode bidi mark matching the context directionality (LRM or RLM) when
either the overall directionality or the exit directionality of the string is opposite to that
of the context. If the formatter was built using #stereoReset(true), also prepends a
Unicode bidi mark matching the context directionality when either the overall directionality
or the entry directionality of the string is opposite to that of the context. Note that as
opposed to the overall directionality, the entry and exit directionalities are determined from
the string itself.
Does *not* do HTML-escaping regardless of the value of isHtml.
dir - str's directionality. If null, i.e. unknown, it is estimated.str - The input stringisHtml - Whether str is HTML / HTML-escapedisolate - Whether to directionally isolate the string to prevent it from garbling the
content around itpublic String unicodeWrapWithKnownDir(@Nullable Dir dir, String str, boolean isHtml)
unicodeWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isolate is true.dir - str's directionalitystr - The input stringisHtml - Whether str is HTML / HTML-escapedpublic String unicodeWrapWithKnownDir(@Nullable Dir dir, String str)
unicodeWrapWithKnownDir(Dir, String, boolean, boolean), but
assumes isHtml is false and isolate is true.dir - str's directionalitystr - The input stringpublic String markAfter(String str, boolean isHtml)
str - String after which the mark may need to appearisHtml - Whether str is HTML / HTML-escapedpublic String markAfter(String str)
markAfter(String, boolean), but assumes isHtml is false.str - String after which the mark may need to appearpublic String markAfterKnownDir(@Nullable Dir dir, String str, boolean isHtml)
dir.str - String after which the mark may need to appeardir - str's overall directionality. If null, i.e. unknown, it is estimated.isHtml - Whether str is HTML / HTML-escapedpublic String markAfterKnownDir(@Nullable Dir dir, String str)
markAfterKnownDir(Dir, String, boolean), but assumes that
isHtml is false.str - The input stringdir - str's overall directionalitypublic String markBefore(String str, boolean isHtml)
str - String before which the mark may need to appearisHtml - Whether str is HTML / HTML-escapedpublic String markBefore(String str)
markBefore(String, boolean), but assumes isHtml is false.str - String before which the mark may need to appearpublic String markBeforeKnownDir(@Nullable Dir dir, String str, boolean isHtml)
dir.str - String before which the mark may need to appeardir - str's overall directionality. If null, i.e. unknown, it is estimated.isHtml - Whether str is HTML / HTML-escapedpublic String markBeforeKnownDir(@Nullable Dir dir, String str)
markBeforeKnownDir(Dir, String, boolean), but assumes that
isHtml is false.str - String before which the mark may need to appeardir - str's overall directionalitypublic String mark()
public String startEdge()
public String endEdge()
public static Dir estimateDirection(String str, boolean isHtml)
str - String whose directionality is to be estimatedisHtml - Whether str is HTML / HTML-escapedstr's estimated overall directionality