public class BidiUtils extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
BidiUtils.Format
A container class for Unicode formatting characters and for directionality
string constants.
|
| Modifier and Type | Field and Description |
|---|---|
static String |
LEFT
"left" string constant.
|
static String |
RIGHT
"right" string constant.
|
| Modifier and Type | Method and Description |
|---|---|
static Dir |
estimateDirection(String str)
Estimates the directionality of a string based on relative word counts, as detailed below.
|
static Dir |
estimateDirection(String str,
boolean isHtml)
Like
estimateDirection(String), but can treat str as HTML,
ignoring HTML tags and escapes that would otherwise be mistaken for LTR text. |
static Dir |
getEntryDir(String str)
Like
getEntryDir(String, boolean), but assumes str is not HTML or
HTML-escaped. |
static Dir |
getEntryDir(String str,
boolean isHtml)
Returns the directionality of the first character with strong directionality in the string,
or Dir.NEUTRAL if none was encountered.
|
static Dir |
getExitDir(String str)
Like
getExitDir(String, boolean), but assumes str is not HTML or
HTML-escaped. |
static Dir |
getExitDir(String str,
boolean isHtml)
Returns the directionality of the last character with strong directionality in the string, or
Dir.NEUTRAL if none was encountered.
|
static Dir |
getUnicodeDir(String str)
Like
getUnicodeDir(String, boolean), but assumes str is not HTML or
HTML-escaped. |
static Dir |
getUnicodeDir(String str,
boolean isHtml)
Returns the directionality of a string as defined by the UBA's rules P2 and P3, i.e.
|
static boolean |
hasAnyLtr(String str)
Like
hasAnyLtr(String, boolean), but assumes
str is not HTML / HTML-escaped. |
static boolean |
hasAnyLtr(String str,
boolean isHtml)
Checks if the given string has any LTR characters in it.
|
static boolean |
hasAnyRtl(String str)
Like
hasAnyRtl(String, boolean), but assumes
str is not HTML / HTML-escaped. |
static boolean |
hasAnyRtl(String str,
boolean isHtml)
Checks if the given string has any RTL characters in it.
|
static boolean |
isRtlLanguage(String locale)
Returns whether a locale, given as a string in the ICU syntax, is RTL.
|
static boolean |
isRtlLanguage(com.ibm.icu.util.ULocale locale)
Returns whether a locale is RTL.
|
static Dir |
languageDir(String locale)
Returns the directionality of a locale, given as a string in the ICU syntax.
|
static Dir |
languageDir(com.ibm.icu.util.ULocale locale)
Returns the directionality of a locale.
|
public static final String RIGHT
public static final String LEFT
public static Dir languageDir(com.ibm.icu.util.ULocale locale)
public static Dir languageDir(String locale)
public static boolean isRtlLanguage(com.ibm.icu.util.ULocale locale)
public static boolean isRtlLanguage(String locale)
public static boolean hasAnyLtr(String str, boolean isHtml)
str - the string to be testedisHtml - whether str is HTML / HTML-escapedpublic static boolean hasAnyLtr(String str)
hasAnyLtr(String, boolean), but assumes
str is not HTML / HTML-escaped.str - the string to be testedpublic static boolean hasAnyRtl(String str, boolean isHtml)
str - the string to be testedisHtml - whether str is HTML / HTML-escapedpublic static boolean hasAnyRtl(String str)
hasAnyRtl(String, boolean), but assumes
str is not HTML / HTML-escaped.str - the string to be testedpublic static Dir getUnicodeDir(String str, boolean isHtml)
str - the string to checkisHtml - whether str is HTML / HTML-escapedpublic static Dir getUnicodeDir(String str)
getUnicodeDir(String, boolean), but assumes str is not HTML or
HTML-escaped.public static Dir getEntryDir(String str, boolean isHtml)
str - the string to checkisHtml - whether str is HTML / HTML-escapedpublic static Dir getEntryDir(String str)
getEntryDir(String, boolean), but assumes str is not HTML or
HTML-escaped.public static Dir getExitDir(String str, boolean isHtml)
str - the string to checkisHtml - whether str is HTML / HTML-escapedpublic static Dir getExitDir(String str)
getExitDir(String, boolean), but assumes str is not HTML or
HTML-escaped.public static Dir estimateDirection(String str)
The parts of the text embedded between LRE/RLE and the matching PDF are ignored, since the directionality in which the string as a whole is displayed will not affect their display anyway, and we want to base it on the remainder.
The parts of the text embedded between LRO/RLO and the matching PDF are considered LTR/RTL "words". This is primarily in order to treat "fake bidi" pseudolocalized text as RTL.
The remaining parts of the text are divided into "words" on whitespace and, inside numbers, on neutral characters that break the LTR flow around them when used inside a number in an RTL context. (This is most of them, the primary exceptions being period, comma, NBSP and colon, i.e. bidi class CS not including slash, which a long-standing Microsoft bug treats as ES)).
Each word is assigned a type - LTR, RTL, URL, signed "European" number, unsigned "European" number, negative "Arabic" number, "Arabic" number with leading plus sign, and unsigned "Arabic" number - as follows:
- Words that start with "http[s]://" (possibly preceded by some neutrals) are URLs.
- Of the remaining words, those that contain any strongly directional characters are classified as LTR or RTL based on their first strongly directional character.
- Of the remaining words, those that contain any digits are classified as an "European" or "Arabic" number based on the type of its first digit, and signed or unsigned depending on whether the first digit was immediately preceded by a plus or minus sign (bidi class ES).
- The remaining words are classified as "neutral" and ignored.
Once the words of each type have been counted, the directionality is decided as follows:
If the number of RTL words exceeds 40% of the total of LTR and RTL words, return Dir.RTL. The threshold favors RTL because LTR words and phrases are used in RTL sentences more commonly than RTL in LTR.
Otherwise, if there are any LTR words, return Dir.LTR.
Otherwise (i.e. if there are no LTR or RTL words), if there are any URLs, or any signed "European" numbers, or an "Arabic" number with a leading plus sign, or more than one unsigned "European" number, return Dir.LTR. This ensures that the text is displayed LTR even in an RTL context, where things like "http://www.google.com/", "-5", "+١٢٣٤٢٣٤٦٧٨٩" (assuming it is intended as an international phone number, not an explicitly signed positive number, which is a very rare use case), "3 - 2 = 1", "(03) 123 4567", and, when preceded by an Arabic letter, even "123-4567" and "400×300" are displayed incorrectly. (Most neutrals, including those in the last two examples, are treated as ending a number in order to treat such expressions as containing more than one "European" number, and thus to force their display in LTR.) Considering a string containing more than "European" number to be LTR also makes sense because math expressions in "European" digits need to be displayed LTR even in RTL languages. However, that probably isn't a very important consideration, since math expressions would usually also contain strongly LTR or RTL variable names that should set the overall directionality. Ranges like "$1 - $5" *are* an important consideration, but their preferred direction unfortunately varies among the RTL languages. Since LTR is preferred for ranges in Persian and Urdu, and is the more widespread usage in Hebrew, it seems like an OK choice. Please note that native Persian digits are included in the "European" class because the unary minus is preferred on the left in Persian, and Persian math is written LTR.
Otherwise, if there are any negative "Arabic" numbers, return Dir.RTL. This is because the unary minus is supposed to be displayed to the right of a number written in "Arabic" digits.
Otherwise, return Dir.NEUTRAL. This includes the common case of a single unsigned number, which will display correctly in either "European" or "Arabic" digits in either directionality, so it is best not to force it to either. It also includes an otherwise neutral string containing two or more "Arabic" numbers. We do *not* consider it to be RTL because it is unclear that it is important to display "Arabic"-digit math and ranges in RTL even in an LTR context, and because we have no idea how to handle phone numbers spelled (or, more likely, misspelled) in "Arabic" digits with non-CS separators. But it is quite clear that we do not want to force it to LTR.
str - the string to checkpublic static Dir estimateDirection(String str, boolean isHtml)
estimateDirection(String), but can treat str as HTML,
ignoring HTML tags and escapes that would otherwise be mistaken for LTR text.str - the string to checkisHtml - whether str is HTML / HTML-escaped