Class XMLString
- java.lang.Object
-
- org.htmlunit.cyberneko.xerces.xni.XMLString
-
- All Implemented Interfaces:
CharSequence
public class XMLString extends Object implements CharSequence
This class is meant to replace the old
XMLStringin all areas where performance and memory-efficiency is key. XMLString compatibility remains in place in case one has used that in their own code.This buffer is mutable and when you use it, make sure you work with it responsibly. In many cases, we will reuse the buffer to avoid fresh memory allocations, hence you have to pay attention to its usage pattern. It is not meant to be a general String replacement.
This class avoids many of the standard runtime checks that will result in a runtime or array exception anyway. Why check twice and raise the same exception?
- Since:
- 3.10.0
- Author:
- René Schwietzke
-
-
Field Summary
Fields Modifier and Type Field Description static intCAPACITY_GROWTHstatic XMLStringEMPTYstatic intINITIAL_CAPACITY
-
Constructor Summary
Constructors Constructor Description XMLString()Constructs an XMLCharBuffer with a default size.XMLString(char[] ch, int offset, int length)Constructs an XMLString structure preset with the specified values.XMLString(int startSize)Constructs an XMLCharBuffer with a desired size.XMLString(int startSize, int growBy)Constructs an XMLCharBuffer with a desired size.XMLString(String src)Constructs an XMLCharBuffer from a string.XMLString(XMLString src)Constructs an XMLCharBuffer from another buffer.XMLString(XMLString src, int addCapacity)Constructs an XMLCharBuffer from another buffer.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description XMLStringappend(char c)Appends a single character to the buffer.XMLStringappend(char[] src, int offset, int length)Add data from a char array to this buffer with the ability to specify a range to copy fromXMLStringappend(char c1, char c2)Append two characters at once, mainly to make a codePoint add more efficientXMLStringappend(String src)Append a string to this buffer without copying the string first.XMLStringappend(XMLString src)Add another buffer to this one.booleanappendCodePoint(int codePoint)Append a character to an XMLCharBuffer.intcapacity()Returns the current max capacity without growth.voidcharacters(ContentHandler contentHandler)charcharAt(int index)Returns the char a the given position.XMLStringclear()Resets the buffer to 0 length.XMLStringclearAndAppend(char c)Resets the buffer to 0 length and sets the new data.XMLStringclone()Returns a content copy of this buffervoidcomment(LexicalHandler lexicalHandler)booleancontains(XMLString s)See if this string contains the otherbooleanendsWith(String s)Does this buffer end with this string?static booleanequals(CharSequence sequence, XMLString s)Compares a CharSequence with an XMLString in a null-safe manner.booleanequals(Object o)Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.booleanequalsIgnoreCase(CharSequence s)Compares this with a CharSequence in a case-insensitive manner.static booleanequalsIgnoreCase(CharSequence sequence, XMLString s)Compares a CharSequence with an XMLString in a null-safe manner.char[]getChars()Get the characters as char array, this will be a copy!intgetGrowBy()Tell us how much the capacity grows if neededinthashCode()We don't cache the hashcode because we mutate often.voidignorableWhitespace(ContentHandler contentHandler)intindexOf(char c)Find the first occurrence of a charintindexOf(XMLString s)Search for the first occurrence of another buffer in this bufferbooleanisWhitespace()Check if we have only whitespacesintlength()Returns the current lengthXMLStringprepend(char c)Inserts a character at the beginningXMLStringreduceToContent(String startMarker, String endMarker)Deprecated.Use the new methodtrimToContent(String, String)instead.XMLStringshortenBy(int count)Shortens the buffer by that many positions.CharSequencesubSequence(int start, int end)Returns aCharSequencethat is a subsequence of this sequence.XMLStringtoLowerCase(Locale locale)This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow.StringtoString()Returns a string representation of this buffer.StringtoString(FastHashMap<XMLString,String> cache)Returns a string representation of this buffer using a cache as source to avoid duplicates.static StringtoString(XMLString seq)Returns a string representation of a buffer.static StringtoString(XMLString seq, FastHashMap<XMLString,String> cache)Returns a string representation of the buffer using a cache as source to avoid duplicates.XMLStringtoUpperCase(Locale locale)This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow.XMLStringtrim()Trims the string similar toString.trim()XMLStringtrimLeading()Removes all whitespace before the first non-whitespace char.XMLStringtrimToContent(String startMarker, String endMarker)Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker.XMLStringtrimTrailing()Removes all whitespace at the end.XMLStringtrimWhitespaceAtEnd()Deprecated.UsetrimTrailing()instead.charunsafeCharAt(int index)Returns the char at the given position.-
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.CharSequence
chars, codePoints
-
-
-
-
Field Detail
-
CAPACITY_GROWTH
public static final int CAPACITY_GROWTH
- See Also:
- Constant Field Values
-
INITIAL_CAPACITY
public static final int INITIAL_CAPACITY
- See Also:
- Constant Field Values
-
EMPTY
public static final XMLString EMPTY
-
-
Constructor Detail
-
XMLString
public XMLString()
Constructs an XMLCharBuffer with a default size.
-
XMLString
public XMLString(int startSize)
Constructs an XMLCharBuffer with a desired size.- Parameters:
startSize- the size of the buffer to start with
-
XMLString
public XMLString(int startSize, int growBy)Constructs an XMLCharBuffer with a desired size.- Parameters:
startSize- the size of the buffer to start withgrowBy- by how much do we want to grow when needed
-
XMLString
public XMLString(XMLString src)
Constructs an XMLCharBuffer from another buffer. Copies the data over. The new buffer capacity matches the length of the source.- Parameters:
src- the source buffer to copy from
-
XMLString
public XMLString(XMLString src, int addCapacity)
Constructs an XMLCharBuffer from another buffer. Copies the data over. You can add more capacity on top of the source length. If you specify 0, the capacity will match the src length.- Parameters:
src- the source buffer to copy fromaddCapacity- how much capacity to add to origin length
-
XMLString
public XMLString(String src)
Constructs an XMLCharBuffer from a string. To avoid too much allocation, we just take the string array as is and don't allocate extra space in the first place.- Parameters:
src- the string to copy from
-
XMLString
public XMLString(char[] ch, int offset, int length)Constructs an XMLString structure preset with the specified values. There will not be any room to grow, if you need that, construct an empty one and append.There are not range checks performed. Make sure your data is correct.
- Parameters:
ch- The character array, must not be nulloffset- The offset into the character array.length- The length of characters from the offset.
-
-
Method Detail
-
capacity
public int capacity()
Returns the current max capacity without growth. Does not indicate how much capacity is already in use. Uselength()for that.- Returns:
- the current capacity, not taken any usage into account
-
append
public XMLString append(char c)
Appends a single character to the buffer.- Parameters:
c- the character to append- Returns:
- this instance
-
append
public XMLString append(char c1, char c2)
Append two characters at once, mainly to make a codePoint add more efficient- Parameters:
c1- the first character to appendc2- the second character to append- Returns:
- this instance
-
append
public XMLString append(String src)
Append a string to this buffer without copying the string first.- Parameters:
src- the string to append- Returns:
- this instance
-
append
public XMLString append(XMLString src)
Add another buffer to this one.- Parameters:
src- the buffer to append- Returns:
- this instance
-
append
public XMLString append(char[] src, int offset, int length)
Add data from a char array to this buffer with the ability to specify a range to copy from- Parameters:
src- the source char arrayoffset- the pos to start to copy fromlength- the length of the data to copy- Returns:
- this instance
-
prepend
public XMLString prepend(char c)
Inserts a character at the beginning- Parameters:
c- the char to insert at the beginning- Returns:
- this instance
-
length
public int length()
Returns the current length- Specified by:
lengthin interfaceCharSequence- Returns:
- the length of the charbuffer data
-
getGrowBy
public int getGrowBy()
Tell us how much the capacity grows if needed- Returns:
- the value that determines how much we grow the backing array in case we have to
-
clear
public XMLString clear()
Resets the buffer to 0 length. It won't resize it to avoid memory churn.- Returns:
- this instance for fluid programming
-
clearAndAppend
public XMLString clearAndAppend(char c)
Resets the buffer to 0 length and sets the new data. This is a little cheaper than clear().append(c) depending on the where and the inlining decisions.- Parameters:
c- the char to set- Returns:
- this instance for fluid programming
-
endsWith
public boolean endsWith(String s)
Does this buffer end with this string? If we check for the empty string, we get true. If we would support JDK 11, we could use Arrays.mismatch and be way faster.- Parameters:
s- the string to check the end against- Returns:
- true of the end matches the buffer, false otherwise
-
reduceToContent
public XMLString reduceToContent(String startMarker, String endMarker)
Deprecated.Use the new methodtrimToContent(String, String)instead.Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.If a marker is empty, it behaves like
String.trim()on that side.- Parameters:
startMarker- the start string to find, must not be nullendMarker- the end string to find, must not be null- Returns:
- this instance
-
trimToContent
public XMLString trimToContent(String startMarker, String endMarker)
Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.If a marker is empty, it behaves like
String.trim()on that side.- Parameters:
startMarker- the start string to find, must not be nullendMarker- the end string to find, must not be null- Returns:
- this instance
-
isWhitespace
public boolean isWhitespace()
Check if we have only whitespaces- Returns:
- true if we have only whitespace, false otherwise
-
trim
public XMLString trim()
Trims the string similar toString.trim()- Returns:
- a string with removed whitespace at the beginning and the end
-
trimLeading
public XMLString trimLeading()
Removes all whitespace before the first non-whitespace char. If all are whitespaces, we get an empty buffer- Returns:
- this instance
-
trimWhitespaceAtEnd
public XMLString trimWhitespaceAtEnd()
Deprecated.UsetrimTrailing()instead.Removes all whitespace at the end. If all are whitespace, we get an empty buffer- Returns:
- this instance
-
trimTrailing
public XMLString trimTrailing()
Removes all whitespace at the end. If all are whitespace, we get an empty buffer- Returns:
- this instance
-
shortenBy
public XMLString shortenBy(int count)
Shortens the buffer by that many positions. If the count is larger than the length, we get just an empty buffer. If you pass in negative values, we are failing, likely often silently. It is all about performance and not a general all-purpose API.- Parameters:
count- a positive number, no runtime checks, if count is larger than length, we get length = 0- Returns:
- this instance
-
getChars
public char[] getChars()
Get the characters as char array, this will be a copy!- Returns:
- a copy of the underlying char data
-
toString
public String toString()
Returns a string representation of this buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead.- Specified by:
toStringin interfaceCharSequence- Overrides:
toStringin classObject- Returns:
- a string of the content of this buffer
-
toString
public static String toString(XMLString seq)
Returns a string representation of a buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead. Method exists to deliver null-safety.- Returns:
- a string of the content of this buffer
-
toString
public String toString(FastHashMap<XMLString,String> cache)
Returns a string representation of this buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.
- Parameters:
cache- the cache to be used- Returns:
- a string of the content of this buffer, preferably taken from the cache
-
toString
public static String toString(XMLString seq, FastHashMap<XMLString,String> cache)
Returns a string representation of the buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.
- Parameters:
seq- the XMLString to convertcache- the cache to be used- Returns:
- a string of the content of this buffer, preferably taken from the cache, null if seq was null
-
charAt
public char charAt(int index)
Returns the char a the given position. Will complain if we try to read outside the range. We do a range check here because we might not notice when we are within the buffer but outside the current length.- Specified by:
charAtin interfaceCharSequence- Parameters:
index- the position to read from- Returns:
- the char at the position
- Throws:
IndexOutOfBoundsException- in case one tries to read outside of valid buffer range
-
unsafeCharAt
public char unsafeCharAt(int index)
Returns the char at the given position. No checks are performed. It is up to the caller to make sure we read correctly. Reading outside of the array will cause anIndexOutOfBoundsExceptionbut using an incorrect position in the array (such as beyond length) might stay unnoticed! This is a performance method, use at your own risk.- Parameters:
index- the position to read from- Returns:
- the char at the position
-
clone
public XMLString clone()
Returns a content copy of this buffer
-
subSequence
public CharSequence subSequence(int start, int end)
Returns aCharSequencethat is a subsequence of this sequence. The subsequence starts with thecharvalue at the specified index and ends with thecharvalue at index end - 1. The length (inchars) of the returned sequence is end - start, so if start == end then an empty sequence is returned.- Specified by:
subSequencein interfaceCharSequence- Parameters:
start- the start index, inclusiveend- the end index, exclusive- Returns:
- the specified subsequence
- Throws:
IndexOutOfBoundsException- if start or end are negative, if end is greater than length(), or if start is greater than end
-
equals
public boolean equals(Object o)
Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.
-
equals
public static boolean equals(CharSequence sequence, XMLString s)
Compares a CharSequence with an XMLString in a null-safe manner. For more, seeequals(Object). The XMLString can be null, but the CharSequence must not be null. This mimics the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.- Parameters:
sequence- the sequence to compare to, null is permitteds- the XMLString to use for comparison- Returns:
- true if the sequence matches case-insensitive, false otherwise
-
hashCode
public int hashCode()
We don't cache the hashcode because we mutate often. Don't use this in hashmaps as key. But you can use that to look up in a hashmap against a string using the CharSequence interface.
-
appendCodePoint
public boolean appendCodePoint(int codePoint)
Append a character to an XMLCharBuffer. The character is an int value, and can either be a single UTF-16 character or a supplementary character represented by two UTF-16 code points.- Parameters:
codePoint- The character value.- Returns:
- this instance for fluid programming
- Throws:
IllegalArgumentException- if the specifiedcodePointis not a valid Unicode code point.
-
toUpperCase
public XMLString toUpperCase(Locale locale)
This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. SeeString.toUpperCase().We cannot correctly deal with ß for instance.
Note: We change the current XMLString and don't get a copy back but this instance.
- Parameters:
locale- the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing toString.toLowerCase(Locale)- Returns:
- this updated instance
-
toLowerCase
public XMLString toLowerCase(Locale locale)
This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. SeeString.toUpperCase().Note: We change the current XMLString and don't get a copy back but this instance.
- Parameters:
locale- the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing toString.toLowerCase(Locale)- Returns:
- this updated instance
-
equalsIgnoreCase
public static boolean equalsIgnoreCase(CharSequence sequence, XMLString s)
Compares a CharSequence with an XMLString in a null-safe manner. For more, seeequalsIgnoreCase(CharSequence). The XMLString can be null, but the CharSequence must not be null. This mimic the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.- Parameters:
sequence- the sequence to compare to, null is permitteds- the XMLString to use for comparison- Returns:
- true if the sequence matches case-insensitive, false otherwise
-
equalsIgnoreCase
public boolean equalsIgnoreCase(CharSequence s)
Compares this with a CharSequence in a case-insensitive manner.This code might have subtle edge-case defects for some rare locales and related characters. See
String.toLowerCase(Locale). The locales tr, at, lt and the extra letters GREEK CAPITAL LETTER SIGMA and LATIN CAPITAL LETTER I WITH DOT ABOVE are our challengers. If the input would match withequals(Object), everything is fine, just in case we have to check for a casing difference, we might see a problem.But this is for XML/HTML characters and we know what we compare, hence this should not be any issue for us.
- Parameters:
s- the sequence to compare to, null is permitted- Returns:
- true if the sequences match case-insensitive, false otherwise
-
indexOf
public int indexOf(char c)
Find the first occurrence of a char- Parameters:
c- the char to search- Returns:
- the position or -1 otherwise
-
indexOf
public int indexOf(XMLString s)
Search for the first occurrence of another buffer in this buffer- Parameters:
s- the buffer to be search for- Returns:
- the first found position or -1 if not found
-
contains
public boolean contains(XMLString s)
See if this string contains the other- Parameters:
s- the XMLString to search and match- Returns:
- true if s is in this string or false otherwise
-
characters
public void characters(ContentHandler contentHandler) throws SAXException
- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(ContentHandler contentHandler) throws SAXException
- Throws:
SAXException
-
comment
public void comment(LexicalHandler lexicalHandler) throws SAXException
- Throws:
SAXException
-
-