Use variations of substring()
from StringUtils. This
next example parses a string that contains five numbers delimited by
parentheses, brackets, and a pipe symbol (N0
* (N1,N2)
[N3,N4] |
N5):
String formatted = " 25 * (30,40) [50,60] | 30"
PrintWriter out = System.out;
out.print("N0: " + StringUtils.substringBeforeLast( formatted, "*" ) );
out.print(", N1: " + StringUtils.substringBetween( formatted, "(", "," ) );
out.print(", N2: " + StringUtils.substringBetween( formatted, ",", ")" ) );
out.print(", N3: " + StringUtils.substringBetween( formatted, "[", "," ) );
out.print(", N4: " + StringUtils.substringBetween( formatted, ",", "]" ) );
out.print(", N5: " + StringUtils.substringAfterLast( formatted, "|" ) );This parses the formatted text and prints the following output:
N0: 25, N1: 30, N2: 40, N3: 50, N4: 60, N5: 30
The following public static methods come in handy when trying to extract information from a formatted string:
StringUtils.substringBetween(
)Captures content between two strings
StringUtils.substringAfter(
)Captures content that occurs after the specified string
StringUtils.substringBefore(
)Captures content that occurs before a specified string
StringUtils.substringBeforeLast(
)Captures content after the last occurrence of a specified string
StringUtils.substringAfterLast(
)Captures content before the last occurrence of a specified string
To illustrate the use of these methods, here is an example of a feed of sports scores. Each record in the feed has a defined format, which resembles this feed description:
\(SOT)<sport>[<team1>,<team2>] (<score1>,<score2>)\(ETX) Notes: \(SOT) is ASCII character 2 "Start of Text", \(ETX) is ASCII character 4 "End of Transmission". Example: \(SOT)Baseball[BOS,SEA] (24,22)\(ETX) \(SOT)Basketball[CHI,NYC] (29,5)\(ETX)
The following example parses this feed using StringUtils methods trim( ), substringBetween( ), and substringBefore( ). The boxScore variable holds a test string to
parse, and, once parsed, this code prints out the game score:
// Create a formatted string to parse - get this from a feed
char SOT = '\u0002';
char ETX = '\u0004';
String boxScore = SOT + "Basketball[CHI,BOS](69,75)\r\n" + ETX;
// Get rid of the archaic control characters
boxScore = StringUtils.trim( boxScore );
// Parse the score into component parts
String sport = StringUtils.substringBefore( boxScore, "[" );
String team1 = StringUtils.substringBetween( boxScore, "[", "," );
String team2 = StringUtils.substringBetween( boxScore, ",", "]" );
String score1 = StringUtils.substringBetween( boxScore, "(", "," );
String score2 = StringUtils.substringBetween( boxScore, ",", ")" );
PrintWriter out = System.out
out.println( "**** " + sport + " Score" );
out.println( "\t" + team1 + "\t" + score1 );
out.println( "\t" + team2 + "\t" + score2 );This code parses a score, and prints the following output:
**** Basketball CHI 69 BOS 75
In the previous example, StringUtils.trim( ) rids the text of the
SOT and ETX control characters. StringUtils.substringBefore( ) then reads the
sport name—"Basketball"—and substringBetween(
) is used to retrieve the teams and scores.
At first glance, the value of these substring( ) variations is not obvious. The
previous example parsed this simple formatted string using three static
methods on StringUtils, but how
difficult would it have been to implement this parsing without the aid
of Commons Lang? The following example parses the same string using only
methods available in the Java 1.4 J2SE:
// Find the sport name without using StringUtils
boxScore = boxScore.trim( );
int firstBracket = boxScore.indexOf( "[" );
String sport = boxScore.substring( 0, firstBracket );
int firstComma = boxScore.indexOf( "," );
String team1 = boxScore.substring( firstBracket + 1, firstComma );
int secondBracket = boxScore.indexOf( "]" );
String team2 = boxScore.substring( firstComma + 1, secondBracket );
int firstParen = boxScore.indexOf( "(" );
int secondComma = boxScore.indexOf( ",", firstParen );
String score1 = boxScore.substring( firstParen + 1, secondComma );
int secondParen = boxScore.indexOf( ")" );
String score2 = boxScore.substring( secondComma + 1, secondParen );This parses the string in a similar number of lines, but the code
is less straightforward and much more difficult to maintain. Instead of
simply calling a substringBetween( )
method, the previous example calls String.indexOf( ) and performs arithmetic with
an index while calling String.substring(
). Additionally, the substring(
) methods on StringUtils
are null-safe; the Java 1.4 example
could throw a NullPointerException if
boxScore was null.
String.trim( ) has the same
behavior as StringUtils.trim( ),
stripping the string of all whitespace and ASCII control characters.
StringUtils.trim() is simply a
wrapper for the String.trim( )
method, but the StringUtils.trim( )
method can gracefully handle a null
input. If a null value is passed to
StringUtils.trim(), a null value is returned.
