public final class EmailAddressValidator extends Object
For real-world addresses, this class is roughly 3-4 times slower than parsing with InternetAddress (although recent versions of this class might be faster), but it can handle a whole lot more. Because of sensible design tradeoffs made in javamail, if InternetAddress has trouble parsing, it might throw an exception, but often it will silently leave the entire original string in the result of ia.getAddress(). This class can be trusted to only provide authenticated results.
This class has been successfully used on many billion real-world addresses, live in production environments, but it's not perfect yet.
Comments/Questions/Corrections welcome: https://github.com/bbottema/email-rfc2822-validator/issues
History:
Started with code by Les Hazlewood: leshazlewood.com.
Modified/added (Casey Connor): removed some functions, added support for CFWS token, corrected FWSP token, added some boolean flags, added getInternetAddress and extractHeaderAddresses and other methods, some optimization.
Modified/added (Benny Bottema): modularized the code and separated configuration, validation and extraction functions.
Where Mr. Hazlewood's version was more for ensuring certain forms that were passed in during registrations, etc, this handles more types of verifying as well a few forms of extracting the data in predictable, cleaned-up chunks.
Note: CFWS means the "comment folded whitespace" token from 2822, in other words, whitespace and comment text that is enclosed in ()'s.
Limitations: doesn't support nested CFWS (comments within (other) comments), doesn't support mailbox groups except when flat-extracting addresses from headers or when doing verification, doesn't support any of the obs-* tokens. Also: the getInternetAddress and extractHeaderAddresses methods return InternetAddress objects; if the personal name has any quotes or \'s in it at all, the InternetAddress object will always escape the name entirely and put it in quotes, so multiple-token personal names with those characters somewhere in them will always be munged into one big escaped string. This is not really a big deal at all, but I mention it anyway. (And you could get around it by a simple modification to those methods to not use InternetAddress objects.) See the docs of those methods for more info.
Note: Unlike InternetAddress, this class will preserve any RFC-2047-encoding of international characters. Thus doing my_internetaddress.getPersonal() will return the 2047-encoded string, ready for use in an RFC-822-compliant message, whereas the common InternetAddress constructor (when used outside the context of EmailAddressValidator) would return the decoded version of the text, if any was needed. If you need the decoded form, you can do something like this (where ia is the InternetAddress object returned from an EmailAddressValidator method):
ia.setPersonal(javax.mail.internet.MimeUtility.decodeText(ia.getPersonal()));
...subsequent calls to ia.getPersonal() will then return the decoded text.
Note: This class does not do any header-length-checking. There are no such limitations on the email address grammar in 2822, though email headers in general do have length restrictions. So if the return path is 40000 unfolded characters long, but otherwise valid under 2822, this class will pass it.
Examples of passing (2822-valid) addresses, believe it or not:
bob @example.com
"bob" @ example.com
bob (comment) (other comment) @example.com (personal name)
"<bob \" (here) " < (hi there) "bob(the man)smith" (hi) @ (there) example.com (hello) > (again)
(none of which are permitted by javamail's InternetAddress parsing, incidentally)
By using getInternetAddress(), you can retrieve an InternetAddress object that, when toString()'ed, would reveal that the parser had converted the above into:
<bob@example.com>
<bob@example.com>
"personal name" <bob@example.com>
"<bob\" (here)" <"bob(the man)smith"@example.com>
(respectively)
If parsing headers, however, you'll probably be calling extractHeaderAddresses().
A future improvement may be to use this class to extract info from corrupted addresses, but for now, it does not permit them.
Some of the configuration booleans allow a bit of tweaking already. The source code can be compiled with these booleans in various states. They are configured to what is probably the most commonly-useful state.
| Modifier and Type | Method and Description |
|---|---|
static boolean |
isValid(@Nullable String email)
Validates an e-mail with default validation flags.
|
static boolean |
isValid(@Nullable String email,
@NotNull EnumSet<EmailAddressCriteria> criteria)
Using the given validation criteria, checks to see if the specified string is a valid email address according to the RFC 2822 specification, which is
remarkably squirrely.
|
static boolean |
isValidStrict(@Nullable String email)
Validates an e-mail with default validation flags that remains
true to RFC 2822. |
public static boolean isValid(@Nullable
@Nullable String email)
EmailAddressCriteria.ALLOW_DOMAIN_LITERALS criteria, which results in exclusions on single domains. Useful for cleaning up email strings that other
middleware (ie. the next server) will be able to understand.email - A string representing an email address.EmailAddressCriteria.RECOMMENDEDpublic static boolean isValidStrict(@Nullable
@Nullable String email)
true to RFC 2822.email - A string representing an email address.EmailAddressCriteria.RFC_COMPLIANTpublic static boolean isValid(@Nullable
@Nullable String email,
@NotNull
@NotNull EnumSet<EmailAddressCriteria> criteria)
Note 1: By passing a criteria in to this method, you are overriding the criteria entirely, not extending it. E.g. you may want to
include both EmailAddressCriteria.ALLOW_QUOTED_IDENTIFIERS and EmailAddressCriteria.ALLOW_PARENS_IN_LOCALPART in addition to whatever
additional criteria you desire
Note 2: If being used on a 2822 header, this method applies to Sender, Resent-Sender, only, although you can also use it on the Return-Path if you know it to be non-empty (see doc for isValidReturnPath()!). Folded header lines should work OK, but I haven't tested that.
email - A complete email address.criteria - A set of criteria flags that restrict or relax RFC 2822 compliance.EmailAddressCriteria.EmailAddressCriteriaCopyright © 2016–2020. All rights reserved.