Class Path

java.lang.Object
org.evolvis.tartools.rfc822.Path
Direct Known Subclasses:
UXAddress

public class Path extends Object

Represents an eMail address header content (parser). That is, RFC822 (and successors) From, To, and subsets, for use on the public internet. Handling of line endings is lenient: CRLF := ([CR] LF) / CR

In domain literals (square brackets), the General-address-literal syntax is not recognised because downstream MUAs cannot support it as no use is specified at the moment. Similarily, an IPv6 scope (Zone Identifier) is not supported because this parser targets use on the general internet. This class is concerned with on-wire formats; separate classes will implement MIME support and the likes later.

To use, create a new instance via the of(String) factory method passing the string to analyse for eMail address(es). Then call one of the parse methods on the instance, depending on what to expect:

  • asAddrSpec() checks for unlabelled addr-spec, such as foo@example.com, which are useful for MSA invocations.
  • forSender(boolean) with false argument validates one mailbox, that is Foo <foo@example.com>, such as used for the Sender header. Labels must be ASCII and confirm to the RFC.
  • forSender(boolean) with true argument validates one address, that is either a mailbox as above or a group (Test:a@example.com,b@example.com;); RFC6854 adds them to Sender headers under the RFC2026 §3.3(d) Limited Use caveat.
  • asMailboxList() validates a, comma-separated, list of mailboxen as above, normally for the From header.
  • asAddressList() validates a comma-separated list that can include a mix of mailbox and group addresses and normally is used for recipient headers (To, …) but, under the same Limited Use caveat, can be used per RFC6854 for a From (and like) header.

All of these return an instance of Path.ParserResult or null if the parsing failed; Path.ParserResult.isValid() will return true only if, in addition, extra syntax and semantic checks passed; only if so, the address list can be used on the public internet safely; Path.ParserResult.toString() pretty-prints the on-wire representation. Some result objects may have extra methods that can be useful.

Author:
mirabilos (t.glaser@tarent.de)
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final class 
    Representation for an address (either mailbox or group).
    static final class 
    Representation for an address-list or a mailbox-list.
    static final class 
    Representation for an addr-spec (eMail address).
    protected final class 
    Representation for a local-part (FWS unfolded) or a domain (dot-atom only).
    static interface 
    Methods all Path parser results implement.
    protected final class 
    Representation for a substring of the input string, FWS unfolded.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
    Path(String input)
    Private constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected final int
    Advances the current position to the next character.
    Parses the address as address-list, such as for the Reply-To, To, Cc, (optionally) Bcc, Resent-To, … headers.
    Parses the address as addr-spec (unlabelled address).
    Parses the address as mailbox-list, such as for the From and Resent-From headers.
    protected final int
    bra(int deltapos)
    Jumps to a specified input character position, relative jump.
    protected final int
    cur()
    Returns the wide character at the current position.
    forSender(boolean allowRFC6854forLimitedUse)
    Parses the address for the Sender and Resent-Sender headers.
    protected static boolean
    is(int c, byte what)
     
    protected static boolean
    isAtext(int c)
     
    protected static boolean
    isCtext(int c)
     
    protected static boolean
    isDtext(int c)
     
    protected boolean
     
    protected static boolean
    isQtext(int c)
     
    protected final int
    jmp(int pos)
    Jumps to a specified input character position, absolute jump.
    protected static <T extends org.evolvis.tartools.rfc822.Parser>
    T
    of(Class<T> cls, String input)
    Constructs a parser.
    static Path
    of(String addresses)
    Creates and initialises a new (strict) parser for eMail addresses.
    protected Path.Address
     
     
    protected Path.AddrSpec
     
    protected Path.AddrSpec
     
    protected org.evolvis.tartools.rfc822.Path.Word
    Returns the parse result of the atom production:
    protected boolean
     
    protected org.evolvis.tartools.rfc822.Parser.Substring
    Parses CFWS.
    protected org.evolvis.tartools.rfc822.Parser.Substring
    Parses comment.
    protected org.evolvis.tartools.rfc822.Parser.Substring
     
    protected org.evolvis.tartools.rfc822.Parser.Substring
     
    pDomainDotAtom(org.evolvis.tartools.rfc822.Parser.Substring da)
     
    protected org.evolvis.tartools.rfc822.Parser.Substring
     
    protected org.evolvis.tartools.rfc822.Parser.Substring
     
    protected final int
    Returns the wide character after the one at the current position.
    protected org.evolvis.tartools.rfc822.Parser.Substring
    Parses FWS.
    protected Path.Address
     
     
    protected Path.Address
     
     
    protected Path.Address
     
    protected final int
    pos()
    Returns the current input character position.
    protected org.evolvis.tartools.rfc822.Parser.Substring
     
    protected int
     
    protected int
     
    protected org.evolvis.tartools.rfc822.Path.Word
    Returns the parse result of the quoted-string production:
    protected org.evolvis.tartools.rfc822.Path.Word
     
    protected final String
    s()
    Returns the input string, for use with substring comparisons.
    protected final int
    Advances the current position using a regular matcher.
    protected final int
    Advances the current position using a peeking matcher.
    static String
    Removes all occurrences of CR and/or LF from a string.
    protected org.evolvis.tartools.rfc822.Parser.Substring
    unfold(org.evolvis.tartools.rfc822.Parser.Substring ss)
    Unfolds FWS in the passed Substring if necessary.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Path

      protected Path(String input)
      Private constructor. Use the factory method of(String) instead.
      Parameters:
      input - string to analyse
  • Method Details

    • is

      protected static boolean is(int c, byte what)
    • isAtext

      protected static boolean isAtext(int c)
    • isCtext

      protected static boolean isCtext(int c)
    • isDtext

      protected static boolean isDtext(int c)
    • isQtext

      protected static boolean isQtext(int c)
    • unfold

      public static String unfold(String s)
      Removes all occurrences of CR and/or LF from a string.
      Parameters:
      s - input string
      Returns:
      null if there was nothing to remove, a new shorter String otherwise
    • unfold

      protected org.evolvis.tartools.rfc822.Parser.Substring unfold(org.evolvis.tartools.rfc822.Parser.Substring ss)
      Unfolds FWS in the passed Substring if necessary.
      Parameters:
      ss - Parser.Substring to unfold
      Returns:
      instance of an unfolded equivalent of the original substring
    • of

      public static Path of(String addresses)
      Creates and initialises a new (strict) parser for eMail addresses.
      Parameters:
      addresses - to parse
      Returns:
      null if addresses was null or very large, the new parser instance otherwise
    • asMailboxList

      public Path.AddressList asMailboxList()
      Parses the address as mailbox-list, such as for the From and Resent-From headers. See asAddressList() for RFC6854’s RFC2026 §3.3(d) Limited Use though.
      Returns:
      parser result; remember to call isValid() on it first!
    • forSender

      public Path.Address forSender(boolean allowRFC6854forLimitedUse)

      Parses the address for the Sender and Resent-Sender headers.

      These headers normally use the mailbox production, but RFC6854 allows for the address production under the RFC2026 §3.3(d) Limited Use caveat that permits it but only for specific circumstances.

      Parameters:
      allowRFC6854forLimitedUse - use address instead of mailbox parsing
      Returns:
      parser result; remember to call isValid() on it first!
    • asAddrSpec

      public Path.AddrSpec asAddrSpec()

      Parses the address as addr-spec (unlabelled address).

      This method is mostly used in input validation or for constructing arguments for invoking an MSA. It may be better in most cases to instead use forSender(boolean)(false) which permits mailboxen like “user <lcl@example.com>” then extract the addr-speclcl@example.com” from the return value via getMailbox().

      Returns:
      parser result; remember to call isValid() on it first!
    • asAddressList

      public Path.AddressList asAddressList()
      Parses the address as address-list, such as for the Reply-To, To, Cc, (optionally) Bcc, Resent-To, … headers. RFC6854 (under RFC2026 §3.3(d) Limited Use circumstances) permits using this production for the From and Resent-From headers, normally covered by the asMailboxList() method.
      Returns:
      parser result; remember to call isValid() on it first!
    • pAddressList

      protected Path.AddressList pAddressList()
    • isMailboxListSeparator

      protected boolean isMailboxListSeparator()
    • pMailboxList

      protected Path.AddressList pMailboxList()
    • pAddress

      protected Path.Address pAddress()
    • pGroup

      protected Path.Address pGroup()
    • pMailbox

      protected Path.Address pMailbox()
    • pNameAddr

      protected Path.Address pNameAddr()
    • pAngleAddr

      protected Path.AddrSpec pAngleAddr()
    • pDisplayName

      protected org.evolvis.tartools.rfc822.Parser.Substring pDisplayName()
    • pPhrase

      protected org.evolvis.tartools.rfc822.Parser.Substring pPhrase()
    • pWord

      protected org.evolvis.tartools.rfc822.Path.Word pWord()
    • pAtom

      protected org.evolvis.tartools.rfc822.Path.Word pAtom()

      Returns the parse result of the atom production:

      result.body is a raw Parser.Substring of the atom, with surrounding CFWS stripped (no unfolding necessary), no extra data

      result.cfws is null or the trailing CFWS as raw Parser.Substring, not unfolded

      Returns:
      result (see above) as Path.Word
    • pQuotedPair

      protected int pQuotedPair()
    • pQcontent

      protected int pQcontent()
    • pQuotedString

      protected org.evolvis.tartools.rfc822.Path.Word pQuotedString()

      Returns the parse result of the quoted-string production:

      result.body is an Path.UnfoldedSubstring of the entire quoted string, with surrounding double quotes; its String data is dequoted and backslash-removed

      result.cfws is null or the trailing CFWS as raw Parser.Substring, not unfolded

      Returns:
      result (see above) as Path.Word
    • pFWS

      protected org.evolvis.tartools.rfc822.Parser.Substring pFWS()
      Parses FWS.
      Returns:
      raw Parser.Substring, not unfolded
    • pCcontent

      protected boolean pCcontent()
    • pComment

      protected org.evolvis.tartools.rfc822.Parser.Substring pComment()
      Parses comment.
      Returns:
      raw Parser.Substring, not unfolded (unfolded is human-visible form for now; may wish to simplify quoted-pairs)
    • pCFWS

      protected org.evolvis.tartools.rfc822.Parser.Substring pCFWS()
      Parses CFWS.
      Returns:
      raw Parser.Substring, not unfolded
    • pDotAtom

      protected org.evolvis.tartools.rfc822.Parser.Substring pDotAtom()
    • pLocalPart

      protected Path.AddrSpecSIDE pLocalPart()
    • pDomainLiteral

      protected org.evolvis.tartools.rfc822.Parser.Substring pDomainLiteral()
    • pDomain

      protected org.evolvis.tartools.rfc822.Parser.Substring pDomain()
    • pDomainDotAtom

      protected Path.AddrSpecSIDE pDomainDotAtom(org.evolvis.tartools.rfc822.Parser.Substring da)
    • pAddrSpec

      protected Path.AddrSpec pAddrSpec()
    • of

      protected static <T extends org.evolvis.tartools.rfc822.Parser> T of(Class<T> cls, String input)

      Constructs a parser. Intended to be used by subclasses from static factory methods *only*; see of(String) for an example.

      Type Parameters:
      T - subclass of Parser to construct
      Parameters:
      cls - subclass of Parser to construct
      input - user-provided String to parse
      Returns:
      null if input was null or too large, the new parser subclass instance otherwise
    • jmp

      protected final int jmp(int pos)
      Jumps to a specified input character position, absolute jump.
      Parameters:
      pos - to jump to
      Returns:
      the codepoint at that position
      Throws:
      IndexOutOfBoundsException - if pos is not in or just past the input
    • bra

      protected final int bra(int deltapos)
      Jumps to a specified input character position, relative jump.
      Parameters:
      deltapos - to add to the current position
      Returns:
      the codepoint at that position
      Throws:
      IndexOutOfBoundsException - if pos is not in or just past the input
    • pos

      protected final int pos()
      Returns the current input character position. Useful for saving and restoring (with jmp(int)) and for error messages.
      Returns:
      position
    • s

      protected final String s()
      Returns the input string, for use with substring comparisons. (This is safe because Java™ strings are immutable.)
      Returns:
      String input
    • cur

      protected final int cur()
      Returns the wide character at the current position.
      Returns:
      UCS-4 codepoint, or -1 if end of input is reached
    • peek

      protected final int peek()
      Returns the wide character after the one at the current position.
      Returns:
      UCS-4 codepoint, or -1 if end of input is reached
    • accept

      protected final int accept()
      Advances the current position to the next character.
      Returns:
      codepoint of the next character, or -1 if end of input is reached
      Throws:
      IndexOutOfBoundsException - if end of input was already reached
    • skipPeek

      protected final int skipPeek(LookaheadMatcher matcher)
      Advances the current position using a peeking matcher. Continues as long as the matcher returns true and end of input is not yet reached.
      Parameters:
      matcher - LookaheadMatcher called with cur() and peek() as arguments to determine whether to skip ahead
      Returns:
      codepoint of the first character for which the matcher returned false, or -1 if end of input is reached
      See Also:
    • skip

      protected final int skip(ContextlessMatcher matcher)
      Advances the current position using a regular matcher. Continues as long as the matcher returns true and end of input is not yet reached.
      Parameters:
      matcher - ContextlessMatcher called with just cur() as argument to determine whether to skip ahead
      Returns:
      codepoint of the first character for which the matcher returned false, or -1 if end of input is reached
      See Also: