Class Path

  • Direct Known Subclasses:
    UXAddress

    public class Path
    extends Object

    Represents an eMail address header content (parser). That is, RFC822 (and successors) From, To, and subsets, for use on the public internet. Handling of line endings is lenient: CRLF := ([CR] LF) / CR

    In domain literals (square brackets), the General-address-literal syntax is not recognised because downstream MUAs cannot support it as no use is specified at the moment. Similarily, an IPv6 scope (Zone Identifier) is not supported because this parser targets use on the general internet. This class is concerned with on-wire formats; separate classes will implement MIME support and the likes later.

    To use, create a new instance via the of(String) factory method passing the string to analyse for eMail address(es). Then call one of the parse methods on the instance, depending on what to expect:

    • asAddrSpec() checks for unlabelled addr-spec, such as foo@example.com, which are useful for MSA invocations.
    • forSender(boolean) with false argument validates one mailbox, that is Foo <foo@example.com>, such as used for the Sender header. Labels must be ASCII and confirm to the RFC.
    • forSender(boolean) with true argument validates one address, that is either a mailbox as above or a group (Test:a@example.com,b@example.com;); RFC6854 adds them to Sender headers under the RFC2026 §3.3(d) Limited Use caveat.
    • asMailboxList() validates a, comma-separated, list of mailboxen as above, normally for the From header.
    • asAddressList() validates a comma-separated list that can include a mix of mailbox and group addresses and normally is used for recipient headers (To, …) but, under the same Limited Use caveat, can be used per RFC6854 for a From (and like) header.

    All of these return an instance of Path.ParserResult or null if the parsing failed; Path.ParserResult.isValid() will return true only if, in addition, extra syntax and semantic checks passed; only if so, the address list can be used on the public internet safely; Path.ParserResult.toString() pretty-prints the on-wire representation. Some result objects may have extra methods that can be useful.

    Author:
    mirabilos (t.glaser@tarent.de)
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  Path.Address
      Representation for an address (either mailbox or group).
      static class  Path.AddressList
      Representation for an address-list or a mailbox-list.
      static class  Path.AddrSpec
      Representation for an addr-spec (eMail address).
      protected class  Path.AddrSpecSIDE
      Representation for a local-part (FWS unfolded) or a domain (dot-atom only).
      static interface  Path.ParserResult
      Methods all Path parser results implement.
      protected class  Path.UnfoldedSubstring
      Representation for a substring of the input string, FWS unfolded.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected Path​(String input)
      Private constructor.
    • Constructor Detail

      • Path

        protected Path​(String input)
        Private constructor. Use the factory method of(String) instead.
        Parameters:
        input - string to analyse
    • Method Detail

      • is

        protected static boolean is​(int c,
                                    byte what)
      • isAtext

        protected static boolean isAtext​(int c)
      • isCtext

        protected static boolean isCtext​(int c)
      • isDtext

        protected static boolean isDtext​(int c)
      • isQtext

        protected static boolean isQtext​(int c)
      • unfold

        public static String unfold​(String s)
        Removes all occurrences of CR and/or LF from a string.
        Parameters:
        s - input string
        Returns:
        null if there was nothing to remove, a new shorter String otherwise
      • unfold

        protected org.evolvis.tartools.rfc822.Parser.Substring unfold​(org.evolvis.tartools.rfc822.Parser.Substring ss)
        Unfolds FWS in the passed Substring if necessary.
        Parameters:
        ss - Parser.Substring to unfold
        Returns:
        instance of an unfolded equivalent of the original substring
      • of

        public static Path of​(String addresses)
        Creates and initialises a new (strict) parser for eMail addresses.
        Parameters:
        addresses - to parse
        Returns:
        null if addresses was null or very large, the new parser instance otherwise
      • asMailboxList

        public Path.AddressList asMailboxList()
        Parses the address as mailbox-list, such as for the From and Resent-From headers. See asAddressList() for RFC6854’s RFC2026 §3.3(d) Limited Use though.
        Returns:
        parser result; remember to call isValid() on it first!
      • forSender

        public Path.Address forSender​(boolean allowRFC6854forLimitedUse)

        Parses the address for the Sender and Resent-Sender headers.

        These headers normally use the mailbox production, but RFC6854 allows for the address production under the RFC2026 §3.3(d) Limited Use caveat that permits it but only for specific circumstances.

        Parameters:
        allowRFC6854forLimitedUse - use address instead of mailbox parsing
        Returns:
        parser result; remember to call isValid() on it first!
      • asAddrSpec

        public Path.AddrSpec asAddrSpec()

        Parses the address as addr-spec (unlabelled address).

        This method is mostly used in input validation or for constructing arguments for invoking an MSA. It may be better in most cases to instead use forSender(boolean)(false) which permits mailboxen like “user <lcl@example.com>” then extract the addr-speclcl@example.com” from the return value via getMailbox().

        Returns:
        parser result; remember to call isValid() on it first!
      • asAddressList

        public Path.AddressList asAddressList()
        Parses the address as address-list, such as for the Reply-To, To, Cc, (optionally) Bcc, Resent-To, … headers. RFC6854 (under RFC2026 §3.3(d) Limited Use circumstances) permits using this production for the From and Resent-From headers, normally covered by the asMailboxList() method.
        Returns:
        parser result; remember to call isValid() on it first!
      • isMailboxListSeparator

        protected boolean isMailboxListSeparator()
      • pDisplayName

        protected org.evolvis.tartools.rfc822.Parser.Substring pDisplayName()
      • pPhrase

        protected org.evolvis.tartools.rfc822.Parser.Substring pPhrase()
      • pWord

        protected org.evolvis.tartools.rfc822.Path.Word pWord()
      • pAtom

        protected org.evolvis.tartools.rfc822.Path.Word pAtom()

        Returns the parse result of the atom production:

        result.body is a raw Parser.Substring of the atom, with surrounding CFWS stripped (no unfolding necessary), no extra data

        result.cfws is null or the trailing CFWS as raw Parser.Substring, not unfolded

        Returns:
        result (see above) as Path.Word
      • pQuotedPair

        protected int pQuotedPair()
      • pQcontent

        protected int pQcontent()
      • pQuotedString

        protected org.evolvis.tartools.rfc822.Path.Word pQuotedString()

        Returns the parse result of the quoted-string production:

        result.body is an Path.UnfoldedSubstring of the entire quoted string, with surrounding double quotes; its String data is dequoted and backslash-removed

        result.cfws is null or the trailing CFWS as raw Parser.Substring, not unfolded

        Returns:
        result (see above) as Path.Word
      • pFWS

        protected org.evolvis.tartools.rfc822.Parser.Substring pFWS()
        Parses FWS.
        Returns:
        raw Parser.Substring, not unfolded
      • pCcontent

        protected boolean pCcontent()
      • pComment

        protected org.evolvis.tartools.rfc822.Parser.Substring pComment()
        Parses comment.
        Returns:
        raw Parser.Substring, not unfolded (unfolded is human-visible form for now; may wish to simplify quoted-pairs)
      • pCFWS

        protected org.evolvis.tartools.rfc822.Parser.Substring pCFWS()
        Parses CFWS.
        Returns:
        raw Parser.Substring, not unfolded
      • pDotAtom

        protected org.evolvis.tartools.rfc822.Parser.Substring pDotAtom()
      • pDomainLiteral

        protected org.evolvis.tartools.rfc822.Parser.Substring pDomainLiteral()
      • pDomain

        protected org.evolvis.tartools.rfc822.Parser.Substring pDomain()
      • pDomainDotAtom

        protected Path.AddrSpecSIDE pDomainDotAtom​(org.evolvis.tartools.rfc822.Parser.Substring da)
      • of

        protected static <T extends org.evolvis.tartools.rfc822.Parser> T of​(Class<T> cls,
                                                                             String input)

        Constructs a parser. Intended to be used by subclasses from static factory methods *only*; see of(String) for an example.

        Type Parameters:
        T - subclass of Parser to construct
        Parameters:
        cls - subclass of Parser to construct
        input - user-provided String to parse
        Returns:
        null if input was null or too large, the new parser subclass instance otherwise
      • jmp

        protected final int jmp​(int pos)
        Jumps to a specified input character position, absolute jump.
        Parameters:
        pos - to jump to
        Returns:
        the codepoint at that position
        Throws:
        IndexOutOfBoundsException - if pos is not in or just past the input
      • bra

        protected final int bra​(int deltapos)
        Jumps to a specified input character position, relative jump.
        Parameters:
        deltapos - to add to the current position
        Returns:
        the codepoint at that position
        Throws:
        IndexOutOfBoundsException - if pos is not in or just past the input
      • pos

        protected final int pos()
        Returns the current input character position. Useful for saving and restoring (with jmp(int)) and for error messages.
        Returns:
        position
      • s

        protected final String s()
        Returns the input string, for use with substring comparisons. (This is safe because Java™ strings are immutable.)
        Returns:
        String input
      • cur

        protected final int cur()
        Returns the wide character at the current position.
        Returns:
        UCS-4 codepoint, or -1 if end of input is reached
      • peek

        protected final int peek()
        Returns the wide character after the one at the current position.
        Returns:
        UCS-4 codepoint, or -1 if end of input is reached
      • accept

        protected final int accept()
        Advances the current position to the next character.
        Returns:
        codepoint of the next character, or -1 if end of input is reached
        Throws:
        IndexOutOfBoundsException - if end of input was already reached
      • skipPeek

        protected final int skipPeek​(LookaheadMatcher matcher)
        Advances the current position using a peeking matcher. Continues as long as the matcher returns true and end of input is not yet reached.
        Parameters:
        matcher - LookaheadMatcher called with cur() and peek() as arguments to determine whether to skip ahead
        Returns:
        codepoint of the first character for which the matcher returned false, or -1 if end of input is reached
        See Also:
        skip(ContextlessMatcher)
      • skip

        protected final int skip​(ContextlessMatcher matcher)
        Advances the current position using a regular matcher. Continues as long as the matcher returns true and end of input is not yet reached.
        Parameters:
        matcher - ContextlessMatcher called with just cur() as argument to determine whether to skip ahead
        Returns:
        codepoint of the first character for which the matcher returned false, or -1 if end of input is reached
        See Also:
        skipPeek(LookaheadMatcher)