Class UnicodeData


  • public final class UnicodeData
    extends Object
    Unicode test data. Some were obtained by applying an online converter onto the results of Google translation.
    Since:
    0.6.0
    Version:
    $Id$
    Author:
    tlerios@marketcetera.com
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String COMBO
      A combo string that includes "Hello" in English, "Language" in Norwegian, "HELLO" in Greek, "house" in Arabic, "goodbye" in Japanese, and the G-clef, each successive pair separated by exactly one space.
      static char[] COMBO_CHARS
      The combo string, as a character array.
      static byte[] COMBO_NAT
      The combo string, in the default encoding.
      static int[] COMBO_UCPS
      The combo string, as a Unicode code point array.
      static byte[] COMBO_UTF16BE
      The combo string, in UTF-16BE.
      static byte[] COMBO_UTF16LE
      The combo string, in UTF-16LE.
      static byte[] COMBO_UTF32BE
      The combo string, in UTF-32BE.
      static byte[] COMBO_UTF32LE
      The combo string, in UTF-32LE.
      static byte[] COMBO_UTF8
      The combo string, in UTF-8.
      static String G_CLEF_MSC
      The musical symbol G-clef.
      static char[] G_CLEF_MSC_CHARS
      The G-clef, as a character array.
      static byte[] G_CLEF_MSC_NAT
      The G-clef, in the default encoding.
      static int[] G_CLEF_MSC_UCPS
      The G-clef, as a Unicode code point array.
      static byte[] G_CLEF_MSC_UTF16BE
      The G-clef, in UTF-16BE.
      static byte[] G_CLEF_MSC_UTF16LE
      The G-clef, in UTF-16LE.
      static byte[] G_CLEF_MSC_UTF32BE
      The G-clef, in UTF-32BE.
      static byte[] G_CLEF_MSC_UTF32LE
      The G-clef, in UTF-32LE.
      static byte[] G_CLEF_MSC_UTF8
      The G-clef, in UTF-8.
      static String GOATS_LNB
      The Linear B ideograms for she-goat and he-goat (in this order and separated by a space).
      static char[] GOATS_LNB_CHARS
      The Linear B goat ideograms, as a character array.
      static byte[] GOATS_LNB_NAT
      The Linear B goat ideograms, in the default encoding.
      static int[] GOATS_LNB_UCPS
      The Linear B goat ideograms, as a Unicode code point array.
      static byte[] GOATS_LNB_UTF16BE
      The Linear B goat ideograms, in UTF-16BE.
      static byte[] GOATS_LNB_UTF16LE
      The Linear B goat ideograms, in UTF-16LE.
      static byte[] GOATS_LNB_UTF32BE
      The Linear B goat ideograms, in UTF-32BE.
      static byte[] GOATS_LNB_UTF32LE
      The Linear B goat ideograms, in UTF-32LE.
      static byte[] GOATS_LNB_UTF8
      The Linear B goat ideograms, in UTF-8.
      static String GOODBYE_JA
      "goodbye" (pronounced "sayonara") in Japanese, in the Hiragana writing system.
      static char[] GOODBYE_JA_CHARS
      "goodbye" in Japanese, as a character array.
      static byte[] GOODBYE_JA_NAT
      "goodbye" in Japanese, in the default encoding.
      static int[] GOODBYE_JA_UCPS
      "goodbye" in Japanese, as a Unicode code point array.
      static byte[] GOODBYE_JA_UTF16BE
      "goodbye" in Japanese, in UTF-16BE.
      static byte[] GOODBYE_JA_UTF16LE
      "goodbye" in Japanese, in UTF-16LE.
      static byte[] GOODBYE_JA_UTF32BE
      "goodbye" in Japanese, in UTF-32BE.
      static byte[] GOODBYE_JA_UTF32LE
      "goodbye" in Japanese, in UTF-32LE.
      static byte[] GOODBYE_JA_UTF8
      "goodbye" in Japanese, in UTF-8.
      static String HELLO_EN
      "Hello" in English.
      static char[] HELLO_EN_CHARS
      "Hello" in English, as a character array.
      static byte[] HELLO_EN_NAT
      "Hello" in English, in the default encoding.
      static int[] HELLO_EN_UCPS
      "Hello" in English, as a Unicode code point array.
      static byte[] HELLO_EN_UTF16BE
      "Hello" in English, in UTF-16BE.
      static byte[] HELLO_EN_UTF16LE
      "Hello" in English, in UTF-16LE.
      static byte[] HELLO_EN_UTF32BE
      "Hello" in English, in UTF-32BE.
      static byte[] HELLO_EN_UTF32LE
      "Hello" in English, in UTF-32LE.
      static byte[] HELLO_EN_UTF8
      "Hello" in English, in UTF-8.
      static String HELLO_GR
      "HELLO" (pronounced "yassou") in Greek: this is the word "hello" in all uppercase Greek letters (it is, in fact, two Greek words, separated by a space).
      static char[] HELLO_GR_CHARS
      "HELLO" in Greek, as a character array.
      static byte[] HELLO_GR_NAT
      "HELLO" in Greek, in the default encoding.
      static int[] HELLO_GR_UCPS
      "HELLO" in Greek, as a Unicode code point array.
      static byte[] HELLO_GR_UTF16BE
      "HELLO" in Greek, in UTF-16BE.
      static byte[] HELLO_GR_UTF16LE
      "HELLO" in Greek, in UTF-16LE.
      static byte[] HELLO_GR_UTF32BE
      "HELLO" in Greek, in UTF-32BE.
      static byte[] HELLO_GR_UTF32LE
      "HELLO" in Greek, in UTF-32LE.
      static byte[] HELLO_GR_UTF8
      "HELLO" in Greek, in UTF-8.
      static String HOUSE_AR
      "house" (pronounced "manzil") in Arabic.
      static char[] HOUSE_AR_CHARS
      "house" in Arabic, as a character array.
      static byte[] HOUSE_AR_NAT
      "house" in Arabic, in the default encoding.
      static int[] HOUSE_AR_UCPS
      "house" in Arabic, as a Unicode code point array.
      static byte[] HOUSE_AR_UTF16BE
      "house" in Arabic, in UTF-16BE.
      static byte[] HOUSE_AR_UTF16LE
      "house" in Arabic, in UTF-16LE.
      static byte[] HOUSE_AR_UTF32BE
      "house" in Arabic, in UTF-32BE.
      static byte[] HOUSE_AR_UTF32LE
      "house" in Arabic, in UTF-32LE.
      static byte[] HOUSE_AR_UTF8
      "house" in Arabic, in UTF-8.
      static String INVALID
      An invalid string, comprising an isolated 16-bit surrogate.
      static char[] INVALID_CHARS
      An invalid string, comprising an isolated 16-bit surrogate, as a character array.
      static int[] INVALID_UCPS
      A Unicode code point comprising an isolated surrogate code point.
      static byte[] INVALID_UTF16BE
      A byte array comprising an invalid UTF-16BE byte sequence (an isolated 16-bit surrogate).
      static byte[] INVALID_UTF16LE
      A byte array comprising an invalid UTF-16LE byte sequence (an isolated 16-bit surrogate).
      static byte[] INVALID_UTF32BE
      A byte array comprising an invalid UTF-32BE byte sequence (a 32-bit value outside the valid range for Unicode scalar values).
      static byte[] INVALID_UTF32LE
      A byte array comprising an invalid UTF-32LE byte sequence (a 32-bit value outside the valid range for Unicode scalar values).
      static byte[] INVALID_UTF8
      A byte array comprising an invalid UTF-8 byte sequence (the first 3 bytes of a 4-byte sequence).
      static String LANGUAGE_NO
      "Language" (pronounced "sprook") in Norwegian: this is the word "language" in Norwegian, with the first letter capitalized.
      static char[] LANGUAGE_NO_CHARS
      "Language" in Norwegian, as a character array.
      static byte[] LANGUAGE_NO_NAT
      "Language" in Norwegian, in the default encoding.
      static int[] LANGUAGE_NO_UCPS
      "Language" in Norwegian, as a Unicode code point array.
      static byte[] LANGUAGE_NO_UTF16BE
      "Language" in Norwegian, in UTF-16BE.
      static byte[] LANGUAGE_NO_UTF16LE
      "Language" in Norwegian, in UTF-16LE.
      static byte[] LANGUAGE_NO_UTF32BE
      "Language" in Norwegian, in UTF-32BE.
      static byte[] LANGUAGE_NO_UTF32LE
      "Language" in Norwegian, in UTF-32LE.
      static byte[] LANGUAGE_NO_UTF8
      "Language" in Norwegian, in UTF-8.
      static String SPACE
      The space character.
      static char[] SPACE_CHARS
      The space character, as a character array.
      static byte[] SPACE_NAT
      The space character, in the default encoding.
      static int[] SPACE_UCPS
      The space character, as a Unicode code point array.
      static byte[] SPACE_UTF16BE
      The space, in UTF-16BE.
      static byte[] SPACE_UTF16LE
      The space, in UTF-16LE.
      static byte[] SPACE_UTF32BE
      The space, in UTF-32BE.
      static byte[] SPACE_UTF32LE
      The space, in UTF-32LE.
      static byte[] SPACE_UTF8
      The space, in UTF-8.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private UnicodeData()
      Constructor.
    • Field Detail

      • SPACE_CHARS

        public static final char[] SPACE_CHARS
        The space character, as a character array.
      • SPACE_UCPS

        public static final int[] SPACE_UCPS
        The space character, as a Unicode code point array.
      • SPACE_NAT

        public static final byte[] SPACE_NAT
        The space character, in the default encoding.
      • SPACE_UTF8

        public static final byte[] SPACE_UTF8
        The space, in UTF-8.
      • SPACE_UTF16BE

        public static final byte[] SPACE_UTF16BE
        The space, in UTF-16BE.
      • SPACE_UTF16LE

        public static final byte[] SPACE_UTF16LE
        The space, in UTF-16LE.
      • SPACE_UTF32BE

        public static final byte[] SPACE_UTF32BE
        The space, in UTF-32BE.
      • SPACE_UTF32LE

        public static final byte[] SPACE_UTF32LE
        The space, in UTF-32LE.
      • HELLO_EN_CHARS

        public static final char[] HELLO_EN_CHARS
        "Hello" in English, as a character array.
      • HELLO_EN_UCPS

        public static final int[] HELLO_EN_UCPS
        "Hello" in English, as a Unicode code point array.
      • HELLO_EN_NAT

        public static final byte[] HELLO_EN_NAT
        "Hello" in English, in the default encoding.
      • HELLO_EN_UTF8

        public static final byte[] HELLO_EN_UTF8
        "Hello" in English, in UTF-8.
      • HELLO_EN_UTF16BE

        public static final byte[] HELLO_EN_UTF16BE
        "Hello" in English, in UTF-16BE.
      • HELLO_EN_UTF16LE

        public static final byte[] HELLO_EN_UTF16LE
        "Hello" in English, in UTF-16LE.
      • HELLO_EN_UTF32BE

        public static final byte[] HELLO_EN_UTF32BE
        "Hello" in English, in UTF-32BE.
      • HELLO_EN_UTF32LE

        public static final byte[] HELLO_EN_UTF32LE
        "Hello" in English, in UTF-32LE.
      • LANGUAGE_NO

        public static final String LANGUAGE_NO
        "Language" (pronounced "sprook") in Norwegian: this is the word "language" in Norwegian, with the first letter capitalized.
        See Also:
        Constant Field Values
      • LANGUAGE_NO_CHARS

        public static final char[] LANGUAGE_NO_CHARS
        "Language" in Norwegian, as a character array.
      • LANGUAGE_NO_UCPS

        public static final int[] LANGUAGE_NO_UCPS
        "Language" in Norwegian, as a Unicode code point array.
      • LANGUAGE_NO_NAT

        public static final byte[] LANGUAGE_NO_NAT
        "Language" in Norwegian, in the default encoding.
      • LANGUAGE_NO_UTF8

        public static final byte[] LANGUAGE_NO_UTF8
        "Language" in Norwegian, in UTF-8.
      • LANGUAGE_NO_UTF16BE

        public static final byte[] LANGUAGE_NO_UTF16BE
        "Language" in Norwegian, in UTF-16BE.
      • LANGUAGE_NO_UTF16LE

        public static final byte[] LANGUAGE_NO_UTF16LE
        "Language" in Norwegian, in UTF-16LE.
      • LANGUAGE_NO_UTF32BE

        public static final byte[] LANGUAGE_NO_UTF32BE
        "Language" in Norwegian, in UTF-32BE.
      • LANGUAGE_NO_UTF32LE

        public static final byte[] LANGUAGE_NO_UTF32LE
        "Language" in Norwegian, in UTF-32LE.
      • HELLO_GR

        public static final String HELLO_GR
        "HELLO" (pronounced "yassou") in Greek: this is the word "hello" in all uppercase Greek letters (it is, in fact, two Greek words, separated by a space).
        See Also:
        Constant Field Values
      • HELLO_GR_CHARS

        public static final char[] HELLO_GR_CHARS
        "HELLO" in Greek, as a character array.
      • HELLO_GR_UCPS

        public static final int[] HELLO_GR_UCPS
        "HELLO" in Greek, as a Unicode code point array.
      • HELLO_GR_NAT

        public static final byte[] HELLO_GR_NAT
        "HELLO" in Greek, in the default encoding.
      • HELLO_GR_UTF8

        public static final byte[] HELLO_GR_UTF8
        "HELLO" in Greek, in UTF-8.
      • HELLO_GR_UTF16BE

        public static final byte[] HELLO_GR_UTF16BE
        "HELLO" in Greek, in UTF-16BE.
      • HELLO_GR_UTF16LE

        public static final byte[] HELLO_GR_UTF16LE
        "HELLO" in Greek, in UTF-16LE.
      • HELLO_GR_UTF32BE

        public static final byte[] HELLO_GR_UTF32BE
        "HELLO" in Greek, in UTF-32BE.
      • HELLO_GR_UTF32LE

        public static final byte[] HELLO_GR_UTF32LE
        "HELLO" in Greek, in UTF-32LE.
      • HOUSE_AR_CHARS

        public static final char[] HOUSE_AR_CHARS
        "house" in Arabic, as a character array.
      • HOUSE_AR_UCPS

        public static final int[] HOUSE_AR_UCPS
        "house" in Arabic, as a Unicode code point array.
      • HOUSE_AR_NAT

        public static final byte[] HOUSE_AR_NAT
        "house" in Arabic, in the default encoding.
      • HOUSE_AR_UTF8

        public static final byte[] HOUSE_AR_UTF8
        "house" in Arabic, in UTF-8.
      • HOUSE_AR_UTF16BE

        public static final byte[] HOUSE_AR_UTF16BE
        "house" in Arabic, in UTF-16BE.
      • HOUSE_AR_UTF16LE

        public static final byte[] HOUSE_AR_UTF16LE
        "house" in Arabic, in UTF-16LE.
      • HOUSE_AR_UTF32BE

        public static final byte[] HOUSE_AR_UTF32BE
        "house" in Arabic, in UTF-32BE.
      • HOUSE_AR_UTF32LE

        public static final byte[] HOUSE_AR_UTF32LE
        "house" in Arabic, in UTF-32LE.
      • GOODBYE_JA

        public static final String GOODBYE_JA
        "goodbye" (pronounced "sayonara") in Japanese, in the Hiragana writing system.
        See Also:
        Constant Field Values
      • GOODBYE_JA_CHARS

        public static final char[] GOODBYE_JA_CHARS
        "goodbye" in Japanese, as a character array.
      • GOODBYE_JA_UCPS

        public static final int[] GOODBYE_JA_UCPS
        "goodbye" in Japanese, as a Unicode code point array.
      • GOODBYE_JA_NAT

        public static final byte[] GOODBYE_JA_NAT
        "goodbye" in Japanese, in the default encoding.
      • GOODBYE_JA_UTF8

        public static final byte[] GOODBYE_JA_UTF8
        "goodbye" in Japanese, in UTF-8.
      • GOODBYE_JA_UTF16BE

        public static final byte[] GOODBYE_JA_UTF16BE
        "goodbye" in Japanese, in UTF-16BE.
      • GOODBYE_JA_UTF16LE

        public static final byte[] GOODBYE_JA_UTF16LE
        "goodbye" in Japanese, in UTF-16LE.
      • GOODBYE_JA_UTF32BE

        public static final byte[] GOODBYE_JA_UTF32BE
        "goodbye" in Japanese, in UTF-32BE.
      • GOODBYE_JA_UTF32LE

        public static final byte[] GOODBYE_JA_UTF32LE
        "goodbye" in Japanese, in UTF-32LE.
      • GOATS_LNB

        public static final String GOATS_LNB
        The Linear B ideograms for she-goat and he-goat (in this order and separated by a space).
        See Also:
        Constant Field Values
      • GOATS_LNB_CHARS

        public static final char[] GOATS_LNB_CHARS
        The Linear B goat ideograms, as a character array.
      • GOATS_LNB_UCPS

        public static final int[] GOATS_LNB_UCPS
        The Linear B goat ideograms, as a Unicode code point array.
      • GOATS_LNB_NAT

        public static final byte[] GOATS_LNB_NAT
        The Linear B goat ideograms, in the default encoding.
      • GOATS_LNB_UTF8

        public static final byte[] GOATS_LNB_UTF8
        The Linear B goat ideograms, in UTF-8.
      • GOATS_LNB_UTF16BE

        public static final byte[] GOATS_LNB_UTF16BE
        The Linear B goat ideograms, in UTF-16BE.
      • GOATS_LNB_UTF16LE

        public static final byte[] GOATS_LNB_UTF16LE
        The Linear B goat ideograms, in UTF-16LE.
      • GOATS_LNB_UTF32BE

        public static final byte[] GOATS_LNB_UTF32BE
        The Linear B goat ideograms, in UTF-32BE.
      • GOATS_LNB_UTF32LE

        public static final byte[] GOATS_LNB_UTF32LE
        The Linear B goat ideograms, in UTF-32LE.
      • G_CLEF_MSC_CHARS

        public static final char[] G_CLEF_MSC_CHARS
        The G-clef, as a character array.
      • G_CLEF_MSC_UCPS

        public static final int[] G_CLEF_MSC_UCPS
        The G-clef, as a Unicode code point array.
      • G_CLEF_MSC_NAT

        public static final byte[] G_CLEF_MSC_NAT
        The G-clef, in the default encoding.
      • G_CLEF_MSC_UTF8

        public static final byte[] G_CLEF_MSC_UTF8
        The G-clef, in UTF-8.
      • G_CLEF_MSC_UTF16BE

        public static final byte[] G_CLEF_MSC_UTF16BE
        The G-clef, in UTF-16BE.
      • G_CLEF_MSC_UTF16LE

        public static final byte[] G_CLEF_MSC_UTF16LE
        The G-clef, in UTF-16LE.
      • G_CLEF_MSC_UTF32BE

        public static final byte[] G_CLEF_MSC_UTF32BE
        The G-clef, in UTF-32BE.
      • G_CLEF_MSC_UTF32LE

        public static final byte[] G_CLEF_MSC_UTF32LE
        The G-clef, in UTF-32LE.
      • COMBO

        public static final String COMBO
        A combo string that includes "Hello" in English, "Language" in Norwegian, "HELLO" in Greek, "house" in Arabic, "goodbye" in Japanese, and the G-clef, each successive pair separated by exactly one space.
        See Also:
        Constant Field Values
      • COMBO_CHARS

        public static final char[] COMBO_CHARS
        The combo string, as a character array.
      • COMBO_UCPS

        public static final int[] COMBO_UCPS
        The combo string, as a Unicode code point array.
      • COMBO_NAT

        public static final byte[] COMBO_NAT
        The combo string, in the default encoding.
      • COMBO_UTF8

        public static final byte[] COMBO_UTF8
        The combo string, in UTF-8.
      • COMBO_UTF16BE

        public static final byte[] COMBO_UTF16BE
        The combo string, in UTF-16BE.
      • COMBO_UTF16LE

        public static final byte[] COMBO_UTF16LE
        The combo string, in UTF-16LE.
      • COMBO_UTF32BE

        public static final byte[] COMBO_UTF32BE
        The combo string, in UTF-32BE.
      • COMBO_UTF32LE

        public static final byte[] COMBO_UTF32LE
        The combo string, in UTF-32LE.
      • INVALID

        public static final String INVALID
        An invalid string, comprising an isolated 16-bit surrogate.
        See Also:
        Constant Field Values
      • INVALID_CHARS

        public static final char[] INVALID_CHARS
        An invalid string, comprising an isolated 16-bit surrogate, as a character array.
      • INVALID_UCPS

        public static final int[] INVALID_UCPS
        A Unicode code point comprising an isolated surrogate code point.
      • INVALID_UTF8

        public static final byte[] INVALID_UTF8
        A byte array comprising an invalid UTF-8 byte sequence (the first 3 bytes of a 4-byte sequence).
      • INVALID_UTF16BE

        public static final byte[] INVALID_UTF16BE
        A byte array comprising an invalid UTF-16BE byte sequence (an isolated 16-bit surrogate).
      • INVALID_UTF16LE

        public static final byte[] INVALID_UTF16LE
        A byte array comprising an invalid UTF-16LE byte sequence (an isolated 16-bit surrogate).
      • INVALID_UTF32BE

        public static final byte[] INVALID_UTF32BE
        A byte array comprising an invalid UTF-32BE byte sequence (a 32-bit value outside the valid range for Unicode scalar values).
      • INVALID_UTF32LE

        public static final byte[] INVALID_UTF32LE
        A byte array comprising an invalid UTF-32LE byte sequence (a 32-bit value outside the valid range for Unicode scalar values).
    • Constructor Detail

      • UnicodeData

        private UnicodeData()
        Constructor. It is private so that no instances can be created.
    • Method Detail

      • concat

        private static byte[] concat​(byte[]... arrays)
        Concatenates the given byte arrays and returns the result.
        Parameters:
        arrays - The arrays.
        Returns:
        The concatenated arrays.
      • concat

        private static int[] concat​(int[]... arrays)
        Concatenates the given integer arrays and returns the result.
        Parameters:
        arrays - The arrays.
        Returns:
        The concatenated arrays.
      • concat

        private static char[] concat​(char[]... arrays)
        Concatenates the given character arrays and returns the result.
        Parameters:
        arrays - The arrays.
        Returns:
        The concatenated arrays.