Class TextMDMetadata

java.lang.Object
edu.harvard.hul.ois.jhove.TextMDMetadata

public class TextMDMetadata extends Object
Encapsulation of the textMD metadata for text files. See http://www.loc.gov/standards/textMd for more information.
Author:
Thomas Ledoux
  • Field Details

    • NAMESPACE

      public static final String NAMESPACE
      textMD namespace and version
      See Also:
    • DEFAULT_LOCATION

      public static final String DEFAULT_LOCATION
      See Also:
    • VERSION

      public static final String VERSION
      See Also:
    • BYTE_ORDER

      public static final String[] BYTE_ORDER
      Uses enumerated values of 'big', 'little', and 'middle' endian.
    • BYTE_ORDER_BIG

      public static final int BYTE_ORDER_BIG
      See Also:
    • BYTE_ORDER_LITTLE

      public static final int BYTE_ORDER_LITTLE
      See Also:
    • BYTE_ORDER_MIDDLE

      public static final int BYTE_ORDER_MIDDLE
      See Also:
    • LINEBREAK

      public static final String[] LINEBREAK
      Uses enumerated values of 'CR', 'LF' and 'CR/LF' for the idenntification of the linebreak.
    • LINEBREAK_CR

      public static final int LINEBREAK_CR
      See Also:
    • LINEBREAK_LF

      public static final int LINEBREAK_LF
      See Also:
    • LINEBREAK_CRLF

      public static final int LINEBREAK_CRLF
      See Also:
    • UNKNOWN_JAVA_CHARSET

      protected static final String[] UNKNOWN_JAVA_CHARSET
      Array of textMD charsets unknown by java.nio.charset.Charsets
    • setOfUnknownJavaCharset

      protected static Set setOfUnknownJavaCharset
      Set of unknown charsets in Java
    • fromISO_639_2_T2B

      protected static Map<String,String> fromISO_639_2_T2B
      Map from ISO 639/2 T to ISO 639/2 B
    • CHARSET_ASCII

      public static final String CHARSET_ASCII
      See Also:
    • CHARSET_UTF8

      public static final String CHARSET_UTF8
      See Also:
    • CHARSET_ISO8859_1

      public static final String CHARSET_ISO8859_1
      See Also:
    • NILL

      public static final int NILL
      To represent the unknown
      See Also:
  • Constructor Details

    • TextMDMetadata

      public TextMDMetadata()
  • Method Details

    • getCharset

      public String getCharset()
      Returns:
      the charset
    • setCharset

      public void setCharset(String charset)
      Parameters:
      charset - the charset to set
    • getByte_order

      public int getByte_order()
      Returns:
      the byte_order
    • getByte_orderString

      public String getByte_orderString()
    • setByte_order

      public void setByte_order(int byte_order)
      Parameters:
      byte_order - the byte_order to set
    • getByte_size

      public String getByte_size()
      Returns:
      the byte_size
    • setByte_size

      public void setByte_size(String byte_size)
      Parameters:
      byte_size - the byte_size to set
    • getCharacter_size

      public String getCharacter_size()
      Returns:
      the character_size
    • setCharacter_size

      public void setCharacter_size(String character_size)
      Parameters:
      character_size - the character_size to set
    • getLinebreak

      public int getLinebreak()
      Returns:
      the linebreak
    • getLinebreakString

      public String getLinebreakString()
      Returns:
      the linebreak in String form
    • setLinebreak

      public void setLinebreak(int linebreak)
      Parameters:
      linebreak - the linebreak to set
    • getLanguage

      public String getLanguage()
      Returns:
      the language
    • setLanguage

      public void setLanguage(String language)
      Parameters:
      language - the language to set
    • getMarkup_basis

      public String getMarkup_basis()
      Returns:
      the markup_basis
    • setMarkup_basis

      public void setMarkup_basis(String markup_basis)
      Parameters:
      markup_basis - the markup_basis to set
    • getMarkup_basis_version

      public String getMarkup_basis_version()
      Returns:
      the markup_basis_version
    • setMarkup_basis_version

      public void setMarkup_basis_version(String markup_basis_version)
      Parameters:
      markup_basis_version - the markup_basis_version to set
    • getMarkup_language

      public String getMarkup_language()
      Returns:
      the markup_language
    • setMarkup_language

      public void setMarkup_language(String markup_language)
      Parameters:
      markup_language - the markup_language to set
    • getMarkup_language_version

      public String getMarkup_language_version()
      Returns:
      the markup_language_version
    • setMarkup_language_version

      public void setMarkup_language_version(String markup_language_version)
      Parameters:
      markup_language_version - the markup_language_version to set
    • toTextMDCharset

      public static String toTextMDCharset(String srcCharset)
      Transform a given charset in the "authorized" list given in the textMD schema enumeration. From the schema documentation on charset (http://www.loc.gov/standards/textMD/elementSet/index.html#element_charset). The character set employed by the text. Controlled vocab using IANA names for character sets: http://www.iana.org/assignments/character-sets. The problem arises because the java Charset uses the (preferred MIME name) where textMD uses the Name ...
      Parameters:
      srcCharset - charset from the file
      Returns:
      normalized charset
    • toISO_639_2

      public static String toISO_639_2(String srcLang)
      Transform a language to the ISO_639-2 language (only enumeration allowed in textMD schema).
      Parameters:
      srcLang - language in the file
      Returns:
      normalized language in 3 letters (except qaa-qtz)