Package edu.harvard.hul.ois.jhove
Class TextMDMetadata
java.lang.Object
edu.harvard.hul.ois.jhove.TextMDMetadata
Encapsulation of the textMD metadata for text files.
See http://www.loc.gov/standards/textMd for more information.
- Author:
- Thomas Ledoux
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String[]Uses enumerated values of 'big', 'little', and 'middle' endian.static final intstatic final intstatic final intstatic final Stringstatic final Stringstatic final Stringstatic final StringMap from ISO 639/2 T to ISO 639/2 Bstatic final String[]Uses enumerated values of 'CR', 'LF' and 'CR/LF' for the idenntification of the linebreak.static final intstatic final intstatic final intstatic final StringtextMD namespace and versionstatic final intTo represent the unknownprotected static SetSet of unknown charsets in Javaprotected static final String[]Array of textMD charsets unknown by java.nio.charset.Charsetsstatic final String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintintvoidsetByte_order(int byte_order) voidsetByte_size(String byte_size) voidsetCharacter_size(String character_size) voidsetCharset(String charset) voidsetLanguage(String language) voidsetLinebreak(int linebreak) voidsetMarkup_basis(String markup_basis) voidsetMarkup_basis_version(String markup_basis_version) voidsetMarkup_language(String markup_language) voidsetMarkup_language_version(String markup_language_version) static StringtoISO_639_2(String srcLang) Transform a language to the ISO_639-2 language (only enumeration allowed in textMD schema).static StringtoTextMDCharset(String srcCharset) Transform a given charset in the "authorized" list given in the textMD schema enumeration.
-
Field Details
-
NAMESPACE
textMD namespace and version- See Also:
-
DEFAULT_LOCATION
- See Also:
-
VERSION
- See Also:
-
BYTE_ORDER
Uses enumerated values of 'big', 'little', and 'middle' endian. -
BYTE_ORDER_BIG
public static final int BYTE_ORDER_BIG- See Also:
-
BYTE_ORDER_LITTLE
public static final int BYTE_ORDER_LITTLE- See Also:
-
BYTE_ORDER_MIDDLE
public static final int BYTE_ORDER_MIDDLE- See Also:
-
LINEBREAK
Uses enumerated values of 'CR', 'LF' and 'CR/LF' for the idenntification of the linebreak. -
LINEBREAK_CR
public static final int LINEBREAK_CR- See Also:
-
LINEBREAK_LF
public static final int LINEBREAK_LF- See Also:
-
LINEBREAK_CRLF
public static final int LINEBREAK_CRLF- See Also:
-
UNKNOWN_JAVA_CHARSET
Array of textMD charsets unknown by java.nio.charset.Charsets -
setOfUnknownJavaCharset
Set of unknown charsets in Java -
fromISO_639_2_T2B
Map from ISO 639/2 T to ISO 639/2 B -
CHARSET_ASCII
- See Also:
-
CHARSET_UTF8
- See Also:
-
CHARSET_ISO8859_1
- See Also:
-
NILL
public static final int NILLTo represent the unknown- See Also:
-
-
Constructor Details
-
TextMDMetadata
public TextMDMetadata()
-
-
Method Details
-
getCharset
- Returns:
- the charset
-
setCharset
- Parameters:
charset- the charset to set
-
getByte_order
public int getByte_order()- Returns:
- the byte_order
-
getByte_orderString
-
setByte_order
public void setByte_order(int byte_order) - Parameters:
byte_order- the byte_order to set
-
getByte_size
- Returns:
- the byte_size
-
setByte_size
- Parameters:
byte_size- the byte_size to set
-
getCharacter_size
- Returns:
- the character_size
-
setCharacter_size
- Parameters:
character_size- the character_size to set
-
getLinebreak
public int getLinebreak()- Returns:
- the linebreak
-
getLinebreakString
- Returns:
- the linebreak in String form
-
setLinebreak
public void setLinebreak(int linebreak) - Parameters:
linebreak- the linebreak to set
-
getLanguage
- Returns:
- the language
-
setLanguage
- Parameters:
language- the language to set
-
getMarkup_basis
- Returns:
- the markup_basis
-
setMarkup_basis
- Parameters:
markup_basis- the markup_basis to set
-
getMarkup_basis_version
- Returns:
- the markup_basis_version
-
setMarkup_basis_version
- Parameters:
markup_basis_version- the markup_basis_version to set
-
getMarkup_language
- Returns:
- the markup_language
-
setMarkup_language
- Parameters:
markup_language- the markup_language to set
-
getMarkup_language_version
- Returns:
- the markup_language_version
-
setMarkup_language_version
- Parameters:
markup_language_version- the markup_language_version to set
-
toTextMDCharset
Transform a given charset in the "authorized" list given in the textMD schema enumeration. From the schema documentation on charset (http://www.loc.gov/standards/textMD/elementSet/index.html#element_charset). The character set employed by the text. Controlled vocab using IANA names for character sets: http://www.iana.org/assignments/character-sets. The problem arises because the java Charset uses the (preferred MIME name) where textMD uses the Name ...- Parameters:
srcCharset- charset from the file- Returns:
- normalized charset
-
toISO_639_2
Transform a language to the ISO_639-2 language (only enumeration allowed in textMD schema).- Parameters:
srcLang- language in the file- Returns:
- normalized language in 3 letters (except qaa-qtz)
-