Package edu.harvard.hul.ois.jhove
Class TextMDMetadata
- java.lang.Object
-
- edu.harvard.hul.ois.jhove.TextMDMetadata
-
public class TextMDMetadata extends Object
Encapsulation of the textMD metadata for text files. See http://www.loc.gov/standards/textMd for more information.- Author:
- Thomas Ledoux
-
-
Field Summary
Fields Modifier and Type Field Description static String[]BYTE_ORDERUses enumerated values of 'big', 'little', and 'middle' endian.static intBYTE_ORDER_BIGstatic intBYTE_ORDER_LITTLEstatic intBYTE_ORDER_MIDDLEstatic StringCHARSET_ASCIIstatic StringCHARSET_ISO8859_1static StringCHARSET_UTF8static StringDEFAULT_LOCATIONprotected static Map<String,String>fromISO_639_2_T2BMap from ISO 639/2 T to ISO 639/2 Bstatic String[]LINEBREAKUses enumerated values of 'CR', 'LF' and 'CR/LF' for the idenntification of the linebreak.static intLINEBREAK_CRstatic intLINEBREAK_CRLFstatic intLINEBREAK_LFstatic StringNAMESPACEtextMD namespace and versionstatic intNILLTo represent the unknownprotected static SetsetOfUnknownJavaCharsetSet of unknown charsets in Javaprotected static String[]UNKNOWN_JAVA_CHARSETArray of textMD charsets unknown by java.nio.charset.Charsetsstatic StringVERSION
-
Constructor Summary
Constructors Constructor Description TextMDMetadata()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description intgetByte_order()StringgetByte_orderString()StringgetByte_size()StringgetCharacter_size()StringgetCharset()StringgetLanguage()intgetLinebreak()StringgetLinebreakString()StringgetMarkup_basis()StringgetMarkup_basis_version()StringgetMarkup_language()StringgetMarkup_language_version()voidsetByte_order(int byte_order)voidsetByte_size(String byte_size)voidsetCharacter_size(String character_size)voidsetCharset(String charset)voidsetLanguage(String language)voidsetLinebreak(int linebreak)voidsetMarkup_basis(String markup_basis)voidsetMarkup_basis_version(String markup_basis_version)voidsetMarkup_language(String markup_language)voidsetMarkup_language_version(String markup_language_version)static StringtoISO_639_2(String srcLang)Transform a language to the ISO_639-2 language (only enumeration allowed in textMD schema).static StringtoTextMDCharset(String srcCharset)Transform a given charset in the "authorized" list given in the textMD schema enumeration.
-
-
-
Field Detail
-
NAMESPACE
public static final String NAMESPACE
textMD namespace and version- See Also:
- Constant Field Values
-
DEFAULT_LOCATION
public static final String DEFAULT_LOCATION
- See Also:
- Constant Field Values
-
VERSION
public static final String VERSION
- See Also:
- Constant Field Values
-
BYTE_ORDER
public static final String[] BYTE_ORDER
Uses enumerated values of 'big', 'little', and 'middle' endian.
-
BYTE_ORDER_BIG
public static final int BYTE_ORDER_BIG
- See Also:
- Constant Field Values
-
BYTE_ORDER_LITTLE
public static final int BYTE_ORDER_LITTLE
- See Also:
- Constant Field Values
-
BYTE_ORDER_MIDDLE
public static final int BYTE_ORDER_MIDDLE
- See Also:
- Constant Field Values
-
LINEBREAK
public static final String[] LINEBREAK
Uses enumerated values of 'CR', 'LF' and 'CR/LF' for the idenntification of the linebreak.
-
LINEBREAK_CR
public static final int LINEBREAK_CR
- See Also:
- Constant Field Values
-
LINEBREAK_LF
public static final int LINEBREAK_LF
- See Also:
- Constant Field Values
-
LINEBREAK_CRLF
public static final int LINEBREAK_CRLF
- See Also:
- Constant Field Values
-
UNKNOWN_JAVA_CHARSET
protected static final String[] UNKNOWN_JAVA_CHARSET
Array of textMD charsets unknown by java.nio.charset.Charsets
-
setOfUnknownJavaCharset
protected static Set setOfUnknownJavaCharset
Set of unknown charsets in Java
-
fromISO_639_2_T2B
protected static Map<String,String> fromISO_639_2_T2B
Map from ISO 639/2 T to ISO 639/2 B
-
CHARSET_ASCII
public static final String CHARSET_ASCII
- See Also:
- Constant Field Values
-
CHARSET_UTF8
public static final String CHARSET_UTF8
- See Also:
- Constant Field Values
-
CHARSET_ISO8859_1
public static final String CHARSET_ISO8859_1
- See Also:
- Constant Field Values
-
NILL
public static final int NILL
To represent the unknown- See Also:
- Constant Field Values
-
-
Method Detail
-
getCharset
public String getCharset()
- Returns:
- the charset
-
setCharset
public void setCharset(String charset)
- Parameters:
charset- the charset to set
-
getByte_order
public int getByte_order()
- Returns:
- the byte_order
-
getByte_orderString
public String getByte_orderString()
-
setByte_order
public void setByte_order(int byte_order)
- Parameters:
byte_order- the byte_order to set
-
getByte_size
public String getByte_size()
- Returns:
- the byte_size
-
setByte_size
public void setByte_size(String byte_size)
- Parameters:
byte_size- the byte_size to set
-
getCharacter_size
public String getCharacter_size()
- Returns:
- the character_size
-
setCharacter_size
public void setCharacter_size(String character_size)
- Parameters:
character_size- the character_size to set
-
getLinebreak
public int getLinebreak()
- Returns:
- the linebreak
-
getLinebreakString
public String getLinebreakString()
- Returns:
- the linebreak in String form
-
setLinebreak
public void setLinebreak(int linebreak)
- Parameters:
linebreak- the linebreak to set
-
getLanguage
public String getLanguage()
- Returns:
- the language
-
setLanguage
public void setLanguage(String language)
- Parameters:
language- the language to set
-
getMarkup_basis
public String getMarkup_basis()
- Returns:
- the markup_basis
-
setMarkup_basis
public void setMarkup_basis(String markup_basis)
- Parameters:
markup_basis- the markup_basis to set
-
getMarkup_basis_version
public String getMarkup_basis_version()
- Returns:
- the markup_basis_version
-
setMarkup_basis_version
public void setMarkup_basis_version(String markup_basis_version)
- Parameters:
markup_basis_version- the markup_basis_version to set
-
getMarkup_language
public String getMarkup_language()
- Returns:
- the markup_language
-
setMarkup_language
public void setMarkup_language(String markup_language)
- Parameters:
markup_language- the markup_language to set
-
getMarkup_language_version
public String getMarkup_language_version()
- Returns:
- the markup_language_version
-
setMarkup_language_version
public void setMarkup_language_version(String markup_language_version)
- Parameters:
markup_language_version- the markup_language_version to set
-
toTextMDCharset
public static String toTextMDCharset(String srcCharset)
Transform a given charset in the "authorized" list given in the textMD schema enumeration. From the schema documentation on charset (http://www.loc.gov/standards/textMD/elementSet/index.html#element_charset). The character set employed by the text. Controlled vocab using IANA names for character sets: http://www.iana.org/assignments/character-sets. The problem arises because the java Charset uses the (preferred MIME name) where textMD uses the Name ...- Parameters:
srcCharset- charset from the file- Returns:
- normalized charset
-
-