Package edu.harvard.hul.ois.jhove.module
Class WarcModule
java.lang.Object
edu.harvard.hul.ois.jhove.ModuleBase
edu.harvard.hul.ois.jhove.module.WarcModule
- All Implemented Interfaces:
Module
JHOVE module for identifying, validating and characterizing WARC files.
Ported from the JHOVE2 WARC module and based on the JWAT-tool, both
created by nicl@kb.dk (nclarkekb@git).
This is a non-recursive validation. It only validates the WARC file format
and WARC headers, not the actual payload of the WARC records.
- Author:
- jolf@kb.dk
-
Field Summary
Fields inherited from class edu.harvard.hul.ois.jhove.ModuleBase
_app, _bigEndian, _checksumFinished, _ckSummer, _countStream, _coverage, _crc32, _cstream, _date, _defaultParams, _dstream, _features, _format, _init, _isRandomAccess, _je, _logger, _md5, _mimeType, _name, _nByte, _note, _param, _release, _repInfoNote, _rights, _sha1, _sha256, _signature, _specification, _validityNote, _vendor, _verbosity, _wellFormedNoteFields inherited from interface edu.harvard.hul.ois.jhove.Module
MAXIMUM_VERBOSITY, MINIMUM_VERBOSITY -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcheckSignatures(File file, InputStream stream, RepInfo info) voidcheckSignatures(File file, RandomAccessFile raf, RepInfo info) intparse(InputStream stream, RepInfo info, int parseIndex) voidparse(RandomAccessFile file, RepInfo info) protected voidparseRecords(org.jwat.warc.WarcReader reader) Parse WARC records.protected voidprocessRecord(org.jwat.warc.WarcRecord record) Process a WARC record.voidReset parameter settings.protected voidsetReaderOptions(org.jwat.warc.WarcReader reader) Set digest options for WARC reader.Methods inherited from class edu.harvard.hul.ois.jhove.ModuleBase
addIntegerProperty, addIntegerProperty, applyDefaultParams, calcRAChecksum, checksumIfRafNotCopied, getApp, getBase, getBufferedDataStream, getCoverage, getCRC32, getDate, getDefaultParams, getFeatures, getFormat, getMimeType, getName, getNByte, getNote, getRelease, getRepInfoNote, getRights, getSignature, getSpecification, getValidityNote, getVendor, getWellFormedNote, hasFeature, init, initFeatures, initInfo, initParse, isBigEndian, isParamInDefaults, isRandomAccess, param, readByteBuf, readDouble, readDouble, readDouble, readFloat, readFloat, readSignedByte, readSignedByte, readSignedByte, readSignedInt, readSignedInt, readSignedInt, readSignedLong, readSignedRational, readSignedRational, readSignedShort, readSignedShort, readSignedShort, readUnsignedByte, readUnsignedByte, readUnsignedByte, readUnsignedInt, readUnsignedInt, readUnsignedInt, readUnsignedRational, readUnsignedRational, readUnsignedRational, readUnsignedShort, readUnsignedShort, readUnsignedShort, setApp, setBase, setChecksums, setCRC32, setDefaultParams, setMD5, setNByte, setSHA1, setSHA256, setupDataStream, setValidityNote, setVerbosity, show, skipBytes, skipBytes, skipDstreamToEnd, vectorToPropArray
-
Constructor Details
-
WarcModule
public WarcModule()Constructor.
-
-
Method Details
-
resetParams
public void resetParams()Reset parameter settings. Returns to a default state without any parameters.- Specified by:
resetParamsin interfaceModule- Overrides:
resetParamsin classModuleBase
-
checkSignatures
- Specified by:
checkSignaturesin interfaceModule- Overrides:
checkSignaturesin classModuleBase- Throws:
IOException
-
checkSignatures
- Specified by:
checkSignaturesin interfaceModule- Overrides:
checkSignaturesin classModuleBase- Throws:
IOException
-
parse
- Specified by:
parsein interfaceModule- Overrides:
parsein classModuleBase- Throws:
IOException
-
parse
- Specified by:
parsein interfaceModule- Overrides:
parsein classModuleBase- Throws:
IOException
-
setReaderOptions
Set digest options for WARC reader.- Parameters:
reader- WARC reader instance- Throws:
JhoveException
-
parseRecords
Parse WARC records. Parsing should be straight forward with all records accessible through the same source.- Parameters:
reader- WARC reader used to parse records- Throws:
IOException- if an IO error occurs while processingJhoveException- if a serious problem needs to be reported
-
processRecord
Process a WARC record. Does not characterize the record payload.- Parameters:
record- WARC record from WARC reader- Throws:
IOException- if an IO error occurs while processing
-