Package edu.harvard.hul.ois.jhove.module
Class WarcModule
- java.lang.Object
-
- edu.harvard.hul.ois.jhove.ModuleBase
-
- edu.harvard.hul.ois.jhove.module.WarcModule
-
- All Implemented Interfaces:
Module
public class WarcModule extends ModuleBase
JHOVE module for identifying, validating and characterizing WARC files. Ported from the JHOVE2 WARC module and based on the JWAT-tool, both created by nicl@kb.dk (nclarkekb@git). This is a non-recursive validation. It only validates the WARC file format and WARC headers, not the actual payload of the WARC records.- Author:
- jolf@kb.dk
-
-
Field Summary
-
Fields inherited from class edu.harvard.hul.ois.jhove.ModuleBase
_app, _bigEndian, _checksumFinished, _ckSummer, _countStream, _coverage, _crc32, _cstream, _date, _defaultParams, _dstream, _features, _format, _init, _isRandomAccess, _je, _logger, _md5, _mimeType, _name, _nByte, _note, _param, _release, _repInfoNote, _rights, _sha1, _sha256, _signature, _specification, _validityNote, _vendor, _verbosity, _wellFormedNote
-
Fields inherited from interface edu.harvard.hul.ois.jhove.Module
MAXIMUM_VERBOSITY, MINIMUM_VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description WarcModule()Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcheckSignatures(File file, InputStream stream, RepInfo info)voidcheckSignatures(File file, RandomAccessFile raf, RepInfo info)intparse(InputStream stream, RepInfo info, int parseIndex)voidparse(RandomAccessFile file, RepInfo info)protected voidparseRecords(org.jwat.warc.WarcReader reader)Parse WARC records.protected voidprocessRecord(org.jwat.warc.WarcRecord record)Process a WARC record.voidresetParams()Reset parameter settings.protected voidsetReaderOptions(org.jwat.warc.WarcReader reader)Set digest options for WARC reader.-
Methods inherited from class edu.harvard.hul.ois.jhove.ModuleBase
addIntegerProperty, addIntegerProperty, applyDefaultParams, calcRAChecksum, checksumIfRafNotCopied, getApp, getBase, getBufferedDataStream, getCoverage, getCRC32, getDate, getDefaultParams, getFeatures, getFormat, getMimeType, getName, getNByte, getNote, getRelease, getRepInfoNote, getRights, getSignature, getSpecification, getValidityNote, getVendor, getWellFormedNote, hasFeature, init, initFeatures, initInfo, initParse, isBigEndian, isParamInDefaults, isRandomAccess, param, readByteBuf, readDouble, readDouble, readDouble, readFloat, readFloat, readSignedByte, readSignedByte, readSignedByte, readSignedInt, readSignedInt, readSignedInt, readSignedLong, readSignedRational, readSignedRational, readSignedShort, readSignedShort, readSignedShort, readUnsignedByte, readUnsignedByte, readUnsignedByte, readUnsignedInt, readUnsignedInt, readUnsignedInt, readUnsignedRational, readUnsignedRational, readUnsignedRational, readUnsignedShort, readUnsignedShort, readUnsignedShort, setApp, setBase, setChecksums, setCRC32, setDefaultParams, setMD5, setNByte, setSHA1, setSHA256, setupDataStream, setValidityNote, setVerbosity, show, skipBytes, skipBytes, skipDstreamToEnd, vectorToPropArray
-
-
-
-
Method Detail
-
resetParams
public void resetParams()
Reset parameter settings. Returns to a default state without any parameters.- Specified by:
resetParamsin interfaceModule- Overrides:
resetParamsin classModuleBase
-
checkSignatures
public void checkSignatures(File file, InputStream stream, RepInfo info) throws IOException
- Specified by:
checkSignaturesin interfaceModule- Overrides:
checkSignaturesin classModuleBase- Throws:
IOException
-
checkSignatures
public void checkSignatures(File file, RandomAccessFile raf, RepInfo info) throws IOException
- Specified by:
checkSignaturesin interfaceModule- Overrides:
checkSignaturesin classModuleBase- Throws:
IOException
-
parse
public void parse(RandomAccessFile file, RepInfo info) throws IOException
- Specified by:
parsein interfaceModule- Overrides:
parsein classModuleBase- Throws:
IOException
-
parse
public int parse(InputStream stream, RepInfo info, int parseIndex) throws IOException
- Specified by:
parsein interfaceModule- Overrides:
parsein classModuleBase- Throws:
IOException
-
setReaderOptions
protected void setReaderOptions(org.jwat.warc.WarcReader reader) throws JhoveExceptionSet digest options for WARC reader.- Parameters:
reader- WARC reader instance- Throws:
JhoveException
-
parseRecords
protected void parseRecords(org.jwat.warc.WarcReader reader) throws IOException, JhoveExceptionParse WARC records. Parsing should be straight forward with all records accessible through the same source.- Parameters:
reader- WARC reader used to parse records- Throws:
IOException- if an IO error occurs while processingJhoveException- if a serious problem needs to be reported
-
processRecord
protected void processRecord(org.jwat.warc.WarcRecord record) throws IOExceptionProcess a WARC record. Does not characterize the record payload.- Parameters:
record- WARC record from WARC reader- Throws:
IOException- if an IO error occurs while processing
-
-