Class WarcRecordData
java.lang.Object
edu.harvard.hul.ois.jhove.module.warc.WarcRecordData
Copied from JHOVE2 WARC module.
This class is a wrapper for the information available in an WARC record.
Since the WARC reader is not persistent its data must be moved to a simpler
data class which can be persisted instead.
Note: Some populate methods currently do not include any functionality.
However they are included for backwards compatibility in case the ISO
standard changes and extra properties are required.
- Author:
- nicl
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Booleanprotected BooleanBoolean indicating whether this record is compliant or not.protected StringComputed block digest.protected StringComputed block digest algorithm.protected StringComputed block digest encoding.protected StringComputed payload digest, if applicable.protected StringComputed payload digest algorithm, if applicable.protected StringComputed payload digest encoding, if applicable.protected LongNumber of bytes consumed validating record.protected StringContent-Length read from header.protected StringContent-type read from header.protected StringIP vresion of WARC-IP-Address (4 or 6).protected BooleanBoolean indicating whether the block digest is valid or not.protected BooleanBoolean indicating whether the payload digest is valid or not.protected StringPayload length, without payload header (version block/HTTP header).protected StringContent-type read from HTTP header, if present.protected StringServer header entry read from HTTP header, if present.protected StringUser-Agent header entry read from HTTP header, if present.protected StringProtocol version read from HTTP header, if present.protected StringWARC-Record-Id scheme used.protected StringResult-code read from HTTP header, if present.protected LongStart offset of record in input stream.protected StringBlock digest read from header.protected StringBlock digest algorithm read from header.protected StringBlock digest encoding auto-detected from digest and algorithm.List of WARC-Concurrent-To read from header.protected StringWARC-Date read from header.protected StringWARC-Filename read from header.protected StringWARC-Identified-Payload-Type read from header.protected StringWARC-IP-Address read from header.protected StringPayload digest read from header.protected StringPayload digest algorithm read from header.protected StringPayload digest encoding auto-detected from digest and algorithm.protected StringWARC-Profile read from header.protected StringWARC-Record-Id read from header.protected StringWARC-Refers-To read from header.protected StringWARC-Segment-Number read from header.protected StringWARC-Segment-Origin-ID read from header.protected StringWARC-Segment-Total-Length read from header.protected StringWARC-Target-URI read from header.protected StringWARC-Truncated read from header.protected StringWARC-Type read from header.protected StringWARC version read from header.protected StringWARC-Warcinfo-ID read from header. -
Constructor Summary
ConstructorsConstructorDescriptionConstructor required by the persistence layer.WarcRecordData(org.jwat.warc.WarcRecord record) Constructs an object using the data in theWarcRecordobject. -
Method Summary
-
Field Details
-
startOffset
Start offset of record in input stream. -
consumed
Number of bytes consumed validating record. -
warcVersionStr
WARC version read from header. -
warcType
WARC-Type read from header. -
warcFilename
WARC-Filename read from header. -
warcRecordId
WARC-Record-Id read from header. -
warcDate
WARC-Date read from header. -
contentLength
Content-Length read from header. -
contentType
Content-type read from header. -
warcTruncated
WARC-Truncated read from header. -
warcIpAddress
WARC-IP-Address read from header. -
warcConcurrentToList
List of WARC-Concurrent-To read from header. -
warcRefersTo
WARC-Refers-To read from header. -
warcTargetUri
WARC-Target-URI read from header. -
warcWarcinfoId
WARC-Warcinfo-ID read from header. -
warcIdentifiedPayloadType
WARC-Identified-Payload-Type read from header. -
warcProfile
WARC-Profile read from header. -
warcSegmentNumber
WARC-Segment-Number read from header. -
warcSegmentOriginId
WARC-Segment-Origin-ID read from header. -
warcSegmentTotalLength
WARC-Segment-Total-Length read from header. -
warcBlockDigest
Block digest read from header. -
warcBlockDigestAlgorithm
Block digest algorithm read from header. -
warcBlockDigestEncoding
Block digest encoding auto-detected from digest and algorithm. -
warcPayloadDigest
Payload digest read from header. -
warcPayloadDigestAlgorithm
Payload digest algorithm read from header. -
warcPayloadDigestEncoding
Payload digest encoding auto-detected from digest and algorithm. -
computedBlockDigest
Computed block digest. -
computedBlockDigestAlgorithm
Computed block digest algorithm. -
computedBlockDigestEncoding
Computed block digest encoding. -
computedPayloadDigest
Computed payload digest, if applicable. -
computedPayloadDigestAlgorithm
Computed payload digest algorithm, if applicable. -
computedPayloadDigestEncoding
Computed payload digest encoding, if applicable. -
recordIdScheme
WARC-Record-Id scheme used. -
bIsNonCompliant
Boolean indicating whether this record is compliant or not. -
isValidBlockDigest
Boolean indicating whether the block digest is valid or not. -
isValidPayloadDigest
Boolean indicating whether the payload digest is valid or not. -
bHasPayload
-
payloadLength
Payload length, without payload header (version block/HTTP header). -
ipVersion
IP vresion of WARC-IP-Address (4 or 6). -
resultCode
Result-code read from HTTP header, if present. -
protocolVersion
Protocol version read from HTTP header, if present. -
protocolContentType
Content-type read from HTTP header, if present. -
protocolServer
Server header entry read from HTTP header, if present. -
protocolUserAgent
User-Agent header entry read from HTTP header, if present.
-
-
Constructor Details
-
WarcRecordData
public WarcRecordData()Constructor required by the persistence layer. -
WarcRecordData
public WarcRecordData(org.jwat.warc.WarcRecord record) Constructs an object using the data in theWarcRecordobject.- Parameters:
record- parsed WARC record
-