Class WarcRecordData
- java.lang.Object
-
- edu.harvard.hul.ois.jhove.module.warc.WarcRecordData
-
public class WarcRecordData extends Object
Copied from JHOVE2 WARC module. This class is a wrapper for the information available in an WARC record. Since the WARC reader is not persistent its data must be moved to a simpler data class which can be persisted instead. Note: Some populate methods currently do not include any functionality. However they are included for backwards compatibility in case the ISO standard changes and extra properties are required.- Author:
- nicl
-
-
Field Summary
Fields Modifier and Type Field Description protected BooleanbHasPayloadprotected BooleanbIsNonCompliantBoolean indicating whether this record is compliant or not.protected StringcomputedBlockDigestComputed block digest.protected StringcomputedBlockDigestAlgorithmComputed block digest algorithm.protected StringcomputedBlockDigestEncodingComputed block digest encoding.protected StringcomputedPayloadDigestComputed payload digest, if applicable.protected StringcomputedPayloadDigestAlgorithmComputed payload digest algorithm, if applicable.protected StringcomputedPayloadDigestEncodingComputed payload digest encoding, if applicable.protected LongconsumedNumber of bytes consumed validating record.protected StringcontentLengthContent-Length read from header.protected StringcontentTypeContent-type read from header.protected StringipVersionIP vresion of WARC-IP-Address (4 or 6).protected BooleanisValidBlockDigestBoolean indicating whether the block digest is valid or not.protected BooleanisValidPayloadDigestBoolean indicating whether the payload digest is valid or not.protected StringpayloadLengthPayload length, without payload header (version block/HTTP header).protected StringprotocolContentTypeContent-type read from HTTP header, if present.protected StringprotocolServerServer header entry read from HTTP header, if present.protected StringprotocolUserAgentUser-Agent header entry read from HTTP header, if present.protected StringprotocolVersionProtocol version read from HTTP header, if present.protected StringrecordIdSchemeWARC-Record-Id scheme used.protected StringresultCodeResult-code read from HTTP header, if present.protected LongstartOffsetStart offset of record in input stream.protected StringwarcBlockDigestBlock digest read from header.protected StringwarcBlockDigestAlgorithmBlock digest algorithm read from header.protected StringwarcBlockDigestEncodingBlock digest encoding auto-detected from digest and algorithm.protected List<String>warcConcurrentToListList of WARC-Concurrent-To read from header.protected StringwarcDateWARC-Date read from header.protected StringwarcFilenameWARC-Filename read from header.protected StringwarcIdentifiedPayloadTypeWARC-Identified-Payload-Type read from header.protected StringwarcIpAddressWARC-IP-Address read from header.protected StringwarcPayloadDigestPayload digest read from header.protected StringwarcPayloadDigestAlgorithmPayload digest algorithm read from header.protected StringwarcPayloadDigestEncodingPayload digest encoding auto-detected from digest and algorithm.protected StringwarcProfileWARC-Profile read from header.protected StringwarcRecordIdWARC-Record-Id read from header.protected StringwarcRefersToWARC-Refers-To read from header.protected StringwarcSegmentNumberWARC-Segment-Number read from header.protected StringwarcSegmentOriginIdWARC-Segment-Origin-ID read from header.protected StringwarcSegmentTotalLengthWARC-Segment-Total-Length read from header.protected StringwarcTargetUriWARC-Target-URI read from header.protected StringwarcTruncatedWARC-Truncated read from header.protected StringwarcTypeWARC-Type read from header.protected StringwarcVersionStrWARC version read from header.protected StringwarcWarcinfoIdWARC-Warcinfo-ID read from header.
-
Constructor Summary
Constructors Constructor Description WarcRecordData()Constructor required by the persistence layer.WarcRecordData(org.jwat.warc.WarcRecord record)Constructs an object using the data in theWarcRecordobject.
-
-
-
Field Detail
-
startOffset
protected Long startOffset
Start offset of record in input stream.
-
consumed
protected Long consumed
Number of bytes consumed validating record.
-
warcVersionStr
protected String warcVersionStr
WARC version read from header.
-
warcType
protected String warcType
WARC-Type read from header.
-
warcFilename
protected String warcFilename
WARC-Filename read from header.
-
warcRecordId
protected String warcRecordId
WARC-Record-Id read from header.
-
warcDate
protected String warcDate
WARC-Date read from header.
-
contentLength
protected String contentLength
Content-Length read from header.
-
contentType
protected String contentType
Content-type read from header.
-
warcTruncated
protected String warcTruncated
WARC-Truncated read from header.
-
warcIpAddress
protected String warcIpAddress
WARC-IP-Address read from header.
-
warcConcurrentToList
protected List<String> warcConcurrentToList
List of WARC-Concurrent-To read from header.
-
warcRefersTo
protected String warcRefersTo
WARC-Refers-To read from header.
-
warcTargetUri
protected String warcTargetUri
WARC-Target-URI read from header.
-
warcWarcinfoId
protected String warcWarcinfoId
WARC-Warcinfo-ID read from header.
-
warcIdentifiedPayloadType
protected String warcIdentifiedPayloadType
WARC-Identified-Payload-Type read from header.
-
warcProfile
protected String warcProfile
WARC-Profile read from header.
-
warcSegmentNumber
protected String warcSegmentNumber
WARC-Segment-Number read from header.
-
warcSegmentOriginId
protected String warcSegmentOriginId
WARC-Segment-Origin-ID read from header.
-
warcSegmentTotalLength
protected String warcSegmentTotalLength
WARC-Segment-Total-Length read from header.
-
warcBlockDigest
protected String warcBlockDigest
Block digest read from header.
-
warcBlockDigestAlgorithm
protected String warcBlockDigestAlgorithm
Block digest algorithm read from header.
-
warcBlockDigestEncoding
protected String warcBlockDigestEncoding
Block digest encoding auto-detected from digest and algorithm.
-
warcPayloadDigest
protected String warcPayloadDigest
Payload digest read from header.
-
warcPayloadDigestAlgorithm
protected String warcPayloadDigestAlgorithm
Payload digest algorithm read from header.
-
warcPayloadDigestEncoding
protected String warcPayloadDigestEncoding
Payload digest encoding auto-detected from digest and algorithm.
-
computedBlockDigest
protected String computedBlockDigest
Computed block digest.
-
computedBlockDigestAlgorithm
protected String computedBlockDigestAlgorithm
Computed block digest algorithm.
-
computedBlockDigestEncoding
protected String computedBlockDigestEncoding
Computed block digest encoding.
-
computedPayloadDigest
protected String computedPayloadDigest
Computed payload digest, if applicable.
-
computedPayloadDigestAlgorithm
protected String computedPayloadDigestAlgorithm
Computed payload digest algorithm, if applicable.
-
computedPayloadDigestEncoding
protected String computedPayloadDigestEncoding
Computed payload digest encoding, if applicable.
-
recordIdScheme
protected String recordIdScheme
WARC-Record-Id scheme used.
-
bIsNonCompliant
protected Boolean bIsNonCompliant
Boolean indicating whether this record is compliant or not.
-
isValidBlockDigest
protected Boolean isValidBlockDigest
Boolean indicating whether the block digest is valid or not.
-
isValidPayloadDigest
protected Boolean isValidPayloadDigest
Boolean indicating whether the payload digest is valid or not.
-
bHasPayload
protected Boolean bHasPayload
-
payloadLength
protected String payloadLength
Payload length, without payload header (version block/HTTP header).
-
ipVersion
protected String ipVersion
IP vresion of WARC-IP-Address (4 or 6).
-
resultCode
protected String resultCode
Result-code read from HTTP header, if present.
-
protocolVersion
protected String protocolVersion
Protocol version read from HTTP header, if present.
-
protocolContentType
protected String protocolContentType
Content-type read from HTTP header, if present.
-
protocolServer
protected String protocolServer
Server header entry read from HTTP header, if present.
-
protocolUserAgent
protected String protocolUserAgent
User-Agent header entry read from HTTP header, if present.
-
-