Class WarcRecordData


  • public class WarcRecordData
    extends Object
    Copied from JHOVE2 WARC module. This class is a wrapper for the information available in an WARC record. Since the WARC reader is not persistent its data must be moved to a simpler data class which can be persisted instead. Note: Some populate methods currently do not include any functionality. However they are included for backwards compatibility in case the ISO standard changes and extra properties are required.
    Author:
    nicl
    • Field Detail

      • startOffset

        protected Long startOffset
        Start offset of record in input stream.
      • consumed

        protected Long consumed
        Number of bytes consumed validating record.
      • warcVersionStr

        protected String warcVersionStr
        WARC version read from header.
      • warcType

        protected String warcType
        WARC-Type read from header.
      • warcFilename

        protected String warcFilename
        WARC-Filename read from header.
      • warcRecordId

        protected String warcRecordId
        WARC-Record-Id read from header.
      • warcDate

        protected String warcDate
        WARC-Date read from header.
      • contentLength

        protected String contentLength
        Content-Length read from header.
      • contentType

        protected String contentType
        Content-type read from header.
      • warcTruncated

        protected String warcTruncated
        WARC-Truncated read from header.
      • warcIpAddress

        protected String warcIpAddress
        WARC-IP-Address read from header.
      • warcConcurrentToList

        protected List<String> warcConcurrentToList
        List of WARC-Concurrent-To read from header.
      • warcRefersTo

        protected String warcRefersTo
        WARC-Refers-To read from header.
      • warcTargetUri

        protected String warcTargetUri
        WARC-Target-URI read from header.
      • warcWarcinfoId

        protected String warcWarcinfoId
        WARC-Warcinfo-ID read from header.
      • warcIdentifiedPayloadType

        protected String warcIdentifiedPayloadType
        WARC-Identified-Payload-Type read from header.
      • warcProfile

        protected String warcProfile
        WARC-Profile read from header.
      • warcSegmentNumber

        protected String warcSegmentNumber
        WARC-Segment-Number read from header.
      • warcSegmentOriginId

        protected String warcSegmentOriginId
        WARC-Segment-Origin-ID read from header.
      • warcSegmentTotalLength

        protected String warcSegmentTotalLength
        WARC-Segment-Total-Length read from header.
      • warcBlockDigest

        protected String warcBlockDigest
        Block digest read from header.
      • warcBlockDigestAlgorithm

        protected String warcBlockDigestAlgorithm
        Block digest algorithm read from header.
      • warcBlockDigestEncoding

        protected String warcBlockDigestEncoding
        Block digest encoding auto-detected from digest and algorithm.
      • warcPayloadDigest

        protected String warcPayloadDigest
        Payload digest read from header.
      • warcPayloadDigestAlgorithm

        protected String warcPayloadDigestAlgorithm
        Payload digest algorithm read from header.
      • warcPayloadDigestEncoding

        protected String warcPayloadDigestEncoding
        Payload digest encoding auto-detected from digest and algorithm.
      • computedBlockDigest

        protected String computedBlockDigest
        Computed block digest.
      • computedBlockDigestAlgorithm

        protected String computedBlockDigestAlgorithm
        Computed block digest algorithm.
      • computedBlockDigestEncoding

        protected String computedBlockDigestEncoding
        Computed block digest encoding.
      • computedPayloadDigest

        protected String computedPayloadDigest
        Computed payload digest, if applicable.
      • computedPayloadDigestAlgorithm

        protected String computedPayloadDigestAlgorithm
        Computed payload digest algorithm, if applicable.
      • computedPayloadDigestEncoding

        protected String computedPayloadDigestEncoding
        Computed payload digest encoding, if applicable.
      • recordIdScheme

        protected String recordIdScheme
        WARC-Record-Id scheme used.
      • bIsNonCompliant

        protected Boolean bIsNonCompliant
        Boolean indicating whether this record is compliant or not.
      • isValidBlockDigest

        protected Boolean isValidBlockDigest
        Boolean indicating whether the block digest is valid or not.
      • isValidPayloadDigest

        protected Boolean isValidPayloadDigest
        Boolean indicating whether the payload digest is valid or not.
      • bHasPayload

        protected Boolean bHasPayload
      • payloadLength

        protected String payloadLength
        Payload length, without payload header (version block/HTTP header).
      • ipVersion

        protected String ipVersion
        IP vresion of WARC-IP-Address (4 or 6).
      • resultCode

        protected String resultCode
        Result-code read from HTTP header, if present.
      • protocolVersion

        protected String protocolVersion
        Protocol version read from HTTP header, if present.
      • protocolContentType

        protected String protocolContentType
        Content-type read from HTTP header, if present.
      • protocolServer

        protected String protocolServer
        Server header entry read from HTTP header, if present.
      • protocolUserAgent

        protected String protocolUserAgent
        User-Agent header entry read from HTTP header, if present.
    • Constructor Detail

      • WarcRecordData

        public WarcRecordData()
        Constructor required by the persistence layer.
      • WarcRecordData

        public WarcRecordData​(org.jwat.warc.WarcRecord record)
        Constructs an object using the data in the WarcRecord object.
        Parameters:
        record - parsed WARC record