Package org.dspace.checker
Provides content fixity checking (using checksums) for bitstreams stored in DSpace software.
The main access point to org.dspace.checker is on the command line
via ChecksumChecker.main(String[]),
but it is also simple to get programmatic access to ChecksumChecker
if you wish, via a CheckerCommand object.
CheckerCommand is a simple Command object. You initialize it with
a strategy for iterating through bitstreams to check (an implementation of
BitstreamDispatcher), and a object to collect
the results (an implementation of @link org.dspace.checker.ChecksumResultsCollector})
, and then call CheckerCommand.process()
to begin the processing. CheckerCommand handles the calculation of bitstream
checksums and iteration between bitstreams.
BitstreamDispatcher
The order in which bitstreams are checked and when a checking run terminates is controlled by implementations of BitstreamDispatcher, and you can extend the functionality of the package by writing your own implementation of this simple interface, although the package includes several useful implementations that will probably suffice in most cases: -
Dispatchers that generate bitstream ordering: -
Dispatchers that modify the behaviour of other Dispatchers: -
ChecksumResultsCollector
The default implementation of ChecksumResultsCollector
(ResultsLogger) logs checksum checking to the db,
but it would be simple to write your own implementation to log to LOG4J logs,
text files, JMS queues etc.
Results Pruner
The results pruner is responsible for trimming the archived Checksum logs, which can grow large otherwise. The retention period of stored check results can be configured per checksum result code. This allows you, for example, to retain records for all failures for auditing purposes, whilst discarding the storage of successful checks. The pruner uses a default configuration from dspace.cfg, but can take in alternative configurations from other properties files.
Design notes
All interaction between the checker package and the database is abstracted behind DataAccessObjects. Where practicable dependencies on DSpace code are minimized, the rationale being that it may be errors in DSpace code that have caused fixity problems.
-
Interface Summary Interface Description BitstreamDispatcher BitstreamDispatchers are strategy objects that hand bitstream ids out to workers.ChecksumResultsCollector Component that receives BitstreamInfo results from a checker. -
Class Summary Class Description CheckerCommand Main class for the checksum checker tool, which calculates checksums for each bitstream whose ID is in the most_recent_checksum table, and compares it against the last calculated checksum for that bitstream.CheckerConsumer Class for removing Checker data for a Bitstreams based on deletion events.ChecksumHistory Represents a history record for the bitstream.ChecksumHistory_ ChecksumHistoryServiceImpl Service implementation for the ChecksumHistory object.ChecksumResult Database entity representation of the checksum_results tableChecksumResult_ ChecksumResultServiceImpl Service implementation for the ChecksumResult object.DailyReportEmailer The email reporter creates and sends emails to an administrator.HandleDispatcher A BitstreamDispatcher that checks all the bitstreams contained within an item, collection or community referred to by Handle.IteratorDispatcher Really simple dispatcher that just iterates over a pre-defined list of ids.LimitedCountDispatcher Decorator that dispatches a specified number of bitstreams from a delegate dispatcher.LimitedDurationDispatcher A delegating dispatcher that puts a time limit on the operation of another dispatcher.MostRecentChecksum Database entity representation of the most_recent_checksum tableMostRecentChecksum_ MostRecentChecksumServiceImpl Service implementation for the MostRecentChecksum object.ResultsLogger Collects results from a Checksum process and outputs them to a Log4j Logger.ResultsPruner Manages the deletion of results from the checksum history.SimpleDispatcher An implementation of the selection strategy that selects bitstreams in the order that they were last checked, looping endlessly.SimpleReporterServiceImpl Simple Reporter implementation. -
Enum Summary Enum Description ChecksumResultCode Enumeration of ChecksumCheckResults containing constants for checksum comparison result that must correspond to values in checksum_result table.