The default compression codec to use for files written to HDFS.
The default compression codec to use for files written to HDFS. This may be modified by specifying the following property: mothra.rollupday.compression.
Values typically supported by Hadoop include bzip2, gzip, lz4,
lzo, lzop, snappy, and default. The empty string indicates no
compression.
The default number of threads to run for joining files when the
mothra.rollupday.maxThreads Java property is not set.
The default number of threads to run for joining files when the
mothra.rollupday.maxThreads Java property is not set. (The scanning
task always runs in its own thread.)
The compression codec used for files written to HDFS.
The compression codec used for files written to HDFS. This may be set by setting the "mothra.rollupday.compression" property. If that property is not set, DEFAULT_COMPRESSION is used.
The Hadoop configuration
The information model
How often to print log messages regarding the number of tasks, in seconds.
The maximum number of filejoiner threads to start.
The maximum number of filejoiner threads to start. It defaults to the
value DEFAULT_MAX_THREADS.
This run-time behavior may be modified by setting the mothra.rollupday.maxThreads property.
The (approximate) maximum size file to create.
The (approximate) maximum size file to create. The default is no maximum. When a file's size exceeds this value, the file is closed and a new file is started. Typically a file's size will not exceed this value by more than the maximum size of an IPFIX message, 64k.
Object to implement the RollupDay application.
Typical Usage in a Spark environment:
spark-submit --class org.cert.netsa.mothra.packer.tools.RollupDayMain mothra-tools.jar <s1> [<s2> <s3> ...]where:
s1..sn: Directories to process, as Hadoop URIs
RollupDay reduces the number of data files in a Mothra repository. It may also be used to modify the files' compression.
RollupDay runs as a batch process, not as a daemon.
RollupDay makes a single recursive scan of the source directories <s1>, <s2>, ... for files whose names match the pattern "YYYYMMDD.HH." or "YYYYMMDD.HH-PTddH." (It looks for files matching the regular expression
^\d{8}\.\d{2}(?:-PT\d\d?H)?\.) Files whose names match that pattern and reside in the same directory are processed by RollupDay to create a single new file (see next paragraph) in the same directory containing the records in all files in that directory.RollupDay joins the files in a directory into a single file by default. The
mothra.rollupday.maximumSizeJava property may be used to limit the maximum file size. The size is for the compressed file if compression is active. The value is approximate since it is only checked after the data appears on disk which occurs in large blocks because of buffering by the Java stream code and the compression algorithm.There is always a single thread that recursively scans the directories. The number of threads that joins the files may be set by specifying the
mothra.rollupday.maxThreadsJava property. If not specified, the default is 6.By default, RollupDay does not compress the files it writes. (NOTE: It should support writing the output using the same compression as the input.) To specify the compression codec that it should use, specify the
mothra.rollupday.compressionJava property. Values typically supported by Hadoop includebzip2,gzip,lz4,lzo,lzop,snappy, anddefault. The empty string indicates no compression.