Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously.
Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously. A larger value provides more throughput. This value may be specified at run-time by specifying the following property: mothra.packer.maxPackJobs
Default value for the number of times the packer attempts to process an incoming file.
Default value for the number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is left in the incoming directory but ignored by the Packer.
Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process.
Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process. This value may be specified at run-time by specifying the following property: mothra.packer.pollingInterval
Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.
Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.
When this interval is reached, the sizes and ages of the files in the working directory are checked. Files that meet ONE of the following criteria are closed and moved into the data repository. The criteria are:
Files that were created more than maximumAge seconds ago. Since files
are only checked at this interval, a file could potentially be one
interval older than the maximumAge.
Files whose size exceeds maximumSize. Since a file's size is not
continuously monitored, a file could be larger than this size, and the
user should set this value appropriately.
Files whose size is at least minimumsSize AND that were created at
least minimumAge seconds ago.
This value may be specified at run-time by specifying the following property: mothra.packer.workDir.checkInterval
Default value for the "maximum" age (maximumAge) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL.
Default value for the "maximum" age (maximumAge) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at
run-time by specifying the following property:
mothra.packer.workDir.maximumAge
Default value for the "maximum" size (maximumSize) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL.
Default value for the "maximum" size (maximumSize) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at
run-time by specifying the following property:
mothra.packer.workDir.maximumSize
Default value for the "minimum" age (minimumAge) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL.
Default value for the "minimum" age (minimumAge) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at
run-time by specifying the following property:
mothra.packer.workDir.minimumAge
Default value for the "minimum" size (minimumSize) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL.
Default value for the "minimum" size (minimumSize) of a file in the
working directory, as explained in the documentation of
DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at
run-time by specifying the following property:
mothra.packer.workDir.minimumSize
The root of an additional directory to which working files are copied
after being copied into the rootDir.
The root of an additional directory to which working files are copied
after being copied into the rootDir. This value is determined by the
"mothra.packer.archiveDirectory" property. When the property is not
set, the files are removed.
The compression codec used for files written to HDFS.
The compression codec used for files written to HDFS. This value is determined by the "mothra.packer.compression" property, or Packer.DEFAULT_COMPRESSION when that property is not set.
The maximum number of open files maintained by the file cache.
The maximum number of open files maintained by the file cache. This value
is determined by the mothra.packer.fileCacheSize Java property, or by
Packer.DEFAULT_FILE_CACHE_SIZE when the property is not set. This
value must be no less than Packer.MINIMUM_FILE_CACHE_SIZE.
Packer.DEFAULT_FILE_CACHE_SIZE for a full description of this value.
The number of hours covered by each file in the repository.
The number of hours covered by each file in the repository. This value is determined by the "mothra.packer.hoursPerFile" property, or Packer.DEFAULT_HOURS_PER_FILE when that property is not set.
The source (incoming) directory for ipfix files to be processed.
The information model
How often to print log messages regarding the number of tasks and number of files waiting to be moved, in seconds.
The maximum number of PackFileJob instances to run simultaneously.
The maximum number of PackFileJob instances to run simultaneously. This value is determined by the "mothra.packer.maxPackJobs" property, or DEFAULT_MAX_PACK_JOBS when that property is not set.
The size of the thread pool that closes the work files and moves them to the destination directory.
The size of the thread pool that closes the work files and moves them to
the destination directory. A task is potentially created every
workdirCheckInterval seconds if files are determined to have met the
limits. This value is determined by the "mothra.packer.numMoveThreads"
property, or Packer.DEFAULT_NUM_MOVE_THREADS when that property is not
set.
The number of times the packer attempts to process an incoming file.
The number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is ignored by this invocation of the packer. This value is determined by the "mothra.packer.packAttempts" property, or DEFAULT_PACK_ATTEMPTS when that property is not set.
The packing logic (a .scala source file).
The object that categorizes the records and writes them.
How long to wait between polls of the incoming (source) directory directory, in seconds.
How long to wait between polls of the incoming (source) directory directory, in seconds. This value is determined by the "mothra.packer.pollingInterval" property, or DEFAULT_POLL_INTERVAL when that property is not set.
The data repository (output directory).
The object that watches for incoming files.
A local directory used to first create the output files; this directory must allow files to be closed and re-opened.
How often to check whether the size and age of files in the working directory meet limits, in seconds.
How often to check whether the size and age of files in the working directory meet limits, in seconds. This value is determined by the "mothra.packer.workDir.checkInterval" property, or DEFAULT_WORKDIR_CHECK_INTERVAL when that property is not set.
Approximate maximum age for a file in the working directory.
Approximate maximum age for a file in the working directory. Files older than this are moved regardless of their size. This value is determined by the "mothra.packer.workDir.maximumAge" property, or DEFAULT_WORKDIR_MAXIMUM_AGE when that property is not set.
Approximate maximum size for a file in the working directory.
Approximate maximum size for a file in the working directory. Files larger than this are moved regardless of their age. This value is determined by the "mothra.packer.workDir.maximumSize" property, or DEFAULT_WORKDIR_MAXIMUM_SIZE when that property is not set.
Minimum age a file in the working directory must have before it is moved
to the repository, in seconds, unless its size exceeds
workdirMaximumSize.
Minimum age a file in the working directory must have before it is moved
to the repository, in seconds, unless its size exceeds
workdirMaximumSize. This value is determined by the
"mothra.packer.workDir.minimumAge" property, or
DEFAULT_WORKDIR_MINIMUM_AGE when that property is not set.
Minimum size a file in the working directory must have before it is moved
to the repository, in octets, unless its age exceeds
workdirMaximumAge.
Minimum size a file in the working directory must have before it is moved
to the repository, in octets, unless its age exceeds
workdirMaximumAge. This value is determined by the
"mothra.packer.workDir.minimumSize" property, or
DEFAULT_WORKDIR_MINIMUM_SIZE when that property is not set.
Object to implement the Packer application
Typical usage in a Spark environment:
spark-submit --class org.cert.netsa.mothra.packer.tools.PackerMain mothra-tools.jar [--one-shot] <srcDir> <destDir> <workDir> <partitioner>where:
srcDir: Source (incoming) directory as Hadoop URI destDir: Destination directory as Hadoop URI workDir: Working directory on the local disk (not file://) partitioner: Partitioning file as Hadoop URIs
Packer scans the source directory (
srcDir) for IPFIX files. It splits the IPFIX records in each file into output file(s) in a time-based directory structure based on the partitioning rules in the partitioning file (partitioner). The output files are initially created in the working directory (workDir), and when they meet size and/or age thresholds, they are moved to the destination directory (destDir).If "--one-shot" is included on the command line, the
srcDiris only scanned one time. Once all files insrcDirhave been packed (or they fail to be packed after some number of attempts), the packer exits.The Java property values that are used by Packer are:
mothra.packer.compression-- The compression to use for files written to HDFS. Values typically supported by Hadoop includebzip2,gzip,lz4,lzo,lzop,snappy, anddefault. The empty string indicates no compression. The default is no compression.mothra.packer.maxPackJobs-- The size of the thread pool that determines the maximum number of input files that may be processed simultaneously. A larger value provides more throughput. The default is 1.mothra.packer.hoursPerFile-- The number of hours covered by each file in the repository. The valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.mothra.packer.pollingInterval-- How long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process. The default is 30.mothra.packer.workDir.checkInterval-- The value for how often, in seconds, to check the sizes and ages of the files in the working directory. The default is 60. When thecheckIntervalis reached, the sizes and ages of the files in the working directory are checked. Files that meet ONE of the following criteria are closed and moved into the data repository. The criteria are:--- Files that were created more than
maximumAgeseconds ago. Since files are only checked at this interval, a file could potentially be one interval older than themaximumAge.--- Files whose size exceeds
maximumSize. Since a file's size is not continuously monitored, a file could be larger than this size, and the user should set this value appropriately.--- Files whose size is at least
minimumsSizeAND that were created at leastminimumAgeseconds ago.mothra.packer.workDir.maximumAge-- Files in the working directory that were created over this number of seconds ago are always moved into the repository, regardless of their size. The default value is 1800 seconds (30 minutes).mothra.packer.workDir.maximumSize-- Files in the working directory whose size, in octets, is greater than this value are always moved into the repository, regardless of their age. The default value is 104857600 bytes (100MiB).mothra.packer.workDir.minimumAge-- Files in the working directory are NOT eligible to be moved into the repository if they are younger this age (were created less this number of seconds ago) unless their size exceedsmaximumSize. The default is 600 seconds (5 minutes).mothra.packer.workDir.minimumSize-- Files in the working directory are NOT eligible to moved moved into the repository if they are smaller than this size (in octets) unless their age exceedsmaximumAge. The default is 67108864 bytes (64 MiB).mothra.packer.numMoveThreads-- The size of the thread pool that closes the work files and moves them to the destination directory. A task is potentially created everyworkdirCheckIntervalseconds if files are determined to have met the limits. The default is 4.mothra.packer.archiveDirectory-- The root directory into which working files are moved after the packer copies their content to the repository, as a Hadoop URI. If not specified, the working files are deleted.mothra.packer.packAttempts-- The number of times the packer attempts to process a file found in the srcDir. After this number of failed attempts, the file is ignored by this invocation of the packer. The default is 3.mothra.packer.fileCacheSize-- The maximum size of the open file cache. This is the maximum number of open files maintained by the file cache for writing to files in the work directory. The packer does not limit the number of files in the work directory; this only limits the number of open files. Once the cache reaches this number of open files and the packer needs to (re-)open a file, the packer closes the least-recently-used file. This value does not include the file handles required when reading incoming files or when copying files from the work directory to the data directory. The default is 2000; the minimum permitted is 128.