object PackerMain extends App with StrictLogging
Object to implement the Packer application
Typical usage in a Spark environment:
spark-submit --class org.cert.netsa.mothra.packer.tools.PackerMain mothra-tools.jar [--one-shot] <srcDir> <destDir> <workDir> <partitioner>
where:
srcDir: Source (incoming) directory as Hadoop URI destDir: Destination directory as Hadoop URI workDir: Working directory on the local disk (not file://) partitioner: Partitioning file as Hadoop URIs
Packer scans the source directory (srcDir) for IPFIX files. It splits
the IPFIX records in each file into output file(s) in a time-based
directory structure based on the partitioning rules in the partitioning
file (partitioner). The output files are initially created in the
working directory (workDir), and when they meet size and/or age
thresholds, they are moved to the destination directory (destDir).
If "--one-shot" is included on the command line, the srcDir is only
scanned one time. Once all files in srcDir have been packed (or they
fail to be packed after some number of attempts), the packer exits.
The Java property values that are used by Packer are:
mothra.packer.compression -- The compression to use for files written to
HDFS. Values typically supported by Hadoop include bzip2, gzip,
lz4, lzo, lzop, snappy, and default. The empty string indicates
no compression. The default is no compression.
mothra.packer.maxPackJobs -- The size of the thread pool that determines
the maximum number of input files that may be processed simultaneously. A
larger value provides more throughput. The default is 1.
mothra.packer.hoursPerFile -- The number of hours covered by each file
in the repository. The valid range is 1 (a file for each hour) to 24 (one
file per day). The default is 1.
mothra.packer.pollingInterval -- How long the main thread sleeps (in
seconds) between scans (polls) of the source directory checking for IPFIX
files to process. The default is 30.
mothra.packer.workDir.checkInterval -- The value for how often, in
seconds, to check the sizes and ages of the files in the working
directory. The default is 60. When the checkInterval is reached,
the sizes and ages of the files in the working directory are checked.
Files that meet ONE of the following criteria are closed and moved into
the data repository. The criteria are:
--- Files that were created more than maximumAge seconds ago. Since
files are only checked at this interval, a file could potentially be one
interval older than the maximumAge.
--- Files whose size exceeds maximumSize. Since a file's size is not
continuously monitored, a file could be larger than this size, and the
user should set this value appropriately.
--- Files whose size is at least minimumsSize AND that were created at
least minimumAge seconds ago.
mothra.packer.workDir.maximumAge -- Files in the working directory that
were created over this number of seconds ago are always moved into the
repository, regardless of their size. The default value is 1800 seconds
(30 minutes).
mothra.packer.workDir.maximumSize -- Files in the working directory
whose size, in octets, is greater than this value are always moved into
the repository, regardless of their age. The default value is 104857600
bytes (100MiB).
mothra.packer.workDir.minimumAge -- Files in the working directory are
NOT eligible to be moved into the repository if they are younger this age
(were created less this number of seconds ago) unless their size exceeds
maximumSize. The default is 600 seconds (5 minutes).
mothra.packer.workDir.minimumSize -- Files in the working directory are
NOT eligible to moved moved into the repository if they are smaller than
this size (in octets) unless their age exceeds maximumAge. The default
is 67108864 bytes (64 MiB).
mothra.packer.numMoveThreads -- The size of the thread pool that closes
the work files and moves them to the destination directory. A task is
potentially created every workdirCheckInterval seconds if files are
determined to have met the limits. The default is 4.
mothra.packer.archiveDirectory -- The root directory into which working
files are moved after the packer copies their content to the repository,
as a Hadoop URI. If not specified, the working files are deleted.
mothra.packer.packAttempts -- The number of times the packer attempts to
process a file found in the srcDir. After this number of failed attempts,
the file is ignored by this invocation of the packer. The default is 3.
mothra.packer.fileCacheSize -- The maximum size of the open file cache.
This is the maximum number of open files maintained by the file cache for
writing to files in the work directory. The packer does not limit the
number of files in the work directory; this only limits the number of open
files. Once the cache reaches this number of open files and the packer
needs to (re-)open a file, the packer closes the least-recently-used file.
This value does not include the file handles required when reading
incoming files or when copying files from the work directory to the data
directory. The default is 2000; the minimum permitted is 128.
- Alphabetic
- By Inheritance
- PackerMain
- StrictLogging
- App
- DelayedInit
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val DEFAULT_MAX_PACK_JOBS: Int
Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously.
Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously. A larger value provides more throughput. This value may be specified at run-time by specifying the following property: mothra.packer.maxPackJobs
- val DEFAULT_PACK_ATTEMPTS: Int
Default value for the number of times the packer attempts to process an incoming file.
Default value for the number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is left in the incoming directory but ignored by the Packer.
- val DEFAULT_POLL_INTERVAL: Int
Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process.
Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process. This value may be specified at run-time by specifying the following property: mothra.packer.pollingInterval
- val DEFAULT_WORKDIR_CHECK_INTERVAL: Int
Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.
Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.
When this interval is reached, the sizes and ages of the files in the working directory are checked. Files that meet ONE of the following criteria are closed and moved into the data repository. The criteria are:
Files that were created more than
maximumAgeseconds ago. Since files are only checked at this interval, a file could potentially be one interval older than themaximumAge.Files whose size exceeds
maximumSize. Since a file's size is not continuously monitored, a file could be larger than this size, and the user should set this value appropriately.Files whose size is at least
minimumsSizeAND that were created at leastminimumAgeseconds ago.This value may be specified at run-time by specifying the following property: mothra.packer.workDir.checkInterval
- val DEFAULT_WORKDIR_MAXIMUM_AGE: Int
Default value for the "maximum" age (
maximumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "maximum" age (
maximumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.maximumAge - val DEFAULT_WORKDIR_MAXIMUM_SIZE: Int
Default value for the "maximum" size (
maximumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "maximum" size (
maximumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.maximumSize - val DEFAULT_WORKDIR_MINIMUM_AGE: Int
Default value for the "minimum" age (
minimumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "minimum" age (
minimumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.minimumAge - val DEFAULT_WORKDIR_MINIMUM_SIZE: Int
Default value for the "minimum" size (
minimumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "minimum" size (
minimumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.minimumSize - val archiveDir: Option[Path]
The root of an additional directory to which working files are copied after being copied into the
rootDir.The root of an additional directory to which working files are copied after being copied into the
rootDir. This value is determined by the "mothra.packer.archiveDirectory" property. When the property is not set, the files are removed. - final def args: Array[String]
- Attributes
- protected
- Definition Classes
- App
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- val compressCodec: Option[CompressionCodec]
The compression codec used for files written to HDFS.
The compression codec used for files written to HDFS. This value is determined by the "mothra.packer.compression" property, or Packer.DEFAULT_COMPRESSION when that property is not set.
- implicit val conf: Configuration
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final val executionStart: Long
- Definition Classes
- App
- val fileCacheSize: Int
The maximum number of open files maintained by the file cache.
The maximum number of open files maintained by the file cache. This value is determined by the
mothra.packer.fileCacheSizeJava property, or byPacker.DEFAULT_FILE_CACHE_SIZEwhen the property is not set. This value must be no less thanPacker.MINIMUM_FILE_CACHE_SIZE.- See also
Packer.DEFAULT_FILE_CACHE_SIZE for a full description of this value.
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val hoursPerFile: Int
The number of hours covered by each file in the repository.
The number of hours covered by each file in the repository. This value is determined by the "mothra.packer.hoursPerFile" property, or Packer.DEFAULT_HOURS_PER_FILE when that property is not set.
- val incomingDir: Path
The source (incoming) directory for ipfix files to be processed.
- val infoModel: InfoModel
The information model
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val logTaskCountInterval: Int
How often to print log messages regarding the number of tasks and number of files waiting to be moved, in seconds.
- val logger: Logger
- Attributes
- protected
- Definition Classes
- StrictLogging
- final def main(args: Array[String]): Unit
- Definition Classes
- App
- val maxPackJobs: Int
The maximum number of PackFileJob instances to run simultaneously.
The maximum number of PackFileJob instances to run simultaneously. This value is determined by the "mothra.packer.maxPackJobs" property, or DEFAULT_MAX_PACK_JOBS when that property is not set.
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val numMoveThreads: Int
The size of the thread pool that closes the work files and moves them to the destination directory.
The size of the thread pool that closes the work files and moves them to the destination directory. A task is potentially created every
workdirCheckIntervalseconds if files are determined to have met the limits. This value is determined by the "mothra.packer.numMoveThreads" property, or Packer.DEFAULT_NUM_MOVE_THREADS when that property is not set. - var oneShot: Boolean
- val packAttempts: Int
The number of times the packer attempts to process an incoming file.
The number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is ignored by this invocation of the packer. This value is determined by the "mothra.packer.packAttempts" property, or DEFAULT_PACK_ATTEMPTS when that property is not set.
- val packConf: PackerConfig
- val packLogicPath: Path
The packing logic (a .scala source file).
- val packer: CorePacker
The object that categorizes the records and writes them.
- val pollingInterval: Int
How long to wait between polls of the incoming (source) directory directory, in seconds.
How long to wait between polls of the incoming (source) directory directory, in seconds. This value is determined by the "mothra.packer.pollingInterval" property, or DEFAULT_POLL_INTERVAL when that property is not set.
- val positionalArgs: Array[String]
- val reqArgs: Int
- val rootDir: Path
The data repository (output directory).
- val switches: Array[String]
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- def usage(full: Boolean = false): Unit
- def version(): Unit
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- val watcher: DirMapping
The object that watches for incoming files.
- val workDir: Path
A local directory used to first create the output files; this directory must allow files to be closed and re-opened.
- val workdirCheckInterval: Int
How often to check whether the size and age of files in the working directory meet limits, in seconds.
How often to check whether the size and age of files in the working directory meet limits, in seconds. This value is determined by the "mothra.packer.workDir.checkInterval" property, or DEFAULT_WORKDIR_CHECK_INTERVAL when that property is not set.
- val workdirMaximumAge: Int
Approximate maximum age for a file in the working directory.
Approximate maximum age for a file in the working directory. Files older than this are moved regardless of their size. This value is determined by the "mothra.packer.workDir.maximumAge" property, or DEFAULT_WORKDIR_MAXIMUM_AGE when that property is not set.
- val workdirMaximumSize: Long
Approximate maximum size for a file in the working directory.
Approximate maximum size for a file in the working directory. Files larger than this are moved regardless of their age. This value is determined by the "mothra.packer.workDir.maximumSize" property, or DEFAULT_WORKDIR_MAXIMUM_SIZE when that property is not set.
- val workdirMinimumAge: Int
Minimum age a file in the working directory must have before it is moved to the repository, in seconds, unless its size exceeds
workdirMaximumSize.Minimum age a file in the working directory must have before it is moved to the repository, in seconds, unless its size exceeds
workdirMaximumSize. This value is determined by the "mothra.packer.workDir.minimumAge" property, or DEFAULT_WORKDIR_MINIMUM_AGE when that property is not set. - val workdirMinimumSize: Long
Minimum size a file in the working directory must have before it is moved to the repository, in octets, unless its age exceeds
workdirMaximumAge.Minimum size a file in the working directory must have before it is moved to the repository, in octets, unless its age exceeds
workdirMaximumAge. This value is determined by the "mothra.packer.workDir.minimumSize" property, or DEFAULT_WORKDIR_MINIMUM_SIZE when that property is not set.