object PackerMain extends App with StrictLogging
Object to implement the Packer application
Typical usage in a Spark environment:
spark-submit --class org.cert.netsa.mothra.packer.tools.PackerMain mothra-tools.jar
[--one-shot] <srcDir> <destDir> <workDir> <partitioner>
where:
srcDir: Source (incoming) directory as Hadoop URI destDir: Destination directory as Hadoop URI workDir: Working directory on the local disk (not file://) partitioner: Partitioning file as Hadoop URIs
Packer scans the source directory (srcDir) for IPFIX files. It splits the IPFIX records in
each file into output file(s) in a time-based directory structure based on the partitioning
rules in the partitioning file (partitioner). The output files are initially created in the
working directory (workDir), and when they meet size and/or age thresholds, they are moved to
the destination directory (destDir).
If "--one-shot" is included on the command line, the srcDir is only scanned one time. Once all
files in srcDir have been packed (or they fail to be packed after some number of attempts),
the packer exits.
The Java property values that are used by Packer are:
mothra.packer.compression -- The compression to use for files written to HDFS. Values
typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and
default. The empty string indicates no compression. The default is no compression.
mothra.packer.maxPackJobs -- The size of the thread pool that determines the maximum number of
input files that may be processed simultaneously. A larger value provides more throughput. The
default is 1.
mothra.packer.hoursPerFile -- The number of hours covered by each file in the repository. The
valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.
mothra.packer.pollingInterval -- How long the main thread sleeps (in seconds) between scans
(polls) of the source directory checking for IPFIX files to process. The default is 30.
mothra.packer.workDir.checkInterval -- The value for how often, in seconds, to check the sizes
and ages of the files in the working directory. The default is 60. When the checkInterval is
reached, the sizes and ages of the files in the working directory are checked. Files that meet
ONE of the following criteria are closed and moved into the data repository. The criteria are:
--- Files that were created more than maximumAge seconds ago. Since files are only checked at
this interval, a file could potentially be one interval older than the maximumAge.
--- Files whose size exceeds maximumSize. Since a file's size is not continuously monitored, a
file could be larger than this size, and the user should set this value appropriately.
--- Files whose size is at least minimumsSize AND that were created at least minimumAge
seconds ago.
mothra.packer.workDir.maximumAge -- Files in the working directory that were created over this
number of seconds ago are always moved into the repository, regardless of their size. The
default value is 1800 seconds (30 minutes).
mothra.packer.workDir.maximumSize -- Files in the working directory whose size, in octets, is
greater than this value are always moved into the repository, regardless of their age. The
default value is 104857600 bytes (100MiB).
mothra.packer.workDir.minimumAge -- Files in the working directory are NOT eligible to be
moved into the repository if they are younger this age (were created less this number of seconds
ago) unless their size exceeds maximumSize. The default is 600 seconds (5 minutes).
mothra.packer.workDir.minimumSize -- Files in the working directory are NOT eligible to moved
moved into the repository if they are smaller than this size (in octets) unless their age
exceeds maximumAge. The default is 67108864 bytes (64 MiB).
mothra.packer.numMoveThreads -- The size of the thread pool that closes the work files and
moves them to the destination directory. A task is potentially created every
workdirCheckInterval seconds if files are determined to have met the limits. The default is 4.
mothra.packer.archiveDirectory -- The root directory into which working files are moved after
the packer copies their content to the repository, as a Hadoop URI. If not specified, the
working files are deleted.
mothra.packer.packAttempts -- The number of times the packer attempts to process a file found
in the srcDir. After this number of failed attempts, the file is ignored by this invocation of
the packer. The default is 3.
mothra.packer.fileCacheSize -- The maximum size of the open file cache. This is the maximum
number of open files maintained by the file cache for writing to files in the work directory.
The packer does not limit the number of files in the work directory; this only limits the number
of open files. Once the cache reaches this number of open files and the packer needs to
(re-)open a file, the packer closes the least-recently-used file. This value does not include
the file handles required when reading incoming files or when copying files from the work
directory to the data directory. The default is 2000; the minimum permitted is 128.
- Alphabetic
- By Inheritance
- PackerMain
- StrictLogging
- App
- DelayedInit
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
DEFAULT_MAX_PACK_JOBS: Int
Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously.
Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously. A larger value provides more throughput. This value may be specified at run-time by specifying the following property: mothra.packer.maxPackJobs
-
val
DEFAULT_PACK_ATTEMPTS: Int
Default value for the number of times the packer attempts to process an incoming file.
Default value for the number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is left in the incoming directory but ignored by the Packer.
-
val
DEFAULT_POLL_INTERVAL: Int
Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process.
Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process. This value may be specified at run-time by specifying the following property: mothra.packer.pollingInterval
-
val
DEFAULT_WORKDIR_CHECK_INTERVAL: Int
Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.
Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.
When this interval is reached, the sizes and ages of the files in the working directory are checked. Files that meet ONE of the following criteria are closed and moved into the data repository. The criteria are:
Files that were created more than
maximumAgeseconds ago. Since files are only checked at this interval, a file could potentially be one interval older than themaximumAge.Files whose size exceeds
maximumSize. Since a file's size is not continuously monitored, a file could be larger than this size, and the user should set this value appropriately.Files whose size is at least
minimumsSizeAND that were created at leastminimumAgeseconds ago.This value may be specified at run-time by specifying the following property: mothra.packer.workDir.checkInterval
-
val
DEFAULT_WORKDIR_MAXIMUM_AGE: Int
Default value for the "maximum" age (
maximumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "maximum" age (
maximumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.maximumAge -
val
DEFAULT_WORKDIR_MAXIMUM_SIZE: Int
Default value for the "maximum" size (
maximumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "maximum" size (
maximumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.maximumSize -
val
DEFAULT_WORKDIR_MINIMUM_AGE: Int
Default value for the "minimum" age (
minimumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "minimum" age (
minimumAge) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.minimumAge -
val
DEFAULT_WORKDIR_MINIMUM_SIZE: Int
Default value for the "minimum" size (
minimumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL.Default value for the "minimum" size (
minimumSize) of a file in the working directory, as explained in the documentation ofDEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.minimumSize -
val
archiveDir: Option[Path]
The root of an additional directory to which working files are copied after being copied into the
rootDir.The root of an additional directory to which working files are copied after being copied into the
rootDir. This value is determined by the "mothra.packer.archiveDirectory" property. When the property is not set, the files are removed. -
def
args: Array[String]
- Attributes
- protected
- Definition Classes
- App
- Annotations
- @deprecatedOverriding( "args should not be overridden" , "2.11.0" )
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
compressCodec: Option[CompressionCodec]
The compression codec used for files written to HDFS.
The compression codec used for files written to HDFS. This value is determined by the "mothra.packer.compression" property, or Packer.DEFAULT_COMPRESSION when that property is not set.
- implicit val conf: Configuration
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
executionStart: Long
- Definition Classes
- App
- Annotations
- @deprecatedOverriding( ... , "2.11.0" )
-
val
fileCacheSize: Int
The maximum number of open files maintained by the file cache.
The maximum number of open files maintained by the file cache. This value is determined by the
mothra.packer.fileCacheSizeJava property, or byPacker.DEFAULT_FILE_CACHE_SIZEwhen the property is not set. This value must be no less thanPacker.MINIMUM_FILE_CACHE_SIZE.- See also
Packer.DEFAULT_FILE_CACHE_SIZE for a full description of this value.
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
hoursPerFile: Int
The number of hours covered by each file in the repository.
The number of hours covered by each file in the repository. This value is determined by the "mothra.packer.hoursPerFile" property, or Packer.DEFAULT_HOURS_PER_FILE when that property is not set.
-
val
incomingDir: Path
The source (incoming) directory for ipfix files to be processed.
-
val
infoModel: InfoModel
The information model
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
logTaskCountInterval: Int
How often to print log messages regarding the number of tasks and number of files waiting to be moved, in seconds.
-
val
logger: Logger
- Attributes
- protected
- Definition Classes
- StrictLogging
-
def
main(args: Array[String]): Unit
- Definition Classes
- App
- Annotations
- @deprecatedOverriding( "main should not be overridden" , "2.11.0" )
-
val
maxPackJobs: Int
The maximum number of PackFileJob instances to run simultaneously.
The maximum number of PackFileJob instances to run simultaneously. This value is determined by the "mothra.packer.maxPackJobs" property, or DEFAULT_MAX_PACK_JOBS when that property is not set.
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
numMoveThreads: Int
The size of the thread pool that closes the work files and moves them to the destination directory.
The size of the thread pool that closes the work files and moves them to the destination directory. A task is potentially created every
workdirCheckIntervalseconds if files are determined to have met the limits. This value is determined by the "mothra.packer.numMoveThreads" property, or Packer.DEFAULT_NUM_MOVE_THREADS when that property is not set. - var oneShot: Boolean
-
val
packAttempts: Int
The number of times the packer attempts to process an incoming file.
The number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is ignored by this invocation of the packer. This value is determined by the "mothra.packer.packAttempts" property, or DEFAULT_PACK_ATTEMPTS when that property is not set.
- val packConf: PackerConfig
-
val
packLogicPath: Path
The packing logic (a .scala source file).
-
val
packer: CorePacker
The object that categorizes the records and writes them.
-
val
pollingInterval: Int
How long to wait between polls of the incoming (source) directory directory, in seconds.
How long to wait between polls of the incoming (source) directory directory, in seconds. This value is determined by the "mothra.packer.pollingInterval" property, or DEFAULT_POLL_INTERVAL when that property is not set.
- val positionalArgs: Array[String]
- val reqArgs: Int
-
val
rootDir: Path
The data repository (output directory).
- val switches: Array[String]
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
- def usage(full: Boolean = false): Unit
- def version(): Unit
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
watcher: DirMapping
The object that watches for incoming files.
-
val
workDir: Path
A local directory used to first create the output files; this directory must allow files to be closed and re-opened.
-
val
workdirCheckInterval: Int
How often to check whether the size and age of files in the working directory meet limits, in seconds.
How often to check whether the size and age of files in the working directory meet limits, in seconds. This value is determined by the "mothra.packer.workDir.checkInterval" property, or DEFAULT_WORKDIR_CHECK_INTERVAL when that property is not set.
-
val
workdirMaximumAge: Int
Approximate maximum age for a file in the working directory.
Approximate maximum age for a file in the working directory. Files older than this are moved regardless of their size. This value is determined by the "mothra.packer.workDir.maximumAge" property, or DEFAULT_WORKDIR_MAXIMUM_AGE when that property is not set.
-
val
workdirMaximumSize: Long
Approximate maximum size for a file in the working directory.
Approximate maximum size for a file in the working directory. Files larger than this are moved regardless of their age. This value is determined by the "mothra.packer.workDir.maximumSize" property, or DEFAULT_WORKDIR_MAXIMUM_SIZE when that property is not set.
-
val
workdirMinimumAge: Int
Minimum age a file in the working directory must have before it is moved to the repository, in seconds, unless its size exceeds
workdirMaximumSize.Minimum age a file in the working directory must have before it is moved to the repository, in seconds, unless its size exceeds
workdirMaximumSize. This value is determined by the "mothra.packer.workDir.minimumAge" property, or DEFAULT_WORKDIR_MINIMUM_AGE when that property is not set. -
val
workdirMinimumSize: Long
Minimum size a file in the working directory must have before it is moved to the repository, in octets, unless its age exceeds
workdirMaximumAge.Minimum size a file in the working directory must have before it is moved to the repository, in octets, unless its age exceeds
workdirMaximumAge. This value is determined by the "mothra.packer.workDir.minimumSize" property, or DEFAULT_WORKDIR_MINIMUM_SIZE when that property is not set.