object PackerMain extends App with StrictLogging

Object to implement the Packer application

Typical usage in a Spark environment:

spark-submit --class org.cert.netsa.mothra.packer.tools.PackerMain mothra-tools.jar [--one-shot] <srcDir> <destDir> <workDir> <partitioner>

where:

srcDir: Source (incoming) directory as Hadoop URI destDir: Destination directory as Hadoop URI workDir: Working directory on the local disk (not file://) partitioner: Partitioning file as Hadoop URIs

Packer scans the source directory (srcDir) for IPFIX files. It splits the IPFIX records in each file into output file(s) in a time-based directory structure based on the partitioning rules in the partitioning file (partitioner). The output files are initially created in the working directory (workDir), and when they meet size and/or age thresholds, they are moved to the destination directory (destDir).

If "--one-shot" is included on the command line, the srcDir is only scanned one time. Once all files in srcDir have been packed (or they fail to be packed after some number of attempts), the packer exits.

The Java property values that are used by Packer are:

mothra.packer.compression -- The compression to use for files written to HDFS. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression. The default is no compression.

mothra.packer.maxPackJobs -- The size of the thread pool that determines the maximum number of input files that may be processed simultaneously. A larger value provides more throughput. The default is 1.

mothra.packer.hoursPerFile -- The number of hours covered by each file in the repository. The valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.

mothra.packer.pollingInterval -- How long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process. The default is 30.

mothra.packer.workDir.checkInterval -- The value for how often, in seconds, to check the sizes and ages of the files in the working directory. The default is 60. When the checkInterval is reached, the sizes and ages of the files in the working directory are checked. Files that meet ONE of the following criteria are closed and moved into the data repository. The criteria are:

--- Files that were created more than maximumAge seconds ago. Since files are only checked at this interval, a file could potentially be one interval older than the maximumAge.

--- Files whose size exceeds maximumSize. Since a file's size is not continuously monitored, a file could be larger than this size, and the user should set this value appropriately.

--- Files whose size is at least minimumsSize AND that were created at least minimumAge seconds ago.

mothra.packer.workDir.maximumAge -- Files in the working directory that were created over this number of seconds ago are always moved into the repository, regardless of their size. The default value is 1800 seconds (30 minutes).

mothra.packer.workDir.maximumSize -- Files in the working directory whose size, in octets, is greater than this value are always moved into the repository, regardless of their age. The default value is 104857600 bytes (100MiB).

mothra.packer.workDir.minimumAge -- Files in the working directory are NOT eligible to be moved into the repository if they are younger this age (were created less this number of seconds ago) unless their size exceeds maximumSize. The default is 600 seconds (5 minutes).

mothra.packer.workDir.minimumSize -- Files in the working directory are NOT eligible to moved moved into the repository if they are smaller than this size (in octets) unless their age exceeds maximumAge. The default is 67108864 bytes (64 MiB).

mothra.packer.numMoveThreads -- The size of the thread pool that closes the work files and moves them to the destination directory. A task is potentially created every workdirCheckInterval seconds if files are determined to have met the limits. The default is 4.

mothra.packer.archiveDirectory -- The root directory into which working files are moved after the packer copies their content to the repository, as a Hadoop URI. If not specified, the working files are deleted.

mothra.packer.packAttempts -- The number of times the packer attempts to process a file found in the srcDir. After this number of failed attempts, the file is ignored by this invocation of the packer. The default is 3.

mothra.packer.fileCacheSize -- The maximum size of the open file cache. This is the maximum number of open files maintained by the file cache for writing to files in the work directory. The packer does not limit the number of files in the work directory; this only limits the number of open files. Once the cache reaches this number of open files and the packer needs to (re-)open a file, the packer closes the least-recently-used file. This value does not include the file handles required when reading incoming files or when copying files from the work directory to the data directory. The default is 2000; the minimum permitted is 128.

Linear Supertypes
StrictLogging, App, DelayedInit, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PackerMain
  2. StrictLogging
  3. App
  4. DelayedInit
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val DEFAULT_MAX_PACK_JOBS: Int

    Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously.

    Default value for the size of the thread pool that determines the maximum number of input files that may be processed simultaneously. A larger value provides more throughput. This value may be specified at run-time by specifying the following property: mothra.packer.maxPackJobs

  5. val DEFAULT_PACK_ATTEMPTS: Int

    Default value for the number of times the packer attempts to process an incoming file.

    Default value for the number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is left in the incoming directory but ignored by the Packer.

  6. val DEFAULT_POLL_INTERVAL: Int

    Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process.

    Default value for how long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process. This value may be specified at run-time by specifying the following property: mothra.packer.pollingInterval

  7. val DEFAULT_WORKDIR_CHECK_INTERVAL: Int

    Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.

    Default value for how often, in seconds, to check the sizes and ages of the files in the working directory.

    When this interval is reached, the sizes and ages of the files in the working directory are checked. Files that meet ONE of the following criteria are closed and moved into the data repository. The criteria are:

    Files that were created more than maximumAge seconds ago. Since files are only checked at this interval, a file could potentially be one interval older than the maximumAge.

    Files whose size exceeds maximumSize. Since a file's size is not continuously monitored, a file could be larger than this size, and the user should set this value appropriately.

    Files whose size is at least minimumsSize AND that were created at least minimumAge seconds ago.

    This value may be specified at run-time by specifying the following property: mothra.packer.workDir.checkInterval

  8. val DEFAULT_WORKDIR_MAXIMUM_AGE: Int

    Default value for the "maximum" age (maximumAge) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL.

    Default value for the "maximum" age (maximumAge) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.maximumAge

  9. val DEFAULT_WORKDIR_MAXIMUM_SIZE: Int

    Default value for the "maximum" size (maximumSize) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL.

    Default value for the "maximum" size (maximumSize) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.maximumSize

  10. val DEFAULT_WORKDIR_MINIMUM_AGE: Int

    Default value for the "minimum" age (minimumAge) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL.

    Default value for the "minimum" age (minimumAge) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.minimumAge

  11. val DEFAULT_WORKDIR_MINIMUM_SIZE: Int

    Default value for the "minimum" size (minimumSize) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL.

    Default value for the "minimum" size (minimumSize) of a file in the working directory, as explained in the documentation of DEFAULT_WORKDIR_CHECK_INTERVAL. This value may be specified at run-time by specifying the following property: mothra.packer.workDir.minimumSize

  12. val archiveDir: Option[Path]

    The root of an additional directory to which working files are copied after being copied into the rootDir.

    The root of an additional directory to which working files are copied after being copied into the rootDir. This value is determined by the "mothra.packer.archiveDirectory" property. When the property is not set, the files are removed.

  13. final def args: Array[String]
    Attributes
    protected
    Definition Classes
    App
  14. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  15. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  16. val compressCodec: Option[CompressionCodec]

    The compression codec used for files written to HDFS.

    The compression codec used for files written to HDFS. This value is determined by the "mothra.packer.compression" property, or Packer.DEFAULT_COMPRESSION when that property is not set.

  17. implicit val conf: Configuration
  18. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  19. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  20. final val executionStart: Long
    Definition Classes
    App
  21. val fileCacheSize: Int

    The maximum number of open files maintained by the file cache.

    The maximum number of open files maintained by the file cache. This value is determined by the mothra.packer.fileCacheSize Java property, or by Packer.DEFAULT_FILE_CACHE_SIZE when the property is not set. This value must be no less than Packer.MINIMUM_FILE_CACHE_SIZE.

    See also

    Packer.DEFAULT_FILE_CACHE_SIZE for a full description of this value.

  22. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  23. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  24. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  25. val hoursPerFile: Int

    The number of hours covered by each file in the repository.

    The number of hours covered by each file in the repository. This value is determined by the "mothra.packer.hoursPerFile" property, or Packer.DEFAULT_HOURS_PER_FILE when that property is not set.

  26. val incomingDir: Path

    The source (incoming) directory for ipfix files to be processed.

  27. val infoModel: InfoModel

    The information model

  28. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  29. val logTaskCountInterval: Int

    How often to print log messages regarding the number of tasks and number of files waiting to be moved, in seconds.

  30. val logger: Logger
    Attributes
    protected
    Definition Classes
    StrictLogging
  31. final def main(args: Array[String]): Unit
    Definition Classes
    App
  32. val maxPackJobs: Int

    The maximum number of PackFileJob instances to run simultaneously.

    The maximum number of PackFileJob instances to run simultaneously. This value is determined by the "mothra.packer.maxPackJobs" property, or DEFAULT_MAX_PACK_JOBS when that property is not set.

  33. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  34. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  35. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  36. val numMoveThreads: Int

    The size of the thread pool that closes the work files and moves them to the destination directory.

    The size of the thread pool that closes the work files and moves them to the destination directory. A task is potentially created every workdirCheckInterval seconds if files are determined to have met the limits. This value is determined by the "mothra.packer.numMoveThreads" property, or Packer.DEFAULT_NUM_MOVE_THREADS when that property is not set.

  37. var oneShot: Boolean
  38. val packAttempts: Int

    The number of times the packer attempts to process an incoming file.

    The number of times the packer attempts to process an incoming file. After this number of failed attempts, the file is ignored by this invocation of the packer. This value is determined by the "mothra.packer.packAttempts" property, or DEFAULT_PACK_ATTEMPTS when that property is not set.

  39. val packConf: PackerConfig
  40. val packLogicPath: Path

    The packing logic (a .scala source file).

  41. val packer: CorePacker

    The object that categorizes the records and writes them.

  42. val pollingInterval: Int

    How long to wait between polls of the incoming (source) directory directory, in seconds.

    How long to wait between polls of the incoming (source) directory directory, in seconds. This value is determined by the "mothra.packer.pollingInterval" property, or DEFAULT_POLL_INTERVAL when that property is not set.

  43. val positionalArgs: Array[String]
  44. val reqArgs: Int
  45. val rootDir: Path

    The data repository (output directory).

  46. val switches: Array[String]
  47. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  48. def toString(): String
    Definition Classes
    AnyRef → Any
  49. def usage(full: Boolean = false): Unit
  50. def version(): Unit
  51. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  52. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  53. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  54. val watcher: DirMapping

    The object that watches for incoming files.

  55. val workDir: Path

    A local directory used to first create the output files; this directory must allow files to be closed and re-opened.

  56. val workdirCheckInterval: Int

    How often to check whether the size and age of files in the working directory meet limits, in seconds.

    How often to check whether the size and age of files in the working directory meet limits, in seconds. This value is determined by the "mothra.packer.workDir.checkInterval" property, or DEFAULT_WORKDIR_CHECK_INTERVAL when that property is not set.

  57. val workdirMaximumAge: Int

    Approximate maximum age for a file in the working directory.

    Approximate maximum age for a file in the working directory. Files older than this are moved regardless of their size. This value is determined by the "mothra.packer.workDir.maximumAge" property, or DEFAULT_WORKDIR_MAXIMUM_AGE when that property is not set.

  58. val workdirMaximumSize: Long

    Approximate maximum size for a file in the working directory.

    Approximate maximum size for a file in the working directory. Files larger than this are moved regardless of their age. This value is determined by the "mothra.packer.workDir.maximumSize" property, or DEFAULT_WORKDIR_MAXIMUM_SIZE when that property is not set.

  59. val workdirMinimumAge: Int

    Minimum age a file in the working directory must have before it is moved to the repository, in seconds, unless its size exceeds workdirMaximumSize.

    Minimum age a file in the working directory must have before it is moved to the repository, in seconds, unless its size exceeds workdirMaximumSize. This value is determined by the "mothra.packer.workDir.minimumAge" property, or DEFAULT_WORKDIR_MINIMUM_AGE when that property is not set.

  60. val workdirMinimumSize: Long

    Minimum size a file in the working directory must have before it is moved to the repository, in octets, unless its age exceeds workdirMaximumAge.

    Minimum size a file in the working directory must have before it is moved to the repository, in octets, unless its age exceeds workdirMaximumAge. This value is determined by the "mothra.packer.workDir.minimumSize" property, or DEFAULT_WORKDIR_MINIMUM_SIZE when that property is not set.

Deprecated Value Members

  1. def delayedInit(body: => Unit): Unit
    Definition Classes
    App → DelayedInit
    Annotations
    @deprecated
    Deprecated

    (Since version 2.11.0) the delayedInit mechanism will disappear

Inherited from StrictLogging

Inherited from App

Inherited from DelayedInit

Inherited from AnyRef

Inherited from Any

Ungrouped