object RepackerMain extends App with StrictLogging

Object to implement the Reacker application.

Typical Usage in a Spark environment:

spark-submit --class org.cert.netsa.mothra.packer.tools.RepackerMain mothra-tools.jar <partition-conf> <dest-dir> <work-dir> <s1> [<s2> <s3> ...]

where:

partition-conf: Partitioning configuration file as Hadoop URI

dest-dir: Root destination directory as Hadoop URI

work-dir: Working directory on the local disk (not file://)

s1..sn: Source directories as Hadoop URIs

Makes a single recursive scan of the source directories <s1>,<s2>,... for IPFIX files. Splits the IPFIX records in the source files into output file(s) in a time-based directory structure based on the partitioning rules in the partitioning configuration file <partition-conf>. The output files are initially created in the working directory <work-dir>, and, once ALL input files have been read, are moved to the destination directory and the initial source files removed. The dest-dir may be a source directory.

Repacker runs as a batch process; not as a daemon.

Example/Intended uses for the Repacker include:

(1)Changing how the records are packed---for example packing by the silkAppLabel instead of the protocolIdentifier.

(2)Combining multiple files for an hour into a single file for that hour, merging hourly files into a file that covers a longer duration, or spliting a longer duration file into smaller files.

(3)Changing the compression algorithm used on the IPFIX files.

Currently the repacker does NOT support modifying the records, it only moves the records into different files.

Repacker uses multiple threads. By default, each source directory specified on the command line gets a dedicated thread to scanning that directory and its subdirectories recursively for IPFIX files, and another thread decidated to reading those files and repacking them. The repacker does not support having multiple threads scan a directory, but it does allow multiple threads to process a single directory's files.

The <work-dir> must NOT be a source directory or a subdirectory of a source directory. To repack the files in an existing working directory, use a different working directory. The repacker ignores any files in the <work-dir> that exist when the repacker is started, and it ignores files placed there by other programs.

The property values that are used by the repacker are:

mothra.repacker.compression -- the compression algorithm used for the new IPFIX files. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression.

mothra.repacker.hoursPerFile -- The number of hours covered by each file in the repository. The valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.

mothra.repacker.maxScanJobs -- the maximum number of threads dedicated to scanning the source directories. The default (and maximum) value is the number of source directories.

mothra.repacker.readersPerScanner -- the number of reader/repacker threads to create for each source directory. The default is 1.

mothra.repacker.maxThreads -- the maximum number of worker (scanner and repacker) threads to create. The default value is computed using the formula: (maxScanJobs * (1 + readersPerScanner)).

mothra.repacker.maximumSize -- the (approximate) maximum file size to create. When specified, a work-file that exceeds this size is closed and moved into the repository. NOTES: (1)This value uses the uncompressed file size, and does not consider any compression that may occur when the file is moved from the workDir to the tgtDir. In addition, a file's size tends to grow in large steps because of buffering by the Java stream code. (2)Specifying a maximumSize may temporarially cause duplicate records to appear in the repository because of some records in the original files and some in the new file. Once Repacker finishes scanning all files, the original files are removed and only the newly packed files are left. This issue of temporary having duplicate records in the repository will be resolved in a future release.

mothra.repacker.archiveDirectory -- the root directory into which working files are moved after the repacker has finished running, as a Hadoop URI. If not specified, the working files are deleted.

mothra.repacker.fileCacheSize -- The maximum size of the open file cache. This is the maximum number of open files maintained by the file cache for writing to files in the work directory. The repacker does not limit the number of files in the work directory; this only limits the number of open files. Once the cache reaches this number of open files and the packer needs to (re-)open a file, the packer closes the least-recently-used file. This value does not include the file handles required when reading incoming files or when copying files from the work directory to the data directory. The default is 2000; the minimum permitted is 128.

Linear Supertypes
StrictLogging, App, DelayedInit, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RepackerMain
  2. StrictLogging
  3. App
  4. DelayedInit
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val archiveDir: Option[Path]
  5. final def args: Array[String]
    Attributes
    protected
    Definition Classes
    App
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  8. val compressCodec: Option[CompressionCodec]

    The compression codec used for files written to HDFS.

    The compression codec used for files written to HDFS. This may be set by setting the "mothra.repacker.compression" property. If that property is not set, CorePacker.DEFAULT_COMPRESSION is used.

  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  11. final val executionStart: Long
    Definition Classes
    App
  12. val fileCacheSize: Int

    The maximum number of open files maintained by the file cache.

    The maximum number of open files maintained by the file cache. This is determined by the mothra.repacker.fileCacheSize Java property, or by CorePacker.DEFAULT_FILE_CACHE_SIZE when the property is not set. This value must be no less than CorePacker.MINIMUM_FILE_CACHE_SIZE.

    See also

    CorePacker.DEFAULT_FILE_CACHE_SIZE for a full description of this value.

  13. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  14. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. implicit val hadoopConf: Configuration
  16. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. val hoursPerFile: Int

    The number of hours covered by each file in the repository.

    The number of hours covered by each file in the repository. This is determined by the "mothra.repacker.hoursPerFile" property, or CorePacker.DEFAULT_HOURS_PER_FILE when that property is not set.

  18. val infoModel: InfoModel
  19. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  20. val logTaskCountInterval: Int

    How often to print log messages regarding the number of tasks, in seconds.

  21. val logger: Logger
    Attributes
    protected
    Definition Classes
    StrictLogging
  22. final def main(args: Array[String]): Unit
    Definition Classes
    App
  23. val maxScanJobs: Int

    maxScanJobs specifies the maximum number of scanning threads to start.

    maxScanJobs specifies the maximum number of scanning threads to start. Since at most one thread can scan a directory, the default is to create 1 scanner per srcDir. Setting this to a value larger than the number of source directories has no effect. This may be modified by setting the mothra.repacker.maxScanJobs property.

  24. val maxThreads: Int

    maxThreads specifies the maximum number of scanning and reader/repacker threads to start.

    maxThreads specifies the maximum number of scanning and reader/repacker threads to start. By default this is

    (scanningJobs * (1 + * readersPerScanner))

    Setting it to a value larger than that has no effect.

    This may be modified by setting the mothra.repacker.readersPerScanner property.

  25. val maximumSize: Option[Long]

    The (approximate) maximum size file to create.

    The (approximate) maximum size file to create. Typically a file's size will not exceed this value by more than the maximum size of an IPFIX message, 64k. The default is no maximum. When a file's size exceeds this value, the file is closed and a new file is started.

  26. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  27. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  29. val packConf: PackerConfig
  30. val packLogic: PackingLogic
  31. val packer: CorePacker
  32. val positionalArgs: Array[String]
  33. var readersPerScanner: Int

    readersPerScanner specifies the number of file reader/repacker threads that are invoked per scanning thread.

    readersPerScanner specifies the number of file reader/repacker threads that are invoked per scanning thread. The default is 1. This may be modified by setting the mothra.repacker.readersPerScanner property.

  34. val removeList: ConcurrentLinkedQueue[Path]
  35. val rootDir: Path
  36. val runTimePackConf: Path
  37. var running: Boolean
  38. val sourceDirs: Array[Path]
  39. val sourceFileSystem: FileSystem
  40. val switches: Array[String]
  41. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  42. def toString(): String
    Definition Classes
    AnyRef → Any
  43. def usage(full: Boolean = false): Unit
  44. def version(): Unit
  45. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  46. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  47. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  48. val workDir: Path

Deprecated Value Members

  1. def delayedInit(body: => Unit): Unit
    Definition Classes
    App → DelayedInit
    Annotations
    @deprecated
    Deprecated

    (Since version 2.11.0) the delayedInit mechanism will disappear

Inherited from StrictLogging

Inherited from App

Inherited from DelayedInit

Inherited from AnyRef

Inherited from Any

Ungrouped