Object

org.cert.netsa.mothra.tools

RepackerMain

Related Doc: package tools

Permalink

object RepackerMain extends App with StrictLogging

Object to implement the Reacker application.

Typical Usage in a Spark environment:

spark-submit --class org.cert.netsa.mothra.packer.tools.RepackerMain mothra-tools.jar <partition-conf> <dest-dir> <work-dir> <s1> [<s2> <s3> ...]

where:

partition-conf: Partitioning configuration file as Hadoop URI

dest-dir: Root destination directory as Hadoop URI

work-dir: Working directory on the local disk (not file://)

s1..sn: Source directories as Hadoop URIs

Makes a single recursive scan of the source directories <s1>,<s2>,... for IPFIX files. Splits the IPFIX records in the source files into output file(s) in a time-based directory structure based on the partitioning rules in the partitioning configuration file <partition-conf>. The output files are initially created in the working directory <work-dir>, and, once ALL input files have been read, are moved to the destination directory and the initial source files removed. The dest-dir may be a source directory.

Repacker runs as a batch process; not as a daemon.

Example/Intended uses for the Repacker include:

(1)Changing how the records are packed---for example packing by the silkAppLabel instead of the protocolIdentifier.

(2)Combining multiple files for an hour into a single file for that hour, merging hourly files into a file that covers a longer duration, or spliting a longer duration file into smaller files.

(3)Changing the compression algorithm used on the IPFIX files.

Currently the repacker does NOT support modifying the records, it only moves the records into different files.

Repacker uses multiple threads. By default, each source directory specified on the command line gets a dedicated thread to scanning that directory and its subdirectories recursively for IPFIX files, and another thread decidated to reading those files and repacking them. The repacker does not support having multiple threads scan a directory, but it does allow multiple threads to process a single directory's files.

The <work-dir> must NOT be a source directory or a subdirectory of a source directory. To repack the files in an existing working directory, use a different working directory. The repacker ignores any files in the <work-dir> that exist when the repacker is started, and it ignores files placed there by other programs.

The property values that are used by the repacker are:

mothra.repacker.compression -- the compression algorithm used for the new IPFIX files. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression.

mothra.repacker.hoursPerFile -- The number of hours covered by each file in the repository. The valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.

mothra.repacker.maxScanJobs -- the maximum number of threads dedicated to scanning the source directories. The default (and maximum) value is the number of source directories.

mothra.repacker.readersPerScanner -- the number of reader/repacker threads to create for each source directory. The default is 1.

mothra.repacker.maxThreads -- the maximum number of worker (scanner and repacker) threads to create. The default value is computed using the formula: (maxScanJobs * (1 + readersPerScanner)).

mothra.repacker.maximumSize -- the (approximate) maximum file size to create. When specified, a work-file that exceeds this size is closed and moved into the repository. NOTES: (1)This value uses the uncompressed file size, and does not consider any compression that may occur when the file is moved from the workDir to the tgtDir. In addition, a file's size tends to grow in large steps because of buffering by the Java stream code. (2)Specifying a maximumSize may temporarially cause duplicate records to appear in the repository because of some records in the original files and some in the new file. Once Repacker finishes scanning all files, the original files are removed and only the newly packed files are left. This issue of temporary having duplicate records in the repository will be resolved in a future release.

mothra.repacker.archiveDirectory -- the root directory into which working files are moved after the repacker has finished running, as a Hadoop URI. If not specified, the working files are deleted.

mothra.repacker.fileCacheSize -- The maximum size of the open file cache. This is the maximum number of open files maintained by the file cache for writing to files in the work directory. The repacker does not limit the number of files in the work directory; this only limits the number of open files. Once the cache reaches this number of open files and the packer needs to (re-)open a file, the packer closes the least-recently-used file. This value does not include the file handles required when reading incoming files or when copying files from the work directory to the data directory. The default is 2000; the minimum permitted is 128.

Linear Supertypes
StrictLogging, App, DelayedInit, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RepackerMain
  2. StrictLogging
  3. App
  4. DelayedInit
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. val archiveDir: Option[Path]

    Permalink
  5. def args: Array[String]

    Permalink
    Attributes
    protected
    Definition Classes
    App
    Annotations
    @deprecatedOverriding( "args should not be overridden" , "2.11.0" )
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. val compressCodec: Option[CompressionCodec]

    Permalink

    The compression codec used for files written to HDFS.

    The compression codec used for files written to HDFS. This may be set by setting the "mothra.repacker.compression" property. If that property is not set, CorePacker.DEFAULT_COMPRESSION is used.

  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. val executionStart: Long

    Permalink
    Definition Classes
    App
  12. val fileCacheSize: Int

    Permalink

    The maximum number of open files maintained by the file cache.

    The maximum number of open files maintained by the file cache. This is determined by the mothra.repacker.fileCacheSize Java property, or by CorePacker.DEFAULT_FILE_CACHE_SIZE when the property is not set. This value must be no less than CorePacker.MINIMUM_FILE_CACHE_SIZE.

    See also

    CorePacker.DEFAULT_FILE_CACHE_SIZE for a full description of this value.

  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. implicit val hadoopConf: Configuration

    Permalink
  16. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  17. val hoursPerFile: Int

    Permalink

    The number of hours covered by each file in the repository.

    The number of hours covered by each file in the repository. This is determined by the "mothra.repacker.hoursPerFile" property, or CorePacker.DEFAULT_HOURS_PER_FILE when that property is not set.

  18. val infoModel: InfoModel

    Permalink
  19. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  20. val logTaskCountInterval: Int

    Permalink

    How often to print log messages regarding the number of tasks, in seconds.

  21. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    StrictLogging
  22. def main(args: Array[String]): Unit

    Permalink
    Definition Classes
    App
    Annotations
    @deprecatedOverriding( "main should not be overridden" , "2.11.0" )
  23. val maxScanJobs: Int

    Permalink

    maxScanJobs specifies the maximum number of scanning threads to start.

    maxScanJobs specifies the maximum number of scanning threads to start. Since at most one thread can scan a directory, the default is to create 1 scanner per srcDir. Setting this to a value larger than the number of source directories has no effect. This may be modified by setting the mothra.repacker.maxScanJobs property.

  24. val maxThreads: Int

    Permalink

    maxThreads specifies the maximum number of scanning and reader/repacker threads to start.

    maxThreads specifies the maximum number of scanning and reader/repacker threads to start. By default this is

    (scanningJobs * (1 + * readersPerScanner))

    Setting it to a value larger than that has no effect.

    This may be modified by setting the mothra.repacker.readersPerScanner property.

  25. val maximumSize: Option[Long]

    Permalink

    The (approximate) maximum size file to create.

    The (approximate) maximum size file to create. Typically a file's size will not exceed this value by more than the maximum size of an IPFIX message, 64k. The default is no maximum. When a file's size exceeds this value, the file is closed and a new file is started.

  26. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  27. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  29. val packConf: PackerConfig

    Permalink
  30. val packLogic: PackingLogic

    Permalink
  31. val packer: CorePacker

    Permalink
  32. val positionalArgs: Array[String]

    Permalink
  33. var readersPerScanner: Int

    Permalink

    readersPerScanner specifies the number of file reader/repacker threads that are invoked per scanning thread.

    readersPerScanner specifies the number of file reader/repacker threads that are invoked per scanning thread. The default is 1. This may be modified by setting the mothra.repacker.readersPerScanner property.

  34. val removeList: ConcurrentLinkedQueue[Path]

    Permalink
  35. val rootDir: Path

    Permalink
  36. val runTimePackConf: Path

    Permalink
  37. var running: Boolean

    Permalink
  38. val sourceDirs: Array[Path]

    Permalink
  39. val sourceFileSystem: FileSystem

    Permalink
  40. val switches: Array[String]

    Permalink
  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  42. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  43. def usage(full: Boolean = false): Unit

    Permalink
  44. def version(): Unit

    Permalink
  45. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  47. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  48. val workDir: Path

    Permalink

Deprecated Value Members

  1. def delayedInit(body: ⇒ Unit): Unit

    Permalink
    Definition Classes
    App → DelayedInit
    Annotations
    @deprecated
    Deprecated

    (Since version 2.11.0) The delayedInit mechanism will disappear.

Inherited from StrictLogging

Inherited from App

Inherited from DelayedInit

Inherited from AnyRef

Inherited from Any

Ungrouped