object RepackerMain extends App with StrictLogging
Object to implement the Reacker application.
Typical Usage in a Spark environment:
spark-submit --class org.cert.netsa.mothra.packer.tools.RepackerMain mothra-tools.jar
<partition-conf> <dest-dir> <work-dir> <s1> [<s2> <s3> ...]
where:
partition-conf: Partitioning configuration file as Hadoop URI
dest-dir: Root destination directory as Hadoop URI
work-dir: Working directory on the local disk (not file://)
s1..sn: Source directories as Hadoop URIs
Makes a single recursive scan of the source directories <s1>,<s2>,... for IPFIX files. Splits the IPFIX records in the source files into output file(s) in a time-based directory structure based on the partitioning rules in the partitioning configuration file <partition-conf>. The output files are initially created in the working directory <work-dir>, and, once ALL input files have been read, are moved to the destination directory and the initial source files removed. The dest-dir may be a source directory.
Repacker runs as a batch process; not as a daemon.
Example/Intended uses for the Repacker include:
(1)Changing how the records are packed---for example packing by the silkAppLabel instead of the protocolIdentifier.
(2)Combining multiple files for an hour into a single file for that hour, merging hourly files into a file that covers a longer duration, or spliting a longer duration file into smaller files.
(3)Changing the compression algorithm used on the IPFIX files.
Currently the repacker does NOT support modifying the records, it only moves the records into different files.
Repacker uses multiple threads. By default, each source directory specified on the command line gets a dedicated thread to scanning that directory and its subdirectories recursively for IPFIX files, and another thread decidated to reading those files and repacking them. The repacker does not support having multiple threads scan a directory, but it does allow multiple threads to process a single directory's files.
The <work-dir> must NOT be a source directory or a subdirectory of a source directory. To repack the files in an existing working directory, use a different working directory. The repacker ignores any files in the <work-dir> that exist when the repacker is started, and it ignores files placed there by other programs.
The property values that are used by the repacker are:
mothra.repacker.compression -- the compression algorithm used for the new IPFIX files. Values
typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and
default. The empty string indicates no compression.
mothra.repacker.hoursPerFile -- The number of hours covered by each file in the repository.
The valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.
mothra.repacker.maxScanJobs -- the maximum number of threads dedicated to scanning the source
directories. The default (and maximum) value is the number of source directories.
mothra.repacker.readersPerScanner -- the number of reader/repacker threads to create for each
source directory. The default is 1.
mothra.repacker.maxThreads -- the maximum number of worker (scanner and repacker) threads to
create. The default value is computed using the formula: (maxScanJobs * (1 +
readersPerScanner)).
mothra.repacker.maximumSize -- the (approximate) maximum file size to create. When specified,
a work-file that exceeds this size is closed and moved into the repository. NOTES: (1)This value
uses the uncompressed file size, and does not consider any compression that may occur when the
file is moved from the workDir to the tgtDir. In addition, a file's size tends to grow in large
steps because of buffering by the Java stream code. (2)Specifying a maximumSize may
temporarially cause duplicate records to appear in the repository because of some records in the
original files and some in the new file. Once Repacker finishes scanning all files, the original
files are removed and only the newly packed files are left. This issue of temporary having
duplicate records in the repository will be resolved in a future release.
mothra.repacker.archiveDirectory -- the root directory into which working files are moved
after the repacker has finished running, as a Hadoop URI. If not specified, the working files
are deleted.
mothra.repacker.fileCacheSize -- The maximum size of the open file cache. This is the maximum
number of open files maintained by the file cache for writing to files in the work directory.
The repacker does not limit the number of files in the work directory; this only limits the
number of open files. Once the cache reaches this number of open files and the packer needs to
(re-)open a file, the packer closes the least-recently-used file. This value does not include
the file handles required when reading incoming files or when copying files from the work
directory to the data directory. The default is 2000; the minimum permitted is 128.
- Alphabetic
- By Inheritance
- RepackerMain
- StrictLogging
- App
- DelayedInit
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val archiveDir: Option[Path]
-
def
args: Array[String]
- Attributes
- protected
- Definition Classes
- App
- Annotations
- @deprecatedOverriding( "args should not be overridden" , "2.11.0" )
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
compressCodec: Option[CompressionCodec]
The compression codec used for files written to HDFS.
The compression codec used for files written to HDFS. This may be set by setting the "mothra.repacker.compression" property. If that property is not set, CorePacker.DEFAULT_COMPRESSION is used.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
executionStart: Long
- Definition Classes
- App
- Annotations
- @deprecatedOverriding( ... , "2.11.0" )
-
val
fileCacheSize: Int
The maximum number of open files maintained by the file cache.
The maximum number of open files maintained by the file cache. This is determined by the
mothra.repacker.fileCacheSizeJava property, or byCorePacker.DEFAULT_FILE_CACHE_SIZEwhen the property is not set. This value must be no less thanCorePacker.MINIMUM_FILE_CACHE_SIZE.- See also
CorePacker.DEFAULT_FILE_CACHE_SIZE for a full description of this value.
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- implicit val hadoopConf: Configuration
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
hoursPerFile: Int
The number of hours covered by each file in the repository.
The number of hours covered by each file in the repository. This is determined by the "mothra.repacker.hoursPerFile" property, or CorePacker.DEFAULT_HOURS_PER_FILE when that property is not set.
- val infoModel: InfoModel
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
logTaskCountInterval: Int
How often to print log messages regarding the number of tasks, in seconds.
-
val
logger: Logger
- Attributes
- protected
- Definition Classes
- StrictLogging
-
def
main(args: Array[String]): Unit
- Definition Classes
- App
- Annotations
- @deprecatedOverriding( "main should not be overridden" , "2.11.0" )
-
val
maxScanJobs: Int
maxScanJobs specifies the maximum number of scanning threads to start.
maxScanJobs specifies the maximum number of scanning threads to start. Since at most one thread can scan a directory, the default is to create 1 scanner per srcDir. Setting this to a value larger than the number of source directories has no effect. This may be modified by setting the mothra.repacker.maxScanJobs property.
-
val
maxThreads: Int
maxThreads specifies the maximum number of scanning and reader/repacker threads to start.
maxThreads specifies the maximum number of scanning and reader/repacker threads to start. By default this is
(scanningJobs * (1 + * readersPerScanner))
Setting it to a value larger than that has no effect.
This may be modified by setting the mothra.repacker.readersPerScanner property.
-
val
maximumSize: Option[Long]
The (approximate) maximum size file to create.
The (approximate) maximum size file to create. Typically a file's size will not exceed this value by more than the maximum size of an IPFIX message, 64k. The default is no maximum. When a file's size exceeds this value, the file is closed and a new file is started.
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val packConf: PackerConfig
- val packLogic: PackingLogic
- val packer: CorePacker
- val positionalArgs: Array[String]
-
var
readersPerScanner: Int
readersPerScanner specifies the number of file reader/repacker threads that are invoked per scanning thread.
readersPerScanner specifies the number of file reader/repacker threads that are invoked per scanning thread. The default is 1. This may be modified by setting the mothra.repacker.readersPerScanner property.
- val removeList: ConcurrentLinkedQueue[Path]
- val rootDir: Path
- val runTimePackConf: Path
- var running: Boolean
- val sourceDirs: Array[Path]
- val sourceFileSystem: FileSystem
- val switches: Array[String]
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
- def usage(full: Boolean = false): Unit
- def version(): Unit
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- val workDir: Path