Packages

package tools

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Package Members

  1. package collector
  2. package silk_appender

Type Members

  1. class CollectorOptions extends ToolOptions
  2. class FileJoinerOptions extends ToolOptions
  3. class FileSanitizerOptions extends ToolOptions
  4. class InvariantPackerOptions extends ToolOptions
  5. class PackerOptions extends ToolOptions
  6. class RepackerOptions extends ToolOptions
  7. class RollupDayOptions extends ToolOptions
  8. class SilkAppenderOptions extends ToolOptions
  9. abstract class Tool extends AnyRef
  10. abstract class ToolOptions extends OptionParser[Unit]

Value Members

  1. object Collector extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

  2. object FileJoiner extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

  3. object FileJoinerMain extends App with StrictLogging

    Object to implement the FileJoiner application.

    Object to implement the FileJoiner application.

    Typical Usage in a Spark environment:

    spark-submit --class org.cert.netsa.mothra.packer.tools.FileJoinerMain mothra-tools.jar <s1> [<s2> <s3> ...]

    where:

    s1..sn: Directories to process, as Hadoop URIs

    FileJoiner reduces the number of data files in a Mothra repository. It may also be used to modify the files' compression.

    FileJoiner runs as a batch process, not as a daemon.

    FileJoiner makes a single recursive scan of the source directories <s1>, <s2>, ... for files whose names match the pattern "YYYYMMDD.HH." or "YYYYMMDD.HH-PTddH." (It looks for files matching the regular expression ^\d{8}\.\d{2}(?:-PT\d\d?H)?\.) Files whose names match that pattern are processed by FileJoiner to create a single new file in the same directory that has the same prefix as the originals, and then the original file(s) are removed.

    By default, files that share the same prefix are only processed when there are two or more files. To force re-writing when there is a single file, set the Java property mothra.filejoiner.minCountToJoin to a value less than 2. The property may also be used to create a new file only when an "excessive" number of files share the same prefix.

    There is always a single thread that recursively scans the directories. The number of threads that joins the files may be set by specifying the mothra.filejoiner.maxThreads Java property. If not specified, the default is 6.

    FileJoiner may be run so that either it spawns a thread for every directory that contains files to be joined or it spawns a thread for each set of files in a directory that have the same prefix. The behavior is controlled whether the mothra.filejoiner.spawnThread Java property is set to by-prefix or by-directory. The default is by-directory. (For backwards compatibility, by-hour is an alias for by-prefix.)

    By default, FileJoiner does not compress the files it writes. (NOTE: It should support writing the output using the same compression as the input.) To specify the compression codec that it should use, specify the mothra.filejoiner.compression Java property. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression.

    FileJoiner joins files sharing the same prefix into a single file by default. The mothra.filejoiner.maximumSize Java property may be used to limit the maximum file size. The size is for the compressed file if compression is active. The value is approximate since it is only checked after the data appears on disk which occurs in large blocks because of buffering by the Java stream code and the compression algorithm. (By setting that property and mothra.filejoiner.minCountToJoin to 1, you can force large files to be split into smaller ones, making the FileJoiner a file-splitter.)

  4. object FileSanitizer extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

  5. object FileSanitizerMain extends App with StrictLogging

    Object to implement the FileSanitizer application.

    Object to implement the FileSanitizer application.

    Typical Usage in a Spark environment:

    spark-submit --class org.cert.netsa.mothra.packer.tools.FileSanitizerMain mothra-tools.jar <f1>[,<f2>[,<f3>...]] <s1> [<s2> <s3> ...]

    where:

    f1..fn: Names of InfoElements to be removed from the files s1..sn: Directories to process, as Hadoop URIs

    FileSanitizer removes Information Element fields from the data files in a Mothra repository. In addition, when multiple files share the same name except for the UUID, FileSanitizer combines those files together.

    The IE fields to be removed must be specified in a single argument, as a comma-separated list of names, such as sourceTransportPort,destinationTransportPort.

    Each remaining argument is a single directory to process.

    FileSanitizer runs as a batch process, not as a daemon.

    FileSanitizer makes a single recursive scan of the source directories <s1>, <s2>, ... for files whose names match the pattern "YYYYMMDD.HH." or "YYYYMMDD.HH-PTddH." (It looks for files matching the regular expression ^\d{8}\.\d{2}(?:-PT\d\d?H)?\.) Files whose names match that pattern are processed by FileSanitizer to remove the named Information Elements. All files where the regular expression matched the same string are joined into a single file, similar to the FileJoiner. Finally, the original files are removed.

    There is always a single thread that recursively scans the directories. The number of threads that sanitizes and joins the files may be set by specifying the mothra.filesanitizer.maxThreads Java property. If not specified, the default is 6.

    FileSanitizer may be run so that either it spawns a thread for every directory that contains files to process or it spawns a thread for each set of files in a directory that have the same prefix. The behavior is controlled whether the mothra.filesanitizer.spawnThread Java property is set to by-prefix or by-directory. The default is by-directory. (For backwards compatibility, by-hour is an alias for by-prefix.)

    By default, FileSanitizer does not compress the files it writes. (NOTE: It should support writing the output using the same compression as the input.) To specify the compression codec that it should use, specify the mothra.filesanitizer.compression Java property. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression.

    FileSanitizer joins the files sharing the same prefix into a single file by default. The mothra.filesanitizer.maximumSize Java property may be used to limit the maximum file size. The size is for the compressed file if compression is active. The value is approximate since it is only checked after the data appears on disk which occurs in large blocks because of buffering by the Java stream code and the compression algorithm.

  6. object InvariantPacker extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

  7. object InvariantPackerMain extends App with StrictLogging

    Object to implement the InvariantPacker application.

    Object to implement the InvariantPacker application.

    Typical Usage in a Spark environment:

    spark-submit --class org.cert.netsa.mothra.packer.tools.InvariantPackerMain mothra-tools.jar [--one-shot] <sourceDir> <destinationDir> <partitionerFile>

    Processes files created by super_mediator running in invariant mode and writes them into HDFS.

  8. object Packer extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

  9. object PackerMain extends App with StrictLogging

    Object to implement the Packer application

    Object to implement the Packer application

    Typical usage in a Spark environment:

    spark-submit --class org.cert.netsa.mothra.packer.tools.PackerMain mothra-tools.jar [--one-shot] <srcDir> <destDir> <workDir> <partitioner>

    where:

    srcDir: Source (incoming) directory as Hadoop URI destDir: Destination directory as Hadoop URI workDir: Working directory on the local disk (not file://) partitioner: Partitioning file as Hadoop URIs

    Packer scans the source directory (srcDir) for IPFIX files. It splits the IPFIX records in each file into output file(s) in a time-based directory structure based on the partitioning rules in the partitioning file (partitioner). The output files are initially created in the working directory (workDir), and when they meet size and/or age thresholds, they are moved to the destination directory (destDir).

    If "--one-shot" is included on the command line, the srcDir is only scanned one time. Once all files in srcDir have been packed (or they fail to be packed after some number of attempts), the packer exits.

    The Java property values that are used by Packer are:

    mothra.packer.compression -- The compression to use for files written to HDFS. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression. The default is no compression.

    mothra.packer.maxPackJobs -- The size of the thread pool that determines the maximum number of input files that may be processed simultaneously. A larger value provides more throughput. The default is 1.

    mothra.packer.hoursPerFile -- The number of hours covered by each file in the repository. The valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.

    mothra.packer.pollingInterval -- How long the main thread sleeps (in seconds) between scans (polls) of the source directory checking for IPFIX files to process. The default is 30.

    mothra.packer.workDir.checkInterval -- The value for how often, in seconds, to check the sizes and ages of the files in the working directory. The default is 60. When the checkInterval is reached, the sizes and ages of the files in the working directory are checked. Files that meet ONE of the following criteria are closed and moved into the data repository. The criteria are:

    --- Files that were created more than maximumAge seconds ago. Since files are only checked at this interval, a file could potentially be one interval older than the maximumAge.

    --- Files whose size exceeds maximumSize. Since a file's size is not continuously monitored, a file could be larger than this size, and the user should set this value appropriately.

    --- Files whose size is at least minimumsSize AND that were created at least minimumAge seconds ago.

    mothra.packer.workDir.maximumAge -- Files in the working directory that were created over this number of seconds ago are always moved into the repository, regardless of their size. The default value is 1800 seconds (30 minutes).

    mothra.packer.workDir.maximumSize -- Files in the working directory whose size, in octets, is greater than this value are always moved into the repository, regardless of their age. The default value is 104857600 bytes (100MiB).

    mothra.packer.workDir.minimumAge -- Files in the working directory are NOT eligible to be moved into the repository if they are younger this age (were created less this number of seconds ago) unless their size exceeds maximumSize. The default is 600 seconds (5 minutes).

    mothra.packer.workDir.minimumSize -- Files in the working directory are NOT eligible to moved moved into the repository if they are smaller than this size (in octets) unless their age exceeds maximumAge. The default is 67108864 bytes (64 MiB).

    mothra.packer.numMoveThreads -- The size of the thread pool that closes the work files and moves them to the destination directory. A task is potentially created every workdirCheckInterval seconds if files are determined to have met the limits. The default is 4.

    mothra.packer.archiveDirectory -- The root directory into which working files are moved after the packer copies their content to the repository, as a Hadoop URI. If not specified, the working files are deleted.

    mothra.packer.packAttempts -- The number of times the packer attempts to process a file found in the srcDir. After this number of failed attempts, the file is ignored by this invocation of the packer. The default is 3.

    mothra.packer.fileCacheSize -- The maximum size of the open file cache. This is the maximum number of open files maintained by the file cache for writing to files in the work directory. The packer does not limit the number of files in the work directory; this only limits the number of open files. Once the cache reaches this number of open files and the packer needs to (re-)open a file, the packer closes the least-recently-used file. This value does not include the file handles required when reading incoming files or when copying files from the work directory to the data directory. The default is 2000; the minimum permitted is 128.

  10. object Repacker extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

  11. object RepackerMain extends App with StrictLogging

    Object to implement the Reacker application.

    Object to implement the Reacker application.

    Typical Usage in a Spark environment:

    spark-submit --class org.cert.netsa.mothra.packer.tools.RepackerMain mothra-tools.jar <partition-conf> <dest-dir> <work-dir> <s1> [<s2> <s3> ...]

    where:

    partition-conf: Partitioning configuration file as Hadoop URI

    dest-dir: Root destination directory as Hadoop URI

    work-dir: Working directory on the local disk (not file://)

    s1..sn: Source directories as Hadoop URIs

    Makes a single recursive scan of the source directories <s1>,<s2>,... for IPFIX files. Splits the IPFIX records in the source files into output file(s) in a time-based directory structure based on the partitioning rules in the partitioning configuration file <partition-conf>. The output files are initially created in the working directory <work-dir>, and, once ALL input files have been read, are moved to the destination directory and the initial source files removed. The dest-dir may be a source directory.

    Repacker runs as a batch process; not as a daemon.

    Example/Intended uses for the Repacker include:

    (1)Changing how the records are packed---for example packing by the silkAppLabel instead of the protocolIdentifier.

    (2)Combining multiple files for an hour into a single file for that hour, merging hourly files into a file that covers a longer duration, or spliting a longer duration file into smaller files.

    (3)Changing the compression algorithm used on the IPFIX files.

    Currently the repacker does NOT support modifying the records, it only moves the records into different files.

    Repacker uses multiple threads. By default, each source directory specified on the command line gets a dedicated thread to scanning that directory and its subdirectories recursively for IPFIX files, and another thread decidated to reading those files and repacking them. The repacker does not support having multiple threads scan a directory, but it does allow multiple threads to process a single directory's files.

    The <work-dir> must NOT be a source directory or a subdirectory of a source directory. To repack the files in an existing working directory, use a different working directory. The repacker ignores any files in the <work-dir> that exist when the repacker is started, and it ignores files placed there by other programs.

    The property values that are used by the repacker are:

    mothra.repacker.compression -- the compression algorithm used for the new IPFIX files. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression.

    mothra.repacker.hoursPerFile -- The number of hours covered by each file in the repository. The valid range is 1 (a file for each hour) to 24 (one file per day). The default is 1.

    mothra.repacker.maxScanJobs -- the maximum number of threads dedicated to scanning the source directories. The default (and maximum) value is the number of source directories.

    mothra.repacker.readersPerScanner -- the number of reader/repacker threads to create for each source directory. The default is 1.

    mothra.repacker.maxThreads -- the maximum number of worker (scanner and repacker) threads to create. The default value is computed using the formula: (maxScanJobs * (1 + readersPerScanner)).

    mothra.repacker.maximumSize -- the (approximate) maximum file size to create. When specified, a work-file that exceeds this size is closed and moved into the repository. NOTES: (1)This value uses the uncompressed file size, and does not consider any compression that may occur when the file is moved from the workDir to the tgtDir. In addition, a file's size tends to grow in large steps because of buffering by the Java stream code. (2)Specifying a maximumSize may temporarially cause duplicate records to appear in the repository because of some records in the original files and some in the new file. Once Repacker finishes scanning all files, the original files are removed and only the newly packed files are left. This issue of temporary having duplicate records in the repository will be resolved in a future release.

    mothra.repacker.archiveDirectory -- the root directory into which working files are moved after the repacker has finished running, as a Hadoop URI. If not specified, the working files are deleted.

    mothra.repacker.fileCacheSize -- The maximum size of the open file cache. This is the maximum number of open files maintained by the file cache for writing to files in the work directory. The repacker does not limit the number of files in the work directory; this only limits the number of open files. Once the cache reaches this number of open files and the packer needs to (re-)open a file, the packer closes the least-recently-used file. This value does not include the file handles required when reading incoming files or when copying files from the work directory to the data directory. The default is 2000; the minimum permitted is 128.

  12. object RollupDay extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

  13. object RollupDayMain extends App with StrictLogging

    Object to implement the RollupDay application.

    Object to implement the RollupDay application.

    Typical Usage in a Spark environment:

    spark-submit --class org.cert.netsa.mothra.packer.tools.RollupDayMain mothra-tools.jar <s1> [<s2> <s3> ...]

    where:

    s1..sn: Directories to process, as Hadoop URIs

    RollupDay reduces the number of data files in a Mothra repository. It may also be used to modify the files' compression.

    RollupDay runs as a batch process, not as a daemon.

    RollupDay makes a single recursive scan of the source directories <s1>, <s2>, ... for files whose names match the pattern "YYYYMMDD.HH." or "YYYYMMDD.HH-PTddH." (It looks for files matching the regular expression ^\d{8}\.\d{2}(?:-PT\d\d?H)?\.) Files whose names match that pattern and reside in the same directory are processed by RollupDay to create a single new file (see next paragraph) in the same directory containing the records in all files in that directory.

    RollupDay joins the files in a directory into a single file by default. The mothra.rollupday.maximumSize Java property may be used to limit the maximum file size. The size is for the compressed file if compression is active. The value is approximate since it is only checked after the data appears on disk which occurs in large blocks because of buffering by the Java stream code and the compression algorithm.

    There is always a single thread that recursively scans the directories. The number of threads that joins the files may be set by specifying the mothra.rollupday.maxThreads Java property. If not specified, the default is 6.

    By default, RollupDay does not compress the files it writes. (NOTE: It should support writing the output using the same compression as the input.) To specify the compression codec that it should use, specify the mothra.rollupday.compression Java property. Values typically supported by Hadoop include bzip2, gzip, lz4, lzo, lzop, snappy, and default. The empty string indicates no compression.

  14. object SilkAppender extends Tool

    Wrapper to provide a better command-line argument experience over the top of the main packer class.

    Wrapper to provide a better command-line argument experience over the top of the main packer class. Things should be folded together in the future.

Ungrouped