package sparkmeasure
- Alphabetic
- Public
- All
Type Members
-
class
FlightRecorderStageMetrics extends StageInfoRecorderListener
FlightRecorderStageMetrics - Use Spark Listeners defined in stagemetrics.scala to record task metrics data aggregated at the Stage level, without changing the application code.
FlightRecorderStageMetrics - Use Spark Listeners defined in stagemetrics.scala to record task metrics data aggregated at the Stage level, without changing the application code. The resulting data can be saved to a file and/or printed to stdout.
Use: by adding the following configuration to spark-submit (or Spark Session) configuration --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Additional configuration parameters: --conf spark.sparkmeasure.outputFormat=<format>, valid values: java,json,json_to_hadoop default "json" note: json and java serialization formats, write to the driver local filesystem json_to_hadoop, writes to JSON serialized metrics to HDFS or to an Hadoop compliant filesystem, such as s3a
--conf spark.sparkmeasure.outputFilename=<output file>, default: "/tmp/stageMetrics_flightRecorder" --conf spark.sparkmeasure.printToStdout=<true|false>, default false. Set to true to print JSON serialized metrics to stdout.
-
class
FlightRecorderTaskMetrics extends TaskInfoRecorderListener
FlightRecorderTaskMetrics - Use a Spark Listener to record task metrics data and save them in a file
FlightRecorderTaskMetrics - Use a Spark Listener to record task metrics data and save them in a file
Use: by adding the following configuration to spark-submit (or Spark Session) configuration --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderTaskMetrics
Additional configuration parameters: --conf spark.sparkmeasure.outputFormat=<format>, valid values: java,json,json_to_hadoop default "json" note: json and java serialization formats, write to the driver local filesystem json_to_hadoop, writes to JSON serialized metrics to HDFS or to an Hadoop compliant filesystem, such as s3a
--conf spark.sparkmeasure.outputFilename=<output file>, default: "/tmp/taskMetrics_flightRecorder" --conf spark.sparkmeasure.printToStdout=<true|false>, default false. Set to true to print JSON serialized metrics to stdout.
-
case class
PushGateway(serverIPnPort: String, metricsJob: String) extends Product with Serializable
serverIPnPort: String with prometheus pushgateway hostIP:Port, metricsJob: job name
- case class StageAccumulablesInfo(jobId: Int, stageId: Int, submissionTime: Long, completionTime: Long, accId: Long, name: String, value: Long) extends Product with Serializable
- class StageInfoRecorderListener extends SparkListener
- case class StageMetrics(sparkSession: SparkSession) extends Product with Serializable
-
case class
StageVals(jobId: Int, jobGroup: String, stageId: Int, name: String, submissionTime: Long, completionTime: Long, stageDuration: Long, numTasks: Int, executorRunTime: Long, executorCpuTime: Long, executorDeserializeTime: Long, executorDeserializeCpuTime: Long, resultSerializationTime: Long, jvmGCTime: Long, resultSize: Long, numUpdatedBlockStatuses: Int, diskBytesSpilled: Long, memoryBytesSpilled: Long, peakExecutionMemory: Long, recordsRead: Long, bytesRead: Long, recordsWritten: Long, bytesWritten: Long, shuffleFetchWaitTime: Long, shuffleTotalBytesRead: Long, shuffleTotalBlocksFetched: Long, shuffleLocalBlocksFetched: Long, shuffleRemoteBlocksFetched: Long, shuffleWriteTime: Long, shuffleBytesWritten: Long, shuffleRecordsWritten: Long) extends Product with Serializable
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)
The tool is based on using Spark Listeners as data source and collecting metrics in a ListBuffer of a case class that encapsulates Spark task metrics. The List Buffer is then transformed into a DataFrame for ease of reporting and analysis.
- case class TaskAccumulablesInfo(jobId: Int, stageId: Int, taskId: Long, submissionTime: Long, finishTime: Long, accId: Long, name: String, value: Long) extends Product with Serializable
- class TaskInfoRecorderListener extends SparkListener
- case class TaskMetrics(sparkSession: SparkSession, gatherAccumulables: Boolean = false) extends Product with Serializable
-
case class
TaskVals(jobId: Int, jobGroup: String, stageId: Int, index: Long, launchTime: Long, finishTime: Long, duration: Long, schedulerDelay: Long, executorId: String, host: String, taskLocality: Int, speculative: Boolean, gettingResultTime: Long, successful: Boolean, executorRunTime: Long, executorCpuTime: Long, executorDeserializeTime: Long, executorDeserializeCpuTime: Long, resultSerializationTime: Long, jvmGCTime: Long, resultSize: Long, numUpdatedBlockStatuses: Int, diskBytesSpilled: Long, memoryBytesSpilled: Long, peakExecutionMemory: Long, recordsRead: Long, bytesRead: Long, recordsWritten: Long, bytesWritten: Long, shuffleFetchWaitTime: Long, shuffleTotalBytesRead: Long, shuffleTotalBlocksFetched: Long, shuffleLocalBlocksFetched: Long, shuffleRemoteBlocksFetched: Long, shuffleWriteTime: Long, shuffleBytesWritten: Long, shuffleRecordsWritten: Long) extends Product with Serializable
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Example usage for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) taskMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)
The tool is based on using Spark Listeners as data source and collecting metrics in a ListBuffer of a case class that encapsulates Spark task metrics. The List Buffer is then transformed into a DataFrame for ease of reporting and analysis.
Value Members
-
object
IOUtils
The object IOUtils contains some helper code for the sparkMeasure package The methods readSerializedStageMetrics and readSerializedTaskMetrics are used to read data serialized into files by "flight recorder" mode.
The object IOUtils contains some helper code for the sparkMeasure package The methods readSerializedStageMetrics and readSerializedTaskMetrics are used to read data serialized into files by "flight recorder" mode. Two serialization modes are supported currently: java serialization and JSON serialization with jackson library.
-
object
Utils
The object Utils contains some helper code for the sparkMeasure package The methods formatDuration and formatBytes are used for printing stage metrics reports