Package

ch.cern

sparkmeasure

Permalink

package sparkmeasure

Visibility
  1. Public
  2. All

Type Members

  1. class FlightRecorderStageMetrics extends StageInfoRecorderListener

    Permalink

    Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis

    Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis

    Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity

    Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing

    Supported languages: The tool is written in Scala, but it can be used both from Scala and Python

    Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)

    for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()

    To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics

    Created by Luca.Canali@cern.ch, March 2017

  2. class FlightRecorderTaskMetrics extends TaskInfoRecorderListener

    Permalink
  3. case class StageAccumulablesInfo(jobId: Int, stageId: Int, submissionTime: Long, completionTime: Long, accId: Long, name: String, value: Long) extends Product with Serializable

    Permalink
  4. class StageInfoRecorderListener extends SparkListener

    Permalink
  5. case class StageMetrics(sparkSession: SparkSession) extends Product with Serializable

    Permalink
  6. case class StageVals(jobId: Int, stageId: Int, name: String, submissionTime: Long, completionTime: Long, stageDuration: Long, numTasks: Int, executorRunTime: Long, executorCpuTime: Long, executorDeserializeTime: Long, executorDeserializeCpuTime: Long, resultSerializationTime: Long, jvmGCTime: Long, resultSize: Long, numUpdatedBlockStatuses: Int, diskBytesSpilled: Long, memoryBytesSpilled: Long, peakExecutionMemory: Long, recordsRead: Long, bytesRead: Long, recordsWritten: Long, bytesWritten: Long, shuffleFetchWaitTime: Long, shuffleTotalBytesRead: Long, shuffleTotalBlocksFetched: Long, shuffleLocalBlocksFetched: Long, shuffleRemoteBlocksFetched: Long, shuffleWriteTime: Long, shuffleBytesWritten: Long, shuffleRecordsWritten: Long) extends Product with Serializable

    Permalink

    Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis

    Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis

    Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity

    Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing

    Supported languages: The tool is written in Scala, but it can be used both from Scala and Python

    Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)

    for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()

    To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics

    Created by Luca.Canali@cern.ch, March 2017

  7. case class TaskAccumulablesInfo(jobId: Int, stageId: Int, taskId: Long, submissionTime: Long, finishTime: Long, accId: Long, name: String, value: Long) extends Product with Serializable

    Permalink
  8. class TaskInfoRecorderListener extends SparkListener

    Permalink
  9. case class TaskMetrics(sparkSession: SparkSession, gatherAccumulables: Boolean = false) extends Product with Serializable

    Permalink
  10. case class TaskVals(jobId: Int, stageId: Int, index: Long, launchTime: Long, finishTime: Long, duration: Long, schedulerDelay: Long, executorId: String, host: String, taskLocality: Int, speculative: Boolean, gettingResultTime: Long, successful: Boolean, executorRunTime: Long, executorCpuTime: Long, executorDeserializeTime: Long, executorDeserializeCpuTime: Long, resultSerializationTime: Long, jvmGCTime: Long, resultSize: Long, numUpdatedBlockStatuses: Int, diskBytesSpilled: Long, memoryBytesSpilled: Long, peakExecutionMemory: Long, recordsRead: Long, bytesRead: Long, recordsWritten: Long, bytesWritten: Long, shuffleFetchWaitTime: Long, shuffleTotalBytesRead: Long, shuffleTotalBlocksFetched: Long, shuffleLocalBlocksFetched: Long, shuffleRemoteBlocksFetched: Long, shuffleWriteTime: Long, shuffleBytesWritten: Long, shuffleRecordsWritten: Long) extends Product with Serializable

    Permalink

    Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis

    Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis

    Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity

    Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing

    Supported languages: The tool is written in Scala, but it can be used both from Scala and Python

    Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)

    for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()

    To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics

    Created by Luca.Canali@cern.ch, March 2017

Value Members

  1. object Utils

    Permalink

    The object Utils contains some helper code for the sparkMeasure package The methods formatDuration and formatBytes are used for printing stage metrics reports The methods readSerializedStageMetrics and readSerializedTaskMetrics are used to read data serialized into files by "flight recorder" mode

Ungrouped