Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis
Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis
Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing
Supported languages: The tool is written in Scala, but it can be used both from Scala and Python
Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)
for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()
To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Created by Luca.Canali@cern.ch, March 2017
Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis
Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing
Supported languages: The tool is written in Scala, but it can be used both from Scala and Python
Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)
for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()
To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Created by Luca.Canali@cern.ch, March 2017
The object Utils contains some helper code for the sparkMeasure package The methods formatDuration and formatBytes are used for printing stage metrics reports The methods readSerializedStageMetrics and readSerializedTaskMetrics are used to read data serialized into files by "flight recorder" mode
Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis
Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity
Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing
Supported languages: The tool is written in Scala, but it can be used both from Scala and Python
Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)
for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()
To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Created by Luca.Canali@cern.ch, March 2017