Class

ch.cern.sparkmeasure

FlightRecorderStageMetrics

Related Doc: package sparkmeasure

Permalink

class FlightRecorderStageMetrics extends StageInfoRecorderListener

Spark Measure package: proof-of-concept tool for measuring Spark performance metrics This is based on using Spark Listeners as data source and collecting metrics in a ListBuffer The list buffer is then transformed into a DataFrame for analysis

Stage Metrics: collects and aggregates metrics at the end of each stage Task Metrics: collects data at task granularity

Use modes: Interactive mode from the REPL Flight recorder mode: records data and saves it for later processing

Supported languages: The tool is written in Scala, but it can be used both from Scala and Python

Example usage for stage metrics: val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark) stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show)

for task metrics: val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark) spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show() val df = taskMetrics.createTaskMetricsDF()

To use in flight recorder mode add: --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics

Created by Luca.Canali@cern.ch, March 2017

Linear Supertypes
StageInfoRecorderListener, SparkListener, SparkListenerInterface, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FlightRecorderStageMetrics
  2. StageInfoRecorderListener
  3. SparkListener
  4. SparkListenerInterface
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FlightRecorderStageMetrics(conf: SparkConf)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. val StageIdtoJobId: HashMap[Int, Int]

    Permalink
    Definition Classes
    StageInfoRecorderListener
  5. val accumulablesMetricsData: ListBuffer[StageAccumulablesInfo]

    Permalink
    Definition Classes
    StageInfoRecorderListener
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. val fullPath: String

    Permalink
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  15. lazy val logger: Logger

    Permalink
  16. val metricsFileName: String

    Permalink
  17. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit

    Permalink

    when the application stops serialize the content of stageMetricsData into a file in the driver's filesystem

    when the application stops serialize the content of stageMetricsData into a file in the driver's filesystem

    Definition Classes
    FlightRecorderStageMetrics → SparkListener → SparkListenerInterface
  21. def onApplicationStart(applicationStart: SparkListenerApplicationStart): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  22. def onBlockManagerAdded(blockManagerAdded: SparkListenerBlockManagerAdded): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  23. def onBlockManagerRemoved(blockManagerRemoved: SparkListenerBlockManagerRemoved): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  24. def onBlockUpdated(blockUpdated: SparkListenerBlockUpdated): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  25. def onEnvironmentUpdate(environmentUpdate: SparkListenerEnvironmentUpdate): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  26. def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  27. def onExecutorMetricsUpdate(executorMetricsUpdate: SparkListenerExecutorMetricsUpdate): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  28. def onExecutorRemoved(executorRemoved: SparkListenerExecutorRemoved): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  29. def onJobEnd(jobEnd: SparkListenerJobEnd): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  30. def onJobStart(jobStart: SparkListenerJobStart): Unit

    Permalink
    Definition Classes
    StageInfoRecorderListener → SparkListener → SparkListenerInterface
  31. def onOtherEvent(event: SparkListenerEvent): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  32. def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit

    Permalink

    This methods fires at the end of the stage and collects metrics flattened into the stageMetricsData ListBuffer Note all times are in ms, cpu time and shufflewrite are originally in nanosec, thus in the code are divided by 1e6

    This methods fires at the end of the stage and collects metrics flattened into the stageMetricsData ListBuffer Note all times are in ms, cpu time and shufflewrite are originally in nanosec, thus in the code are divided by 1e6

    Definition Classes
    StageInfoRecorderListener → SparkListener → SparkListenerInterface
  33. def onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  34. def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  35. def onTaskGettingResult(taskGettingResult: SparkListenerTaskGettingResult): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  36. def onTaskStart(taskStart: SparkListenerTaskStart): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  37. def onUnpersistRDD(unpersistRDD: SparkListenerUnpersistRDD): Unit

    Permalink
    Definition Classes
    SparkListener → SparkListenerInterface
  38. val stageMetricsData: ListBuffer[StageVals]

    Permalink
    Definition Classes
    StageInfoRecorderListener
  39. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  40. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  41. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  42. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from StageInfoRecorderListener

Inherited from SparkListener

Inherited from SparkListenerInterface

Inherited from AnyRef

Inherited from Any

Ungrouped