This methods fires at the end of the stage and collects metrics flattened into the stageMetricsData ListBuffer Note all reported times are in ms, cpu time and shuffle write time are originally in nanoseconds, thus in the code are divided by 1 million to normalize them to milliseconds
This methods fires at the end of the stage and collects metrics flattened into the stageMetricsData ListBuffer Note all reported times are in ms, cpu time and shuffle write time are originally in nanoseconds, thus in the code are divided by 1 million to normalize them to milliseconds
FlightRecorderStageMetrics - Use Spark Listeners defined in stagemetrics.scala to record task metrics data aggregated at the Stage level, without changing the application code. The resulting data can be saved to a file and/or printed to stdout.
Use: by adding the following configuration to spark-submit (or Spark Session) configuration --conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
Additional configuration parameters: --conf spark.sparkmeasure.outputFormat=<format>, valid values: java,json,json_to_hadoop default "json" note: json and java serialization formats, write to the driver local filesystem json_to_hadoop, writes to JSON serialized metrics to HDFS or to an Hadoop compliant filesystem, such as s3a
--conf spark.sparkmeasure.outputFilename=<output file>, default: "/tmp/stageMetrics_flightRecorder" --conf spark.sparkmeasure.printToStdout=<true|false>, default false. Set to true to print JSON serialized metrics to stdout.