InfluxDBSink: write Spark metrics and application info in near real-time to InfluxDB
use this mode to monitor Spark execution workload
use for Grafana dashboard and analytics of job execution
How to use: attach the InfluxDBSInk to a Spark Context using the extra listener infrastructure.
Example:
--conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSink
Configuration for InfluxDBSink is handled with Spark conf parameters:
spark.sparkmeasure.influxdbURL (default "http://localhost:8086")
spark.sparkmeasure.influxdbUsername (default "", not this can be empty if InfluxDB is configured with no authentication)
spark.sparkmeasure.influxdbPassword (default "")
spark.sparkmeasure.influxdbName (default "sparkmeasure")
spark.sparkmeasure.influxdbStagemetrics, (boolean, default is false)
spark.sparkmeasure.influxdbEnableBatch, boolean, default true
Note: this is to improve write performance,
but it requires to explicitly stopping Spark Session for clean exit: spark.stop()
consider setting it to false if this is an issue
This code depends on "influxdb.java", you may need to add the dependency:
--packages org.influxdb:influxdb-java:2.14
Note currently we need to use version 2.14 as newer versions generate jar conflicts (tested up to Spark 3.3.0)
InfluxDBExtended: provides additional and verbose info on Task execution
use: --conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSinkExtended
InfluxDBSink: the amount of data generated is relatively small in most applications: O(number_of_stages)
InfluxDBSInkExtended can generate a large amount of data O(Number_of_tasks), use with care
Linear Supertypes
SparkListener, SparkListenerInterface, AnyRef, Any
InfluxDBSink: write Spark metrics and application info in near real-time to InfluxDB use this mode to monitor Spark execution workload use for Grafana dashboard and analytics of job execution How to use: attach the InfluxDBSInk to a Spark Context using the extra listener infrastructure. Example: --conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSink
Configuration for InfluxDBSink is handled with Spark conf parameters:
spark.sparkmeasure.influxdbURL (default "http://localhost:8086") spark.sparkmeasure.influxdbUsername (default "", not this can be empty if InfluxDB is configured with no authentication) spark.sparkmeasure.influxdbPassword (default "") spark.sparkmeasure.influxdbName (default "sparkmeasure") spark.sparkmeasure.influxdbStagemetrics, (boolean, default is false) spark.sparkmeasure.influxdbEnableBatch, boolean, default true Note: this is to improve write performance, but it requires to explicitly stopping Spark Session for clean exit: spark.stop() consider setting it to false if this is an issue
This code depends on "influxdb.java", you may need to add the dependency: --packages org.influxdb:influxdb-java:2.14 Note currently we need to use version 2.14 as newer versions generate jar conflicts (tested up to Spark 3.3.0)
InfluxDBExtended: provides additional and verbose info on Task execution use: --conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSinkExtended
InfluxDBSink: the amount of data generated is relatively small in most applications: O(number_of_stages) InfluxDBSInkExtended can generate a large amount of data O(Number_of_tasks), use with care