KafkaSink: write Spark metrics and application info in near real-time to Kafka stream
use this mode to monitor Spark execution workload
use for Grafana dashboard and analytics of job execution
How to use: attach the KafkaSink to a Spark Context using the extra listener infrastructure.
Example:
--conf spark.extraListeners=ch.cern.sparkmeasure.KafkaSink
Configuration for KafkaSink is handled with Spark conf parameters:
This code depends on "kafka clients", you may need to add the dependency:
--packages org.apache.kafka:kafka-clients:3.2.1
Output: each message contains the name, it is acknowledged as metrics name as well.
Note: the amount of data generated is relatively small in most applications: O(number_of_stages)
Linear Supertypes
SparkListener, SparkListenerInterface, AnyRef, Any
KafkaSink: write Spark metrics and application info in near real-time to Kafka stream use this mode to monitor Spark execution workload use for Grafana dashboard and analytics of job execution
How to use: attach the KafkaSink to a Spark Context using the extra listener infrastructure. Example: --conf spark.extraListeners=ch.cern.sparkmeasure.KafkaSink
Configuration for KafkaSink is handled with Spark conf parameters:
spark.sparkmeasure.kafkaBroker = Kafka broker endpoint URL example: --conf spark.sparkmeasure.kafkaBroker=kafka.your-site.com:9092 spark.sparkmeasure.kafkaTopic = Kafka topic example: --conf spark.sparkmeasure.kafkaTopic=sparkmeasure-stageinfo
This code depends on "kafka clients", you may need to add the dependency: --packages org.apache.kafka:kafka-clients:3.2.1
Output: each message contains the name, it is acknowledged as metrics name as well. Note: the amount of data generated is relatively small in most applications: O(number_of_stages)