What is Connect FilePulse ?
What is it?
Connect FilePulse is a polyvalent, scalable and reliable, Apache Kafka Connect plugin that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka™.
About ConnectKafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. (source: Apache documentation).
Connect FilePulse provides a set of built-in features for streaming local files into Kafka. This includes, among other things:
- Support for recursive scanning of local directories.
- Reading and writing files into Kafka line by line.
- Support multiple input file formats (e.g: CSV, Avro, XML).
- Parsing and transforming data using built-in or custom processing filters.
- Error handler definition
- Monitoring files while they are being written into Kafka
- Support plugeable strategies to cleanup up completed files
Why do I want it?
Connect FilePulse helps you streams local files into Apache Kafka.
What is it good for?: Connect FilePulse lets you define complex pipelines to transform and structure your data before integration into Kafka.
What is it not good for?: Connect FilePulse is not attented to be used for streaming files from a remote storage (AWS S3, HDFS, etc).
Where should I go next?
Give your users next steps from the Overview. For example: