In organizations, data is frequently exported, shared and integrated from legacy systems through the use of files in a wide variety of formats (e.g. CSV, XML, JSON, AVRO, etc.). Dealing with all of these formats can quickly become a real challenge for enterprise that usually end up with a complex and hard to maintain data integration mess.

A modern approach consists in building a scalable data streaming platform as a central nervous system to decouple applications from each other. Apache Kafka™ is one of the most widely used technologies to build such a system. The Apache Kafka project packs with Kafka Connect a distributed, fault tolerant and scalable framework for connecting Kafka with external systems.

The Connect File Pulse project aims to provide an easy-to-use solution, based on Kafka Connect, for streaming any type of data file with the Apache Kafka™ platform.

Connect File Pulse is inspired by the features provided by Elasticsearch and Logstash.

Connect FilePulse Features Overview

Easy to use
Connect FilePulse is based on the Apache Kafka Connect framework and packaged as standard connector source plugin that you can easily installed using the tool such as Confluent Hub CLI.
Connect FilePulse allows you to stream files in various formats, e.g. : CSV, JSON, Avro, XML, etc, across different storage systems directlty into Apache Kafka.
Powerful Processing Filters
Connect FilePulse packs with a rich collection of processing filters that you can leverage to define complex transformation pipelines for transforming and structuring your data.
Since the first release, Connect FilePulse was designed to be extensible so that you can easily develop new capabilities to match your project needs.
Open source
The project is released under the Apache License 2.0. Anyone can contribute to the Connect File Pulse project by opening an issue or a pull request.

