Motivation
In organizations, data is frequently exported, shared and integrated from legacy systems through the use of files in a wide variety of formats (e.g. CSV, XML, JSON, AVRO, etc.). Dealing with all of these formats can quickly become a real challenge for enterprise that usually end up with a complex and hard to maintain data integration mess.
A modern approach consists in building a scalable data streaming platform as a central nervous system to decouple applications from each other. Apache Kafka™ is one of the most widely used technologies to build such a system. The Apache Kafka project packs with Kafka Connect a distributed, fault tolerant and scalable framework for connecting Kafka with external systems.
The Connect File Pulse project aims to provide an easy-to-use solution, based on Kafka Connect,
for streaming any type of data file with the Apache Kafka™ platform.
Connect File Pulse is inspired by the features provided by Elasticsearch and Logstash.