The common configurations for deploying a File Pulse connector.

Commons configuration

Whatever the kind of files you are processing a connector should always be configured with the below properties. These configurations are described in detail in subsequent chapters.

Common Kafka Connect properties

topicThe default output topic to writestring-high
tasks.maxThe maximum number of tasks that should be created for this connector.string-high

Properties for listing and cleaning object files (FileSystemListing)

fs.listing.classClass which is used to list eligible files from the scanned file system.class-MEDIUM
fs.listing.filtersFilters use to list eligible input fileslist-MEDIUM
fs.listing.interval.msTime interval (in milliseconds) at wish to scan input directorylong10000HIGH
fs.cleanup.policy.classThe fully qualified name of the class which is used to cleanup filesclass-HIGH
max.scheduled.filesMaximum number of files that can be schedules to tasks.long1000HIGH

Properties for transforming object file record(Filters Chain Definition)

filtersList of filters aliases to apply on each data (order is important)list-MEDIUM

Properties for reading object file record(FileReaders)

tasks.reader.classThe fully qualified name of the class which is used by tasks to read input filesclass-HIGH

Properties for uniquely identifying object files and records (FileReaders)

offset.policy.classClass which is used to determine the source partition and offset that uniquely identify a input recordclassio.streamthoughts.kafka.connect.filepulse.offset.DefaultSourceOffsetPolicyHIGH

Properties for synchronizing Connector and Tasks

ConfigurationDescriptionTypeDefaultImportance FileObjectStateBackingStore class to be used for storing status state of file objects.Classio.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStoreHIGH

Available implementations are :

  • io.streamthoughts.kafka.connect.filepulse.state.InMemoryFileObjectStateBackingStore
  • io.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStore

Properties for configuring the KafkaFileObjectStateBackingStore class

ConfigurationDescriptionTypeDefaultImportance of the internal topic used by tasks and connector to report and monitor file progression.classconnect-file-pulse-statusHIGH list of host/port pairs uses by the reporter for establishing the initial connection to the Kafka cluster.string-HIGH number of partitions to be used for the status storage replication factor to be used for the status storage topic.float-LOW


Some configuration examples are available here.