Configuration

The common configurations for deploying a File Pulse connector.

Commons configuration

Whatever the kind of files you are processing a connector should always be configured with the below properties. These configurations are described in detail in subsequent chapters.

Common Kafka Connect properties

ConfigurationDescriptionTypeDefaultImportance
topicThe default output topic to writestring-high
tasks.maxThe maximum number of tasks that should be created for this connector.string-high

Properties for listing and cleaning object files (FileSystemListing)

ConfigurationDescriptionTypeDefaultImportance
fs.listing.classClass which is used to list eligible files from the scanned file system.class-MEDIUM
fs.listing.filtersFilters use to list eligible input fileslist-MEDIUM
fs.listing.interval.msTime interval (in milliseconds) at wish to scan input directorylong10000HIGH
fs.cleanup.policy.classThe fully qualified name of the class which is used to cleanup filesclass-HIGH
max.scheduled.filesMaximum number of files that can be schedules to tasks.long1000HIGH

Properties for transforming object file record(Filters Chain Definition)

ConfigurationDescriptionTypeDefaultImportance
filtersList of filters aliases to apply on each data (order is important)list-MEDIUM

Properties for reading object file record(FileReaders)

ConfigurationDescriptionTypeDefaultImportance
tasks.reader.classThe fully qualified name of the class which is used by tasks to read input filesclass-HIGH

Properties for uniquely identifying object files and records (FileReaders)

ConfigurationDescriptionTypeDefaultImportance
offset.policy.classClass which is used to determine the source partition and offset that uniquely identify a input recordclassio.streamthoughts.kafka.connect.filepulse.offset.DefaultSourceOffsetPolicyHIGH

Properties for synchronizing Connector and Tasks

ConfigurationDescriptionTypeDefaultImportance
tasks.file.status.storage.classThe FileObjectStateBackingStore class to be used for storing status state of file objects.Classio.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStoreHIGH

Available implementations are :

  • io.streamthoughts.kafka.connect.filepulse.state.InMemoryFileObjectStateBackingStore
  • io.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStore

Properties for configuring the KafkaFileObjectStateBackingStore class

ConfigurationDescriptionTypeDefaultImportance
tasks.file.status.storage.topicName of the internal topic used by tasks and connector to report and monitor file progression.classconnect-file-pulse-statusHIGH
tasks.file.status.storage.bootstrap.serversA list of host/port pairs uses by the reporter for establishing the initial connection to the Kafka cluster.string-HIGH
tasks.file.status.storage.topic.partitionsThe number of partitions to be used for the status storage topic.int-LOW
tasks.file.status.storage.topic.replication.factorThe replication factor to be used for the status storage topic.float-LOW

Examples

Some configuration examples are available here.