Identifying Files

Learn how Kafka Connect FilePulse uniquely identifies files.

Kafka Connect FilePulse uses a pluggable interface called SourceOffsetPolicy for uniquely identifying files. Basically, the implementation passed in the connector’s configuration is used for computing a unique identifier which is used by Kafka Connect to persist the position of the connector for each file (i.e., the offsets saved in the connect-offsets topic).

By default, Kafka Connect FilePulse use the default implementation DefaultSourceOffsetPolicy which accepts the following configuration:

ConfigurationDescriptionTypeDefaultImportance
offset.attributes.stringA separated list of attributes, using ‘+’ character as separator, to be used for uniquely identifying an object file; must be one of [name, path, lastModified, inode, hash, uri] (e.g: name+hash). Note that order doesn’t matter.stringpath+nameHIGH