File Readers

Learn how to configure Connect FilePulse for a specific file format.

The FilePulseSourceTask uses the FileInputReader. configured in the connector’s configuration for reading object files (i.e., tasks.reader.class).

Currently, Connect FilePulse provides the following FileInputReader implementations :

Amazon S3

package: io.streamthoughts.kafka.connect.filepulse.fs.reader

  • AmazonS3AvroFileInputReader
  • AmazonS3BytesArrayInputReader
  • AmazonS3RowFileInputReader
  • AmazonS3XMLFileInputReader
  • AmazonS3MetadataFileInputReader

Azure Blob Storage

package: io.streamthoughts.kafka.connect.filepulse.fs.reader

  • AzureBlobStorageAvroFileInputReader
  • AzureBlobStorageBytesArrayInputReader
  • AzureBlobStorageRowFileInputReader
  • AzureBlobStorageXMLFileInputReader
  • AzureBlobStorageMetadataFileInputReader

Google Cloud Storage

package: io.streamthoughts.kafka.connect.filepulse.fs.reader

  • GcsAvroFileInputReader
  • GcsBytesArrayInputReader
  • GcsRowFileInputReader
  • GcsXMLFileInputReader
  • GcsMetadataFileInputReader

Local Filesystem

package: io.streamthoughts.kafka.connect.filepulse.fs.reader

  • LocalAvroFileInputReader
  • LocalBytesArrayInputReader
  • LocalRowFileInputReader
  • LocalXMLFileInputReader
  • LocalMetadataFileInputReader

RowFileInputReader (default)

The <PREFIX>RowFileInputReaders can be used to read files line by line. This reader creates one record per row. It should be used for reading delimited text files, application log files, etc.

Configuration

ConfigurationDescriptionTypeDefaultImportance
file.encodingThe text file encoding to useStringUTF_8High
buffer.initial.bytes.sizeThe initial buffer size used to read input files.String4096Medium
min.read.recordsThe minimum number of records to read from file before returning to task.Integer1Medium
skip.headersThe number of rows to be skipped in the beginning of file.Integer0Medium
skip.footersThe number of rows to be skipped at the end of file.Integer0Medium
read.max.wait.msThe maximum time to wait in milliseconds for more bytes after hitting end of file.Long0Medium

XxxBytesArrayInputReader

The <PREFIX>BytesArrayInputReaders create a single byte array record from a source file.

XxxAvroFileInputReader

The <PREFIX>AvroFileInputReaders can be used to read Avro files.

XxxXMLFileInputReader

The <PREFIX>XMLFileInputReaders can be used to read XML files.

Configuration

ConfigurationSinceDescriptionTypeDefaultImportance
reader.xpath.expressionThe XPath expression used extract data from XML input filesString/High
reader.xpath.result.typeThe expected result type for the XPath expression in [NODESET, STRING]StringNODESETHigh
reader.xml.force.array.on.fieldsThe comma-separated list of fields for which an array-type must be forcedList-High
reader.xml.parser.validating.enabled2.2.0Specifies that the parser will validate documents as they are parsed.booleanfalseLow
reader.xml.parser.namespace.aware.enabled2.2.0Specifies that the XML parser will provide support for XML namespaces.booleanfalseLow
reader.xml.exclude.empty.elements2.2.0Specifies that the reader should exclude element having no field.booleanfalseLow
reader.xml.exclude.node.attributes2.4.0Specifies that the reader should exclude all node attributes.booleanfalseLow
reader.xml.exclude.node.attributes.in.namespaces2.4.0Specifies that the reader should only exclude node attributes in the defined list of namespaces.listfalseLow
reader.xml.data.type.inference.enabled2.3.0Specifies that the reader should try to infer the type of data nodes.booleanfalseHigh
reader.xml.attribute.prefix2.4.0If set, the name of attributes will be prepended with the specified prefix when they are added to a record.string""Low
reader.xml.content.field.name2.5.4Specifies the name to be used for naming the field that will contain the value of a TextNode element having attributes.stringvalueLow
reader.xml.field.name.characters.regex.pattern2.5.4Specifies the regex pattern to use for matching the characters in XML element name to replace when converting a document to a struct.string[.\-]'Low
reader.xml.field.name.characters.string.replacement2.5.4Specifies the replacement string to be used when converting a document to a struct.string_Low
reader.xml.force.content.field.for.paths2.5.4The comma-separated list of field for which a content-field must be forced.List-Low

XxxMetadataFileInputReader

The FileInputMetadataReaders can be used to send a single record per file containing metadata, i.e.: name, path, hash, lastModified, size, etc.

Last modified March 2, 2022: docs: fix missing archive (ffdb9ab7)