2 Home
gitea_admin edited this page 2026-03-11 14:42:49 +00:00

Tika

Parse documents and extract metadata and text using Apache Tika.

Metadata

Property Value
Scheme tika
Support Level Stable
Labels document,transformation
Version 4.10.2

Maven Dependency

<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-tika</artifactId>
    <version>4.10.2</version>
</dependency>

Endpoint Properties

Name Type Required Default Description
operation object Operation type
tikaParseOutputEncoding string Tika Parse Output Encoding
tikaParseOutputFormat object xml Tika Output Format. Supported output formats. xml: Returns Parsed Content as XML. html: Returns Parsed Content as HTML. text: Returns Parsed Content as Text. textMain: Uses the boilerpipe library to automatically extract the main content from a web page.
lazyStartProducer boolean false Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail during starting and cause the route to fail being started. By deferring this startup to be lazy then the startup failure can be handled during routing messages via Camel's routing error handlers. Beware that when the first message is processed then creating and starting the producer may take a little time and prolong the total processing time of the processing.
tikaConfig object Tika Config
tikaConfigUri string Tika Config Url