Software Alternatives & Reviews
Table of contents
  1. Videos
  2. Social Mentions
  3. Comments

Apache Tika

Apache Tika toolkit detects and extracts metadata and text from different file types. subtitle

Apache Tika Reviews and details

Screenshots and images

  • Apache Tika Landing page
    Landing page //
    2019-06-07

Badges

Promote Apache Tika. You can add any of these badges on your website.
SaaSHub badge
Show embed code

Videos

Evaluating Text Extraction: Apache Tika's™ New Tika-Eval Module - Tim Allison, The MITRE Corporation

Lightning talk - Broadway + Sqs + Apache Tika - Dave Lee - ElixirConf EU 2019

Social recommendations and mentions

We have tracked the following product recommendations or mentions on various public social media platforms and blogs. They can help you see what people think about Apache Tika and what they use it for.
  • Reading SEC filings using LLMs
    Apache Tika has worked well for me in the past, ended up running it on an AWS Lambda https://tika.apache.org/. - Source: Hacker News / 9 months ago
  • Demystifying Text Data with the Unstructured Python Library
    If you accept running Java, the Apache Tika is extremely good at parsing content (https://tika.apache.org/). - Source: Hacker News / 10 months ago
  • How do you manage and find large amount of files?
    Apache Tika can spit out text from lots of formats. I've used it with grep (or rg) to make a small scale searching of local folders. Tika does a really good job at OCR for finding if text is in a file. Source: about 1 year ago
  • 40 Containers & Counting...
    Https://tika.apache.org Meta data from things. Source: about 1 year ago
  • Document Parsing - an unsolved problem?
    At my previous job we had the same problem which we solved by using Tika. We called it on the server along with other stuff, but there is also a Python binding. Source: almost 2 years ago
  • Hey y'all back again w/ the personal, self-hosted search engine
    For document content I've heard good things about Apache Tika. Spyglass could leverage it via the rest api. Source: about 2 years ago
  • Tool for locating keywords in 16,000+ PDFs
    How about you batch convert it to text with Tika and then run Python (or even grep or awk) on it? Source: about 2 years ago
  • Fun with File Formats
    There is also Apache Tika (https://tika.apache.org/) - file format detection & content extraction library. - Source: Hacker News / over 2 years ago
  • Selfhosted File Management Solution? - tags, searching, etc
    I installed FileRun recently and that might get you close. It's fast and the search is pretty good as it can integrate Apache Tika, I like the OnlyOffice integration as well. It's closed-source, which isn't great for me, but you get 3 accounts without having to pay. Source: over 2 years ago
  • Encoding detection
    Any native or FFI callable thing like the java tools such as Apache Tika? A quick duckduckgo search didn't turn up anything for me. Tika has served me well in the past, but I have no idea what I'd use with CL. Source: over 2 years ago
  • APSE – A Personal Search Engine
    This is just a simple bash script and Apache Tika (https://tika.apache.org/). You could script this together in minutes. Try https://github.com/flameshot-org/flameshot and feed the results through Tika to OCR the results. - Source: Hacker News / almost 3 years ago
  • Looking for a tool.
    For the first step you can check out Tika: https://tika.apache.org/. Source: almost 3 years ago
  • Any file indexer you can recommend?
    'FindTextInDocuments' uses 'Apache's Tika' and conversion of Pdf documents to text takes longer than any other. Source: about 3 years ago
  • How to Build Java Applications Today: April 5, 2021
    I use Apache software every day: Mostly Commons, but also POI, PDFBox, and Tika. They were pioneers for enterprise-friendly open-source libraries at a time when the GPL stroke fear into the hearts of development managers everywhere. - Source: dev.to / about 3 years ago
  • Processing Fixed Width and Complex Files
    Https://tika.apache.org/ - Apache Tika can be integrated as a custom processor or called via REST and run as a seperate server/service. - Source: dev.to / about 3 years ago

Do you know an article comparing Apache Tika to other products?
Suggest a link to a post with product alternatives.

Suggest an article

Generic Apache Tika discussion

Log in or Post with

This is an informative page about Apache Tika. You can review and discuss the product here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.