As for self-hosted web apps, Tabula (https://tabula.technology) is a great tool to extract tables from PDF files. - Source: Hacker News / 4 months ago
For extracting to tables I've been using http://tabula.technology/ for a couple of years. It seems to do a pretty good job even with some fairly complex tables and I've not had any problems with it. - Source: Hacker News / 6 months ago
To extract tables from PDFs, you can use the following tools: 1. Tabula (https://tabula.technology): a free and open-source tool. 2. Parsio (https://parsio.io): uses pre-trained AI models for data extraction from PDFs, emails, and other formats. 3. Airparser (https://airparser.com): uses GPT approach similar to ChatGPT for data extraction from PDFs, emails, and other formats. - Source: Hacker News / 8 months ago
You might want to look at https://tabula.technology. Source: 10 months ago
Seconding the recommendation for Tabula. It's a great tool, and is free and open source. Source: 11 months ago
I use Tabula to convert PDF tables to excel. It needs a little manual work but works decently well for me to convert all of my PDF statements to a spreadsheet. Been using it for at least 6-8 years. Source: 12 months ago
Https://tabula.technology/ will usually do a pretty job of extracting data from PDF tables, as long as the PDF has actual text in it. Source: 12 months ago
I came here to say the same, if the data are in tables, tabula works pretty well. https://tabula.technology/. Source: about 1 year ago
I found a tool called Tabula that's a local download that allows you to import a PDF and then visually select the tables on the PDF that you want to extract. Source: about 1 year ago
There are several tools available today that can help you extract tables from PDF files (such as Tabula), or even parse PDFs into structured JSON using AI (like Parsio -> I'm the founder) or without AI (like Docparser). Source: about 1 year ago
Also look into Tabula for dealing with tables in PDFs. Talk to your IT security folks, you might need to see about getting a standalone machine depending on their policies towards software. Source: about 1 year ago
It's old, and sometimes things don't come out right, but this is one way out of this mess. https://tabula.technology If that doesn't do it, there's always the brute-force option of scripting in your language of choice to pull the data out. - Source: Hacker News / about 1 year ago
The most widely used tool for this in my line of work is Tabula. Source: about 1 year ago
Tabula is old but works ok, depending on the PDF. This is a common, if misguided, objection from agencies. They're misinterpreting the statute — if you give me an excel file and I change it, I've in no way compromised the original record (i.e. The one you have — the original). I've compromised my copy of it. Source: over 1 year ago
I would add Tabula [0] to this list. I’ve used it to extract tabular data from pdfs, especially in acquiring Covid data at the height of the pandemic. It’s got an MIT license and does extraction of table data from pdfs really well. 0: https://tabula.technology/. - Source: Hacker News / over 1 year ago
/u/TJF0617 this can be helpful if you can go this route: https://tabula.technology/. Source: over 1 year ago
If you use tabula or a similar tool the data will structure as you want. https://tabula.technology/ Alternatively Text to Columns by first number from the left, and then by space character from the right. Source: over 1 year ago
Additionally, did you try to scan the prints and do some OCR on it, and convert it to Excel from there on? I used to get some decent results with https://tabula.technology/ . But might need some cleanup afterwards. Source: almost 2 years ago
You may still have to do a lot of manual work, but a good option if your pdf contains the tables directly is Tabula. You can also use its Python implementation TabulaPy. Source: almost 2 years ago
If your order forms are in pdf format, you could use a tool like tabula (https://tabula.technology/) to extract the data from the pdf and then import it into google sheets. Source: almost 2 years ago
There are a number of software options that can do extraction of PDF tables to CSV format, with widely varying results. Tabula works okay, there is also pdftotext. Source: about 2 years ago
Do you know an article comparing Tabula to other products?
Suggest a link to a post with product alternatives.
This is an informative page about Tabula. You can review and discuss the product here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.