pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Full list of issues covering all changes in this release: Add the new remove duplicates api. Add the new extract text API. The pivot filter could not be created successfully.
When a free PDF editor has the best OCR software equipped, it can scan and convert paper documents into ... as well as tables imported from Microsoft Excel. The aim was to push each software ...