People who work frequently and intensively with PDF files have asked the following question themselves: how can I retrieve the document’s information in the simplest way to work with it?
Thanks to the PDF extractor developed at the Know-Center in the framework of the EU-project CODE it has become reality what has not been possible so far. The tool allows to extract structural data such as charts, headlines or graphic elements from PDFs and to provide them for further analyses or visualizations. Additionally, PDFs are categorized automatically in an hierarchic way to simplify the work with the document’s content. Further, the extracted and structured data from PDFs can be saved in the LOD cloud in order to be available as usable information.
The PDF Extractor can be adapted as a tool to different sectors and areas. On the basis of semantic conditions of the respective requirement the system learns semi-automated to deal with the characteristics of the PDFs.
The best example for this is Mendeley London where the PDF extractor already works successfully: http://blog.mendeley.com/progress-update/desktop-contents-tables-and-figures/