Extract xml from pdf form

Sign up or log in to customize your list. Extract xml from pdf form Stack Overflow to learn, share knowledge, and build your career.


I have a pdf file including form fields and need to export the data into a xml file AUTOMATICALLY. Export Form Data and finally chose xml extension for file output. However, I need to automate it, e. Java implementation or some command line tools.

Any ideas which libraries or tools I could use to export form field data to xml? The tool or library should be open source, that I can integrate it in my workflow. EDIT: Feel free to download the sample.

It is open source and could fit your needs, since the website says “Extract forms data from PDF forms or prefill a PDF form. I tried extracting all form fields via command line and it works. I will work on the Java source code example tomorrow, but from what I see it’s exactly what I was looking for. I’m glad it helped a little bit.

I forgot to say that the jdom library might be a great way to go for converting objects to xml. In Java there is a few libraries to work with PDF, but generally it’s hard to get formatted information from PDF.

I have never implemented that thing, but Qoppa looks good and seems to be advanced but it’s not free. It contains jPDFFields which should be useful to extract values from form fields.

Also there is a similar thread, in which there is some information about the command line tool. I hope it will be helpful for you. Thanks for taking the time. Actually, I was looking for an open source library or tool.

Sorry I did not mentioned it, yet. The jPDFFields would do the job.

I would do it on a Windows machine ? Or you can, as I added in an edit, look at how less itself does it and try to port that code to Windows. Or you can install VMWare, spin up a VM, have the VM do it, and get the result back. Or you can spin up an EC2 instance, have the EC2 instance do it, and return the result.

scroll to top