We can easily parse PDF documents and extract specific data using a user-defined template on the cloud. We can extract fields and table data from PDF files programmatically. In this article, we will learn how to extract data from PDF using REST API in Node.js.
The following topics shall be covered in this article:
- REST API and Node.js SDK to Extract Data from PDF
- Extract Data using JSON based Template File in Node.js
- Extract Information From PDF using Template Object in Node.js
- Parse Document Inside Container using Template in Node.js
REST API and Node.js SDK to Extract Data from PDF
For parsing PDF documents and extracting data based on a template, we will be using the Node.js SDK of GroupDocs.Parser Cloud API. It also allows parsing of other supported document types and the extraction of text, images, and extract information from PDF using a template. Please install it using the following command in the console:
npm install groupdocs-parser-cloud
Please get your Client ID and Secret from the dashboard before following the mentioned steps. Once you have your ID and secret, add in the code as shown below:
Extract Data using JSON based Template File in Node.js
We can extract data from PDF documents using a template by following the simple steps given below:
- Upload the PDF file to the cloud
- Extract Data from PDF using JSON based Template File
Upload the Document
Firstly, we will upload the PDF document to the cloud for scraping pdf using the code sample given below:
As a result, the uploaded PDF file will be available in the files section of the dashboard on the cloud.
Extract Data from PDF using JSON based Template File
We can parse the PDF document and extract data using a JSON-based template file by following the steps given below:
- Create an instance of the ParseApi.
- Provide the uploaded PDF file path.
- Set the path to the template JSON file.
- Finally, parse the document and extract the data.
The following code sample shows how to extract data according to the template provided in the JSON file using a REST API.
Please find below the template in JSON format.
Extract Information From PDF using Template Object in Node.js
We can extract data from a PDF file based on the template defined as an object by following the steps given below:
- Create an instance of the ParseApi.
- Provide the uploaded PDF file path.
- Initialize a Template as an object.
- Finally, parse the document and extract the data.
The following code sample shows how to extract data according to the defined template from a PDF document using a REST API. Please follow the steps mentioned earlier to upload the file.
Please find below the template object created according to the PDF document for scraping data from pdf.
Parse Document Inside Container using Template in Node.js
We can also parse the PDF document available inside the container and extract data using the template object. Please follow the steps mentioned below to parse the document to extract data from scanned pdf inside a container.
- Create an instance of the ParseApi.
- Provide the uploaded archive file path.
- Initialize a Template as an object.
- Provide the container item.
- Finally, parse the document and extract the data.
The following code sample shows how to parse a PDF document inside a ZIP archive using a REST API. Please follow the steps mentioned earlier to upload the files and extract info from pdf.
Try Online
Please try the following free online PDF Parsing tool for pdf data extraction online. This pdf content extractor is developed using the above API. https://products.groupdocs.app/parser/pdf
Conclusion
In this article, we have learned how to extract data from PDF documents according to the provided template on the cloud .We have also seen how to create a template object or use a template in a JSON format. Now you know how to extract information from pdf using pdf scraper API and free PDF data extractor. Besides, you can learn more about GroupDocs.Parser Cloud API using the documentation. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.