Extract Data from PDF using REST API in Node.js

Extract Data from PDF using REST API in Node.js
Extracting Data from PDF using REST API in Node.js

We can easily parse PDF documents and extract specific data using a user-defined template on the cloud. We can extract fields and table data from PDF files programmatically. In this article, we will learn how to extract data from PDF using REST API in Node.js.

The following topics shall be covered in this article:

REST API and Node.js SDK to Extract Data from PDF

For parsing PDF documents and extracting data based on a template, we will be using the Node.js SDK of GroupDocs.Parser Cloud API. It also allows parsing of other supported document types and the extraction of text, images, and specific data using a template. Please install it using the following command in the console:

npm install groupdocs-parser-cloud

Please get your Client ID and Secret from the dashboard before following the mentioned steps. Once you have your ID and secret, add in the code as shown below:

Extract Data using JSON based Template File in Node.js

We can extract data from PDF documents using a template by following the simple steps given below:

Upload the Document

Firstly, we will upload the PDF document to the cloud for scraping pdf using the code sample given below:

As a result, the uploaded PDF file will be available in the files section of the dashboard on the cloud.

Extract Data from PDF using JSON based Template File

We can parse the PDF document and extract data using a JSON-based template file by following the steps given below:

  1. Create an instance of the ParseApi.
  2. Provide the uploaded PDF file path.
  3. Set the path to the template JSON file.
  4. Finally, parse the document and extract the data.

The following code sample shows how to extract data according to the template provided in the JSON file using a REST API.

Please find below the template in JSON format.

Extract Data using Template Object in Node.js

We can extract data from a PDF file based on the template defined as an object by following the steps given below:

  1. Create an instance of the ParseApi.
  2. Provide the uploaded PDF file path.
  3. Initialize a Template as an object.
  4. Finally, parse the document and extract the data.

The following code sample shows how to extract data according to the defined template from a PDF document using a REST API. Please follow the steps mentioned earlier to upload the file.

Please find below the template object created according to the PDF document for scraping data from pdf.

Extract Data using Template Object in Node.js
Extract Data using Template Object in Node.js

Parse Document Inside Container using Template in Node.js

We can also parse the PDF document available inside the container and extract data using the template object. Please follow the steps mentioned below to parse the document to extract data from scanned pdf inside a container.

  1. Create an instance of the ParseApi.
  2. Provide the uploaded archive file path.
  3. Initialize a Template as an object.
  4. Provide the container item.
  5. Finally, parse the document and extract the data.

The following code sample shows how to parse a PDF document inside a ZIP archive using a REST API. Please follow the steps mentioned earlier to upload the files and extract info from pdf.

Try Online

Please try the following free online PDF Parsing tool for pdf data extraction online. This pdf content extractor is developed using the above API. https://products.groupdocs.app/parser/pdf

Conclusion

In this article, we have learned how to extract specific data from PDF documents according to the provided template on the cloud. We have also seen how to create a template object or use a template in a JSON format. Now you know how to extract information from pdf using pdf scraper API and free PDF data extractor. Besides, you can learn more about GroupDocs.Parser Cloud API using the documentation. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.

See Also