Extract Text from PDF using REST API in Node.js

Extract Text from PDF using REST API in Node.js

You can easily parse your PDF documents and extract all the text programmatically on the cloud. In this article, you will learn how to extract text from PDF documents using a REST API in Node.js.

The following topics shall be covered in this article:

PDF Parser REST API and Node.js SDK to Extract Text

For parsing the PDF documents, I will be using the Node.js SDK of GroupDocs.Parser Cloud API. It allows you to parse data from over 50 types of supported document formats. It also supports the parsing of containers like ZIP archives, OST/PST mail data files, e-books, markups, and PDF portfolios in your Node.js applications. You can extract text, images, and parse data by a template using the SDK. It also provides .NET, Java, PHP, Ruby, and Python SDKs as its document parser family members for the Cloud API.

You can install GroupDocs.Parser Cloud to your Node.js application using the following command in the console:

npm install groupdocs-parser-cloud

Please get your Client ID and Secret from the dashboard before following the mentioned steps. Once you have your ID and secret, add in the code as shown below:

Extract Text from PDF using a REST API in Node.js

You can extract text from PDF documents by following the simple steps mentioned below:

Upload the Document

Firstly, upload the PDF document to the Cloud using the code example given below:

As a result, the uploaded PDF file will be available in the files section of your dashboard on the cloud.

Extract Text from PDF Documents using Node.js

You can easily extract all the text from the PDF documents programmatically by following the steps given below:

  • Create an instance of the ParseApi.
  • Create an instance of the FileInfo.
  • Then, set path to the PDF file.
  • Create an instance of the TextOptions.
  • Then, assign FileInfo to the TextOptions.
  • Now, create an instance of the TextRequest with TextOptions.
  • Finally, get results by calling the ParseApi.text() method with the TextRequest.

The following code sample shows how to extract all the text from a PDF document using a REST API in Node.js.

Extract Text from PDF using a REST API in Node.js
Extract Text from PDF using a REST API in Node.js

Get Text by Page Numbers from PDF Documents using Node.js

You can extract the text from specific pages of a PDF file programmatically by following the steps given below:

  • Create an instance of the ParseApi.
  • Create an instance of the FileInfo.
  • Then, set path to the PDF file.
  • Create an instance of the TextOptions.
  • Then, assign FileInfo to the TextOptions.
  • Set the start page number and the total number of pages to extract.
  • Now, create an instance of the TextRequest with TextOptions.
  • Finally, get results by calling the ParseApi.text() method with the TextRequest.

The following code sample shows how to extract the text by page numbers from a PDF document using a REST API.

Get Text by Page Numbers from PDF Documents using Node.js
Get Text by Page Numbers from PDF Documents using Node.js

Extract Text from Documents Attached with PDF using Node.js

You can extract the text from a document inside a container, available as an attachment in a PDF file programmatically, by following the steps mentioned below.

  • Create an instance of the ParseApi.
  • Create an instance of the FileInfo.
  • Then, set path to the PDF file.
  • Optionally, provide the file password.
  • Now, create an instance of the ContainerItemInfo
  • Then, set the relative path for the attached file
  • Create an instance of the TextOptions.
  • Then, assign FileInfo and ContainerItemInfo to the TextOptions.
  • Now, create an instance of the TextRequest with TextOptions
  • Finally, get results by calling the ParseApi.text() method with the TextRequest

The following code sample shows how to extract the text from a document inside a PDF document using a REST API.

Extract Text from Documents Attached with PDF using Node.js
Extract Text from Documents Attached with PDF using Node.js

Try Online

Please try the following free online PDF Parsing tool, which is developed using the above API. https://products.groupdocs.app/parser/pdf

Conclusion

In this article, you have learned how to parse PDF documents on the cloud. Moreover, you have seen how to extract text by page numbers and from container items of PDF files using a REST API in Node.js. This article also explained how to programmatically upload a PDF file to the cloud. Besides, you can learn more about GroupDocs.Parser Cloud API using the documentation. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.

See Also