Parse Word Documents using REST API in Node.js

You may need to extract text or images from your Word documents for various purposes. You can easily parse Word documents and read the text programmatically in your Node.js applications. As a Node.js developer, you can extract all the text and images from DOCX files programmatically on the cloud. This article will be focusing on how to parse Word documents using a REST API in Node.js.

The following topics shall be covered in this article:

Document Parser REST API and Node.js SDK

For parsing the DOCX documents, I will be using the Node.js SDK of GroupDocs.Parser Cloud API. It allows you to parse data from over 50 document types. It also supports the parsing of containers like ZIP archives, OST/PST mail data files, eBooks, markups, and PDF portfolios in your Node.js applications. You can extract text, images, and parse data by a template using the SDK. It also provides .NET, Java, PHP, Ruby, and Python SDKs as its document parser family members for the Cloud API.

You can install GroupDocs.Parser Cloud to your Node.js application using the following command in the console:

npm install groupdocs-parser-cloud 

Please get your Client ID and Secret from the dashboard before following the mentioned steps. Once you have your ID and secret, add in the code as shown below:

Parse Word Documents and Extract Images using REST API in Node.js

You can parse your Word documents and extract images programmatically by following the steps mentioned below:

Upload the Document

Firstly, upload the Word document (DOCX) to the Cloud using the code example given below:

As a result, the uploaded DOCX file will be available in the files section of your dashboard on the cloud.

Extract Images from Word Documents using Node.js

You can easily extract all the images from the Word documents by following the steps mentioned below.

  • Create an instance of the ParseApi
  • Create an instance of the FileInfo
  • Set path to the DOCX file
  • Create an instance of the ImageOptions
  • Assign FileInfo to the ImageOptions
  • Create ImagesRequest
  • Get results by calling the ParseApi.images() method

The following code sample shows how to extract images from a DOCX file using a REST API.

Extract Images from Word Documents using Node.js

Extract Images from Word Documents using Node.js

Download Extracted Images

The above code sample will save the extracted images on the cloud. You can download these images using the code sample given below:

Extract Text from Word Documents using Node.js

You can easily extract all the text from the Word documents by following the steps mentioned below.

  • Create an instance of the ParseApi
  • Create an instance of the FileInfo
  • Set path to the DOCX file
  • Create an instance of the TextOptions
  • Assign FileInfo to the TextOptions
  • Set the start page number
  • Define FormattedTextOptions
  • Create TextRequest
  • Get results by calling the ParseApi.text() method

The following code sample shows how to extract text from a DOCX file using a REST API.

Extract Text from Word Documents using Node.js

Extract Text from Word Documents using Node.js

Try Online

Please try the following free online DOCX Parsing tool, which is developed using the above API. https://products.groupdocs.app/parser/docx

Conclusion

In this article, you have learned how to parse Word documents on the cloud. You have also learned how to extract images and text from DOCX files using a REST API in Node.js. This article also explained how to programmatically upload a DOCX file on the cloud and download the image files from the Cloud. You can learn more about GroupDocs.Parser Cloud API using the documentation. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.

See Also