You may need to extract text or images from your Word documents for various purposes. You can easily parse Word documents and read the text programmatically in your Node.js applications. As a Node.js developer, you can extract all the text and images from DOCX files programmatically on the cloud. This article will be focusing on how to parse Word documents using a REST API in Node.js.
The following topics shall be covered in this article:
- Document Parser REST API and Node.js SDK
- Parse Word Documents and Extract Images using REST API in Node.js
- Extract Text from Word Documents using a REST API
Document Parser REST API and Node.js SDK
For parsing the DOCX documents, I will be using the Node.js SDK of GroupDocs.Parser Cloud API. It allows you to parse data from over 50 document types. It also supports the parsing of containers like ZIP archives, OST/PST mail data files, eBooks, markups, and PDF portfolios in your Node.js applications. You can extract text, images, and parse data by a template using the SDK. It also provides .NET, Java, PHP, Ruby, and Python SDKs as its document parser family members for the Cloud API.
You can install GroupDocs.Parser Cloud to your Node.js application using the following command in the console:
npm install groupdocs-parser-cloud
Please get your Client ID and Secret from the dashboard before following the mentioned steps. Once you have your ID and secret, add in the code as shown below:
Parse Word Documents and Extract Images using REST API in Node.js
You can parse your Word documents and extract images programmatically by following the steps mentioned below:
- Upload the Word file to the Cloud
- Extract Images from Word Documents using Node.js
- Download the extracted images
Upload the Document
Firstly, upload the Word document (DOCX) to the Cloud using the code example given below:
As a result, the uploaded DOCX file will be available in the files section of your dashboard on the cloud.
Extract Images from Word Documents using Node.js
You can easily extract all the images from the Word documents by following the steps mentioned below.
- Create an instance of the ParseApi
- Create an instance of the FileInfo
- Set path to the DOCX file
- Create an instance of the ImageOptions
- Assign FileInfo to the ImageOptions
- Create ImagesRequest
- Get results by calling the ParseApi.images() method
The following code sample shows how to extract images from a DOCX file using a REST API.
Download Extracted Images
The above code sample will save the extracted images on the cloud. You can download these images using the code sample given below:
Extract Text from Word Documents using Node.js
You can easily extract all the text from the Word documents by following the steps mentioned below.
- Create an instance of the ParseApi
- Create an instance of the FileInfo
- Set path to the DOCX file
- Create an instance of the TextOptions
- Assign FileInfo to the TextOptions
- Set the start page number
- Define FormattedTextOptions
- Create TextRequest
- Get results by calling the ParseApi.text() method
The following code sample shows how to extract text from a DOCX file using a REST API.
Try Online
Please try the following free online DOCX Parsing tool, which is developed using the above API. https://products.groupdocs.app/parser/docx
Conclusion
In this article, you have learned how to parse Word documents on the cloud. You have also learned how to extract images and text from DOCX files using a REST API in Node.js. This article also explained how to programmatically upload a DOCX file on the cloud and download the image files from the Cloud. You can learn more about GroupDocs.Parser Cloud API using the documentation. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.