Extract Text from PowerPoint in Node.js -Text Extraction API

Extract Text from PowerPoint in Node.js

In certain scenarios, the formatted text is extracted from the documents for further processing such as in text analysis, classification, etc. Among other file formats such as PDF and Word, PowerPoint Presentation is also used in text extraction. Therefore, this article demonstrates how to extract text from PowerPoint PPT/PPTX in Node.js. You can easily parse your PowerPoint PPT/PPTX presentations and text from a specific slide or extract all the text programmatically on the cloud using this Text extraction API.

The following topics will be discussed in this article:

Node.js Library to Extract Text from PowerPoint PPT

For parsing the PowerPoint documents, I will be using the PowerPoint editing software SDK for Node.js of GroupDocs.Parser Cloud API. It allows you to parse data from over 50 types of supported document formats. It also supports the parsing of containers like ZIP archives, OST mail data files, e-books, markups, PowerPoint and PDF portfolios in your Node.js applications. You can extract text, images, and parse data by a template using the SDK. It also provides .NET, Java, PHP, Ruby, and Python SDKs as its document parser family members for the Cloud API.

You can install GroupDocs.Parser Cloud to your Node.js application using the following command in the console:

npm install groupdocs-parser-cloud

Please get your Client ID and Secret from the dashboard before following the mentioned steps. Once you have your ID and secret, add in the code as shown below:

Extract All Text from PowerPoint PPT/PPTX in Node.js using REST API

You can extract text from PowerPoint Presentations by following the simple steps mentioned below:

  • Upload the PowerPoint file to the Cloud using this Text extraction API
  • Extract Text from PowerPoint presentation using Node.js

Upload the File

Firstly, upload the PowerPoint document to the Cloud using the code example given below:

As a result, the uploaded PowerPoint file will be available in the files section of your dashboard on the cloud.

Extract Text from PowerPoint Presentation using Node.js

You can easily extract all the text from the PowerPoint files using this PowerPoint editing software.

The steps are given below:

  • Firstly, create an instance of the ParseApi.
  • Secondly, create an instance of the FileInfo.
  • Then, set path to the PowerPoint file.
  • Create an instance of the TextOptions.
  • Then, assign FileInfo to the TextOptions.
  • Create an instance of the FormattedTextOptions.
  • Set formattedTextOptions mode as PlainText
  • Next, assign formattedTextOptions value
  • Now, create an instance of the TextRequest with TextOptions.
  • Finally, get results by calling the ParseApi.text() method with the TextRequest.

The following code sample shows how to extract all the text from PowerPoint file using a REST API in Node.js:

How to Extract Text from PowerPoint Presentation using Node.js

Extract Text from PowerPoint Presentation using Node.js

Extract Text from PowerPoint PPT by Page Number Range in Node.js

You can extract the text from specific pages of a PDF file programmatically by following the steps given below:

  • Firstly, Create an instance of the ParseApi.
  • Next, create an instance of the FileInfo.
  • Then, set path to the PowerPoint PPTX file.
  • Create an instance of the TextOptions.
  • Then, assign FileInfo to the TextOptions.
  • Set startPageNumber and countPagesToExtract values
  • Create an instance of the FormattedTextOptions.
  • Set formattedTextOptions mode as PlainText
  • Next, assign formattedTextOptions value.
  • Set the start page number and the total number of pages to extract.
  • Now, create an instance of the TextRequest with TextOptions.
  • Lastly, get results by calling the ParseApi.text() method with the TextRequest.

The following code sample shows how to extract specific text from PowerPoint PPTX file by page numbers in Node.js using Text extraction API API:

How to Extract Text from PowerPoint PPT by Page Number Range in Node.js

Extract Text from PowerPoint PPT by Page Number Range in Node.js

Online PowerPoint Parsing Tool

Please try the following free online PowerPoint Parsing tool, which is developed using the above API.

 online PowerPoint Parsing tool

Conclusion

In this article, you have learned how to parse and extract text from PowerPoint PPT in Nodejs. You have seen:

  • how to extract text from a specific slides in Node.js using Text extraction API API;
  • how to extract text from all slides of a PowerPoint Presentation in Node.js;
  • programmatically upload a PowerPoint file to the cloud;

Besides, you can learn more about GroupDocs.Parser Cloud API using the documentation. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.

Ask a question

In case you would have any queries or confusion about Online Text Extractor, inform us via our forum.

FAQs

How do I extract text from PowerPoint in Node.js?

Please follow this link to learn the Node.js code snippet for how to extract text from PPT files in Node.js.

How to extract text from PPT documents online using REST API?

Firstly, create an instance of ParseApi, set the values of the TextOptions, and now call the ParseApi.text() method with TextRequest to extract selected text from PPT files online.

How to install a PPT text extractor free download library?

You can install the PPT text extractor Node.js library to extract text from PPT files programmatically.

How do I extract text from PPT offline in Windows?

Please visit this link to download PowerPoint editing software for Windows. This text extractor tool will extract text in windows instantly, with a single click.

See Also