Extract Images from PDF, Spreadsheets, Presentations & Word Documents using Python

If you are a Python developer and want to extract data from documents, this article will guide you to extract images from various word processing documents, spreadsheets, presentations, and PDF documents using simple Python examples.

Following topics will be covered today:

Image Extraction REST API and Python SDK

Document Parsing Python SDK

This time, we will use GroupDocs.Parser Cloud API for the extraction of images from different types of documents. We will only use the Python SDK of GroupDocs.Parser Cloud API. However, currently, it also provides, .NET, Java, PHP, Ruby, and Node.js SDKs as its document parsing family members for the Cloud API.

The API also supports text and metadata extraction along with extracting images from various kinds of documents like word processing documents, spreadsheets, presentations, emails, archives, markup, and PDF documents.

Coming to the objective, first, get your APP KEY and APP SID from the dashboard before you start following the steps and available code examples.

Extract Images from PDF Document using Python

PDF Document to Extract Images

As an example, first I will be extracting the images from a PDF document. By just following simple steps, all the images can be extracted easily.

  • Upload the document to the Cloud.
  • Extract the images from the uploaded document.
  • Download the extracted images.

Upload the Document

Firstly, upload the PDF document to the Cloud using any of the following methods:

As the result, PDF file will be uploaded at the Cloud Storage

PDF file uploaded at dashboard
Uploaded PDF file at dashboard.groupdocs.cloud/#/files

Extract Images from the Uploaded Document

Now you are done with the difficult part. Following Python code will let you quickly extract all the images from the uploaded PDF document.

Download the Extracted Images

Once you have extracted the images, you can download the images from the cloud either from dashboard or programmatically. Images shown here are extracted from the above shown PDF document.

Images extracted from a PDF file
Images extracted from the PDF document
Extracted Images from Document using Python

Image Extraction from Excel, PPT, or Word Docs using Python

Similarly, you can extract all the images from the Word documents, spreadsheets, presentations with the exact above-mentioned python code for PDF document. You just have to change the file path with the correct document name with extension.

Conclusion

We are now familiar with how to programmatically extract images from Word, Excel, PowerPoint, PDF, and other documents using Python. No difference in the code, we just have to change the source document path and type.

For more features and to learn more about the document parsing API, visit the documentation for articles which also contain the examples. The best way to test the highlighted features is to experience the open-source running examples from GitHub. In case of any confusion, the GroupDocs Support Team feels delighted to facilitate you. Thanks