If you are a Python developer and want to extract data from documents, this article will guide you to extract images from various word processing documents, spreadsheets, presentations, and PDF documents using simple Python examples.
Following topics will be covered today:
- Image Extraction REST API and Python SDK
- Extract Images from PDF Document using Python
- Images Extraction from Excel, PPT, or Word Docs using Python
Image Extraction REST API and Python SDK
This time, we will use GroupDocs.Parser Cloud API for the extraction of images from different types of documents. We will only use the Python SDK of GroupDocs.Parser Cloud API. However, currently, it also provides, .NET, Java, PHP, Ruby, and Node.js SDKs as its document parsing family members for the Cloud API.
The API also supports text and metadata extraction along with extracting images from various kinds of documents like word processing documents, spreadsheets, presentations, emails, archives, markup, and PDF documents.
Coming to the objective, first, get your APP KEY and APP SID from the dashboard before you start following the steps and available code examples.
Extract Images from PDF Document using Python
As an example, first I will be extracting the images from a PDF document. By just following simple steps, all the images can be extracted easily.
- Upload the document to the Cloud.
- Extract the images from the uploaded document.
- Download the extracted images.
Upload the Document
Firstly, upload the PDF document to the Cloud using any of the following methods:
- Using the dashboard.
- Using Upload File API from the browser.
- Programmatically as mentioned in the documentation.
As the result, PDF file will be uploaded at the Cloud Storage
Extract Images from the Uploaded Document
Now you are done with the difficult part. Following Python code will let you quickly extract all the images from the uploaded PDF document.
Download the Extracted Images
Once you have extracted the images, you can download the images from the cloud either from dashboard or programmatically. Images shown here are extracted from the above shown PDF document.
Image Extraction from Excel, PPT, or Word Docs using Python
Similarly, you can extract all the images from the Word documents, spreadsheets, presentations with the exact above-mentioned python code for PDF document. You just have to change the file path with the correct document name with extension.
We are now familiar with how to programmatically extract images from Word, Excel, PowerPoint, PDF, and other documents using Python. No difference in the code, we just have to change the source document path and type.
For more features and to learn more about the document parsing API, visit the documentation for articles which also contain the examples. The best way to test the highlighted features is to experience the open-source running examples from GitHub. In case of any confusion, the GroupDocs Support Team feels delighted to facilitate you. Thanks