Extract Images from PDF Documents

You may need to extract images from your PDF or Word documents to reuse them. You can easily extract images from PDF documents programmatically on the cloud. This article will explain how to extract images from PDF documents using a REST API in Python.

The following topics shall be covered in this article:

Document Parser REST API and Python SDK

For extracting images from a PDF document, I will be using the Python SDK of GroupDocs.Parser Cloud API. It allows you to parse data from all popular document types. You can extract text, images, and parse data by a template by using the SDK. It also provides .NET, Java, PHP, Ruby, and Node.js SDKs as its document parser family members for the Cloud API.

You can install GroupDocs.Parser Cloud to your Python project with pip (package installer for python) using the following command in the console:

pip install groupdocs_parser_cloud

Please get your Client ID and Client Secret from the dashboard before you start following the steps and available code examples. Once you have your client ID and Secret, add in the code as shown below:

Extract Images from PDF using a REST API in Python

You can extract images from PDF documents by following the simple steps mentioned below:

Upload the Document

First of all, upload the PDF document to the Cloud using the code example given below:

As a result, the uploaded PDF file (sample.pdf) will be available in the files section of your dashboard on the cloud.

Extract All Images from PDF Document using Python

You can easily extract all the images from the PDF file programmatically by following the steps mentioned below.

  • Create an instance of ParseApi
  • Define ImageOptions
  • Set path to the PDF file
  • Create ImagesRequest
  • Get results by calling the ParseApi.images() method

The following code sample shows how to extract all the images from a PDF document using a REST API.

Extract all images from PDF document.

Extract all images from PDF document.

Download Extracted Images

The above code sample will save the extracted images on the cloud. You can download these images using the code sample given below:

Save Images by Page Numbers from PDF Documents using Python

You can easily extract the images from specific pages of a PDF file programmatically by following the steps mentioned below.

  • Create an instance of ParseApi
  • Define ImageOptions
  • Provide the path to the PDF file
  • Set the start page number
  • Set the count of pages to extract
  • Create ImagesRequest
  • Get results by calling the ParseApi.images() method

The following code sample shows how to extract the images by page numbers range from a PDF document using a REST API. Please follow the steps mentioned earlier to download the extracted images.

Extract images by page number range from PDF document.

Extract images by page number range from PDF document.

Get Images From Document Attached with PDF using Python

You can extract the images from a document inside a container, available as an attachment in a PDF file programmatically by following the steps mentioned below.

  • Create an instance of ParseApi
  • Define ImageOptions
  • Set path to the PDF file
  • Define ContainerItemInfo
  • Provide the relative path of the inside document
  • Set the start page number
  • Set the count of pages to extract
  • Create ImagesRequest
  • Get results by calling the ParseApi.images() method

The following code sample shows how to extract the images from a document inside a PDF document using a REST API. Please follow the steps mentioned earlier to download the extracted images.

Extract images from document attached in PDF document

Extract images from document attached in PDF document.

Try Online

Please try the following free online PDF Parsing tool, which is developed using the above API. https://products.groupdocs.app/parser/pdf

Conclusion

In this article, you have learned how to extract images from PDF documents on the cloud. This article also explained how to programmatically upload a PDF file on the cloud. You also learned how to download the extracted images using the SDK. You can learn more about GroupDocs.Parser Cloud API using the documentation. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.

See Also