Why Extract Images from HTML? (Benefits & Use Cases)

HTML files often include several types of images, such as: Standard <img> tag images, Base64 inline images (data:image/...), images defined in CSS (e.g., background-image), SVG icons and graphics, externally linked images or, embedded image resources. Extracting images from HTML documents is useful for:

  • Content migration
  • Extracting media from HTML emails
  • Web scraping and analysis
  • Preparing training datasets for machine learning
  • Converting HTML into PDF/Word while preserving media
  • Archiving webpages with original assets

HTML Processing API

GroupDocs.Parser Cloud SDK for .NET is a robust REST based API capable of processing all popular file formats including HTML files. It enables you to manipulate HTML files and you can use it to:

  • Parse HTML documents
  • Extract embedded and inline images
  • Extract Base64-encoded images
  • Detect external image references
  • Retrieve metadata (size, type, path)
  • Download extracted images locally
  • Automate HTML parsing workflows

Supported Image Formats

Prerequisites

  • A GroupDocs Cloud account (Client ID & Client Secret).
  • .NET 6.0+ installed.
  • Visual Studio or compatible IDE.
  • NuGet package: GroupDocs.Parser-Cloud

Install via NuGet

dotnet add package GroupDocs.Parser-Cloud --version 25.7.0

Extract Images from HTML using C#

Given below is the full C# example demonstrating HTML image extraction using the Cloud API.

Step 1 — Initialize the API:

var config = new Configuration("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET");
var parserApi = new ParserApi(config);
var fileApi = new FileApi(config);

Step 2 — Provide HTML File Information for Parsing:

var fileInfo = new FileInfo { FilePath = "input.html" };
var options = new ImagesOptions { FileInfo = fileInfo };
var request = new ImagesRequest(options);

Step 3 — Extract Images from HTML (Embedded, Inline & Base64):

var response = parserApi.Images(request);

foreach (var image in response.Images)
{
    Console.WriteLine($"Source: {image.Path}, Type: {image.MediaType}, Size: {image.Size}");
}

Step 4 — Download Extracted Images (macOS & Windows Compatible):

var outputDirectory = "/Users/nayyer/Downloads/html-images";
Directory.CreateDirectory(outputDirectory);

foreach (var img in response.Images)
{
    var cloudImagePath = img.Path.Replace("\\", "/");

    var downloadRequest = new DownloadFileRequest(path: cloudImagePath);
    using (var stream = fileApi.DownloadFile(downloadRequest))
    {
        var localPath = Path.Combine(outputDirectory, Path.GetFileName(cloudImagePath));

        using (var fileStream = File.Create(localPath))
        {
            stream.CopyTo(fileStream);
        }

        Console.WriteLine($"Downloaded: {localPath}");
    }
}

Download HTML Images using cURL

Apart from C# code snippet, we may also use cURL commands to download images from HTML files.

Step 1 — Generate Access Token:
The first step in this approach is to generate a JWT access token based on client credentials.

curl -v -X POST "https://api.groupdocs.cloud/connect/token" \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded"

*Step 2 — Extract Images:

curl -v -X POST "https://api.groupdocs.cloud/v1.0/parser/images" \
-H  "accept: application/json" \
-H  "authorization: Bearer {ACCESS_TOKEN}" \
-H  "Content-Type: application/json" \
-d "{ "FileInfo": { "FilePath": "inbox/input.html", "StorageName": "internal" }, "OutputPath": "extracted-images"}"

Try Free Online HTML Image Extractor

We offer a free online Online HTML Image Extractor developed on top of GroupDocs.Parser Cloud API. No software installation or download required and evaluate the capabilities of REST API within web browser.

extract html images

Conclusion

In this article, we have learnt the most accurate way to extract images from HTML using .NET REST API. With this help of this API, you can:

  • Extract embedded and inline images
  • Parse Base64-encoded images
  • Extract CSS background images
  • Retrieve metadata for linked images
  • Download all images programmatically

Therefore, it is the best solution for automating HTML media extraction in C# applications.

We highly recommend visiting the following articles to learn more about:

Frequently Asked Questions (FAQs)

1. Can I extract Base64 embedded images from HTML?
Yes, the API extracts Base64-encoded and inline HTML images automatically.

2. Does the API extract externally linked images?
The API extracts metadata for linked images; downloading them is optional.

3. Can I extract images referenced in CSS?
Yes, images referenced through inline or embedded CSS are supported.

4. What image formats are supported?
JPG, PNG, BMP, GIF, TIFF, and other common image types.

5. Is there a free trial?
Yes. You can create a free account and get free 150 monthly API calls.