Extract Text from a PDF Document with Python using GroupDocs.Conversion Cloud

PDF (Portable Document Format) is one of the most important and widely used file format used to present and exchange documents. As a python developer, there are many scenarios where you will want to extract text from a PDF document and export it in a different format using Python for text analytics. In this post, we will show you how to extract text from a PDF document accurately using GroupDocs.Conversion Cloud SDK for Python.

GroupDocs.Conversion Cloud is a platform independent REST API solution of document and image conversion without depending on any third-party application. It converts 50+ types of documents from one format to another. It offers SDKs for all popular programming languages including Python, so developers can use the API directly in their applications without worrying about underlying REST API calls.

Let us start the code:

Install GroupDocs.Conversion Cloud Package

First thing first, install groupdocs-conversion-cloud package from pypi with the following command.

>pip install groupdocs-conversion-cloud

Python PDF Text Extraction Example

We will follow these steps to extract text from a PDF Document:

  • Free sign up with groupdocs.cloud to get your AppSID and AppKey
  • Create a python module and copy paste following code in it. We have used default options to extract text of the PDF document. You can extract text of specific pages as well using Convert Options of text format.
  • Run the code in you favorite IDE, you will get following output and that’s it. Task accomplished!

Feel free to drop us a comment at the support forum sharing your thoughts about GroupDocs.Conversion Cloud API. Or let us know if you have any suggestions or if you need any particular features which you expect our REST API to have.

Posted in GroupDocs.Comparison Cloud Product Family | Tagged , , | Leave a comment

A Reliable RESTful API Solution to Optimize PDF Document

GroupDocs.Conversion Cloud is a document and image conversion solution. It empowers the developers to add document conversion feature in their applications on any platform with complete control using standard REST API Calls. In this post we will discuss How to optimize PDF document. You can visit GroupDocs.Convesion Cloud for a complete list of features.

A PDF document may sometimes contain additional data. Reducing the size of a PDF file will help you optimize the network transfer and storage. This is especially handy for publishing on web pages, sharing on social networks, sending by e-mail, or archiving in storage. Let me show you how easily you can use GroupDocs.Conversion Cloud to optimize PDF document for web or optimize the PDF file size. I will be using cURL in the following examples. You can use the SDK of your favorite programming language, without worrying about underlying REST API calls.

Optimize PDF Document for Web

Optimization, or linearization for Web, refers to the process of making a PDF file suitable for online browsing using a web browser. The linearized PDF file loads faster over the Internet. Because, linearized PDF files contains information that allow a byte-streaming server to download the PDF file one page at a time. If the byte-streaming is disabled on the server or if the PDF file is not linearized, the entire PDF file must be downloaded before it can be viewed. Check the cURL API command to optimize a PDF file for web display:

Optimize PDF for web

Optimize PDF File Size

To optimize PDF file size, we can use several techniques to optimize PDF. GroupDocs.Conversion Cloud provides following properties to optimize the file size:

  • compressImages
  • imageQuality
  • linkDuplicateStreams
  • unembedFonts
  • removeUnusedObjects
  • removeUnusedStreams

Feel free to drop us a comment at the support forum sharing your thoughts about GroupDocs.Conversion Cloud API. Or let us know if you have any suggestions or if you need any particular features which you expect our REST API to have.

And if you’ve not already had a chance to try our REST API, simply start a free trial today. All you need is to sign up with the groupdocs.cloud. Once you’ve signed up, you’re ready to try the powerful file processing features offered by groupdocs.cloud.

Posted in GroupDocs.Conversion Cloud Product Family | Leave a comment

Classify raw text in MS Office, PDF and many other documents using cURL

What is Text Classification?

Text classification is the process of assigning tags or categories to text according to its content with broad applications such as sentiment analysis, topic labeling, spam detection, and intent detection.

GroupDocs.Classification Cloud API
GroupDocs.Classification Cloud

Unstructured data in the form of raw text is everywhere: emails, chats, web pages, social media, support tickets, survey responses, and more. Text can be an extremely rich source of information, but extracting insights from it can be hard and time-consuming due to its unstructured nature. Businesses are turning to text classification for structuring text in a fast and cost-efficient way to enhance decision-making and automate processes.

What is Taxonomy?

Taxonomy is the practice and science of classification. The word is also used as a taxonomic scheme. Taxonomy is a particular classification. In a wider, more general sense, it may refer to a classification of things or concepts, as well as to the principles underlying such a classification.

GroupDocs.Classification Cloud API

GroupDocs.Classification Cloud API retrieves raw text classification output for IAB-2 taxonomy or Documents taxonomy. It returns an object that contains information about the best class and its probability and about probabilities of the other classes.

IAB-2 Taxonomy Example

GroupDocs.Classification Cloud API supports IAB-2 taxonomy scheme, Some of the taxonomy examples are listed below :

  • ‘Automotive’,
  • ‘Books_and_Literature’,
  • ‘Business_and_Finance’,
  • ‘Careers’,
  • ‘Education’,
  • ‘Events_and_Attractions’,
  • ‘Family_and_Relationships’,
  • ‘Fine_Art’,
  • ‘Food_&_Drink’,
  • ‘Healthy_Living’,
  • ‘Hobbies_&_Interests’,
  • ‘Home_&_Garden’,
  • ‘Medical_Health’,
  • ‘Movies’,
  • ‘Music_and_Audio’,
  • ‘News_and_Politics’,
  • ‘Personal_Finance’
  • etc.

cURL Request

Response

.NET Example

Documents Taxonomy Example

Documents taxonomy includes the following list in GroupDocs.Classification Cloud API:

  • ADVE – advertisements, brochures.
  • Email
  • Form
  • Letter
  • Memo – memorandums.
  • News – articles, including news articles.
  • Invoice
  • Report
  • Resume 
  • Scientific – scientific papers.
  • Other – the other classes of documents or cases where the classifier is not sure.

cURL Request

Response

.NET Example

Start a Free Trial Today

Start a free trial today – all you need is to sign up with the GroupDocs Cloud service. Once you have signed up, you are ready to try the powerful file processing features offered by GroupDocs Cloud.

Posted in GroupDocs.Classification Cloud Product Family | Leave a comment

A REST API Solution to Redact PDF Text

GroupDocs.Annotation Cloud API is a platform independent Document and Image Annotation Solution, that empowers the developers to add an annotation feature in their application with minimum efforts. The API supports a range of Annotation types, but in this post I will focus on the Text Redaction Annotation to demonstrate how to redact PDF text.

Text Redaction is a process to remove content from a document permanently. Before you publish the document, you need to remove sensitive and private data from the document. GroupDocs.Annotation Cloud provides, the Text Redaction Annotation to redact the text on the certain page region. Text redaction fills part of text with a black rectangle, to hide underlying word or phrase.

Let me give you a quick overview of how to redact text with GroupDocs.Annotation Cloud with a simple set of HTTP requests. I will be using cURL to redact text in a PDF document in this example. The API is not limited to PDF file format, you can check the complete list of supported file formats. It also provides SDKs for all popular programming languages. You can check the available SDKs from GitHub repository with working examples and use directly in your application.

We will follow these steps to find the duplicate images:

  • Generate access token for authentication
  • Upload source document to storage
  • Add Annotation to document
  • Download annotated document

Generate Access Token

Upload source document to storage

Annotate source document

The SvgPath property is used to add the text redaction annotation and coordinates of SvgPath property start from bottom of the document page and increase to the top.

Download Annotated document

If you’ve not already tried our REST API, we encourage you to head over to GroupDocs.Annotation Cloud with a free trial today. All you need is to sign up with the groupdocs.cloud. Once you’ve signed up, you may go through the following useful resources of GroupDocs.Annotation Cloud.

Feedback

Your feedback is very important for us. If you’ve any suggestions or if you need any particular features which you expect our REST API to have, please feel free to drop us a comment below or at the support forum.

Posted in GroupDocs.Annotation Cloud Product Family | Tagged , , , | Leave a comment

Add Barcode Signature to your Documents

What is an Electronic Signature?

An electronic signature or e-signature refers to data in electronic form which is logically associated with other data in electronic form and which is used by the signatory to sign. This type of signature provides the same legal standing as a handwritten signature as long as it adheres to the requirements of the specific regulation. E-Signatures can be in the form of digital text, images, barcode, QR codes, etc.

Electronic signatures are a legal concept distinct from digital signatures, a cryptographic mechanism often used to implement electronic signatures. While an electronic signature can be as simple as a name entered in an electronic document, digital signatures are increasingly used in e-commerce and in regulatory filings to implement electronic signatures in a cryptographically protected way.

GroupDocs.Signature Cloud API?

GroupDocs.Signature Cloud is a REST API to create, verify and search different types of Signature objects to documents in the cloud. There are five major types of supported Signature you can operate with:

  • Text Signature
  • Barcode Signature
  • QR Code Signature
  • Digital Signature
  • Image Signature
  • Stamp Signature

How to work with barcode document using e-signing API?

This API is intended to add electronic signatures to the documents, based on the parameters passed as an array of signature options. The rendered document can be downloaded using the output URLs or paths.

Here are the steps to work with document signature:

  1. Upload File to a Storage.
  2. Create a Signature.
  3. Verify a Signature.
  4. Search a Signature.
  5. Download HTML File.

1. Upload File to a Storage

The following code demonstrates how to upload files to a storage.

Now our file “one-page.docx” is available under folder “signaturedocs” on storage.

2. Creating a Barcode Signature

GroupDocs.Signature Cloud REST API supports to sign a document with Barcode. It provides methods to create Barcode Signature in Document Pages with different options of Barcode type, location, alignment, font, margins, and appearances by using Signature Option Objects data in the request body.

The following code demonstrates, how to Create Barcode Signature.


3. Verify a Signature

GroupDocs.Signature Cloud REST API provides methods to verify Barcode Signature in Documents Pages with different options for page number, text and search criteria by using Verification Options Objects data in the request body.

The following code demonstrates, how to Verify Barcode Signature .

4. Search a Signature

GroupDocs.Signature Cloud API provides a method to search Barcode Signature in Document Pages with different options barcode type, Name, text, match type, and other search features by using Search Options Object data in the request body.

The following code demonstrates, how to Search Barcode Signature .

5. Download File

The following code demonstrates, how to download a specific file.

That’s it.

Start a Free Trial Today

Start a free trial today – all you need is to sign up with the GroupDocs Cloud service. Once you have signed up, you are ready to try the powerful file processing features offered by GroupDocs Cloud.

Posted in GroupDocs.Signature Cloud Product Family | Leave a comment

Rendering CAD File Formats was Never So Easy Before

CAD (Computer Aided Design)

It is used for a 3D graphics file format and may contain 2D or 3D designs. CAD file is a digital file format of an object generated and used by CAD software. A CAD file contains a technical drawing, blueprint, schematic, or 3-D rendering of an object.

GroupDocs.Viewer Cloud API

GroupDocs.Viewer Cloud API is flexible document rendering and viewing solution for programmers and professionals to render and display widely used file formats anywhere.

Supported CAD File Formats

File Extension File Format
DGN MicroStation Design File
DWF Design Web Format
DWG Autodesk Design Data Formats
DXF Autodesk Drawing Exchange File Format
IFC Industry Foundation Classes File
STL Stereolithography File

How to Render HTML View of CAD File Formats

Our document rendering and viewing solution is empowering developer with options to render the CAD file formats in their applications with a few lines of instructions, which includes options like enlarging the output, set the height and width of the output file etc.

Here are the steps to render the HTML view of a CAD file:

  1. Upload File to a Storage.
  2. Create HTML View.
  3. Download HTML File.

1. Upload File to a Storage

The following code demonstrates how to upload files to a storage.

Now our file “three-layouts.dwf” is available under folder “viewerdocs” on storage.

2. Create HTML View

GroupDocs.Viewer Cloud API does this CAD to HTML rendering in the cloud using stored files, when CAD documents are rendered, the size of the rendering result is adjusted by API automatically depending on the size of the initial document.

However, we can also set the output result files by providing the CadOptions available in GroupDocs.Viewer Cloud API such as:

ScaleFactor

Scale factor allows to change the size of the output document. Values higher than 1 will enlarge output result and values between 0 and 1 will make output result smaller. 
This option is ignored when either Height or Width options are set.

Width

The width of the output result in pixels.

Height

The height of the output result in pixels.

The Following code demonstrates, how to create an HTML view of a CAD file Formats.

CAD file to HTML view is created and output HTML is available in storage to download.

3. Download HTML File

The following code demonstrates, how to download a specific file.

That’s it.

GroupDocs.Viewer Cloud Resources

Following are the links to some useful resources you may need to accomplish your tasks.

Start a Free Trial Today

Start a free trial today – all you need is to sign up with the GroupDocs Cloud service. Once you have signed up, you are ready to try the powerful file processing features offered by GroupDocs Cloud.

Posted in GroupDocs.Viewer Cloud Product Family | Leave a comment

Introducing GroupDocs.Comparison Cloud 19.5!

Guys! New version of Groupdocs.Comparison Cloud 19.5 is here. Your feedback and interest in our Document Comparison Cloud API keeps us motivated to implement new features. It is all new API; in this version we have implemented the API as a Microservice. It improves the performance and stability of the API. New API has less methods and options. We have also introduced new methods for cloud storage operations in GroupDocs.Comparison Cloud API. I will give you an overview of some of the new features in the following sections. For complete details of new features and enhancement, please check the release notes of this version.

New API Version

We have introduced V2 API Version in 19.5 version and following base URL will be used. Please note V1 will remain available.

https://api.groupdocs.cloud/v2.0/comparison/

Authentication

For improved security, we have introduced JWT(JSON Web Token) authentication in this release. OAuth2 and URL signing authentication methods are not supported any more by V2 API Version. Let us show you how to get JWT Access Token.

Storage APIs

Now onward, you do not need to use GroupDocs.Storage Cloud REST API for storage operations. GroupDocs.Comparison Cloud API has introduced following API methods for the purpose.

File API – Introduced methods for upload, download, copy, move, delete files : input documents and rendering results, in the cloud storage

Upload file to Storage

Download file from Storage

Folder API – Introduced methods for create, copy, move, delete folders in the cloud storage

Storage API – Introduced methods for getting storage information and file information

Comparison API

In this digital era, Document comparison is a basic requirement of individuals and organizations for their legal and financial tasks. And GroupDocs.Comparison Cloud REST API is a proven API for developers to add this feature in their applications without worrying about platform dependence. It can be used on any platform without any third party software. It provides simple methods to compare most popular business file formats (Word, Excel, PowerPoint, PDF, Images, Email, Html, Note). Following API methods

Comparisons – Compares source and target documents and returns a link to saved result

POST ​/comparison​/comparisons

Changes – Retrieves a list of changes between source and target documents

POST ​/comparison​/changes

Update – Accepts or rejects changes to the resultant document and returns a link to saved result

PUT ​/comparison​/updates

Here we will show you how easily you can compare two versions of the same document for changes with default Comparison options and get result document path. You just need to upload source and target document to storage and call comparisons API method. This is how you can achieve this tasks using cURL, however you can refer to the complete list of available SDKs to use GroupDocs.Comparison Cloud API directly in your favorite platform.

Got a question or Bug? Please feel free to drop us a comment below or post a question in support forum. It helps us to continually improve and refine our API.

Still haven’t tried GroupDocs.Comparison Cloud? The free trial is right here waiting for you to give it a try and explore the power of the Comparison REST API. All you need is to sign up with the groupdocs.cloud.

Posted in GroupDocs.Comparison Cloud Product Family | Tagged , , | Leave a comment

GroupDocs.Annotation Cloud 19.5

Great news for developers! All new GroupDocs.Annotation Cloud 19.5 is introduced. We are committed to evolving GroupDocs.Annotation Cloud REST API to make it more simplified and easy to use. With this in mind, we’ve made necessary changes in this version. The new API is more optimized with less methods and options. Its internal architecture is revamped for fast and reliable processing to build Document & Image Annotation tools with support for Text & Figure based annotation operations. Also now the API includes methods for working with cloud storage. So you can perform storage operations using GroupDocs.Annotation Cloud REST API directly instead of using separate API.

Please check the detailed release notes of this version to get an idea about all the new features/enhancements made in this release.

Breaking Changes

New API Version

Introduced API version V2 in 19.5 version, V1 will remain available.

Authentication

JWT(JSON Web Token) authentication is introduced in this release, now OAuth2 and URL signing authentication methods are obsolete now.

Annotation API

Simplified API methods to apply Text and Figure based annotations to documents & images of all popular formats.

Storage APIs

File API – Introduced methods for upload, download, copy, move, delete files : input documents and rendering results, in the cloud storage

Folder API – Introduced methods for create, copy, move, delete folders in the cloud storage

Storage API – Introduced methods for getting storage information and file information

Add Annotations to Document

Here we will show you how GroupDocs.Annotation Cloud V2 API version works and it is different than V1. We will add annotation in a Word document using GroupDocs.Annotation Cloud SDK for .NET by following these steps:

  • Upload source document to Storage
  • Add Annotaition to source document

We need to upload source document to Cloud storage as in this example we will process document from Cloud storage. In the release we introduced File API for file storage operations. We will use UploadFile method of Annotation V2 API version instead of GroupDocs.Storage Cloud API method to upload file to storage.

In 19.5 version, PUT method for import of annotation has been changed to POST method, as shown in following sample code.

V1.1 Example

V2.0 Example

Share Your Feedback

Your feedback is important! Please feel free to drop us a comment sharing your thoughts about the new version of GroupDocs.Annotation Cloud REST API. It helps us to continually improve and refine our API.

Posted in GroupDocs.Annotation Cloud Product Family | Tagged , , | Leave a comment

All New GroupDocs.Signature Cloud V2!

Guys, old days’ tedious job of signing a document is gone, when you used to print, sign, scan and post the documents. Now in this digital era online document signing options made life easier. And GroupDocs.Signature Cloud REST API is tested and reliable e-Signature REST API to add the power of electronic signatures in your applications without installing any third-party software. It helps you to electronically secure your documents by applying Text, Stamp, QR-Code, Barcode, Image and Digital Signatures. New version of GroupDocs.Signature Cloud 19.5 is released.

Please check the detailed release notes of this version to get an idea about all the new features/enhancements made in this release.

What’s New

API Version – Introduced API version V2

Authentication – JWT(JSON Web Token) authentication

Signature API – Simplified API methods to create, verify and search for signatures, same as getting additional information of documents

File API – Introduced methods for upload, download, copy, move, delete files : input documents and rendering results, in the cloud storage

Folder API – Introduced methods for create, copy, move, delete folders in the cloud storage

Storage API – Introduced methods for getting storage information and file information

How it Works

The major change in this release is the introduction of V2 API version, it is all new API version. It is more simplified API with less methods and options. Also, it has more optimized and refined internal architecture. In this version, the API includes methods for working with cloud storage. So you can perform storage operations using GroupDocs.Signature Cloud REST API instead of using separate API.

Here we will show you how GroupDocs.Signature Cloud V2 API version works and it is different than V1. We will add Barcode signature in a Word document using V1 and V2 by following these steps:

  • Retrieve Access Token
  • Upload source document to Storage
  • Add BarCode Signature to source document

You can notice from following cURL examples that we have used JWT authentication in V2 example. Please note OAuth 2.0 and URL signing request authentication methods of V1 API version are not supported in V2 anymore. Now, V2 API version supports JWT(JSON Web Token) authentication.

In new release method we used File method of the V2 API for uploading file to storage instead of GroupDocs.Storage Cloud method. And the last difference from following examples, but not least, in V2 a single API create is used for all supported signature types by passing signature details as parameter. However, in V1 we used to call different APIs for each signature type.

V1.1 Example

V2.0 Example

Provide Feedback

Feel free to drop us a comment below sharing your thoughts about the new version of GroupDocs.Signature Cloud 19.5. Or visit our Support Forum and let us know if you have any suggestions or if you need any particular features/improvement which you expect our API to have.

And if you have not already had a chance to try our REST API, simply start a free trial today. All you need is to sign up with the groupdocs.cloud. Once you have signed up, you are ready to try the powerful file processing features offered by groupdocs.cloud.

Posted in GroupDocs.Signature Cloud Product Family | Leave a comment

Introducing GroupDocs.Conversion Cloud V2

GroupDocs Cloud is pleased to announce V2 version of GroupDocs.Conversion Cloud REST API. It is all new API version, with a simplified and intuitive approach as compared to V1. New API has less methods and options for document conversion tasks with improved architecture. In this version, the API includes methods for working with cloud storage, which is the important part. Learn more.

What’s New

API methods of GroupDocs.Conversion Cloud V2 are segregated into four sections. The Conversion API section includes methods for document conversion. The File API section has methods to upload, download, copy, move and delete files. Methods for create, copy, move, delete folders in the cloud storage are added in Folder API section. And Storage API includes methods getting storage information and file information.

In the next few paragraphs we will go over the features and functionality of GroupDocs.Viewer Cloud V2.

How it Works

In this digital era, document conversion service is becoming an essential need of organizations. Different document formats of a document are required in the business process for different purposes. So as a developer, you may be searching for an efficient and reliable solution to develop a document conversion tool. GroupDocs.Conversion Cloud REST API support conversion of 65+ document formats. It allows seamless integration of document conversion feature in your application.

We will show you how easily and quickly you can convert a document to another document format and add watermark at the same time. We are using cURL for the REST API requests in this example. Please check the complete list of available SDKs to use GroupDocs.Viewer Cloud API directly in your favorite platform.

Here we go… We will convert a DOCX file to PDF an add watermark text as following

Get authentication code

GroupDocs.Conversion Cloud REST API supports JWT(JSON Web Token) authentication.

Upload source file to storage

We are uploading source DOCX file to default storage of groupdocs.cloud. However you can use 3rd party storage with groupdocs.cloud Cloud APIs as well.

Covert and add Watermark

Use ConversionSettings and ConvertOptions input parameters as per your required file format.

Download PDF file

File API is used to download files form storage.

Share Your Feedback

Feel free to drop us a comment sharing your thoughts about the new version of GroupDocs.Conversion Cloud REST API. Or let us know if you have any suggestions or if you need any particular features which you expect our REST API to have.

And if you have not already had a chance to try our REST API, simply start a free trial today. All you need is to sign up with the GroupDocs Cloud service. Once you have signed up, you are ready to try the powerful file processing features offered by GroupDocs Cloud.

Posted in GroupDocs.Conversion Cloud Product Family | Tagged , , , | Leave a comment