As a Python developer, you can annotate any Word (.doc or .docx) file programmatically on the cloud. You can also extract or remove all the annotations from Word files using Python. The annotations include comments, popups, and various other graphical objects in the document providing additional information. This article will be focusing on how to extract or remove annotations from Word in Python.
The following topics will be covered in this article:
Document Annotation REST API and Python SDK
For extracting or removing annotations from Word or DOCX files, I will be using the Python SDK of GroupDocs.Annotation Cloud API. It allows you to programmatically build online document and image annotation tools. Such tools can be used to add annotations, watermark overlays, text replacements, redactions, sticky notes, and text markups to the business documents of all popular formats. It also provides .NET, Java, PHP, Ruby, and Node.js SDKs as its document annotation family members for the Cloud API.
You can install GroupDocs.Annotation Cloud to your Python project using the following command in the console:
pip install groupdocs_annotation_cloud
Please get your Client ID and Client Secret from the dashboard before you start following the steps and available code examples. Once you have your ID and secret, add in the code as shown below:
Extract or Remove Annotations from Word in Python
You can extract or delete annotations from the DOCX files by following the simple steps mentioned below:
- Upload the DOCX file to the Cloud
- Extract Annotations from DOCX Files in Python
- Remove Annotations from DOCX Files in Python
- Download the updated file
Upload the Document
Firstly, upload the DOCX file to the Cloud using the code example given below:
As a result, the uploaded DOCX file (input.docx) will be available in the files section of your dashboard on the cloud.
Extract Annotations from DOCX Files in Python
Please follow the steps mentioned below to extract annotations from the Word document programmatically.
- Create an instance of AnnotateApi
- Create a FileInfo instance
- Set the file path
- Create a request by calling the ExtractRequest method
- Get results by calling the AnnotateApi.extract() method
The following code snippet shows how to extract annotations from the Word document using a REST API.
The above code sample will return an array of all the annotations in JSON format as shown below:
Remove Annotations from DOCX Files in Python
Please follow the steps mentioned below to delete annotations from the Word document programmatically.
- Create an instance of AnnotateApi
- Create a FileInfo instance
- Set the file path
- Define RemoveOptions
- Set file info to AnnotateOptions
- Provide annotation IDs to remove
- Set output file path
- Create a request by calling the RemoveAnnotationsRequest method
- Get results by calling the AnnotateApi.remove_annotations() method
The following code snippet shows how to remove annotations from the Word document using a REST API. You need to mention annotation IDs that need to be removed from the document.
Download the Output File
The above code sample will save the output DOCX file (output.docx) after removing annotations on the cloud. You can download it using the following code sample:
Try Online - Online Annotation Remover
Please try the following free online DOCX annotation tool, which is developed using the above API.
Conclusion
In this article, you have learned how to extract or remove annotations from Word in Pythonon the cloud using Python. You also learned how to programmatically upload the DOCX file on the cloud and download the file from the cloud. You can learn even more about GroupDocs.Annotation Cloud API using the documentation which will enable you to develop your own annotation remover. We also provide an API Reference section that lets you visualize and interact with our APIs directly through the browser. In case of any ambiguity, please feel free to contact us on the forum.