Classify raw text in MS Office, PDF and many other documents using cURL

What is Text Classification?

Text classification is the process of assigning tags or categories to text according to its content with broad applications such as sentiment analysis, topic labeling, spam detection, and intent detection.

GroupDocs.Classification Cloud

Unstructured data in the form of raw text is everywhere: emails, chats, web pages, social media, support tickets, survey responses, and more. Text can be an extremely rich source of information, but extracting insights from it can be hard and time-consuming due to its unstructured nature. Businesses are turning to text classification for structuring text in a fast and cost-efficient way to enhance decision-making and automate processes.

What is Taxonomy?

Taxonomy is the practice and science of classification. The word is also used as a taxonomic scheme. Taxonomy is a particular classification. In a wider, more general sense, it may refer to a classification of things or concepts, as well as to the principles underlying such a classification.

GroupDocs.Classification Cloud API retrieves raw text classification output for IAB-2 taxonomy or Documents taxonomy. It returns an object that contains information about the best class and its probability and about probabilities of the other classes.

IAB-2 Taxonomy Example

GroupDocs.Classification Cloud API supports IAB-2 taxonomy scheme, Some of the taxonomy examples are listed below :

  • ‘Automotive’,
  • ‘Books_and_Literature’,
  • ‘Business_and_Finance’,
  • ‘Careers’,
  • ‘Education’,
  • ‘Events_and_Attractions’,
  • ‘Family_and_Relationships’,
  • ‘Fine_Art’,
  • ‘Food_&_Drink’,
  • ‘Healthy_Living’,
  • ‘Hobbies_&_Interests’,
  • ‘Home_&_Garden’,
  • ‘Medical_Health’,
  • ‘Movies’,
  • ‘Music_and_Audio’,
  • ‘News_and_Politics’,
  • ‘Personal_Finance’
  • etc.

Documents Taxonomy Example

Documents taxonomy includes the following list in GroupDocs.Classification Cloud API:

  • ADVE – advertisements, brochures.
  • Email
  • Form
  • Letter
  • Memo – memorandums.
  • News – articles, including news articles.
  • Invoice
  • Report
  • Resume 
  • Scientific – scientific papers.
  • Other – the other classes of documents or cases where the classifier is not sure.

