Skip links

Text Mining from Unstructured Documents

Client

Manufacturing and E-commerce company

Scroll down

Technologies used

Python

Tensorflow

Darknet

OpenCV

Challenges

Text mining from non-readable documents

The documents are image/pdf documents received from hundreds of companies

All the documents are dumped at a shared location on server

The documents can be scanned or generated by some software

Solutions

Schedule pickup of incoming documents from a shared location on the server

Configure the template of each company’s document for its region of interest

Check the input documents for type and quality

Reject the documents not matching the requirements for quality

Perform document classification using AI model

Perform template matching on a classified document

Automatically detect boundaries on each document

Perform Smart OCR on each document

Benefits

100% information availability in near real-time

Quality check each document

Generate the results

  • Store in database
  • Store in excel format

Generate MIS Charts and reports