Skip links

Text Mining from Unstructured Documents

Darknet
Python
Tensorflow
Challenges
  • Text mining from non-readable documents
  • The documents are image/pdf documents received from hundreds of companies
  • All the documents are dumped at a shared location on server
  • The documents can be scanned or generated by some software
Solutions
  • Schedule pickup of incoming documents from a shared location on the server
  • Configure the template of each company’s document for its region of interest
  • Check the input documents for type and quality
  • Reject the documents not matching the requirements for quality
  • Perform document classification using AI model
  • Perform template matching on a classified document
  • Automatically detect boundaries on each document
  • Perform Smart OCR on each document
Benefits
  • 100% information availability in near real-time
  • Quality check each document
  • Generate the results and store in database or excel
  • Generate MIS Charts and reports