Examlex

Solved

A Company Is Converting a Large Number of Unstructured Paper

question 60

Multiple Choice

A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date, location, and notes, as well as some custom entities such as receipt numbers. The company is using optical character recognition (OCR) to extract text for data labeling. However, documents are in different structures and formats, and the company is facing challenges with setting up the manual workflows for each document type. Additionally, the company trained a named entity recognition (NER) model for custom entity detection using a small sample size. This model has a very low confidence score and will require retraining with a large dataset. Which solution for text extraction and entity detection will require the LEAST amount of effort?


Definitions:

Spoken Language

The use of sounds and words for communication, a fundamental aspect of human interaction and culture.

Audition

The sense of hearing, or the process by which sound waves are transformed by the auditory system into neural signals the brain interprets as sound.

Vocalization

Vocalization refers to the production of sounds or utterances as a means of communication, common in both humans and animals.

Complex Communication

The exchange of information that involves multiple methods, such as verbal, non-verbal, written, and digital, to convey messages effectively.

Related Questions