Using Intelligent PDF processing approach the documents would be processed, and values would be extracted based on a dynamic repository. The dynamic repository consists of a generic data model which defines the list of fields to be extracted as part of the process and a data dictionary which contains all the pseudo names for the fields listed in the data model. Jiffy provides a pre-package solution which comprises of the document templates, a finite set of data dictionary, and a built in Jiffy User Interface application to view and confirm the data extracted from the PDF.
Dynamic Repository The fields that must be extracted from the PDF should be defined under the dynamic repository:
Document Template creation The pre-packaged template would be available for user to upload to the Document templates section. (Templates -> Document Templates). The pre-packaged template needs to be modified to change the repository name and the tag names to the newly created names from the previous step. Once the template is uploaded, add a template tag as well.
Task Design Creating a task under Jiffy Core (Task Design – Task). The built-in nodes can be used to create the task. This includes the PDF node and the rest API node. The Rest API node is used to pass the data to the Jiffy UI Portal (JDI). The task would be created and available in the repository for the user to copy from. To copy a task from repository, Refer - Setting up automation environment -> Copy task from repository for detailed steps on how to copy a task from repository. For the configurations to be set, the below steps need to be followed under Properties tab:
Viewing Output in Jiffy UI Portal The user would be able to view the output in Jiffy UI Portal. All the data that is captured from the PDF based on the data dictionary and the postions, would be made available in Jiffy UI Portal. The operations user or bot designer need to provide confirmation on the correct data to be mapped to the respective fields, by clicking on the Manually approve & Save button. This is a one-time activity that need to be performed for each template.
Below video demonstrates the Intelligent Document processing PDF extraction process
Reading Scanned Images
Jiffy can process both digital (text pdf) and scanned pdf (image pdf) formats. For scanned/image pdfs, the users need to have a licensed OCR integrated to Jiffy. During processing (pdf reader node), Jiffy automatically checks if the input file is scanned/digital. In case it is a scanned copy, it automatically invokes the OCR to convert the scanned document to text pdf and then continues with data extraction. Jiffy uses Abbyy Fine Reader OCR to perform this action.
Integrating Google Vision API as OCR
Google Vision API can be integrated as an OCR using Jiffy’s REST API node or as a custom expression. The user needs a valid license key as it is a paid service. After understanding the input and output formats of the API, user can configure Jiffy to make use of Google Vision API instead of Abbyy Fine Reader.