“`html
Amazon Textract is a machine learning (ML) service that enables automatic extraction of text, handwriting, and data from scanned documents, surpassing traditional optical character recognition (OCR). It can identify, understand, and extract data from tables and forms with remarkable accuracy. Presently, several companies rely on manual extraction methods or basic OCR software, which is tedious and time-consuming, and requires manual configuration that needs updating when the form changes. Amazon Textract helps solve these challenges by utilizing ML to automatically process different document types and accurately extract information with minimal manual intervention. This enables you to automate document processing and use the extracted data for different purposes, such as automating loans processing or gathering information from invoices and receipts.
As travel resumes post-pandemic, verifying a traveler’s vaccination status may be required in many cases. Hotels and travel agencies often need to review vaccination cards to gather important details like whether the traveler is fully vaccinated, vaccine dates, and the traveler’s name. Some agencies do this through manual verification of cards, which can be time-consuming for staff and leaves room for human error. Others have built custom solutions, but these can be costly and difficult to scale, and take significant time to implement. Moving forward, there may be opportunities to streamline the vaccination status verification process in a way that is efficient for businesses while respecting travelers’ privacy and convenience.
Amazon Textract Queries helps address these challenges. Amazon Textract Queries allows you to specify and extract only the piece of information that you need from the document. It gives you precise and accurate information from the document.
In this post, we walk you through a step-by-step implementation guide to build a vaccination status verification solution using Amazon Textract Queries. The solution showcases how to process vaccination cards using an Amazon Textract query, verify the vaccination status, and store the information for future use.
Solution overview
The following diagram illustrates the solution architecture.
The workflow includes the following steps:
The user takes a photo of a vaccination card.
The image is uploaded to an Amazon Simple Storage Service (Amazon S3) bucket.
When the image gets saved in the S3 bucket, it invokes an AWS Step Functions workflow:
The Queries-Decider AWS Lambda function examines the document passed in and adds information about the mime type, the number of pages, and the number of queries to the Step Functions workflow (for our example, we have four queries).
NumberQueriesAndPagesChoice is a Choice state that adds conditional logic to a workflow. If there are between 15–31 queries and the number of pages is between 2–3,001, then Amazon Textract asynchronous processing is the only option, because synchronous APIs only support up to 15 queries and one-page documents. For all other cases, we route to the random selection of synchronous or asynchronous processing.
The TextractSync Lambda function sends a request to Amazon Textract to analyze the document based on the following Amazon Textract queries:
What is Vaccination Status?
What is Name?
What is Date of Birth?
What is Document Number?
Amazon Textract analyzes the image and sends the answers of these queries back to the Lambda function.
The Lambda function verifies the customer’s vaccination status and stores the final result in CSV format in the same S3 bucket (demoqueries-textractxxx) in the csv-output folder.
Prerequisites
To complete this solution, you should have an AWS account and the appropriate permissions to create the resources required as part of the solution.
Download the deployment code and sample vaccination card from GitHub.
Use the Queries feature on the Amazon Textract console
Before you build the vaccination verification solution, let’s explore how you can use Amazon Textract Queries to extract vaccination status via the Amazon Textract console. You can use the vaccination card sample you downloaded from the GitHub repo.
On the Amazon Textract console, choose Analyze Document in the navigation pane.
Under Upload document, choose Choose document to upload the vaccination card from your local drive.
After you upload the document, select Queries in the Configure Document section.
You can then add queries in the form of natural language questions. Let’s add the following:
What is Vaccination Status?
What is Name?
What is Date of Birth?
What is Document Number?
After you add all your queries, choose Apply configuration.
Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.
Rishabh Yadav is a Partner Solutions architect at AWS with an extensive background in DevOps and Security offerings at AWS. He works with ASEAN partners to provide guidance on enterprise cloud adoption and architecture reviews along with building AWS practices through the implementation of the Well-Architected Framework. Outside of work, he likes to spend his time in the sports field and FPS gaming.
“`