How to Create an Optical Character Recognition (OCR) Application?
copied
In this tutorial, you'll learn to create an Optical Character Recognition (OCR) application. This program takes in images with text, scans the images looking for text using the Tesseract OCR engine, and reads the text out loud from your choice of English, Spanish, or Chinese. I'll break down the functionality of the frontend and backend and by the end of the tutorial, you'll have your own URL that can be shared so that others can run this application.
Before I tell you what to do, let's go over what you won't have to do as a developer. You won't be updating your frontend static assets (HTML, CSS, and Javascript) to a CDN (content delivery network), running your backend on a server, and making sure your database is secure so people can't steal important user data. You won't have to do those things because that functionality is done for you when you're building in the Nimbella Cloud.
As a developer, you just write your frontend/backend logic, run a single command in your terminal and you get your own URL that you can use to view and share your project. Plus Nimbella Cloud's built-in security eliminates the effort of protecting your customer's financial information.
Here's the roadmap for this tutorial:
- OCR architecture
- Project organization
- The OCR UI (frontend)
- Serverless API Implementation (backend)
- Code to Cloud in 30 Seconds
1: OCR Architecture
There are five main actions that can be run in this application:
- Accepting an image
- Credential
- Image to Text
- Progress
- Text to Speech
Each of these actions occurs depending on how the user interacts with the applications. The reason we're able to easily make stateful applications like this is that the Progress action is storing information in a Key-Value cache.
2: Project Organization
Here is how we build an application (e.g., frontend + backend) using Nimbella:
The application has a web component that will host the HTML, CSS, JavaScript, and other static assets for the application. For the backend, we will implement the APIs using Node.js although any one of these other languages could be used: TypeScript, Python, PHP, Java, Go, or even Swift.
3: The OCR UI (frontend)
In this example, the OCR frontend is built with React which we store in the web folder. But, you could use any Framework such as Vue, Angular, jQuery, ASP.NET, Ruby on Rails, etc.
When you first load your application, you have three options to interact with the UI. You can 1) drag and drop an image that's in the JPG, GIF, or PNG format, 2) click the Upload Image button to pull up a folder on your computer, or 3) you can select the language you'd like the application to read out loud in.
Once you decide which images you want to run into the OCR, you can watch the image processing in real-time. Here's an example of three images being loaded at the same time.
Once your images are loaded, you can click on the picture and select the speak option.
You can even flip an image while one of the images is being read out loud.
Following is a chart illustrating the paths of these actions.
4: Commands found in the backend
Now let's look at the Packages folder which will contact the backend logic to run the commands. First, we'll consider the five backend functions and how they're operating.
Accept an Image
/ocr/acceptImage gets called when you select an image for the application process. When you begin this functionality, the ocr/imageToText is running to convert the images to text and the ocr/progress function helps the loading bar showing you how the image is being processed. We're using the built-in OpenWhisk functionality to simplify this process by using the retain and sequence commands.
Check your credential
In ocr/credential, we specify our secure get and put operations between the web page and storage bucket. You can call on the built-in storage functionality by calling nimbella.storage() into your code!
Image to Text
ocr/imageToText is where we will be incorporating the Tesseract Engine to convert the image to text. Tesseract scans the image and returns text, which we store and display. We're also using the built-in Key-Value functionality to store/update our text.
Progress
We're able to see the image loading in real-time because we're using the functionality in ocr/progress. For this, we are using key-value storage that's connected to your Nimbella namespace to track the progress, status, and the text that will be displayed in the UI.
Text to Speech
And finally, ocr/textToSpeech function enables us to hear the text read out loud. After the app has successfully processed the image and stored the associated text, clicking the speak button on your app will run your text through Google Translate to read the text through your speakers.
Here's a layout of all the actions stored in your packages folder:
5: Code to cloud in 30 seconds
The Nimbella CLI is called nim. It helps you organize and deploy your applications to the Nimbella cloud, in a secure domain that is unique to your projects. You will need to download the CLI and login with an access key to get started. Please visit the signup page to configure your CLI if you haven't previously done this. As described earlier, deploying your functionality and UI is incredibly simple with the following line of code.
nim project:deploy <project_name>
Within seconds, Nimbella will tell you how many files and actions they hosted. Navigate to the link provided by Nimbella after running the command to access your finished OCR.
This tutorial detailed the steps involved in creating an OCR application using the Nimbella Platform. Nimbella is evolving and expanding. Check out our website for more information on our upcoming products and news.
Contact us at info@nimbella.com or on our community Slack if you have any further questions. You can also check out the OCR source code on GitHub.
Recent Posts
- How to deploy Node.js functions on Nimbella
- Kick-Start Your Serverless Journey
- AWS re:Invent Serverless Highlights
- Opportunities in the Wake of the AWS Juggernaut
- FaaS Wars: Serverless & Virtual Robot Competition
- #DeveloperIPL Online Hackathon Results & Feedback on Nimbella's Integration for Postman
- How to connect to the 3rd party database such as MySQL at Nimbella (example in Java)
- What can you do with the Nimbella Workbench?
- Deploy your Shopify Storefront to Nimbella
- Not All Serverless Platforms Are Created Equal
- Nimbella + Netlify: Uplevel Your Development Speed
- How we learned to Jamstack, Our Caputron Story.
- Commander for Microsoft Teams - Your Custom Bot that runs on your Command!
- How to Build a Stateful Cloud App on Nimbella vs. AWS
- Starter Kit and Resources to Build a Serverless Cloud Application
- How to Build Serverless Slack Apps and Commands
- How to Set up your Serverless Environment and Get Started in Less than 2 Minutes!
- How to Quickly Deploy Stateful Serverless Apps with Nimbella?
- What is Serverless Computing? 3 reasons to start now
- How to Build a Serverless Slack App in Minutes.
- How to Manage your Netlify Website from Slack?
- How to Build a Serverless Slack Command in minutes
- How to Build a Stateful Serverless Cloud Web Application?
- How to Create an Optical Character Recognition (OCR) Application?
- Development at the Speed of Innovation – Nimbella, the Serverless Cloud
- Software Security Features on Enterprise Serverless Slack Apps Enabled by Nimbella Commander
- Coronathon India’s first demo day has 18 projects to help fight COVID-19
- See the time in different cities on Slack with Nimbella Commander
- Greet your friends in their native language in Slack with Nimbella Commander
- Install Commander on your Mattermost Instance
- How to Fetch your Digital Ocean Billing Info on Slack?
- How to Stay Updated with Coronavirus Statistics on Slack?
- Create BlueJeans meetings on Mattermost using Commander
- How to Fetch your AWS Billing Info on Slack?
- Get your Datadog billing info in Slack with Nimbella Commander
- Serverless Slack Apps and Slash Commands
- How to use Slack Effectively with Nimbella Commander?
- How to Create a multi-user Chatroom Hosted on Serverless Cloud?
- Using Docker actions, running Golang, and other fun things with OpenWhisk
- The duality between serverless functions and APIs
- Serverless HTTP handlers with OpenWhisk
- Serverless functions in your favorite language with OpenWhisk
- Run Swiftly: precompiled Swift actions
- Performance debugging for serverless functions using the Cloud Shell
- Locally debugging OpenWhisk actions
- Composing functions into applications
- A Serverless Composition of Functions
- The Serverless Contract
- The dawn of the Cloud Computer
- Security and Serverless Functions