Optical Character Recognition (OCR) Demo Tutorial using Nimbella Platform
This tutorial explains the Optical Character Recognition (OCR) demo available on GitHub and shows you how to deploy it to the Nimbella Cloud.
The OCR application displays a web page where visitors select a language (English, Spanish, or Chinese) and then drag a JPG, GIF, or PNG image onto the page, where it is automatically converted to text. If the conversion is successful, users can read the text and click a Speak button to hear the text. You can try out the application here.
The OCR demo has the following code and configuration components, described in more detail later:
- Application logic: The packages directory contains several Nimbella actions, which are logical collections of functions.
- Slack application: A Nimbella Commander project folder called commander, with an ocr.js file with an app that can be installed into Slack.
Project structure and logic
Nimbella relies on directory and file structure to intelligently deploy projects, so the GitHub project directory structure is organized that way and is described below. For more information about creating and deploying Nimbella Cloud projects, see the Nimbella Command Line Tool (nim) Guide.
In the following diagrams, files that contain code or web content are shown with file icons and larger font labels, while project configuration and build files are displayed in smaller fonts without icons.
In any Nimbella project, the deployer looks for one or both of the following top-level directories:
- The packages directory, which contains the back end logic of the project.
- The web directory, which contains static web content published to be published to the front end of the application.
The packages and web directory contents are described in the following sections.
There are also several configuration files. The project.yml file in the OCR demo configures the object store bucket provided with your Nimbella Cloud namespace for web content and database instances.
Actions in the packages directory
In this case, the packages directory contains two packages with six total actions, shown in this diagram and described below:
ocrpackage contains five actions:
Provides a workflow that sequences other actions, invoking the
ocr/imageToTextaction at the proper point and invoking the
utils/slackaction to send a notification and the OCR results to Slack (assuming the Commander app is installed into Slack).
- credential This action specifies secure get and put operations between the web page and the storage bucket.
- imageToText Provides logic for the Tesseract conversion of the image to text.
- progress Uses the Redis key-value store provided with your Nimbella namespace to track progress, status, and the OCR text to be displayed.
- textToSpeech Routes the text through Google Translate to be synthesized to speech.
- acceptImage Provides a workflow that sequences other actions, invoking the
utilspackage, which contains one action:
- slack Logic for Slack notifications for the Commander app.
Packages are used as qualifiers in action names, so the full action names are therefore
utils/slack , and so on.
Some of the action directories have build files, which trigger an automatic build within the directory in which it's placed any time the code file is modified. In the case of the acceptImage directory, there's a build.sh file, which contains shell commands to run an
npm install and
npm run build of that directory. The package.json file specifies the common Node.js dependencies of the code.
The Nimbella deployer looks for a directory called web for static web content. The OCR demo contains the web directory structure shown in the following diagram.
Top-level web directory structure
The build.sh file in the web directory runs npm install and npm run build to generate content from the src directory's React components into the index.html file. This happens automatically every time a file is modified.
src directory structure
The starting point of the React logic is index.js in the src directory. It imports various React components and CSS, and it imports and renders App.js from the components subdirectory.
App.js contains the sequencing of the other components. It imports Header, FileUpload, and Result. It also adds handlers for various components (language, browsing, camera, file upload) and creates some of the HTML markup. The
Result code imports the
TextDisplay components that control how the image and text are displayed after OCR occurs. If you've tried the demo, it's easy to see what these components refer to.
Deploy this project to the Nimbella Cloud
If you have the Nimbella command line tool called
nim installed, you can deploy this project directly from GitHub, either online or from the local repository cloned to your disk.
- Run the following command in your terminal:
nim project deploy /path/to/ocr
The output of this command will include a link to where the application is running in the cloud.