One of the transformative powers of multimodal AI is the ability to translate text in documents into machine-encoded text. Clarifai makes it easy to do this with the Object Character Recognition (OCR) - Document workflow.
With the OCR - Document workflow, the text in this image is detected and then transformed into standard text format that can be read by any computer.
Getting Started
First things first. You will need to set up a Clarifai account and create an application. This how-to article shows you how to detect text with the Clarifai API and through Portal. If you would like to detect text via API, you will also need to generate an API key.
Recognize image-text in Portal
You can do almost anything that Clarifai can do with Clarifai Portal, and we work hard to make Portal the world's easiest interface for using AI. Detecting text with Portal is as simple as uploading your data, and choosing the right base workflow.
Create your application and choose your base workflow
Simply log in to Clarifai Portal and create a new application. To use the OCR-Document model, "General Detection" as your base workflow.
Navigate to Model Mode and create a custom "Tesseract Operator" Model
First we will need to set up a custom Tesseract Operator to use as the main node in our workflow. Just navigate to Model Mode on the left hand sidebar and click "Create New Model" in the upper righthand corner of the screen. Then select "Tesseract Operator".
Now name your model "OCR-Document" click "CREATE NEW MODEL".
Navigate to Model Mode and create a new workflow
Next, we will want to create a new workflow that uses our custom model. Just navigate to Model Mode on the left hand sidebar and click "Create New Workflow" in the upper righthand corner of the screen.
Add the custom "OCR-Document" to your workflow
Now we will add just one model to the work flow: our custom OCR-Document model. Select the model, click "ADD" to add the model to your workflow, and then click "CREATE WORKFLOW"
View your image in Explorer view
Upload an image to your application and view predictions in the righthand sidebar under the tab that says "App Workflow".
Select your new "OCR-Document" workflow as the app workflow
Now navigate to view your image in Explorer. In the righthand sidebar you can click the "APP WORKFLOW" tab, and click the gear icon. Finally select your new workflow, and view your predictions.
Detect and analyze text in your documents via API
Here is an example of how to detect and analyze text in an image that is hosted on a URL. This snippet is in Python, but we offer support for many other client languages. Please refer to our API documentation for additional information.
First Create Your New OCR-Document Workflow
from clarifai_grpc.grpc.api import service_pb2, resources_pb2 from clarifai_grpc.grpc.api.status import status_code_pb2 # This is how you authenticate. metadata = (('authorization', 'Key {{YOUR_CLARIFAI_API_KEY}}'),) post_workflows_response = stub.PostWorkflows( service_pb2.PostWorkflowsRequest( workflows=[ resources_pb2.Workflow( id="my-OCR-Document-workflow", nodes=[ resources_pb2.WorkflowNode( id="logo", model=resources_pb2.Model( id="OCR-Document", model_version=resources_pb2.ModelVersion( id="{{YOUR CUSTOM MODEL ID}}" ) ) ), ] ) ] ), metadata=metadata ) if post_workflows_response.status.code != status_code_pb2.SUCCESS: raise Exception("Post workflows failed, status: " + post_workflows_response.status.description)
Now Use Your Workflow to Make Predictions on Images
from clarifai_grpc.grpc.api import service_pb2, resources_pb2 from clarifai_grpc.grpc.api.status import status_code_pb2 # This is how you authenticate. metadata = (('authorization', 'Key {{YOUR_CLARIFAI_API_KEY}}'),) post_workflow_results_response = stub.PostWorkflowResults( service_pb2.PostWorkflowResultsRequest( workflow_id="my-OCR-Document-workflow", inputs=[ resources_pb2.Input( data=resources_pb2.Data( image=resources_pb2.Image( url="https://samples.clarifai.com/test.jpg" ) ) ) ] ), metadata=metadata ) if post_workflow_results_response.status.code != status_code_pb2.SUCCESS: raise Exception("Post workflow results failed, status: " + post_workflow_results_response.status.description) # We'll get one WorkflowResult for each input we used above. Because of one input, we have here # one WorkflowResult. results = post_workflow_results_response.results[0] # Each model we have in the workflow will produce one output. for output in results.outputs: model = output.model print("Predicted concepts for the model `%s`" % model.name) for concept in output.data.concepts: print("\t%s %.2f" % (concept.name, concept.value))