This how-to-guide shows you how to easily tokenize text with Clarifai's text token classifier. Tokenization of text passages helps split text passages into small, meaningful chunks. Tokens can be individual words, or even longer passages of text.
Getting Started
First things first. You will need to set up a Clarifai account and create an application. This how-to article shows you how to tokenize text with Clarifai API and through Portal. If you would like to tokenize text via API, you will also need to generate an API key.
Detecting people with Clarifai Portal
You can do almost anything that Clarifai can do with Clarifai Portal, and we work hard to make Portal the world's easiest interface for using AI. Detecting logos with Portal is as simple as uploading your data, and setting up the right workflow.
Create your application and choose your base workflow
Simply log in to Clarifai Portal and create a new application. Select "Text" as your base workflow.
Navigate to Model Mode and create a new workflow
Next, we will want to create a new workflow that uses the "ner_english" model. Just navigate to Model Mode on the right hand sidebar and click "Create New Workflow" in the upper righthand corner of the screen.
Add the "text-token-classifier" to your workflow
Now we will add just one model to the work flow: the ner_english "text-token-classifier". Be sure to select "clarifai" as the user in the lefthand dropdown menu. You can then filter your results by model type. Select "text-token-classifier". Click "ADD" to add the model to your workflow, and then click "CREATE WORKFLOW"
Select your new text token classifier workflow as the app workflow
Now navigate to view your image in Explorer. In the righthand sidebar you can click the "APP WORKFLOW" tab, and click the gear icon. Finally select your new workflow, and view your predictions.
Classify text tokens via API
Here is an example of how to classify text that is hosted on a URL. This snippet is in Python, but we offer support for many other client languages. Please refer to our API documentation for additional information.
First Create Your New Text Token Classifer Workflow
from clarifai_grpc.grpc.api import service_pb2, resources_pb2 from clarifai_grpc.grpc.api.status import status_code_pb2 # This is how you authenticate. metadata = (('authorization', 'Key {{YOUR_CLARIFAI_API_KEY}}'),) post_workflows_response = stub.PostWorkflows( service_pb2.PostWorkflowsRequest( workflows=[ resources_pb2.Workflow( id="my-text-token-workflow", nodes=[ resources_pb2.WorkflowNode( id="logo", model=resources_pb2.Model( id="3a4dd3157b18d37f3402cdaca8091ddd", model_version=resources_pb2.ModelVersion( id="af8720ac1d244b6db632612f7548c3d3" ) ) ), ] ) ] ), metadata=metadata ) if post_workflows_response.status.code != status_code_pb2.SUCCESS: raise Exception("Post workflows failed, status: " + post_workflows_response.status.description)
Now Use Your Workflow to Make Predictions on Text
from clarifai_grpc.grpc.api import service_pb2, resources_pb2 from clarifai_grpc.grpc.api.status import status_code_pb2 # This is how you authenticate. metadata = (('authorization', 'Key {{YOUR_CLARIFAI_API_KEY}}'),) post_workflow_results_response = stub.PostWorkflowResults( service_pb2.PostWorkflowResultsRequest( workflow_id="my-text-token-workflow", inputs=[ resources_pb2.Input( data=resources_pb2.Data( image=resources_pb2.Image( url="https://samples.clarifai.com/sample.txt" ) ) ) ] ), metadata=metadata ) if post_workflow_results_response.status.code != status_code_pb2.SUCCESS: raise Exception("Post workflow results failed, status: " + post_workflow_results_response.status.description) # We'll get one WorkflowResult for each input we used above. Because of one input, we have here # one WorkflowResult. results = post_workflow_results_response.results[0] # Each model we have in the workflow will produce one output. for output in results.outputs: model = output.model print("Predicted concepts for the model `%s`" % model.name) for concept in output.data.concepts: print("\t%s %.2f" % (concept.name, concept.value))