What are Collectors and why are they useful
Collectors are available with Professional and Enterprise plans to help you manage data ingestion at scale.
An AI model is only as good as the data that it is trained on. Collectors help you gather data from models in production so that you can learn from real-world data and produce high-performing models.
Collectors enable you to pipe in data automatically into your applications so that you can iterate on existing models or train entirely new ones. You can create app-level collectors to monitor specific models and specify sampling rules for triggering data ingestion. Collectors can only collect data from apps where you are the app owner.
Creating a new collector
Collectors help you to feed your models with real-world training data. This data can be taken from models that you have already deployed to production. Just create a collector within your app and set it up to ingest data from another model when new inputs are "posted" to this model.
Step 1: Create a sampler model and put it in a workflow
Begin by creating a Random Sampler model. The purpose of this model is to collect (sample) data based on the fraction of proportion that we set. This would be a custom model and you can find it under Create Custom Model tab of Model Mode. The model type for this would be Random Sampler
Give this model a name and choose what percentage (fraction) of data you want to collect. The range runs from 0 to 1.0. 0 means that no data will be collected and 1.0 means that any images sent for prediction to your custom model in production will be collected.
- Pick the Random Sampler Model Type.
- Give it the percentage of images to collect.
Step 2: Put the sampler in a workflow
Once the Random Sampler is created, add this model to a workflow. In the Model Mode screen, use the Create Workflow tab. Select yourself as the "User", and you should be able to see the sampler model you created. Add that model to the workflow, give the workflow a descriptive name, and click Create Workflow.
Step 3: Create a Collector
Under Data Mode, at the bottom of the screen, you should see the option to create new collectors. When you are creating a new screen you will see a screen like this:
Collector ID
Give your collector a useful and descriptive name.
Description
Provide additional details about your collector.
Pre-queue workflow
We will use the workflow we created in step 2 as the pre-queue workflow
Post Inputs key
Select the API key that you would like to use to allow new inputs to be posted to your app. This is the post-queue workflow ID of the workflow that will run after the collector has processed the queued input. This API key must have the PostInputs scope since it grants the collector the authority to POST inputs to your app.
This workflow uses the original input to the model as input to the workflow so that you can run additional models as well on that input to decide whether to queue the model or not. If the workflow output has any field that is non-empty then it will be passed on to POST /inputs to the destination app. At least one (pre-queue or post-queue) workflow ID is required.
Caller
Any Caller means that any API call from any user made to your model will get collected. You can also specify a specific user id that you want to collect against.
Source
Select the model that you would like to collect from, and the collector will automatically post the new inputs to your app. Simply enter your model name or model ID number. You can select the model that you would like to collect from in the drop-down menu. When the user predicts an input against this model, the input is going to be collected.
The app ID and user ID are where the model is located. If using a publicly available model, the model user and app ID should be clarifai
and main
, respectively. Otherwise, the IDs should belong to the user who created the model. Use the API key ID for the application where you would like the inputs to be added.