Why?

Custom Training is about teaching computers to see the world in a way that is specific to your own content and context. 

"Specific to your own content" can mean a variety of things:

  • Specificity: ūüćé ¬†Clarifai APIs return 'fruit', 'apple' - Acme Corp needs a model to predict 'Pink Lady' or 'Red Delicious'. (Fine grained classification of objects)
  • Taxonomy: ūüöÜ ¬†Clarifai APIs returns 'train' - Acme Corp has a long running taxonomy containing 'locomotive'. (Ensure there is no unnecessary mapping or confusion)
  • Subjective: ūüćā ¬†Clarifai APIs returns 'foliage' - Acme Corp needs a model to classify images that fit their 2017 Fall brand style guide. (Content recommendations, preferences and filtering, identification applied to style, brand guidelines, and user behavior)

With your own training data, taxonomy and API endpoint, you will have a precise understanding and organization of your visual content. 

This visual content may be user generated photos, existing content within an internal DAM, or untagged partner imagery or scraped data from the web. With Clarifai Custom Training APIs and web products, that visual content becomes actionable in a variety of ways that serve your business, app or workflow best. 

You can think of Custom Training as a series of inputs where you ultimately teach a neural network what concept_1 is and is not.

Background

Since Clarifai's founding in early 2013, we have been focused on enabling developers to understand any image or video in the world. To do this, we develop and expose advanced prediction APIs that abstract all the complexity away from neural network training, data aggregation, hosting, evaluation and re-training.

These prediction focused APIs (General, Food, Travel, NSFW, Apparel, etc.) are currently available and live here. These are pre-trained with our data and research teams over the course of months with millions of images. We define the training data, taxonomy (the model's 'concepts') and other performance characteristics.

Now, with Custom Training, some of those important training and building decisions belong to you while being able to leverage our infrastructure, expertise and complementary products. 

We are particularly excited to democratize access to this package of technologies with the same renewed focus on speed, ease of use and straight forward commercial terms. Over the last few years we have seen an increasingly complex, and niche set of recognition requests. While we would love to service all of our inbound requests it would too daunting of a task to execute all of it in-house. Thanks to CT, each app developer, product leader has powerful recognition solutions in their own hands.

All Custom Training users end up with the a private API endpoint, so it's really up to the imagination as to what to do with it. Several real world examples are included below to get you thinking.

Whether it's best-in-class pre-trained off-the-shelf APIs or your own Custom Training model, we are here to extend the product, tech and data resources needed to support you.

Methods

Train, build and predict with new tools.

Web Interface

Walkthrough Guide

The V2 API

Documentation

4 Official Clients (JavaScript, Python, Java, iOS)

Quick Start Code Examples

What you should know

If your content is visually distinct, 25-50 positive examples per concept will start to provide robust and accurate predictions. A 'concept' is synonymous with 'tag', 'category' or 'keyword'. Concepts are your business' world view as to what an object, visual pattern, or style may represent. 

For our reference, our General model contains 11,000 concepts. Custom models are typically much smaller-resembling taxonomies like moderation guides, apparel sorting and classifieds categorization. 

See Glossary

The most important step in Custom Training is having a focus on building robust and accurate concepts, as the performance of your custom model is only as good as your underlying concepts are. Training isn't a magical step; it should be seen as a necessary technical process that combines raw ingredients (concepts) in a powerful way. 

What makes for a well built concept?

  • Accurate labels. Mis-labeled images introduces noise into your model and can lead to weak or confusing predictions.
  • Balanced training data. Skewed training sets where several concepts have 5-20x as many positive training images as others may affect model performance.¬†
  • Matching training and prediction context. It's crucial that your training images for your concepts resemble the conditions and context of imagery you'll be making predictions on.

As an example, training a flower identification model solely with stock photography and then attempting to predict on user generated smartphone photos will not be ideal. 

Custom Training works on a variety of content types.

Successful implementations span across a wide range of implementations: document categorization, plant and flower identification, apparel classification, user generated content filtering, industrial abnormality detection, ad listing moderation, etc. 

A training image can have multiple concepts. So a picture of a yacht in the harbor could be trained as

'yacht' for simple categorization

or

'yacht' 'boat' 'ocean 'water' 'shoreline' 'sunset' 'fishing' 'recreation' 

for more general search purposes.

An API prediction response from your custom model will return all concepts and their respective confidence scores, ranging from 0 to 100%.

Custom models currently support several hundred concepts

It is important to help distinguish two concepts from each other by labeling a training image as a negative example of what it is not.

Re-training time can vary given your total number of labeled images but it typically takes just a few seconds with tens of thousands of labeled images in your application. 

There is no monthly cost associated with hosting 10,000 images in an application, predicting up to 5,000 times with your API and building one Model that contains up to 10 concepts. Additional pricing details can be found here.

Many successful implementations involve using both the General domain model (11,000 concepts) and a client's own Custom Model. The General model provides a broad and first layer of intelligence as to what is in the respective media. Given a General model response, certain relevant concepts returned can then trigger a Custom model prediction in your internal workflow.

General: "dog", "canine", "pet"

Custom dog breed model: "German Shepherd"

The training images and the subsequent private model API you build are private to you, and no other Clarifai users and customers can access your training images, concept names, model spec, etc.

Real world solutions

Below you can find a sampling of real world problems and features clients are currently using our Custom Training for.

Use Case: Categorization

An analytics platform dedicated to serving brands and social media influencers wanted to identify 'flay lay' photos within the Instagram accounts of consumer brands. 

Use Case: Identification

A mobile consumer app focused on being able to recognize common plants and flowers.

Use Case: Categorization

A large real estate listings platform focused on blocking imagery that is prohibited within their terms of service.

Use Case: Categorization

An insurance company removed the need for human moderation of user submitted content within their fitness rewards app by doing real-time verification. 

Use Case: Recommendations

An online apparel marketplace used Custom Training to predict how closely newly uploaded products matched their brand guidelines for the featured homepage section.  

Use Case: Categorization

A national home improvement retailer needed to accurately categorize incoming product imagery from ad agency partners that contained no metadata, description or labels.

Use Case: Identification

An aerial data collection firm used Custom Training to identify key physical landmarks and thus was able to significantly remove much of its reliance on human annotated drone videos.

Use Case: Categorization

An international mobile classifieds built a 40 concept model for real-time understanding and categorization of user uploaded images. 

Use Case: Categorization

Real-time filtering and categorization of user submitted content are providing instant reports for this digital market research firm and their international CPG clients. 

Use Case: Categorization

A home insurance firm ensured their mobile app claims submission could immediately identify certain damage conditions. 

Train your first model

Using the Web UI at http://clarifai.com/explorer

Using the Python client 

1. Create an Application by signing into clarifai.com/developer

An Application is where you will index your 'Inputs' (images) for training and testing purposes

2. Add labeled Inputs with their respective Concepts.

3. Create a Model

A Model is defined by having a number of Concepts. You can have multiple Models per application, and you define the number of and names of each Concept. Concepts are synonymous with 'tags', 'keywords', 'categories', and can be as straightforward as objects or as nuanced and subjective as style. It can be user specific such as 'ryan_d_likes' or vague like 'streetwear'

Each model you create has a Model ID and any number of Model Versions.   

What's next?

Advanced features such as counting, custom brand logo training and custom face recognition are in the research phases. Let us know if that interests you. You can count on even more publicly available domain models to be exposed in the coming months.

Workflow type solutions where users can make multiple predict operations in parallel is also in the works. 


Ideas, brainstorm and questions? Email support@clarifai.com

Additional resources

Community Page
Learn from fellow users, get access to alpha products, discover other use cases

sales@clarifai.com
Commercial questions and discussions

support@clarifai.com
Technical questions, suggestions and clarification 

Pricing

Twitter and Instagram

Media

Custom Training and Visual Search with a user's Instagram images

CEO Matt Zeiler explaining Deep Neural Networks:

Clarifai 2016 Year in Review

Images as the Universal Inputs [blog]

Teaching Machines to See Beyond the Hashtag [blog]

Did this answer your question?