Welcome aboard! We've listed some of the more common questions and answers that we get on a daily basis and hopefully they'll get you on your path to image and video recognition greatness!

What does Clarifai actually do?

Great question! We use machine learning and artificial intelligence to analyze and return the content in images and videos. We also offer tools like Visual Search and Custom training so that you can train your own recognition models (on anything) and also search through them. Check out our tagging demo here and our public model gallery here.

What file types do you support?


Videos: MP4, MKV, AVI, MOV, OGG

(GIFs get treated as videos, but if they aren’t animated they will only generate one API call)

Do you support real time recognition? 

Our API typically responds within 300-500ms for a single image when calling from the US as we are currently hosted on AWS on the East Coast. Response times will vary based on the sizes of the images and videos sent to us though.

Total Latency = Image/video download + recognition + request/response overhead

Note that we also downscale all images to 512 pixels in width so the closer you can get to that, the better.

Do you support training of models with custom vocabularies? 

We sure do! Our Custom Training Module is robust and ready to rock your AI world.


Can you recognize specific products?

We currently don’t offer a public model which can recognize specific products but you can certainly try to create a custom model to do so!

Do you recognize logos?

We are currently performing maintenance on our Logo Model and should have it up and running again soon!

Do you support finding the location or bounding boxes of objects in an image? 

Yes, and no. At the moment we can do this in our Celebrity, Demographic and Faces models, but we don't offer object detection functionality on a widespread basis yet. That's definitely on our roadmap for the future though!

What about facial recognition? Can you find the name of a person for me?

Our Faces Model (see above) can tell you exactly where faces are in a picture and their bounding box coordinates, but if you want to find information on specific people that would be outside of our capabilities unfortunately.

Do you support counting objects?

Currently we don’t offer an ability to count all objects, though we can sometimes return the number of people in images in our General Model.

Is it possible to use your platform to read text and do Optical Character Recognition (OCR)? Or can we train it to do that?

We can’t read text yet unfortunately and this is currently being researched on our end. However, our General Model can tell you whether text exists in an image or not via the tag “Text”.

You can also potentially train our system on certain types of handwriting via the CT module if that's what your use case calls for.

Can I try this out for free?

Indeed! When you sign up you'll be automatically placed on our Community Tier, and if you verify your email address you'll get the highest limits in that tier (5,000 operations, 10,000 inputs, 10 custom concepts). If you don't verify your email you'll only get 100 operations and that tends to run out pretty fast.

Can we process files locally without sending the images to your API? 

Most of our platform operates via our Cloud but our Mobile SDK is entirely offline! If you'd like to give it a test run check it out here.

Can I upload YouTube videos?

Unfortunately we need to be able to download the files that are sent to our platform so we are unable to process YouTube videos.

What is the maximum image size that I can upload?

Typically we recommend compressing images as much as possible since our platform can identify items by the pixel level, but 3.6 MB is around the maximum that we can intake from local files. Pixel-wise, 512x512 is optimal but it doesn't have to match that perfectly.

Do you have special pricing for Students?

We don't have a formal policy on this yet but we do love working with students! If you have a project that you're working on feel free to tell us about it here. Credits are awarded on a case-by-case basis.

Which languages do you support?

Our General Model has the ability to return tags in Arabic, Bengali, Danish, Dutch, German, English, Finnish, French, Hindi, Italian, Japanese, Norwegian, Punjabi, Polish, Portuguese, Russian, Spanish, Simplified Chinese, Traditional Chinese and even Swedish!

Do you store images on your server? What happens to them?

Any files that are sent to us are stored on our secure servers for the sole purpose of improving our models, and we never release them to third parties. For more information check out our full privacy policy here.

We are also complying with all GDPR policies! 😃 

Do I have to verify my email address when I sign up?

Technically no. But if you don't then you'll only get 100 operations to use on our Community Tier instead of 5,000, which is WAY less! So click that link in your inbox for some more free Clarifai goodness.

Did this answer your question?