Howdy there! You are probably tired of uploading your entire dataset to Clarifai one image or file at a time. I can see that being stressful. But I am here to show you that there is an easier way of going about doing this! The programmatic way! This article will just be showing the code you need to do batching and assumes that you have done authorization and have an application ready!

A few notes about batching:

  • The optimal batch size is 32, but the actual limit is 128
  • Calls are asynchronous, meaning that they'll all get run at once.
  • This is much faster and efficient than loading one file at a time. The Server Gods will be smiling down upon you for it
  • If your files are fairly large (perhaps in the 1+ MB range) you may need to lower the batch size amount to avoid broken connection errors

In this portion we'll show just how to do it with adding inputs and tags and/or custom metadata, or just for predictions if that's what floats your boat.

IMPORTANT: For uploads via a JSON file, we're assuming that your data is structured like this:

[
   {
      url: "url",
      metadata: {
         "item 1": "",
         "item 2": "",
         "item 3": ""
      }
   },
   {
      url: "url",
      metadata: {
         "item 1": "",
         "item 2": "",
         "item 3": ""
      }
   }
]

It certainly doesn't have to be, but if it's not then you'll just need to adjust the code below to reflect that.

Also, if you're just doing predictions on a public model, you won't need to add the images to a collection first. You can simply do a model.predict call on the imagelist variable.


Local files directly from a folder (on your computer)

import glob
from clarifai.rest import ClarifaiApp
from clarifai.rest import Image as ClImage

# ADD CREDENTIALS HERE
api_key = 'YOUR_API_KEY'

# For printing unicode characters to the console
def encode(text):
  if type(text) is list or type(text) is tuple:
    return [x.encode('utf-8') for x in text]
  elif type(text) is not int:
    return text.encode('utf-8')
  else:
    return text

# Counter variables
index = 0
counter = 0
batch_size = 32

# Credentials
app = ClarifaiApp(api_key=api_key)

# Image Directory
path = '/where/your/images/are/*'
files = glob.glob(path)

# Total file amount
total_files = len(files)

while (counter < total_files):
  print "Processing batch " + str(index+1)

  # Batch Image List
  imageList=[]

  for x in range(counter,counter+batch_size - 1):
    try:
      # ONLY USE ONE OF THESE 3 METHODS
      # Remove the other two

      # Method 1: No custom concepts
      imageList.append(ClImage(filename=files[x]))

      # Method 2: With custom concepts
      imageList.append(ClImage(filename=files[x], concepts=['cat', 'chunky']))

      # Method 3: With custom concepts AND metadata
      custom_metadata = { "Breed": "Domestic Shorthair", "Name": "Juliet", "Size": "Chunky" }
      imageList.append(ClImage(filename=files[x], concepts=['cat', 'chunky'], metadata=custom_metadata))

    except IndexError:
      break

  # Upload Images to your collection
  # Needed for Visual Search and Custom Training
  app.inputs.bulk_create_images(imageList)

  # And/or get predictions from these images, if desired
  model = app.models.get('food-items-v1.0')
  model.predict(imageList)

  counter=counter+batch_size
  index=index+1


URLs or File Paths from an external file

(Note: Change "url" to "filename" here if using that instead)

JSON

import json
from clarifai.rest import ClarifaiApp
from clarifai.rest import Image as ClImage

# ADD CREDENTIALS HERE
api_key = 'YOUR_API_KEY'

# For printing unicode characters to the console
def encode(text):
  if type(text) is list or type(text) is tuple:
    return [x.encode('utf-8') for x in text]
  elif type(text) is not int:
    return text.encode('utf-8')
  else:
    return text

# Counter variables
index = 0
counter = 0
batch_size = 32

# Credentials
app = ClarifaiApp(api_key=api_key)

with open('/path/of/your/file.json') as data_file:
  data = json.load(data_file)
  print "Number of images to process: " + str(len(data))

while (counter < len(data)):
  print "Processing batch " + str(index+1)

  # Batch Image List
  imageList=[]

  for x in range(counter,counter+batch_size - 1):
    custom_metadata = {}

    try:
      # ONLY USE ONE OF THESE 3 METHODS
      # Remove the other two

      # Method 1: No custom concepts
      imageList.append(ClImage(url=data[x]["url"]))

      # Method 2: With custom concepts
      imageList.append(ClImage(url=data[x]["url"], concepts=['cat', 'chunky']))

      # Method 3: With custom concepts + metadata
      for items in data[x]["metadata"]:
        custom_metadata[items] = encode(data[x]["metadata"][items])
      imageList.append(ClImage(url=data[x]["url"], concepts=['cat', 'chunky'], metadata=custom_metadata))

    except IndexError:
      break

  # Upload Images to your collection
  # Needed for Visual Search and Custom Training
  app.inputs.bulk_create_images(imageList)

  # And/or get predictions from these images, if desired
  model = app.models.get('food-items-v1.0')
  model.predict(imageList)

  counter=counter+batch_size
  index=index+1

CSV

Note: Assumes a comma delimiter and header names of "URL" and "Category".
Both can be edited

import csv
import types
from clarifai.rest import ClarifaiApp
from clarifai.rest import Image as ClImage

# ADD CREDENTIALS HERE
api_key = 'YOUR_API_KEY'

# For printing unicode characters to the console
def encode(text):
  if type(text) is list or type(text) is tuple:
    return [x.encode('utf-8') for x in text]
  elif type(text) is not int:
    return text.encode('utf-8')
  else:
    return text

# Counter variables
index = 0
counter = 0
batch_size = 32

# Credentials
app = ClarifaiApp(api_key=api_key)

with open('your-file.csv', 'rb') as csv_file:
  data = list(csv.DictReader(csv_file, delimiter=','))
  row_count = len(data)
 
  print "Number of images to process: " + str(len(data))

while (counter < row_count):
  print "Processing batch " + str(index+1)

  # Batch Image List
  imageList=[]

  for x in range(counter,counter+batch_size - 1):
    custom_metadata = {}

    try:
      # ONLY USE ONE OF THESE 3 METHODS
      # Remove the other two

      # Method 1: No custom concepts
      imageList.append(ClImage(url=data[x]["URL"]))

      # Method 2: With custom concepts
      imageList.append(ClImage(url=data[x]["URL"], concepts=['cat', 'chunky']))

      # Method 3: With custom concepts + metadata
      custom_metadata["Category"] = encode(data[x]["Category"])
      imageList.append(ClImage(url=data[x]["URL"], concepts=['cat', 'chunky'], metadata=custom_metadata))

    except IndexError:
      break

  # Upload Images to your collection
  # Needed for Visual Search and Custom Training
  app.inputs.bulk_create_images(imageList)

  # And/or get predictions from these images, if desired
  model = app.models.get('food-items-v1.0')
  model.predict(imageList)

  counter=counter+batch_size
  index=index+1
Did this answer your question?