Related Article:
Batch Processing in Python
Code Used:
Bulk-Upload-Images-And-Save-Predictions.py
Howdy! If you've been doing predictions in one of our public models or in your own private, custom model(s), you might be wondering how to save the outputs locally after you process them. By default we return all of this in JSON format but only once at run-time, so if you need to store these results for future use and not act on them immediately, you're going to want to save them to something like a CSV file.
Let's take a look at how to do just that.
Step 1: Get your list of URLs together
The first thing that we need here is a list of URLs that you want to predict on so that we can read them line by line and batch predict on them. For the purposes of this scenario, we're going to assume that you have all of these in CSV file with a header of "url".
Like this:
url
https://i.imgur.com/G94F9PH.jpg
https://i.redd.it/5uyrc8opy9uy.jpg
https://cdn.thedailymash.co.uk/wp-content/uploads/20190324205638/random667.jpg
...etc...
Step 2: Create a blank CSV file to write your predictions to
This can have any name, and it can be located anywhere. No need to worry about the specifics, we just need somewhere to send all of this lovely output.
Step 3: Run predictions on the URLs and save the data!
The following Python file can be run to iterate through your URLs, predict them all in batches of 32 (the default), and save them line by line to the CSV file from step 2.
This particular script uses the General Model but you can essentially use any model here, public or custom. The variable names are described in detail below the script.
import json
import csv
import types
import sys
from clarifai.rest import ClarifaiApp
from clarifai.rest import Image as ClImage
from clarifai.rest import UserError
from clarifai.rest import ModelOutputConfig, ModelOutputInfo
# ADD CREDENTIALS HERE
api_key = 'YOUR_API_KEY'
# For printing unicode characters to the console.
def encode(text):
if type(text) is list or type(text) is tuple:
return [x.encode('utf-8') for x in text]
elif type(text) is not int and type(text) is not float:
return text.encode('utf-8')
else:
return text
# Counter variables (Leave as is)
index = 0
counter = 0
# Batch Size. 32 is the most optimal
batch_size = 32
# Editable Options for Concept Models (e.g. General, Food, Moderation, NSFW, etc.) and Custom Models
# Note that the language can only be edited for the General Model
max_concepts = 20
min_prediction_value = 0.75
language = "en"
# Concept Filter - use if you want to filter by ONLY these. Blank by default, and can only be used on concept and custom models
# e.g. concept_filter = ['word1', 'word2']
concept_filter = []
# Input and Output Files - Edit these
input_file = '/where/your/urls/are.csv'
output_file = '/where/the/results/should/go.csv'
# Instantiate the app
app = ClarifaiApp(api_key=api_key)
# MODEL - (Edit this)
# PUBLIC
model = app.public_models.demographics_model
# CUSTOM
# model = app.models.get("model_name")
with open(input_file, 'rb') as file_with_urls:
data = list(csv.DictReader(file_with_urls, delimiter=','))
row_count = len(data)
print "Number of images to process: " + str(len(data))
with open(output_file, mode='w') as file_to_write_to:
data_writer = csv.writer(file_to_write_to, delimiter=',', quoting=csv.QUOTE_MINIMAL)
while (counter < row_count):
print "Processing batch " + str(index+1)
print "Images Predicted So Far = " + str(counter)
# Batch Image List
imageList=[]
if row_count > counter + batch_size:
range_limit = counter + batch_size
else:
range_limit = row_count - counter
for x in range(counter, range_limit):
try:
imageList.append(ClImage(url=data[x]["image_path"]))
except IndexError:
print err
break
except UserError as err:
print err
except AttributeError as err:
print err
continue
# Get predictions for those images
if model.model_id in ["eeed0b6733a644cea07cf4c60f87ebb7", "c0c0ac362b03416da06ab3fa36fb58e3", "a403429f2ddf4b49b307e318f00e528b", "e466caa0619f444ab97497640cefc4dc"]:
batch_predict = model.predict(imageList)
else:
model_output_info = ModelOutputInfo(
output_config=ModelOutputConfig(
language=language,
min_value=min_prediction_value,
max_concepts=max_concepts,
select_concepts=concept_filter))
batch_predict = model.predict(imageList, model_output_info)
for item in batch_predict["outputs"]:
row_output = []
row_output.append(encode(item["input"]["data"]["image"]["url"]))
# Check for Model Type (Concept, Colors or Detection)
# CONCEPT MODELS
if "concepts" in item["data"]:
for prediction in item["data"]["concepts"]:
row_output.append(encode(prediction["name"]))
row_output.append(encode(prediction["value"]))
# COLOR MODEL
elif "colors" in item["data"]:
for prediction in item["data"]["colors"]:
row_output.append(encode(prediction["w3c"]["name"]))
row_output.append(encode(prediction["value"]))
# DETECTION MODELS
elif "regions" in item["data"]:
# Demographics
if model.model_id == "c0c0ac362b03416da06ab3fa36fb58e3":
for region in item["data"]["regions"]:
for ethnicity in region["data"]["face"]["multicultural_appearance"]["concepts"]:
row_output.append(encode(ethnicity["name"]))
row_output.append(encode(ethnicity["value"]))
row_output.append(encode(region["data"]["face"]["gender_appearance"]["concepts"][0]["name"]))
row_output.append(encode(region["data"]["face"]["gender_appearance"]["concepts"][0]["value"]))
for age in region["data"]["face"]["age_appearance"]["concepts"]:
row_output.append("Age:" + encode(age["name"]))
row_output.append(encode(age["value"]))
# Celebrity
if model.model_id == "e466caa0619f444ab97497640cefc4dc":
if "regions" in item["data"]:
for region in item["data"]["regions"]:
for celeb in region["data"]["face"]["identity"]["concepts"]:
row_output.append(encode(celeb["name"]))
row_output.append(encode(celeb["value"]))
# Face Detection
if model.model_id == "a403429f2ddf4b49b307e318f00e528b":
if "regions" in item["data"]:
for region in item["data"]["regions"]:
row_output.append(encode("Top Row:" + region["region_info"]["bounding_box"]["top_row"]))
row_output.append(encode("Left Col:" + region["region_info"]["bounding_box"]["left_col"]))
row_output.append(encode("Bottom Row:" + region["region_info"]["bounding_box"]["bottom_row"]))
row_output.append(encode("Right Col:" + region["region_info"]["bounding_box"]["right_col"]))
data_writer.writerow(row_output)
counter = counter + batch_size
index = index+1
Variables
While it may seem that there is a ton of stuff going on here, it's really not as crazy as it looks! The only variables that you'll need to edit are the following:
Required
api_key - Your API key
input_file - The path of the file from step 1
output_file - The path of the file from step 2
model - The model that you want to use for predictions. This is set to our general model above but any model can be used.
Optional
Note: The following variables can only be used on Concept Models (general, food, nsfw, moderation, wedding, apparel) and Custom Models:
max_concepts - the maximum number of concepts to output per prediction. Default is 20.
min_prediction_value - the minimum value a prediction needs to have for it to output. Default is 0.75 above.
concept_filter - the only concepts that should output, regardless of what their values are. Nothing else will be returned.
And the language variable can only be used with our General Model:
language - sets the language that the concepts should appear in. The full list can be found here.
Everything Else
Calling the script is rather easy since there are no arguments to pass through:
python Bulk-Upload-Images-And-Save-Predictions.py
The output file will contain one row per image, no matter how many items were detected per image. This includes our detection models (demographics, face, celebrity) that may have several bounding box coordinates in them.