Auto-generate metadata triggered by S3 events with Clarifai and AWS Lambda
Introduction
Follow this step-by-step guide to leverage Clarifai’s API to automatically add metadata to .jpg images being added to your AWS S3 bucket. Once setup, any new image being added to the S3 bucket will trigger a lambda function call that:
- Gets the model predictions from Clarifai
- Puts the metadata JSON file back into the same bucket
This enables any user to leverage Clarifai’s powerful AI engine to add value to their data in storage.
Note: Clarifai supports several different formats, we've limited this example to .jpg files for simplicity: https://docs.clarifai.com/api-guide/data/supported-formats
Prerequisites
- AWS account
- Clarifai account. If you don’t have one, sign up here!
Create S3 Bucket
This will be the bucket that the lambda function will be monitoring for uploads.
Create bucket: https://s3.console.aws.amazon.com/s3/
- Make note of the bucket's region. The lambda functions will need to be set to the same region as the bucket.
- Regarding access, any images uploaded to the bucket will need to be publicly readable, so that the Clarifai API can read and process it.
Edit Bucket Policy
Under the bucket's Permissions tab, add in the following bucket policy, which will cause any newly uploaded items to the s3 bucket to be publicly readable.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicRead",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::yourbucketname/*"
}
]
}
Create Roles
Create a new role (select Lambda under the "Common use cases").
As for permissions, you will need to be able to read objects from the monitored S3 bucket and write permissions. The Lambda function will be writing the predictions into the same bucket as a .json file.
Create Lambda Function
https://console.aws.amazon.com/lambda/home
- Select the appropriate role created in the previous section.
- Runtime will be set to Python 3.x, since our lambda function will be coded up in Python.
Afterwards, go into the Lambda's configurations and set an environment variable for CLARIFAI_API_KEY, which is needed to make predictions through the Clarifai API.
[Example code]: https://github.com/michael-gormish/clarifai-community/blob/master/integrations/aws_s3_lambda/clarifai_predict_lambda_url.py
Create lambda_handler function to be called when new data is uploaded:
def lambda_handler(event, context):
"""
function called by AWS lambda, we assume will be called because of
upload, must be called lambda_handler
"""
print("Received event: " + json.dumps(event, indent=2))
api_key = os.environ['CLARIFAI_API_KEY']
bucket_url = "https://yourbucketname.s3.amazonaws.com/"
model_id = 'aaa03c23b3724a16a56b629203edc62c' # general concept model
# Get info about the uploaded object
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(
event['Records'][0]['s3']['object']['key'],
encoding='utf-8'
)
# convert to a URL for the bucket to pass to Clarifai
image_url = bucket_url+key
concepts = cl_predict(model_id, api_key, None, image_url=image_url)
writekey = key[:-4] + ".json"
response = s3.put_object(
Bucket=bucket,
Key=writekey,
Body=json.dumps(concepts).encode('utf-8')
)
Call the Clarifai Predict function to make predictions on your images:
def cl_predict(model_id, api_key, image_bytes=None, image_url=None):
"""call clarifai predict for the given model and return a list of concepts and scores"""
#format the request (note it is easier to use Clarifai's grpc library but that is harder with lambda
api_key = "Key {}".format(api_key)
headers = {"Authorization":api_key, "Content-Type":"application/json"}
url = "https://api.clarifai.com/v2/models/{}/outputs".format(model_id)
oneinput = {"data": { "image": { "url": image_url}}}
data = {"inputs": [oneinput]}
encoded_data = json.dumps(data).encode('utf-8')
http = urllib3.PoolManager()
r = http.request('POST', url, headers=headers, body=encoded_data)
print("predict http status: " + str(r.status))
#print(r.headers)
response = json.loads(r.data.decode('utf-8'))
#unpack the list
clconcepts = response['outputs'][0]['data']['concepts']
concepts = {}
concepts = {concept['name']:concept['value'] for concept in clconcepts}
return concepts
See output files
After uploading a file (`cat1.jpg` in our case), the lambda function will kick off, run the image against Clarifai's general model, and return predictions saved as a new json file (`cat1.json`).
So, uploading the following image...
... will return a .json file with the following predictions:
{"cat": 0.9959365, "cute": 0.9924247, "fur": 0.9870249, "eye": 0.9835881, "whisker": 0.98305607, "animal": 0.97318846, "downy": 0.97252816, "hair": 0.9687242, "kitten": 0.95943654, "young": 0.952587, "portrait": 0.9471475, "pet": 0.939095, "funny": 0.933573, "little": 0.924665, "looking": 0.9176315, "domestic": 0.9146316, "nature": 0.9092645, "staring": 0.89858574, "mammal": 0.8921827, "tabby": 0.8610473}
Create SQS (bonus)
Tutorial o how to connect SQS to Lambda, and save result to S3: https://www.youtube.com/watch?v=8_ydaEp_LrU