You can use image classification to assign one or more labels to an image. This has many applications in manufacturing, logistics, transportation, and retail. In manufacturing, you can use classification to determine if the image contains a defect. In retail, you could classify whether a package is of a given product SKU.

Here is an example of an image that was classified as “scratched car door”:

In this guide, we are going to discuss how to use the Roboflow image classification API to assign categories to images. This API, powered by OpenAI’s open source CLIP model, accepts any arbitrary category. You can process results from the API to assign a label or set of labels that are most likely to represent an image. We will also discuss the steps to train your own classification model. 

To use the Roboflow image classification API, you will need a free Roboflow account. This account will allow you to retrieve an API key you can use to access the API.

Once you have a Roboflow account, you can start working with the API.

Without further ado, let’s get started!

Classify an Image

The Roboflow image classification API uses a zero-shot classification model called CLIP. CLIP works without any prior training, and there is no list of specific labels the model can classify. Rather, you can provide any arbitrary label.

CLIP, and thus the API, works well on general terms such as classifying vehicle types, signs, whether an image contains NSFW material, etc. 

To classify an image, we need two things:

  1. A list of categories to use in classification, and;
  2. An image to classify.

With this information, you can make a request to the API.

Create a new Python file and add the following code:

import requests
import base64

tags = ["scratched car door", "car door"]
API_KEY = "api_key"
image = "image.jpeg"

infer_clip_payload = {
    "subject": {
        "type": "base64",
        "value": base64.b64encode(open(image, "rb").read()).decode("utf-8"),
    },
    "subject_type": "image",
    "prompt": tags,
    "prompt_type": "text",
}

res = requests.post(
    f"http://infer.roboflow.com/clip/compare?api_key=" + API_KEY,
    json=infer_clip_payload,
)

similarity = res.json()['similarity']
idx = similarity.index(max(similarity))
tag = tags[idx]

print(f"Most similar tag: {tag}")

In the code above, we make a HTTP request to the Roboflow image classification API. We send our image as well as a list of tags. The API returns a list of scores that say how similar each label is to the image. Finally, we choose the most similar tag as a classification result.

You can specify any tags you want. CLIP does not have a list of accepted tags. CLIP works well on generic tags (i.e. "car door" and "scratch on car door") as opposed to specific tags (i.e. "scratch on red toyota camry door").

Replace:

  1. The value of the tags variable with the categories you want to send to the API.
  2. api_key with your Roboflow API key. Learn how to retrieve your API key.
  3. image.jpeg with the name of the image on which you want to run your model.

Then, run the script.

Let’s run the script on the following two images of a car with a scratched door with the tags scratched car door and car door:

Our model returns:

Most similar tag: car door
Most similar tag: scratched car door

Our model successfully identified that the first image contains a car door and the second contains a scratched car door.

If you need to run your classification API on your own hardware you can do so with Roboflow Inference. Inference is a high-performance inference server on which you can run fine-tuned models such as YOLOv8 object detection and classification models as well as foundation models such as CLIP, the model that powers the Roboflow image classification API.

Next Steps: Train a Custom Classification Model

While CLIP and the Roboflow image classification API address a number of use cases, for more specific classification problems we recommend training a custom classification model. A custom classification model can be trained using your own data and taxonomy, allowing you to achieve higher accuracy than you would with a generic model.

There are two main types of classification model you can train: a single-class model that assigns one label to an image, and a multi-class model that can assign one or more labels to an image.

No matter what type of classification system you want to create, you need to follow these steps:

  1. Collect data representative of your use case.
  2. Label data with one or more categories.
  3. Train a model.
  4. Deploy the model.

You can train a classification model on Roboflow.

To get started training your own model, go to your Roboflow dashboard, click “Create a Project”, and select “Classification” on the project creation page. You can then upload your images and label them for a classification model. When your dataset is ready, you can train your model and deploy it with a cloud API or on your own hardware.

Conclusion

The Roboflow image classification API, powered by CLIP, allows you to assign arbitrary tags to an image. You can use this API for tasks such as classifying if a photo contains a person, whether an image contains a defect, or whether an image contains NSFW material.

In this guide, we walked through how to use the Roboflow image classification API. We made a request to the API to classify whether a car door was damaged. We then discussed, at a high level, what you need to do to train your own image classification API, which is ideal for more specialized use cases.