Exploring Adversarial Attacks on Deep Learning Models — 1

3 min read6 days ago

Attacking CIFAR-10 Dataset using PGD (Projected Gradient Descent)

In this series, we’ll explore and demostrate (with examples) how we can generate Adversarial Attacks on models using various techniques. To kick off, we’ll start with this example where we use the CIFAR-10 Dataset and craft a PGD attack using FoolBox.

Setting Up the Environment

These libraries will help us load datasets, build or use pretrained models, and visualize the results.

!pip install torch torchvision matplotlib numpy foolbox

Loading the CIFAR-10 Dataset

The CIFAR-10 dataset is widely used in computer vision tasks. It consists of 60,000 32x32 color images in 10 classes, with 10,000 images reserved for testing.

Import all the required libraries:

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

import foolbox as fb
from foolbox.attacks import LinfPGD

We can load the dataset using PyTorch’s torchvision module:


# Load CIFAR-10 dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=1, shuffle=True)

Here, we normalize the images with a mean of 0.5 and standard deviation of 0.5 for all three color channels (RGB).

Using a Pretrained Model

For simplicity, we’ll use a pretrained ResNet-18 model from torchvision.models. This powerful convolutional neural network architecture is widely used for image classification.

# Define a simple model (using pretrained ResNet18 for demonstration)
model = torchvision.models.resnet18(pretrained=True)
model.eval()

Setting model.eval() ensures the model operates in evaluation mode, crucial for consistency when dealing with adversarial examples.

Visualizing the Original Image

Before diving into adversarial attacks, let’s pick a random sample from the test set and display it.

# Select a sample image from the testset
dataiter = iter(testloader)
images, labels = next(dataiter)

# Display the original image
def imshow(img, title):
    if img.ndim == 4:
        # Display the first image in the batch
        img = img[0]

    img = img / 2 + 0.5  # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.title(title)
    plt.show()

imshow(images[0], title='Original Image')

This code snippet normalizes the image back to its original pixel range and displays it alongside its title. You’ll notice the vibrant CIFAR-10 image with its original colors.

Generating an Adversarial Example

We now use Foolbox to craft adversarial examples. Here’s how:

Wrap the model for Foolbox.
Use the LinfPGD attack to generate perturbations within a small range.
Visualize the adversarial image.


# Create Foolbox model
# Adjust bounds to match normalization
fmodel = fb.PyTorchModel(model, bounds=(-1, 1))

# Apply an adversarial attack
attack = LinfPGD()
epsilons = [0.03]
adversarials = attack(fmodel, images, labels, epsilons=epsilons)

# Extract the first adversarial image
adversarial_image = adversarials[0]

# Ensure adversarial_image is a tensor
if isinstance(adversarial_image, list):
    adversarial_image = adversarial_image[0]

if isinstance(adversarial_image, torch.Tensor):
    adversarial_image = adversarial_image.cpu()

# Display the adversarial image
imshow(adversarial_image, title='Adversarial Image')

The adversarial image retains a high degree of similarity to the original but introduces subtle perturbations designed to confuse the model.

Visualizing the Difference

To better understand how the adversarial example differs from the original, we compute and display the difference image:

# Display the difference
diff = adversarial_image - (images[0] / 2 + 0.5)
imshow(diff, title='Difference Image')

Conclusion

This walkthrough demonstrates how to:

Load a CIFAR-10 dataset.
Use a pretrained ResNet-18 model.
Generate and visualize adversarial examples with Foolbox.

Adversarial attacks like LinfPGD emphasize the importance of developing robust deep learning models. While the adversarial changes may seem minor, they can significantly impact model predictions.

In future posts, we’ll explore:

Comparing the predictions of original vs. adversarial images.
Testing other adversarial attack methods.
Strategies for improving model robustness.

I’ll post more attack examples in this series, stay tuned!