PyTorch CNN Guide: Basics, Transfer Learning & Deployment

目次

1. Introduction: Overview of PyTorch and CNN

What is PyTorch?

PyTorch is an open-source machine learning library developed by Facebook (now Meta). It is focused on Python and makes building, training, and evaluating neural networks easy. It allows intuitive code writing and enjoys high popularity among researchers and developers.

What is CNN (Convolutional Neural Network)?

CNN (Convolutional Neural Network) is a type of neural network specialized for image and video recognition. It mimics the human visual recognition process and can extract features from data. It is widely used especially in fields such as image classification and object detection.

Basic Structure of CNN

CNN consists of the following main layers.
  1. Convolutional Layer extracts local image features (edges, colors, etc.). It performs convolution operations using small matrices called filters.
  2. Pooling Layer reduces the feature map size, lowering computational cost. A common method is Max Pooling, which retains the strongest parts of the features.
  3. Fully Connected Layer uses the extracted features to perform final classification or prediction.
  4. Activation Function applies a non-linear transformation, enabling the network to learn complex patterns. A common function is ReLU (Rectified Linear Unit).

Why the Combination of PyTorch and CNN Is Powerful

PyTorch adopts dynamic computation graphs, allowing flexible code writing. This makes building and debugging CNN models easy, making it ideal for experimental research and projects. It also supports fast processing using GPUs, enabling handling of large-scale data.

Real-World Use Cases

PyTorch and CNN are used in the following areas.
  • Image classification (e.g., cat and dog identification)
  • Face recognition systems
  • Image processing for autonomous vehicles
  • Medical image diagnosis (MRI and X-ray image analysis)
  • Style transfer and image enhancement

Summary

This section explained the basic concepts of PyTorch and CNN and the strengths of their combination.

2. Preparing PyTorch and CNN: Environment Setup and Installation

How to Install PyTorch and Initial Setup

1. Preparing the Development Environment

PyTorch requires Python to be installed. Additionally, using an integrated development environment (IDE) such as Visual Studio Code, Jupyter Notebook, or Google Colab is convenient.

2. PyTorch Installation Steps

Below are the steps to install PyTorch in a local environment.
  1. Python Installation
  • Download and install the latest Python from the official Python website (https://www.python.org/).
  1. Creating a Virtual Environment
   python -m venv pytorch_env
   source pytorch_env/bin/activate   # Mac/Linux
   pytorch_envScriptsactivate      # Windows
  1. PyTorch Installation You can generate an appropriate installation command for your environment on the official PyTorch website (https://pytorch.org/). Below is an example of installing the GPU‑enabled version.
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  1. Installation Verification
   import torch
   print(torch.__version__)         # display version
   print(torch.cuda.is_available()) # check if GPU is available

Setting Up the Environment on Google Colab

1. Log in>1. Log in with a Google Account

Google Colab (https://colab.research.google.com/) can be accessed and you can log in with your account.

2. Runtime Settings

From the menu, select “Runtime” → “Change runtime type” and choose “GPU” as the hardware accelerator.

3. Verify PyTorch Version

import torch
print(torch.__version__)
You can install the latest version if needed.
!pip install torch torchvision torchaudio

Preparing and Preprocessing the Dataset

1. Downloading the Dataset

PyTorch provides the “torchvision” library, which makes it easy to work with a wide variety of datasets. Here we illustrate using the popular CIFAR‑10 dataset as an example.
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.ToTensor(),                    
    transforms.Normalize((0.5,), (0.5,))      
])

trainset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True>download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=32, shuffle=True)

2. Data Preprocessing

  • Normalization: Scaling the data to the 0–1 range stabilizes training.
  • Data Augmentation: Applying random rotations and flips increases the data and helps prevent overfitting.
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),        
    transforms.RandomCrop(32, padding=4),     
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

3. Configuring the Data Loader

The data loader streamlines batch processing and supplies data to the model in mini‑batch units.
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=32, shuffle=True, num_workers=2)

Summary

In this section, we covered the PyTorch installation steps and how to set up the environment using Google Colab. We also presented concrete examples of preparing and preprocessing a dataset for CNNs.

3. Building a CNN Model with PyTorch [with Code Examples]

Basic Structure of CNN Models and Customization Examples

1. Basic Structure of a CNN Model

CNN extracts features from image data and uses them to perform classification. The basic architecture is as follows.
  1. Convolutional Layer (Convolutional Layer) – Extracts image features.
  2. Pooling Layer (Pooling Layer) – Reduces feature map size to lower computational cost.
  3. Fully Connected Layer (Fully Connected Layer) – Performs the final classification.
  4. Activation Function (Activation Function) – Applies non-linear transformations to enable the model to learn complex patterns.
In this section, we introduce how to build a simple CNN model in PyTorch by combining these layers.

Steps to Implement a CNN with PyTorch

1. Import Required Libraries

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

2. Prepare the Dataset

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

3. Build the CNN Model

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        # Convolutional Layer 1
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        # Convolutional Layer 2
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        # Convolutional Layer 3
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        # Pooling Layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # Fully Connected Layer
        self.fc1 = nn.Linear(128 * 4 * 4, 256)
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # Convolutional Layer 1 → ReLU → Pooling
        x = self.pool(F.relu(self.conv2(x)))  # Convolutional Layer 2 → ReLU → Pooling
        x = self.pool(F.relu(self.conv3(x)))  # Convolutional Layer 3 → ReLU → Pooling
        x = x.view(-1, 128 * 4 * 4)           # Convert feature map to 1D
        x = F.relu(self.fc1(x))               # Fully Connected Layer 1 → ReLU
        x = self.fc2(x)                       # Fully Connected Layer 2 → Output
        return x

4. Instantiate and Inspect the Model

model = SimpleCNN()
print(model)

5. Set Loss Function and Optimizer

criterion = nn.CrossEntropyLoss()  # Loss function: Cross-entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Optimization method: Adam

Summary

In this section, we explained in detail how to build a simple CNN model using PyTorch. This should give you a solid understanding of the basic CNN architecture and how to implement it.

4. Training and Evaluation of CNN Models (Learning with Concrete Examples)

Steps to Train a CNN Model with PyTorch

1. Preparing the Model Training Process

In model training, data is processed using the following steps.
  1. Forward Propagation: Pass the input data through the model and calculate the output.
  2. Loss Calculation: Compute the error between predictions and true labels.
  3. Backward Propagation: Update each layer’s parameters based on the error.
  4. Optimizer Update: Adjust parameters based on the learning rate.
Below is a concrete code example implementing these steps.
# Model, loss function, and optimizer setup
model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Execute training
n_epochs = 10  # number of epochs epoch in range(n_epochs):
    running_loss = 00
    for inputs, labels in trainloader:
        # Zero gradients
        optimizer.zero_grad()
        # Forward propagation
        outputs = model(inputs)
        # Compute loss
        loss = criterion(outputs, labels)
        # Backward propagation
        loss.backward()
        # Update weights
        optimizer.step()
        # Record loss
        running_loss += loss.item()

    # Display loss per epoch
    print(f"Epoch {epoch+1}/{n_epochs}, Loss: {running_loss / len(trainloader):.4f}")

Evaluation and Result Analysis Using Test Data

1. Model Performance Evaluation

We evaluate the model’s accuracy using test data. Below is a code example for evaluation.
correct = 0
total = 0

# Switch to evaluation mode
model.eval()
with torch.no_grad():  # Disable gradient computation
    for inputs, labels in testloader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)  # Predict the class with the highest probability
        total += labels.size(0)
        correct += (predictedsum().item()

accuracy = 100 * correct / total
print(f'Accuracy: {accuracy:.2f}%')

2. Detailed Explanation of Evaluation Metrics

  • Accuracy: The proportion of samples correctly classified.
  • Loss: A metric indicating model error; lower values are better.
  • Confusion Matrix: Visualizes classification results for each class, helping to identify misclassification patterns.
Below is an example of a confusion matrix.
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Generate confusion matrix
all_labels = []
all_preds = []

with torch.no_grad():
    for inputs, labels in testloader:
        = model(inputs)
        _, preds = torch.max(outputs, 1)
        all_labels.extend(labels.numpy())
        all_preds.extend(preds.numpy())

cm = confusion_matrix(all_labels, all_preds)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

Summary

In this section, we explained how to train and evaluate CNN models using PyTorch. The training process leverages loss functions and optimizers to improve model accuracy.

5. Application Example: How to Improve Performance with Transfer Learning

What is Transfer Learning?

Transfer learning is a method that reuses already trained models for new tasks. In particular, for image recognition tasks, you can fine‑tune models trained on large datasets (e.g., VGG16 or ResNet) to build high‑accuracy models in a short time.

Benefits of Transfer Learning

  1. Reduced computational cost: No need to train a model from scratch, which eases GPU load.
  2. Learnable with small datasets: Even with limited data, you can achieve high accuracy by leveraging the feature extraction capabilities of pre‑trained models.
  3. Rapid implementation: Easy to implement and can build models quickly.

Example of Transfer Learning Implementation in PyTorch

1. Import Required Libraries

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms

2. Data Preprocessing and Loading

transform = transforms.Compose([
    transforms.Resize(224),                  # Resize input to 224x224
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

3. Load Pre‑trained Model

model = models.resnet18(pretrained=True)

# Customize output layer (CIFAR-10 has 10 classes)
model.fc = nn.Linear(512, 10)

4. Freeze Model and Fine‑tune

for param in model.parameters():
    param.requires_grad = False  # Freeze parameters

# Set only the final layer to be trainable
model.fc = nn.Linear(512, 10)

5. Set Loss Function and Optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

6. Train the Model

n_epochs = 10
for epoch in range(n_epochs):
    running_loss = 0.0
    for inputs, labels in trainloader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}/{n_epochs}, Loss: {running_loss / len(trainloader):.4f}")

Beyond Image Classification! CNN Applications

1. Object Detection

  • Use case: Object detection for autonomous vehicles and security camera video analysis, etc.
  • Technology: Architectures such as YOLO and Faster R-CNN are used.

2. Segmentation

  • Use case: Used in medical image analysis to identify tumors and abnormal regions.
  • Technology: U-Net and Mask R-CNN are common.

3. Style Transfer

  • Use case: Technology for changing style in artworks and photo‑editing apps.
  • Technology: Uses CNNs to extract and transform image features.

4. Anomaly Detection

  • Use case: Used for quality control and anomaly detection in manufacturing.
  • Technology: Leverages CNN feature extraction to classify normal and abnormal data.

Conclusion

In this section, we explained the basic concepts and implementation methods of transfer learning in detail. We also presented examples of applying CNNs to object detection, anomaly detection, and other use cases.

6. Troubleshooting: Error Handling and Debugging Tips

Common PyTorch Errors and Their Solutions

1. Module or Package Import Errors

Error Message:
ModuleNotFoundError: No module named 'torch'
Cause: PyTorch is not installed, or the virtual environment is not set up correctly. Solution:
source pytorch_env/bin/activate  # Linux/Mac
pytorch_envScriptsactivate   # Windows

pip install torch torchvision torchaudio

2. GPU Not Recognized Error

Error Message:
RuntimeError: CUDA error: device-side assert triggered
Cause: The GPU is not available, or the CUDA version does not match. Solution:
import torch
print(torch.__version__)         # PyTorch version
print(torch.cuda.is_available()) # Whether GPU is available

3. Dimension Mismatch Error

Error Message:
RuntimeError: shape '[N, C, H, W]' is invalid for input of size X
Cause: The dimensions (size) of the input data do not match the model. Solution:
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

Debugging Techniques and Log Utilization

1. Logging the Training Process

for epoch in range(n_epochs):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if i % 100 == 99:
            print(f"[Epoch {epoch+1}, Batch {i+1}] Loss: {loss.item():.4f}")

2. Visualization with TensorBoard

Installation:
pip install tensorboard
Code Example:
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter()

for epoch in range(n_epochs):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(trainloader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        writer.add_scalar('Loss/train', loss.item(), epoch * len(trainloader) + i)

writer.close()
Launching TensorBoard:
tensorboard --logdir=runs

Error Handling Checklist

IssueCheckpointsSolution
Module import errorPackage installation and virtual environment activationReinstall required modules
GPU not recognizedVerify CUDA version compatibility with PyTorchUpdate CUDA driver and reinstall PyTorch
Data dimension mismatch errorCheck input data shape against model’s expected sizeResize input data and adjust model
Accuracy not improvingAdjust learning rate and batch size, verify normalizationTune hyperparameters and add data augmentation
Overfitting occursMonitor loss trends and test data accuracyAdd dropout layers or apply regularization

Conclusion

This section covered error handling and debugging techniques in PyTorch. Errors frequently arise during model building and training, so use logging and TensorBoard visualizations for early detection and resolution.

7. Saving and Deploying Models: Practical Application Methods

How to Save Trained Models

1. How to Save a State Dictionary (State Dict)

torch.save(model.state_dict(), 'cnn_model.pth')
Advantages:
  • Redefining the model architecture provides greater flexibility when reusing it.
  • The file size is small, allowing efficient management.

2. How to Save the Entire Model

torch.save(model, 'cnn_complete_model.pth')
Advantages:
  • No need to reconstruct the model; it can be loaded directly, making it simple.
Disadvantages:
  • Depends on PyTorch version compatibility.

Reloading Saved Models and Using Them for Inference

1. How to Load a Model from a State Dictionary

model = SimpleCNN()
model.load_state_dict(torch.load('cnn_model.pth'))
model.eval()

2. How to Load the Entire Model

model = torch.load('cnn_complete_model.pth')
model.eval()

3. Running Inference

import numpy as np
from PIL import Image
import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

image = Image.open('sample_image.jpg')
image = transform(image).unsqueeze(0)

output = model(image)
_, predicted = torch.max(output, 1)
print(f'Predicted class: {predicted.item()}')

Deploying Models on Cloud and Web Applications

1. API Deployment Using Flask

Installing Required Libraries:
pip install flask
Example Code:
from flask import Flask, request, jsonify
import torch
from torchvision import transforms
from PIL import Image

app = Flask(__name__)

model = torch.load('cnn_complete_model.pth')
model.eval()

def preprocess_image(image):
    transform = transforms.Compose([
        transforms.Resize(224),
        transforms.ToTensor(),
        transforms.Normalize((0.5,), (0.5,))
    ])
    image = transform(image).unsqueeze(0)
    return image

@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['file']
    image = Image.open(file.stream)
    image = preprocess_image(image)

    output = model(image)
    _, predicted = torch.max(output, 1)

    return jsonify({'prediction': predicted.item()})

if __name__ == '__main__':
    app.run(debug=True)
How to Use the API:
curl -X POST -F "file=@sample_image.jpg" http://127.0.0.1:5000/predict
Example Result:
{"prediction": 3}

Key Points for Model Deployment

  1. Lightweight: Apply quantization and pruning to reduce model size.
  2. Cloud Integration: Leverage AWS Lambda and Google Cloud Functions to achieve scalable deployment.
  3. Real-time Processing: Use WebSocket to integrate real-time processing into applications.

Conclusion

In this section, we explained how to save and deploy models using PyTorch. We covered both state dictionary and full model saving methods, and learned the steps for reuse and deployment via APIs.

8. Summary

Take the First Step in Machine Learning with PyTorch CNN!

In the previous sections, we covered building, training, and evaluating CNN models using PyTorch, as well as their applications and deployment. Below is a summary of the key points of this article.

1. Overview of PyTorch and CNN

  • CNNs are neural networks that excel at image recognition, and PyTorch is a framework well-suited for implementing them.
  • PyTorch is widely used in research and development thanks to its intuitive code syntax and GPU support.

2. Setting Up the Environment and Installation

  • Installing PyTorch is straightforward, and you can quickly set up an environment using Google Colab.
  • We also learned that dataset preparation and preprocessing can be efficiently handled with torchvision.

3. Building and Training CNN Models

  • We explained how to construct a model that combines convolutional, pooling, and fully connected layers, and how to set loss functions and optimizers for training.
  • By logging the training process and using evaluation metrics, we could effectively analyze model performance.

4. Use Cases and Transfer Learning

  • We introduced how to use pre‑trained models (e.g., ResNet18) for transfer learning to create high‑accuracy models with limited data and time.
  • We also confirmed that CNNs have a broad range of applications beyond image classification, such as object detection and style transfer.

5. Error Handling and Debugging Techniques

  • We presented common errors encountered during model building and how to address them.
  • We learned efficient debugging methods through visualization and logging with TensorBoard.

6. Saving and Deploying Models

  • We explained how to save and reuse trained models, as well as how to deploy them to web apps or APIs.
  • The simple API example using Flask can be readily applied to real projects.

Next Steps

1. Learning Advanced Models

  • Learn more advanced models (e.g., YOLO, Faster R-CNN) and tackle object detection and segmentation.

2. Hyperparameter Optimization

  • Try improving your model by adjusting learning rates, batch sizes, and adding dropout or regularization techniques.

3. Applying to Real‑World Projects

  • Working on projects with real image data (e.g., medical image analysis, face recognition systems) will strengthen your practical skills.

4. Leveraging Cloud Platforms

  • Use cloud services like AWS or GCP to build scalable applications.

5. Continuous Learning and Community Involvement

  • Collaborate with other developers on GitHub or Kaggle and keep learning the latest models and techniques.

Conclusion

PyTorch and CNNs are a powerful combination for machine learning and deep learning. Through this article, you should now understand the workflow from fundamentals to applications and have gained knowledge you can apply to your own projects and research. Going forward, build your own models based on what you learned here and venture into deeper application areas.