Neuroevolution for machine learning#

EvoX provides solutions for supervised learning tasks based on neuroevolution, with key modules including SupervisedLearningProblem and ParamsAndVector. Taking the MNIST classification task as an example, this section illustrates the neuroevolution process for supervised learning by adopting the modules of EvoX.

Basic Setup#

Basic component imports and device configuration serve as the essential starting steps for the neuroevolution process.

Here, to ensure the reproducibility of results, a random seed can be optionally set.

import torch
import torch.nn as nn

from evox.utils import ParamsAndVector
from evox.core import Algorithm, Mutable, Parameter, jit_class
from evox.problems.neuroevolution.supervised_learning import SupervisedLearningProblem
from evox.algorithms import PSO
from evox.workflows import EvalMonitor, StdWorkflow


# Set device
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Set random seed
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

In this step, a sample convolutional neural network (CNN) model is directly defined upon the PyTorch framework and then loaded onto the device.

class SampleCNN(nn.Module):
    def __init__(self):
        super(SampleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 3, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(3, 3, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(3, 3, kernel_size=3),
            nn.ReLU(),
            nn.Conv2d(3, 3, kernel_size=3),
            nn.ReLU(),
        )
        self.classifier = nn.Sequential(nn.Flatten(), nn.Linear(12, 10))

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x


model = SampleCNN().to(device)

total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of model parameters: {total_params}")
Total number of model parameters: 412

Setting dataset implies the selection of the task. The data loader now needs to be initialized based on PyTorch’s built-in support. Here, the package torchvision must be installed in advance depending on your PyTorch version, if it is not already available.

In case the MNIST dataset is not already present in the data_root directory, the download=True flag is set to ensure that the dataset will be automatically downloaded. Therefore, the setup may take some time during the first run.

import os
import torchvision


data_root = "./data"  # Choose a path to save dataset
os.makedirs(data_root, exist_ok=True)
train_dataset = torchvision.datasets.MNIST(
    root=data_root,
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
test_dataset = torchvision.datasets.MNIST(
    root=data_root,
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)


BATCH_SIZE = 100
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    collate_fn=None,
)
test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    collate_fn=None,
)

To accelerate subsequent processes, all MNIST data are pre-loaded for faster execution. Below, three datasets are pre-loaded for different stages – gradient descent training, neuroevolution fine-tuning, and model testing.

It should be noted that this is an optional operation that trades space for time. Its adoption depends on your GPU capacity, and it will always take some time to prepare.

# Used for gradient descent training process
pre_gd_train_loader = tuple([(inputs.to(device), labels.to(device)) for inputs, labels in train_loader])

# Used for neuroevolution fine-tuning process
pre_ne_train_loader = tuple(
    [
        (
            inputs.to(device),
            labels.type(torch.float).unsqueeze(1).repeat(1, 10).to(device),
        )
        for inputs, labels in train_loader
    ]
)

# Used for model testing process
pre_test_loader = tuple([(inputs.to(device), labels.to(device)) for inputs, labels in test_loader])

Here, a model_test function is pre-defined to simplify the evaluation of the model’s prediction accuracy on the test dataset during subsequent stages.

def model_test(model: nn.Module, data_loader: torch.utils.data.DataLoader, device: torch.device) -> float:
    model.eval()
    with torch.no_grad():
        total = 0
        correct = 0
        for inputs, labels in data_loader:
            inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
            labels: torch.Tensor = labels.to(device=device, non_blocking=True)

            logits = model(inputs)
            _, predicted = torch.max(logits.data, dim=1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        acc = 100 * correct / total
    return acc

Gradient Descent Training (Optional)#

The gradient descent based model training is performed first. In this example, this training is adopted to initialize the model, preparing it for subsequent neuroevolution processes.

The model training process in PyTorch is compatible with neuroevolution in EvoX, making it convenient to reuse the same model implementation for further steps.

def model_train(
    model: nn.Module,
    data_loader: torch.utils.data.DataLoader,
    criterion: nn.Module,
    optimizer: torch.optim.Optimizer,
    max_epoch: int,
    device: torch.device,
    print_frequent: int = -1,
) -> nn.Module:
    model.train()
    for epoch in range(max_epoch):
        running_loss = 0.0
        for step, (inputs, labels) in enumerate(data_loader, start=1):
            inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
            labels: torch.Tensor = labels.to(device=device, non_blocking=True)

            optimizer.zero_grad()
            logits = model(inputs)
            loss = criterion(logits, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if print_frequent > 0 and step % print_frequent == 0:
                print(f"[Epoch {epoch:2d}, step {step:4d}] running loss: {running_loss:.4f} ")
                running_loss = 0.0
    return model
model_train(
    model,
    data_loader=pre_gd_train_loader,
    criterion=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters(), lr=1e-2),
    max_epoch=3,
    device=device,
    print_frequent=500,
)

gd_acc = model_test(model, pre_test_loader, device)
print(f"Accuracy after gradient descent training: {gd_acc:.4f} %.")
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at ../aten/src/ATen/Context.cpp:296.)
  return F.linear(input, self.weight, self.bias)
[Epoch  0, step  500] running loss: 394.9020 
[Epoch  1, step  500] running loss: 231.2396 
[Epoch  2, step  500] running loss: 206.0878 
Accuracy after gradient descent training: 89.1500 %.

Neuroevolution Fine-Tuning#

Based on the pre-trained model from the previous gradient descent process, neuroevolution is progressively applied to fine-tune the model.

First, the ParamsAndVector component is used to flatten the weights of the pre-trained model into a vector, which serves as the initial center individual for the subsequent neuroevolution process.

adapter = ParamsAndVector(dummy_model=model)
model_params = dict(model.named_parameters())
pop_center = adapter.to_vector(model_params)
lower_bound = pop_center - 0.01
upper_bound = pop_center + 0.01

In case of algorithms specifically designed for neuroevolution, which can directly accept a dictionary of batched parameters as input, the usage of ParamsAndVector can be unnecessary.

Additionally, a sample criterion is defined. Here, both the loss and accuracy of the individual model are selected and weighted to serve as the fitness function in the neuroevolution process. This step is customizable to suit the optimization direction.

class AccuracyCriterion(nn.Module):
    def __init__(self, data_loader):
        super().__init__()
        data_loader = data_loader

    def forward(self, logits, labels):
        _, predicted = torch.max(logits, dim=1)
        correct = (predicted == labels[:, 0]).sum()
        fitness = -correct
        return fitness


acc_criterion = AccuracyCriterion(pre_ne_train_loader)
loss_criterion = nn.MSELoss()


class WeightedCriterion(nn.Module):
    def __init__(self, loss_weight, loss_criterion, acc_weight, acc_criterion):
        super().__init__()
        self.loss_weight = loss_weight
        self.loss_criterion = loss_criterion
        self.acc_weight = acc_weight
        self.acc_criterion = acc_criterion

    def forward(self, logits, labels):
        weighted_loss = self.loss_weight * loss_criterion(logits, labels)
        weighted_acc = self.acc_weight * acc_criterion(logits, labels)
        return weighted_loss + weighted_acc


weighted_criterion = WeightedCriterion(
    loss_weight=0.5,
    loss_criterion=loss_criterion,
    acc_weight=0.5,
    acc_criterion=acc_criterion,
)

At the same time, similar to the gradient descent training and model testing processes, the neuroevolution fine-tuning process is also encapsulated into a function for convenient use in subsequent stages.

import time


def neuroevolution_process(
    workflow: StdWorkflow,
    adapter: ParamsAndVector,
    model: nn.Module,
    test_loader: torch.utils.data.DataLoader,
    device: torch.device,
    best_acc: float,
    max_generation: int = 2,
) -> None:
    for index in range(max_generation):
        print(f"In generation {index}:")
        t = time.time()
        workflow.step()
        print(f"\tTime elapsed: {time.time() - t: .4f}(s).")

        monitor = workflow.get_submodule("monitor")
        print(f"\tTop fitness: {monitor.topk_fitness}")
        best_params = adapter.to_params(monitor.topk_solutions[0])
        model.load_state_dict(best_params)
        acc = model_test(model, test_loader, device)
        if acc > best_acc:
            best_acc = acc
        print(f"\tBest accuracy: {best_acc:.4f} %.")

Population-Based Neuroevolution Test#

In this example, the population-based algorithm for neuroevolution is tested first, using Particle Swarm Optimization (PSO) as a representation. The configuration for neuroevolution is similar to that of other optimization tasks – we need to define the problem, algorithm, monitor, and workflow, along with their respective setup() functions to complete the initialization.

A key point to note here is that the population size (POP_SIZE in this case) needs to be initialized in both the problem and the algorithm to avoid potential errors.

POP_SIZE = 100
vmapped_problem = SupervisedLearningProblem(
    model=model,
    data_loader=pre_ne_train_loader,
    criterion=weighted_criterion,
    pop_size=POP_SIZE,
    device=device,
)
vmapped_problem.setup()

pop_algorithm = PSO(
    pop_size=POP_SIZE,
    lb=lower_bound,
    ub=upper_bound,
    device=device,
)
pop_algorithm.setup()

monitor = EvalMonitor(
    topk=3,
    device=device,
)
monitor.setup()

pop_workflow = StdWorkflow()
pop_workflow.setup(
    algorithm=pop_algorithm,
    problem=vmapped_problem,
    solution_transform=adapter,
    monitor=monitor,
    device=device,
)
print("Upon gradient descent, the population-based neuroevolution process start. ")
neuroevolution_process(
    workflow=pop_workflow,
    adapter=adapter,
    model=model,
    test_loader=pre_test_loader,
    device=device,
    best_acc=gd_acc,
    max_generation=10,
)
Upon gradient descent, the population-based neuroevolution process start. 
In generation 0:
	Time elapsed:  3.7248(s).
	Top fitness: tensor([0.8886, 1.4018, 1.6117], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 1:
	Time elapsed:  4.2092(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 2:
	Time elapsed:  3.8874(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 3:
	Time elapsed:  3.5654(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 4:
	Time elapsed:  3.3940(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 5:
	Time elapsed:  3.4152(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 6:
	Time elapsed:  3.2818(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 7:
	Time elapsed:  3.0669(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 8:
	Time elapsed:  3.1275(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
In generation 9:
	Time elapsed:  3.1362(s).
	Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
	Best accuracy: 89.3000 %.
pop_workflow.get_submodule("monitor").plot()