Neuroevolution for machine learning#
EvoX provides solutions for supervised learning tasks based on neuroevolution, with key modules including SupervisedLearningProblem
and ParamsAndVector
. Taking the MNIST classification task as an example, this section illustrates the neuroevolution process for supervised learning by adopting the modules of EvoX.
Basic Setup#
Basic component imports and device configuration serve as the essential starting steps for the neuroevolution process.
Here, to ensure the reproducibility of results, a random seed can be optionally set.
import torch
import torch.nn as nn
from evox.utils import ParamsAndVector
from evox.core import Algorithm, Mutable, Parameter, jit_class
from evox.problems.neuroevolution.supervised_learning import SupervisedLearningProblem
from evox.algorithms import PSO
from evox.workflows import EvalMonitor, StdWorkflow
# Set device
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# Set random seed
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
In this step, a sample convolutional neural network (CNN) model is directly defined upon the PyTorch framework and then loaded onto the device.
class SampleCNN(nn.Module):
def __init__(self):
super(SampleCNN, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 3, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(3, 3, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(3, 3, kernel_size=3),
nn.ReLU(),
nn.Conv2d(3, 3, kernel_size=3),
nn.ReLU(),
)
self.classifier = nn.Sequential(nn.Flatten(), nn.Linear(12, 10))
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
model = SampleCNN().to(device)
total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of model parameters: {total_params}")
Total number of model parameters: 412
Setting dataset implies the selection of the task. The data loader now needs to be initialized based on PyTorch’s built-in support.
Here, the package torchvision
must be installed in advance depending on your PyTorch version, if it is not already available.
In case the MNIST dataset is not already present in the data_root
directory, the download=True
flag is set to ensure that the dataset will be automatically downloaded. Therefore, the setup may take some time during the first run.
import os
import torchvision
data_root = "./data" # Choose a path to save dataset
os.makedirs(data_root, exist_ok=True)
train_dataset = torchvision.datasets.MNIST(
root=data_root,
train=True,
download=True,
transform=torchvision.transforms.ToTensor(),
)
test_dataset = torchvision.datasets.MNIST(
root=data_root,
train=False,
download=True,
transform=torchvision.transforms.ToTensor(),
)
BATCH_SIZE = 100
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
collate_fn=None,
)
test_loader = torch.utils.data.DataLoader(
test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
collate_fn=None,
)
To accelerate subsequent processes, all MNIST data are pre-loaded for faster execution. Below, three datasets are pre-loaded for different stages – gradient descent training, neuroevolution fine-tuning, and model testing.
It should be noted that this is an optional operation that trades space for time. Its adoption depends on your GPU capacity, and it will always take some time to prepare.
# Used for gradient descent training process
pre_gd_train_loader = tuple([(inputs.to(device), labels.to(device)) for inputs, labels in train_loader])
# Used for neuroevolution fine-tuning process
pre_ne_train_loader = tuple(
[
(
inputs.to(device),
labels.type(torch.float).unsqueeze(1).repeat(1, 10).to(device),
)
for inputs, labels in train_loader
]
)
# Used for model testing process
pre_test_loader = tuple([(inputs.to(device), labels.to(device)) for inputs, labels in test_loader])
Here, a model_test
function is pre-defined to simplify the evaluation of the model’s prediction accuracy on the test dataset during subsequent stages.
def model_test(model: nn.Module, data_loader: torch.utils.data.DataLoader, device: torch.device) -> float:
model.eval()
with torch.no_grad():
total = 0
correct = 0
for inputs, labels in data_loader:
inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
labels: torch.Tensor = labels.to(device=device, non_blocking=True)
logits = model(inputs)
_, predicted = torch.max(logits.data, dim=1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
acc = 100 * correct / total
return acc
Gradient Descent Training (Optional)#
The gradient descent based model training is performed first. In this example, this training is adopted to initialize the model, preparing it for subsequent neuroevolution processes.
The model training process in PyTorch is compatible with neuroevolution in EvoX, making it convenient to reuse the same model implementation for further steps.
def model_train(
model: nn.Module,
data_loader: torch.utils.data.DataLoader,
criterion: nn.Module,
optimizer: torch.optim.Optimizer,
max_epoch: int,
device: torch.device,
print_frequent: int = -1,
) -> nn.Module:
model.train()
for epoch in range(max_epoch):
running_loss = 0.0
for step, (inputs, labels) in enumerate(data_loader, start=1):
inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
labels: torch.Tensor = labels.to(device=device, non_blocking=True)
optimizer.zero_grad()
logits = model(inputs)
loss = criterion(logits, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if print_frequent > 0 and step % print_frequent == 0:
print(f"[Epoch {epoch:2d}, step {step:4d}] running loss: {running_loss:.4f} ")
running_loss = 0.0
return model
model_train(
model,
data_loader=pre_gd_train_loader,
criterion=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters(), lr=1e-2),
max_epoch=3,
device=device,
print_frequent=500,
)
gd_acc = model_test(model, pre_test_loader, device)
print(f"Accuracy after gradient descent training: {gd_acc:.4f} %.")
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at ../aten/src/ATen/Context.cpp:296.)
return F.linear(input, self.weight, self.bias)
[Epoch 0, step 500] running loss: 394.9020
[Epoch 1, step 500] running loss: 231.2396
[Epoch 2, step 500] running loss: 206.0878
Accuracy after gradient descent training: 89.1500 %.
Neuroevolution Fine-Tuning#
Based on the pre-trained model from the previous gradient descent process, neuroevolution is progressively applied to fine-tune the model.
First, the ParamsAndVector
component is used to flatten the weights of the pre-trained model into a vector, which serves as the initial center individual for the subsequent neuroevolution process.
adapter = ParamsAndVector(dummy_model=model)
model_params = dict(model.named_parameters())
pop_center = adapter.to_vector(model_params)
lower_bound = pop_center - 0.01
upper_bound = pop_center + 0.01
In case of algorithms specifically designed for neuroevolution, which can directly accept a dictionary of batched parameters as input, the usage of
ParamsAndVector
can be unnecessary.
Additionally, a sample criterion is defined. Here, both the loss and accuracy of the individual model are selected and weighted to serve as the fitness function in the neuroevolution process. This step is customizable to suit the optimization direction.
class AccuracyCriterion(nn.Module):
def __init__(self, data_loader):
super().__init__()
data_loader = data_loader
def forward(self, logits, labels):
_, predicted = torch.max(logits, dim=1)
correct = (predicted == labels[:, 0]).sum()
fitness = -correct
return fitness
acc_criterion = AccuracyCriterion(pre_ne_train_loader)
loss_criterion = nn.MSELoss()
class WeightedCriterion(nn.Module):
def __init__(self, loss_weight, loss_criterion, acc_weight, acc_criterion):
super().__init__()
self.loss_weight = loss_weight
self.loss_criterion = loss_criterion
self.acc_weight = acc_weight
self.acc_criterion = acc_criterion
def forward(self, logits, labels):
weighted_loss = self.loss_weight * loss_criterion(logits, labels)
weighted_acc = self.acc_weight * acc_criterion(logits, labels)
return weighted_loss + weighted_acc
weighted_criterion = WeightedCriterion(
loss_weight=0.5,
loss_criterion=loss_criterion,
acc_weight=0.5,
acc_criterion=acc_criterion,
)
At the same time, similar to the gradient descent training and model testing processes, the neuroevolution fine-tuning process is also encapsulated into a function for convenient use in subsequent stages.
import time
def neuroevolution_process(
workflow: StdWorkflow,
adapter: ParamsAndVector,
model: nn.Module,
test_loader: torch.utils.data.DataLoader,
device: torch.device,
best_acc: float,
max_generation: int = 2,
) -> None:
for index in range(max_generation):
print(f"In generation {index}:")
t = time.time()
workflow.step()
print(f"\tTime elapsed: {time.time() - t: .4f}(s).")
monitor = workflow.get_submodule("monitor")
print(f"\tTop fitness: {monitor.topk_fitness}")
best_params = adapter.to_params(monitor.topk_solutions[0])
model.load_state_dict(best_params)
acc = model_test(model, test_loader, device)
if acc > best_acc:
best_acc = acc
print(f"\tBest accuracy: {best_acc:.4f} %.")
Population-Based Neuroevolution Test#
In this example, the population-based algorithm for neuroevolution is tested first, using Particle Swarm Optimization (PSO) as a representation. The configuration for neuroevolution is similar to that of other optimization tasks – we need to define the problem, algorithm, monitor, and workflow, along with their respective setup()
functions to complete the initialization.
A key point to note here is that the population size (POP_SIZE
in this case) needs to be initialized in both the problem and the algorithm to avoid potential errors.
POP_SIZE = 100
vmapped_problem = SupervisedLearningProblem(
model=model,
data_loader=pre_ne_train_loader,
criterion=weighted_criterion,
pop_size=POP_SIZE,
device=device,
)
vmapped_problem.setup()
pop_algorithm = PSO(
pop_size=POP_SIZE,
lb=lower_bound,
ub=upper_bound,
device=device,
)
pop_algorithm.setup()
monitor = EvalMonitor(
topk=3,
device=device,
)
monitor.setup()
pop_workflow = StdWorkflow()
pop_workflow.setup(
algorithm=pop_algorithm,
problem=vmapped_problem,
solution_transform=adapter,
monitor=monitor,
device=device,
)
print("Upon gradient descent, the population-based neuroevolution process start. ")
neuroevolution_process(
workflow=pop_workflow,
adapter=adapter,
model=model,
test_loader=pre_test_loader,
device=device,
best_acc=gd_acc,
max_generation=10,
)
Upon gradient descent, the population-based neuroevolution process start.
In generation 0:
Time elapsed: 3.7248(s).
Top fitness: tensor([0.8886, 1.4018, 1.6117], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 1:
Time elapsed: 4.2092(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 2:
Time elapsed: 3.8874(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 3:
Time elapsed: 3.5654(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 4:
Time elapsed: 3.3940(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 5:
Time elapsed: 3.4152(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 6:
Time elapsed: 3.2818(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 7:
Time elapsed: 3.0669(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 8:
Time elapsed: 3.1275(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 9:
Time elapsed: 3.1362(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
pop_workflow.get_submodule("monitor").plot()