机器学习的神经演化#
EvoX 提供基于神经演化的监督学习任务解决方案,主要模块包括 SupervisedLearningProblem
和 ParamsAndVector
。以 MNIST 分类任务为例,本节通过采用 EvoX 的模块来说明监督学习的神经演化过程。
基本设置#
神经演化过程的基本组件导入和设备配置是至关重要的起始步骤。
在这里,为了确保结果的可重复性,可以选择性地设置一个随机种子。
import torch
import torch.nn as nn
from evox.utils import ParamsAndVector
from evox.core import Algorithm, Mutable, Parameter, jit_class
from evox.problems.neuroevolution.supervised_learning import SupervisedLearningProblem
from evox.algorithms import PSO
from evox.workflows import EvalMonitor, StdWorkflow
# Set device
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# Set random seed
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
在此步骤中,样本卷积神经网络(CNN)模型直接基于PyTorch框架定义,然后加载到设备上。
class SampleCNN(nn.Module):
def __init__(self):
super(SampleCNN, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 3, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(3, 3, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(3, 3, kernel_size=3),
nn.ReLU(),
nn.Conv2d(3, 3, kernel_size=3),
nn.ReLU(),
)
self.classifier = nn.Sequential(nn.Flatten(), nn.Linear(12, 10))
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
model = SampleCNN().to(device)
total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of model parameters: {total_params}")
Total number of model parameters: 412
设置数据集意味着选择任务。现在需要基于PyTorch的内置支持初始化数据加载器。在这里,根据您的PyTorch版本,必须预先安装torchvision
包,如果尚未安装的话。
如果 data_root
目录中尚未存在 MNIST 数据集,则设置 download=True
标志以确保数据集将被自动下载。因此,第一次运行时设置可能需要一些时间。
import os
import torchvision
data_root = "./data" # Choose a path to save dataset
os.makedirs(data_root, exist_ok=True)
train_dataset = torchvision.datasets.MNIST(
root=data_root,
train=True,
download=True,
transform=torchvision.transforms.ToTensor(),
)
test_dataset = torchvision.datasets.MNIST(
root=data_root,
train=False,
download=True,
transform=torchvision.transforms.ToTensor(),
)
BATCH_SIZE = 100
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
collate_fn=None,
)
test_loader = torch.utils.data.DataLoader(
test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
collate_fn=None,
)
为了加速后续过程,所有MNIST数据都被预加载以实现更快的执行。下面,三个数据集被预加载用于不同阶段——梯度下降训练、神经演化微调和模型测试。
需要注意的是,这是一个可选操作,用空间换取时间。其采用取决于您的GPU容量,并且总是需要一些时间来准备。
# Used for gradient descent training process
pre_gd_train_loader = tuple([(inputs.to(device), labels.to(device)) for inputs, labels in train_loader])
# Used for neuroevolution fine-tuning process
pre_ne_train_loader = tuple(
[
(
inputs.to(device),
labels.type(torch.float).unsqueeze(1).repeat(1, 10).to(device),
)
for inputs, labels in train_loader
]
)
# Used for model testing process
pre_test_loader = tuple([(inputs.to(device), labels.to(device)) for inputs, labels in test_loader])
在这里,预先定义了一个 model_test
函数,以简化在后续阶段对模型在测试数据集上预测准确性的评估。
def model_test(model: nn.Module, data_loader: torch.utils.data.DataLoader, device: torch.device) -> float:
model.eval()
with torch.no_grad():
total = 0
correct = 0
for inputs, labels in data_loader:
inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
labels: torch.Tensor = labels.to(device=device, non_blocking=True)
logits = model(inputs)
_, predicted = torch.max(logits.data, dim=1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
acc = 100 * correct / total
return acc
梯度下降训练(可选)#
基于梯度下降的模型训练首先进行。在这个例子中,这种训练用于初始化模型,为后续的神经演化过程做准备。
在EvoX中,PyTorch的模型训练过程与神经演化兼容,使得在后续步骤中重用相同的模型实现变得方便。
def model_train(
model: nn.Module,
data_loader: torch.utils.data.DataLoader,
criterion: nn.Module,
optimizer: torch.optim.Optimizer,
max_epoch: int,
device: torch.device,
print_frequent: int = -1,
) -> nn.Module:
model.train()
for epoch in range(max_epoch):
running_loss = 0.0
for step, (inputs, labels) in enumerate(data_loader, start=1):
inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
labels: torch.Tensor = labels.to(device=device, non_blocking=True)
optimizer.zero_grad()
logits = model(inputs)
loss = criterion(logits, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if print_frequent > 0 and step % print_frequent == 0:
print(f"[Epoch {epoch:2d}, step {step:4d}] running loss: {running_loss:.4f} ")
running_loss = 0.0
return model
model_train(
model,
data_loader=pre_gd_train_loader,
criterion=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters(), lr=1e-2),
max_epoch=3,
device=device,
print_frequent=500,
)
gd_acc = model_test(model, pre_test_loader, device)
print(f"Accuracy after gradient descent training: {gd_acc:.4f} %.")
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/linear.py:125: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at ../aten/src/ATen/Context.cpp:296.)
return F.linear(input, self.weight, self.bias)
[Epoch 0, step 500] running loss: 394.9020
[Epoch 1, step 500] running loss: 231.2396
[Epoch 2, step 500] running loss: 206.0878
Accuracy after gradient descent training: 89.1500 %.
神经演化微调#
基于之前梯度下降过程中的预训练模型,神经演化逐步应用于微调模型。
首先,ParamsAndVector
组件用于将预训练模型的权重展平为一个向量,该向量作为后续神经演化过程的初始中心个体。
adapter = ParamsAndVector(dummy_model=model)
model_params = dict(model.named_parameters())
pop_center = adapter.to_vector(model_params)
lower_bound = pop_center - 0.01
upper_bound = pop_center + 0.01
对于专门为神经演化设计的算法,它们可以直接接受批处理参数的字典作为输入,使用
ParamsAndVector
可能是不必要的。
此外,定义了一个示例标准。在这里,个体模型的损失和准确性都被选择并加权,以作为神经演化过程中的适应度函数。此步骤可以自定义以适应优化方向。
class AccuracyCriterion(nn.Module):
def __init__(self, data_loader):
super().__init__()
data_loader = data_loader
def forward(self, logits, labels):
_, predicted = torch.max(logits, dim=1)
correct = (predicted == labels[:, 0]).sum()
fitness = -correct
return fitness
acc_criterion = AccuracyCriterion(pre_ne_train_loader)
loss_criterion = nn.MSELoss()
class WeightedCriterion(nn.Module):
def __init__(self, loss_weight, loss_criterion, acc_weight, acc_criterion):
super().__init__()
self.loss_weight = loss_weight
self.loss_criterion = loss_criterion
self.acc_weight = acc_weight
self.acc_criterion = acc_criterion
def forward(self, logits, labels):
weighted_loss = self.loss_weight * loss_criterion(logits, labels)
weighted_acc = self.acc_weight * acc_criterion(logits, labels)
return weighted_loss + weighted_acc
weighted_criterion = WeightedCriterion(
loss_weight=0.5,
loss_criterion=loss_criterion,
acc_weight=0.5,
acc_criterion=acc_criterion,
)
同时,类似于梯度下降训练和模型测试过程,神经演化微调过程也被封装成一个函数,以便在后续阶段方便使用。
import time
def neuroevolution_process(
workflow: StdWorkflow,
adapter: ParamsAndVector,
model: nn.Module,
test_loader: torch.utils.data.DataLoader,
device: torch.device,
best_acc: float,
max_generation: int = 2,
) -> None:
for index in range(max_generation):
print(f"In generation {index}:")
t = time.time()
workflow.step()
print(f"\tTime elapsed: {time.time() - t: .4f}(s).")
monitor = workflow.get_submodule("monitor")
print(f"\tTop fitness: {monitor.topk_fitness}")
best_params = adapter.to_params(monitor.topk_solutions[0])
model.load_state_dict(best_params)
acc = model_test(model, test_loader, device)
if acc > best_acc:
best_acc = acc
print(f"\tBest accuracy: {best_acc:.4f} %.")
种群基础的神经演化测试#
在这个例子中,基于种群的神经演化算法首先被测试,使用粒子群优化(PSO)作为表示。神经演化的配置与其他优化任务类似——我们需要定义问题、算法、监控器和工作流,以及它们各自的setup()
函数来完成初始化。
需要注意的一个关键点是,种群大小(在这种情况下为POP_SIZE
)需要在问题和算法中都进行初始化,以避免潜在的错误。
POP_SIZE = 100
vmapped_problem = SupervisedLearningProblem(
model=model,
data_loader=pre_ne_train_loader,
criterion=weighted_criterion,
pop_size=POP_SIZE,
device=device,
)
vmapped_problem.setup()
pop_algorithm = PSO(
pop_size=POP_SIZE,
lb=lower_bound,
ub=upper_bound,
device=device,
)
pop_algorithm.setup()
monitor = EvalMonitor(
topk=3,
device=device,
)
monitor.setup()
pop_workflow = StdWorkflow()
pop_workflow.setup(
algorithm=pop_algorithm,
problem=vmapped_problem,
solution_transform=adapter,
monitor=monitor,
device=device,
)
print("Upon gradient descent, the population-based neuroevolution process start. ")
neuroevolution_process(
workflow=pop_workflow,
adapter=adapter,
model=model,
test_loader=pre_test_loader,
device=device,
best_acc=gd_acc,
max_generation=10,
)
Upon gradient descent, the population-based neuroevolution process start.
In generation 0:
Time elapsed: 3.7248(s).
Top fitness: tensor([0.8886, 1.4018, 1.6117], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 1:
Time elapsed: 4.2092(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 2:
Time elapsed: 3.8874(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 3:
Time elapsed: 3.5654(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 4:
Time elapsed: 3.3940(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 5:
Time elapsed: 3.4152(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 6:
Time elapsed: 3.2818(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 7:
Time elapsed: 3.0669(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 8:
Time elapsed: 3.1275(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
In generation 9:
Time elapsed: 3.1362(s).
Top fitness: tensor([0.4963, 0.6073, 0.6423], device='cuda:0')
Best accuracy: 89.3000 %.
pop_workflow.get_submodule("monitor").plot()