evox.problems.neuroevolution.brax#

模块内容#

#

BraxProblem

Brax 问题封装器。

函数#

数据#

API#

evox.problems.neuroevolution.brax.__all__#

['BraxProblem']

evox.problems.neuroevolution.brax.to_jax_array(x: torch.Tensor) jax.Array#
evox.problems.neuroevolution.brax.from_jax_array(x: jax.Array, device: Optional[torch.device] = None) torch.Tensor#
evox.problems.neuroevolution.brax.__brax_data__: Dict[int, Tuple[Callable[[jax.Array], brax.envs.State], Callable[[brax.envs.State, jax.Array], brax.envs.State], Callable[[Dict[str, torch.Tensor], torch.Tensor], Tuple[Dict[str, torch.Tensor], torch.Tensor]], List[str]]]#

没有可翻译的文本。

evox.problems.neuroevolution.brax._evaluate_brax_main(env_id: int, pop_size: int, rotate_key: bool, num_episodes: int, max_episode_length: int, key: torch.Tensor, model_state: List[torch.Tensor]) Tuple[torch.Tensor, List[torch.Tensor], torch.Tensor]#
evox.problems.neuroevolution.brax._evaluate_brax(env_id: int, pop_size: int, rotate_key: bool, num_episodes: int, max_episode_length: int, key: torch.Tensor, model_state: List[torch.Tensor]) Tuple[torch.Tensor, List[torch.Tensor], torch.Tensor]#
evox.problems.neuroevolution.brax._fake_evaluate_brax(env_id: int, pop_size: int, rotate_key: bool, num_episodes: int, max_episode_length: int, key: torch.Tensor, model_state: List[torch.Tensor]) Tuple[torch.Tensor, List[torch.Tensor], torch.Tensor]#
evox.problems.neuroevolution.brax._evaluate_brax_vmap_main(batch_size: int, in_dim: List[int], env_id: int, pop_size: int, rotate_key: bool, num_episodes: int, max_episode_length: int, key: torch.Tensor, model_state: List[torch.Tensor]) Tuple[torch.Tensor, List[torch.Tensor], torch.Tensor]#
evox.problems.neuroevolution.brax._evaluate_brax_vmap(vmap_info: evox.utils.VmapInfo, in_dims: Tuple[int | None | List[int], ...], env_id: int, pop_size: int, rotate_key: bool, num_episodes: int, max_episode_length: int, key: torch.Tensor, model_state: List[torch.Tensor]) Tuple[Tuple[torch.Tensor, List[torch.Tensor], torch.Tensor], Tuple[int | None, List[int], int]]#
evox.problems.neuroevolution.brax._fake_evaluate_brax_vmap(batch_size: int, in_dim: List[int], env_id: int, pop_size: int, rotate_key: bool, num_episodes: int, max_episode_length: int, key: torch.Tensor, model_state: List[torch.Tensor]) Tuple[torch.Tensor, List[torch.Tensor], torch.Tensor]#
class evox.problems.neuroevolution.brax.BraxProblem(policy: torch.nn.Module, env_name: str, max_episode_length: int, num_episodes: int, seed: int = None, pop_size: int | None = None, rotate_key: bool = True, reduce_fn: Callable[[torch.Tensor, int], torch.Tensor] = torch.mean, backend: str | None = None, device: torch.device | None = None)#

Bases: evox.core.Problem

Brax 问题封装器。

初始化

构建一个基于 Brax 的问题。首先,您需要定义一个策略模型。然后,您需要设置environment name <https://github.com/google/brax/tree/main/brax/envs>,最大 episode 长度,以及用于评估每个个体的 episode 数量。对于每个个体,它将在环境中运行策略 num_episodes 次,每次使用不同的种子,并使用 reduce_fn 函数对奖励进行归约(默认为取平均值)。在每次迭代中,不同个体将共享同一组随机键。

参数:
  • policy -- 该策略模型的前向函数是: `forward(batched_obs) -> action

  • env_name -- 环境名。

  • max_episode_length -- 每个episode的最大时间步数。

  • num_episodes -- 用于评估每个个体的幕数量。

  • seed -- The seed used to create a PRNGKey for the brax environment. When None, randomly select one. Default to None.

  • pop_size -- 要评估的种群大小。如果为 None,我们期望输入的种群大小为 1。

  • rotate_key -- 表示是否在每次迭代中旋转随机键(默认为 True)。
    如果为 True,随机键将在每次迭代后旋转,从而导致非确定性且可能有噪声的适应度评估。这意味着相同的策略权重在不同迭代中可能会产生不同的适应度值。
    如果为 False,随机键在所有迭代中保持不变,以确保适应度评估的一致性。

  • reduce_fn -- 用于减少多个episode奖励的函数。默认值为torch.mean。

  • backend -- Brax 的后端。如果为 None,将使用环境的默认后端。默认为 None。

  • device -- 用于运行计算的设备。默认为当前默认设备。

注意

初始密钥是从 torch.random.get_rng_state() 获得的。

警告

This problem does NOT support HPO wrapper (problems.hpo_wrapper.HPOProblemWrapper) out-of-box, i.e., the workflow containing this problem CANNOT be vmapped. However, by setting pop_size to the multiplication of inner population size and outer population size, you can still use this problem in a HPO workflow.

示例

from evox import problems problem = problems.neuroevolution.Brax( ... env_name="swimmer", ... policy=model, ... max_episode_length=1000, ... num_episodes=3, ... pop_size=100, ... rotate_key=False, ...)

_evaluate_brax_record(model_state: Dict[str, torch.Tensor]) Tuple[Dict[str, torch.Tensor], torch.Tensor, List[Any]]#
evaluate(pop_params: Dict[str, torch.nn.Parameter]) torch.Tensor#

评估一组模型参数种群(批量)的最终奖励。

参数:

pop_params -- 参数字典,其中每个键是参数名称,每个值是形状为 (batch_size, *param_shape) 的张量,表示批量模型的批量参数。

返回:

形状为 (batch_size,) 的张量,包含种群中每个样本的奖励。

visualize(weights: Dict[str, torch.nn.Parameter], seed: int = 0, output_type: str = 'HTML', *args, **kwargs) str | torch.Tensor#

可视化带有指定策略和权重的brax环境。

参数:
  • weights -- 策略模型的权重。这是一个参数字典。

  • output_type -- 可视化的输出类型,“HTML”或“rgb_array”。默认为“HTML”。

返回:

可视化输出。