ECo-MoE: Embodiment-Conditioned Mixture of Experts Increases the Evolvability of Robots

ICML 2026

Yibin Wang, Muhan Li, Zihan Guo, Sam Kriegman

Northwestern University

Abstract

In this paper, we introduce a model of evolution and learning in robots that co-optimizes a distribution of latent design vectors (genotypes) and a mixture of control experts (neural modules), which are gated by the latent coordinates of each decoded design (phenotype). This provides a scalable alternative to co-design algorithms that either train an individual policy for every robot, which is inefficient, or a monolithic universal controller for all robots, which results in overly conservative structures and behaviors. Our approach lies somewhere between these two extremes, preserving ancestral knowledge in a unified yet modular framework in which different body plans activate and deactivate different combinations of learned sensorimotor circuits for goal-directed behavior. This allows one part of the controller to be overhauled to better suit new species of designs as they emerge without disrupting the hard-earned knowledge contained within other expert modules. It also allows pretrained expert policies to be directly plugged into the mixture, which can steer evolution into otherwise unexplored areas of latent space containing desired morphological traits. We refer to this process as "evo by demo" and explore how it may be used to guide freeform evolution toward canonical structures defined by the pretrained model.

Method

Overview of the ECo-MoE method
Embodiment-conditioned mixture of experts. Designs were sampled from an evolving distribution within a latent space of possible genotypes (A and B). The distribution was initialized randomly for the main experiment (blue region in B); for "evo by demo", it was regularized by a predesigned demo (orange region in B). The latent genotype of each endoskeletal phenotype (C and D) was fed as input to a gating network that produces a weighted policy output π(at | st, z) by mixing expert actions (E). In evo-by-demo, a policy was pretrained for the predesigned demo and injected as a frozen expert into the mixture. Designs with latent genes similar to the predefined species were routed with greater weight to the pretrained expert (orange bar in D), steering evolution toward the desired phenotypic traits.

Results

ECo-MoE results across task environments
Task environments. We considered three task environments: Flat Ground (A), Upright Locomotion (on flat ground; B), and Potholes (C). In each one, five independent evolutionary trials were conducted, and the peak fitness achieved by each design was averaged across the population before plotting the cumulative max (higher is better; D-F). Evolution with an ECo-MoE controller (blue curves) is compared against evolution with the non-modular (single expert) baseline controller (red); shaded regions indicate 95% bootstrapped confidence intervals. In all three environments net displacement is rewarded. Upright Locomotion adds a second component to the reward function that tracks the proportion of body voxels that fall below a prespecified height threshold during behavior. These results show that, although ECo-MoE fell into the same local optimum as the Baseline on Flat Ground, it significantly increased the evolvability of Upright Locomotion as well as locomotion across Potholes.
Top robot designs across evolutionary time
Top five designs at different points in evolutionary time. The best designs from a randomly initialized population (A-E), early (F-J) and late (K-O) in evolution, and from the final population (P-T) are shown for a representative Upright Locomotion trial. Each body plan contains an internal jointed skeleton (multicolored segments) surrounded by soft tissue (magenta line). In the top right hand corner of each design panel, an inset bar plot shows how weight was routed across the four experts (blue rectangles) to control the given body, with taller and shorter bars corresponding to more and less weight, respectively. Below each design, part of the genotype (every 16th element of the vector) is visualized by upward (dark pink) and downward (light pink) bars representing positive and negative values, respectively, and with bar length proportional to the absolute value of the component.
Evo-by-demo results and evolved morphologies
Evo by demo. A predesigned body plan (the demo; A) and its pretrained expert policy were used to initialize and guide evolution toward morphologies with similar phenotypic traits. A representative evolved morphology is shown for each of the three tested variants of evo-by-demo: without a predesigned latent initialization (B; PretrainOnly), without a frozen pretrained expert (D; PredesignOnly), and with both predesigning and pretraining (D; CoSteering). Five independent evolutionary trials of each variant were conducted for Upright Locomotion and compared against each other as well as the basic, randomly initialized ECo-MoE and its non-modular baseline. Two morphological metrics were used to track evolution relative to the demo: the effective-limb-count (E) and the mass-bias-vector-magnitude, which quantifies asymmetry in the robot's mass distribution relative to its root segment (F). Fitness was also tracked (G). Under the tested conditions, CoSteering (purple lines in E-G) was the most effective way to guide evolution toward the demo's radially symmetrical quadrupedal form (orange dashed lines in E and F), but the highest fitness was achieved without pretraining (yellow line in G; PredesignOnly). Overall we see that a good demo, such as this one, can significantly increase the fitness of evolved designs, but only if initialized about the predesigned latent (G). Shaded regions indicate 95% bootstrapped confidence intervals of the mean.

Citation

Citation will be added later.