Skip to content
View alexpalms's full-sized avatar
🚀
🚀

Organizations

@diambra

Block or report alexpalms

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
alexpalms/README.md

👋 Hi, I'm Alex

Senior AI / ML Research Engineer
Specializing in LLMs / VLMs / VLAs, Reinforcement Learning, and Embodied / Physical AI


🧠 About Me

I’m a senior AI/ML research engineer with 15+ years of experience building intelligent agents and AI systems that understand and reason about the world. My work spans foundation models, reinforcement learning, and physics-based simulation, continuously expanding into VLMs, VLAs, and embodied multimodal AI.

Currently part of the core research and engineering team at LawZero, a non-profit AI lab in Montreal led by Yoshua Bengio, I focus on advancing truthful, transparent, and safe-by-design AI through next-generation foundational models, and reasoning architectures.

My passion is right at the intersection of AI, simulation, and decision-making. I focus on:

  • 🧠 Foundation Models & Alignment: LLM / VLM / VLA fine-tuning and alignment (SFT, DPO, RLHF/RLAIF) with a focus on interpretable, safe, and truthful reasoning.
  • 🧭 Reinforcement Learning: Online/offline, adversarial, multi-agent, imitation learning, and human-in-the-loop optimization.
  • 🛰️ Embodied AI & High-Fidelity Simulation: Isaac Lab, Genesis, and physics-based virtual environments for grounded RL and world modeling.

🔗 Connect with Me

I’m always happy to chat with people who share my interests and passions. Feel free to reach out!


🚀 Featured Projects

Here are a few of my public repositories that reflect my work and interests:

🔀 Extending GPU-Native RSL-RL Library 🛡️ RL Drone Swarm Defense 🕹️ DIAMBRA Arena 🤖 DIAMBRA Agents
RL-Drone-Swarm RL-Drone-Swarm DIAMBRA Arena DIAMBRA Agents
Customized fork of RSL-RL library that supports multi-discrete action spaces. Counter-kamikaze drone swarm with multi-agent reinforcement learning in a realistic simulation. A platform to train reinforcement learning agents in classic retro fighting games. A library of reinforcement learning algorithms tailored for DIAMBRA Arena environments.

✨ What I’m Working On Now

  • Experimenting with adversarial RL for LLM alignment fine-tuning
  • Exploring speculative decoding for fast LLM inference
  • Learning Triton to optimize GPU workloads for small-scale LLMs
  • Testing agentic frameworks for multi-step coordination
  • Moving toward Vision-Language-Action models

Pinned Loading

  1. diambra/arena diambra/arena Public

    DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation

    Python 349 26

  2. diambra/agents diambra/agents Public

    Example Agents for DIAMBRA Arena Environments

    Python 17 6

  3. discrete_rsl_rl discrete_rsl_rl Public

    Forked from leggedrobotics/rsl_rl

    Customized version of the original RSL RL project that additionally supports multi discrete action spaces, providing a fast and simple implementation of PPO algorithm, designed to run fully on GPU.

    Python 1

  4. deeprl-counter-uav-swarm deeprl-counter-uav-swarm Public

    This repository contains a reinforcement learning framework for decision-level interception prioritization of drone swarms. It evaluates RL agent performance against heuristic methods in simulation…

    Python 8

  5. DIAMBRA-Arena-MARL-TLeague DIAMBRA-Arena-MARL-TLeague Public

    A customization of Tencent’s TLeague for self-play in DIAMBRA Arena. Enables population-based adversarial training in multi-agent environments using league-style training loops and dynamic opponent…

    1

  6. diambra-game-painter diambra-game-painter Public

    This project is an experiment that applies in real-time the style of famous paintings to popular fighting retro games, which are provided as Reinforcement Learning environments by DIAMBRA. It is ba…

    Python 1