Capstone Project: Autonomous Humanoid

This Capstone Project challenges you to integrate the concepts learned throughout this module and previous modules to design an end-to-end autonomous humanoid system in a simulated environment. The goal is to enable a humanoid robot to respond to high-level voice commands, plan its actions, navigate complex environments, perceive objects, and manipulate them.

Project Overview: The "Clean the Room" Challenge

Imagine a humanoid robot in a cluttered virtual room. Your task is to enable it to understand a voice command like "Clean the room" or "Put the books on the shelf" and execute it autonomously.

Core Components to Integrate:

Voice Command Reception: The system should be able to receive a voice command from a human user.
- Technology: Leverage concepts from the Voice-to-Action section (e.g., OpenAI Whisper for speech-to-text).
LLM-Based Task Planning: An LLM will be central to interpreting the voice command, decomposing it into a series of sub-tasks, and generating a high-level action plan.
- Technology: Apply principles from LLM-Based Cognitive Planning. The LLM acts as the central reasoning engine.
Simulated Environment: The project will be conducted in a simulated environment (e.g., in NVIDIA Isaac Sim, integrated with ROS 2) where the humanoid robot operates.
- Technology: Concepts from Module 2 (Digital Twins) and Module 3 (Isaac Sim) will be relevant.
Autonomous Navigation: The robot must be able to navigate within the simulated environment, avoiding static and dynamic obstacles to reach target locations.
- Technology: Integrate principles of Nav2 path planning (Module 3) adapted for humanoid locomotion.
Object Identification (Computer Vision): The robot needs to identify specific objects within the environment. For "Clean the room," it would need to identify items to be picked up.
- Technology: Apply AI-based perception pipelines (Module 3) and potentially visual perception data from Isaac Sim.
Object Manipulation: Once an object is identified and located, the robot must be able to manipulate it (e.g., grasp, lift, place).
- Technology: This involves the robot's end-effector control and inverse kinematics (concepts from previous modules).

End-to-End Autonomous Behavior Loop

The Capstone Project will involve implementing the full perception → planning → navigation → manipulation loop. The robot continuously uses its sensors to perceive the environment, reports this to the LLM-based planner, receives new commands, navigates, and manipulates.

Project Deliverables

System Design Document: Outline the architecture, including how each component (Whisper, LLM planner, navigation stack, vision system, manipulation control) communicates.
Simulated Environment: A simple cluttered room environment with a humanoid robot model capable of basic navigation and manipulation.
Voice Command Interface: A method to input voice commands (e.g., using a microphone and a Whisper API/model).
LLM Integration: Code demonstrating how the transcribed voice command is fed to an LLM, and how the LLM's response is parsed into executable robot actions.
Demonstration: A video or live simulation demonstrating the humanoid robot performing a high-level task (e.g., "Clean the table" by picking up objects and placing them in a bin) based on a voice command.

This project aims to provide a practical understanding of how VLA systems bring truly intelligent and autonomous behavior to humanoid robots.

Project Overview: The "Clean the Room" Challenge​

Core Components to Integrate:​

End-to-End Autonomous Behavior Loop​

Project Deliverables​

Project Overview: The "Clean the Room" Challenge

Core Components to Integrate:

End-to-End Autonomous Behavior Loop

Project Deliverables