!44 Update Nav Doc

王昱凯 · HanqingWangAI · commit e2ca2a50fa75 · 2025-09-17T10:29:48.000Z
* update nav doc
diff --git a/source/en/user_guide/internnav/quick_start/create_model.md b/source/en/user_guide/internnav/quick_start/create_model.md
@@ -0,0 +1,122 @@
+# Create Your Model and Agent
+
+## Development Overview
+The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then make model to predict and response to the client.
+
+The InternNav project adopts a modular design, allowing developers to easily add new navigation algorithms.
+The main components include:
+
+- **Model**: Implements the specific neural network architecture and inference logic
+
+- **Agent**: Serves as a wrapper for the Model, handling environment interaction and data preprocessing
+
+- **Config**: Defines configuration parameters for the model and training
+
+## Custom Model
+A Model is the concrete implementation of your algorithm. Implement model under `baselines/models`. A model ideally would inherit from the base model and implement the following key methods:
+
+- `forward(train_batch) -> dict(output, loss)`
+- `inference(obs_batch, state) -> output_for_agent`
+
+## Create a Custom Config Class
+
+In the model file, define a `Config` class that inherits from `PretrainedConfig`.
+A reference implementation is `CMAModelConfig` in [`cma_model.py`](../internnav/model/cma/cma_policy.py).
+
+## Registration and Integration
+
+In [`internnav/model/__init__.py`](../internnav/model/__init__.py):
+- Add the new model to `get_policy`.
+- Add the new model's configuration to `get_config`.
+
+## Create a Custom Agent
+
+The Agent handles interaction with the environment, data preprocessing/postprocessing, and calls the Model for inference.
+A custom Agent usually inherits from [`Agent`](../internnav/agent/base.py) and implements the following key methods:
+
+- `reset()`: Resets the Agent's internal state (e.g., RNN states, action history). Called at the start of each episode.
+- `inference(obs)`: Receives environment observations `obs`, performs preprocessing (e.g., tokenizing instructions, padding), calls the model for inference, and returns an action.
+- `step(obs)`: The external interface, usually calls `inference`, and can include logging or timing.
+
+Example: [`CMAAgent`](../internnav/agent/cma_agent.py)
+
+For each step, the agent should expect an observation from environment.
+
+For the vln benchmark under internutopia:
+
+```
+action = self.agent.step(obs)
+```
+**obs** has format:
+```
+obs = [{
+    'globalgps': [X, Y, Z]              # robot location
+    'globalrotation': [X, Y, Z, W]      # robot orientation in quaternion
+    'rgb': np.array(256, 256, 3)        # rgb camera image
+    'depth': np.array(256, 256, 1)      # depth image
+}]
+```
+**action** has format:
+```
+action = List[int]                      # action for each environments
+# 0: stop
+# 1: move forward
+# 2: turn left
+# 3: turn right
+```
+
+## Create a Trainer
+
+The Trainer manages the training loop, including data loading, forward pass, loss calculation, and backpropagation.
+A custom trainer usually inherits from the [`Base Trainer`](../internnav/trainer/base.py) and implements:
+
+- `train_epoch()`: Runs one training epoch (batch iteration, forward pass, loss calculation, parameter update).
+- `eval_epoch()`: Evaluates the model on the validation set and records metrics.
+- `save_checkpoint()`: Saves model weights, optimizer state, and training progress.
+- `load_checkpoint()`: Loads pretrained models or resumes training.
+
+Example: [`CMATrainer`](../internnav/trainer/cma_trainer.py) shows how to handle sequence data, compute action loss, and implement imitation learning.
+
+## Training Data
+
+The training data is under `data/vln_pe/traj_data`. Our dataset provides trajectory data collected from the H1 robot as it navigates through the task environment.
+Each observation in the trajectory is paired with its corresponding action.
+
+You may also incorporate external datasets to improve model generalization.
+
+## Evaluation Data
+In `raw_data/val`, for each task, the model should guide the robot at the start position and rotation to the target position with language instruction.
+
+## Set the Corresponding Configuration
+
+Refer to existing **training** configuration files for customization:
+
+- **CMA Model Config**: [`cma_exp_cfg`](../scripts/train/configs/cma.py)
+
+Configuration files should define:
+- `ExpCfg` (experiment config)
+- `EvalCfg` (evaluation config)
+- `IlCfg` (imitation learning config)
+
+Ensure your configuration is imported and registered in [`__init__.py`](../scripts/train/configs/__init__.py).
+
+Key parameters include:
+- `name`: Experiment name
+- `model_name`: Must match the name used during model registration
+- `batch_size`: Batch size
+- `lr`: Learning rate
+- `epochs`: Number of training epochs
+- `dataset_*_root_dir`: Dataset paths
+- `lmdb_features_dir`: Feature storage path
+
+Refer to existing **evaluation** config files for customization:
+
+- **CMA Model Evaluation Config**: [`h1_cma_cfg.py`](../scripts/eval/configs/h1_cma_cfg.py)
+
+Main fields:
+- `name`: Evaluation experiment name
+- `model_name`: Must match the name used during training
+- `ckpt_to_load`: Path to the model checkpoint
+- `task`: Define the tasks settings, number of env, scene, robots
+- `dataset`: Load r2r or interiornav dataset
+- `split`: Dataset split (`val_seen`, `val_unseen`, `test`, etc.)
diff --git a/source/en/user_guide/internnav/quick_start/index.md b/source/en/user_guide/internnav/quick_start/index.md
@@ -14,4 +14,6 @@ myst:
 
 installation
 train_eval
+vln_evaluation
+create_model
 ```
diff --git a/source/en/user_guide/internnav/quick_start/installation.md b/source/en/user_guide/internnav/quick_start/installation.md
@@ -192,12 +192,43 @@ Choose the environment that best fits your specific needs to optimize your exper
 
 Before proceeding with the installation, ensure that you have [Isaac Sim 4.5.0](https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html) and [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) installed.
 
+**Pull our latest Docker image with everything you need**
+```bash
+$ docker pull crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2
+```
+
+Run the container
+```bash
+$ xhost +local:root # Allow the container to access the display
+
+$ cd PATH/TO/INTERNNAV/
+
+$ docker run --name internnav -it --rm --gpus all --network host \
+  -e "ACCEPT_EULA=Y" \
+  -e "PRIVACY_CONSENT=Y" \
+  -e "DISPLAY=${DISPLAY}" \
+  --entrypoint /bin/bash \
+  -w /root/InternNav \
+  -v /tmp/.X11-unix/:/tmp/.X11-unix \
+  -v ${PWD}:/root/InternNav \
+  -v ${HOME}/docker/isaac-sim/cache/kit:/isaac-sim/kit/cache:rw \
+  -v ${HOME}/docker/isaac-sim/cache/ov:/root/.cache/ov:rw \
+  -v ${HOME}/docker/isaac-sim/cache/pip:/root/.cache/pip:rw \
+  -v ${HOME}/docker/isaac-sim/cache/glcache:/root/.cache/nvidia/GLCache:rw \
+  -v ${HOME}/docker/isaac-sim/cache/computecache:/root/.nv/ComputeCache:rw \
+  -v ${HOME}/docker/isaac-sim/logs:/root/.nvidia-omniverse/logs:rw \
+  -v ${HOME}/docker/isaac-sim/data:/root/.local/share/ov/data:rw \
+  -v ${HOME}/docker/isaac-sim/documents:/root/Documents:rw \
+  -v ${PWD}/data/scene_data/mp3d_pe:/isaac-sim/Matterport3D/data/v1/scans:rw \
+  crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.0
+```
+
 <!-- To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
 ```bash
 docker pull registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
 docker run -it --name internutopia-container registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
 ``` -->
-#### Conda installation
+#### Conda installation from Scretch
 ```bash
 conda create -n <env> python=3.10 libxcb=1.14
 
@@ -253,19 +284,42 @@ pip install -r requirements/habitat_requirements.txt
 ### Data/Checkpoints Preparation
 To get started, we need to prepare the data and checkpoints.
 1. **InternVLA-N1 pretrained Checkpoints**
-Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
+- Download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
 2. **DepthAnything v2 Checkpoints**
-Please download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
-3. **InternData-N1 VLN-CE Episodes**
-Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for `vln-ce`. Extract them into the `data/vln_ce/` directory.
+- Download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
+3. **InternData-N1 Dataset Episodes**
+- Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1). Extract them into the `data/vln_ce/` and `data/vln_pe/` directory.
 4. **Scene-N1**
-Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce`. Extract them into the `data/scene_data/` directory.
+- Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce`. Extract them into the `data/scene_data/` directory.
+5. **Embodiments**
+- Download the [Embodiments](https://huggingface.co/datasets/InternRobotics/Embodiments) for the `Embodiments/`
+
+6. **Baseline models**
+```bash
+# ddppo-models
+$ mkdir -p checkpoints/ddppo-models
+$ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth
+# longclip-B
+$ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long
+# download r2r finetuned baseline checkpoints
+$ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/
+```
 
 The final folder structure should look like this:
 
 ```bash
 InternNav/
 ├── data/
+│   ├── scene_data/
+│   │   ├── mp3d_ce/
+│   │   │   └── mp3d/
+│   │   │       ├── 17DRP5sb8fy/
+│   │   │       ├── 1LXtFkjw3qL/
+│   │   │       └── ...
+│   │   └── mp3d_pe/
+│   │       ├──17DRP5sb8fy/
+│   │       ├── 1LXtFkjw3qL/
+│   │       └── ...
 │   ├── vln_ce/
 │   │   ├── raw_data/
 │   │   │   ├── r2r
@@ -275,26 +329,36 @@ InternNav/
 │   │   │   │   └── val_unseen
 │   │   │   │       └── val_unseen.json.gz
 │   │   └── traj_data/
-│   ├── scene_data/
-│   │   ├── mp3d_ce/
-│   │   │   ├── mp3d/
-│   │   │   │   ├── 17DRP5sb8fy/
-│   │   │   │   ├── 1LXtFkjw3qL/
-│   │   │   │   └── ...
-
-├── src/
-│   ├── ...
-
+│   └── vln_pe/
+│       ├── raw_data/    # JSON files defining tasks, navigation goals, and dataset splits
+│       │   └── r2r/
+│       │       ├── train/
+│       │       ├── val_seen/
+│       │       │   └── val_seen.json.gz
+│       │       └── val_unseen/
+│       └── traj_data/   # training sample data for two types of scenes
+│           ├── interiornav/
+│           │   └── kujiale_xxxx.tar.gz
+│           └── r2r/
+│               └── trajectory_0/
+│                   ├── data/
+│                   ├── meta/
+│                   └── videos/
 ├── checkpoints/
 │   ├── InternVLA-N1/
 │   │   ├── model-00001-of-00004.safetensors
 │   │   ├── config.json
-│   │   ├── ...
+│   │   └── ...
 │   ├── InternVLA-N1-S2
 │   │   ├── model-00001-of-00004.safetensors
 │   │   ├── config.json
-│   │   ├── ...
-│   │   depth_anything_v2_vits.pth
+│   │   └── ...
+│   depth_anything_v2_vits.pth
+│   ├── r2r
+│   │   ├── fine_tuned
+│   │   └── zero_shot
+├── internnav/
+│   └── ...
 ```
 ### Gradio demo
 
@@ -373,4 +437,3 @@ data/
 └── vln_n1/
     └── traj_data/
 ```
-
diff --git a/source/en/user_guide/internnav/quick_start/train_eval.md b/source/en/user_guide/internnav/quick_start/train_eval.md
@@ -30,6 +30,15 @@ INTERNUTOPIA_ASSETS_PATH=/path/to/InternUTopiaAssets MESA_GL_VERSION_OVERRIDE=4.
 
 The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
 
+Also, the Baselines can directly run:
+```bash
+# seq2seq model
+./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py
+# cma model
+./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py
+# rdp model
+./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py
+```
 
 #### Evaluation on habitat
 Evaluate on Single-GPU:
@@ -217,11 +226,11 @@ python -m internnav.agent.utils.server --config scripts/eval/configs/h1_xxx_cfg.
 Start Evaluation:
 ```bash
 # seq2seq model
-./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py 
+./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py
 # cma model
-./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py 
+./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py
 # rdp model
-./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py 
+./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py
 ```
 
-The evaluation results will be saved in the `eval_results.log` file in the `output_dir` of the config file. 
+The evaluation results will be saved in the `eval_results.log` file in the `output_dir` of the config file.
diff --git a/source/en/user_guide/internnav/quick_start/vln_evaluation.md b/source/en/user_guide/internnav/quick_start/vln_evaluation.md
@@ -7,6 +7,7 @@ The main architecture of the evaluation code adopts a client-server model. In th
 ## Supported baselines
 - InternVLA-N1
 - CMA (Cross-Modal Attention)
+- RDP (Recurrent Diffusion Policy)
 - Navid (RSS2023)
 - Seq2Seq Policy
 
@@ -15,14 +16,15 @@ The main architecture of the evaluation code adopts a client-server model. In th
 - Matterport3D
 
 ## Evaluation Metrics
-
 The project provides comprehensive evaluation metrics:
 
-- **Success Rate (SR)**: Proportion of successful goal arrivals
-- **Success weighted by Path Length (SPL)**: Success rate weighted by path length
-- **Navigation Error (NE)**: Distance error to target point
-- **Oracle Success Rate (OSR)**: Success rate on optimal paths
-- **Trajectory Length**: Actual trajectory length
+- **Success Rate (SR)**: Proportion of episodes where the agent reaches the goal location within 3m
+- **SPL**: Success weighted by Path Length
+- **Trajectory Length (TL)**: Total length of the trajectory (m)
+- **Navigation Error (NE)**: Euclidean distance between the agent's final position and the goal (m)
+- **OS Oracle Success Rate (OSR)**: Whether any point along the predicted trajectory reaches the goal within 3m
+- **Fall Rate (FR)**: Frequency of the agent falling during navigation
+- **Stuck Rate (StR)**: Frequency of the agent becoming stuck during navigation
 
 
 # Quick Start for Evaluation
@@ -47,15 +49,15 @@ eval_cfg = EvalCfg(
     env=EnvCfg(
         env_type='vln_multi',
         env_settings={
-            'use_fabric': False,
-            'headless': True,
+            'use_fabric': True,     # improve simulation efficiency
+            'headless': True,       # display option: set to False will open isaac-sim interactive window
         },
     ),
     task=TaskCfg(
         task_name='test',
         task_settings={
-            'env_num': 1,
-            'use_distributed': False,
+            'env_num': 1,           # number of env in one isaac sim
+            'use_distributed': False,       # Ray distributed framework
             'proc_num': 1,
         },
         scene=SceneCfg(
@@ -68,11 +70,16 @@ eval_cfg = EvalCfg(
         camera_resolution=[640, 480] # (W,H)
     ),
     dataset=EvalDatasetCfg(
+        dataset_type="mp3d",
         dataset_settings={
             'base_data_dir': '/path/to/R2R_VLNCE_v1-3',
             'split_data_types': ['val_unseen'],
             'filter_stairs': True,
         },
+    eval_settings={
+        'save_to_json': False,      # evaluation result saved in separate json file
+        'vis_output': True          # save simulation progress to video under logs/
+    }
     ),
 ```
 ## 3. Launch the server