Skip to content

Commit e2ca2a5

Browse files
王昱凯HanqingWangAI
authored andcommitted
!44 Update Nav Doc
* update nav doc
1 parent 981f509 commit e2ca2a5

File tree

5 files changed

+237
-34
lines changed

5 files changed

+237
-34
lines changed
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Create Your Model and Agent
2+
3+
## Development Overview
4+
The main architecture of the evaluation code adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then make model to predict and response to the client.
5+
6+
The InternNav project adopts a modular design, allowing developers to easily add new navigation algorithms.
7+
The main components include:
8+
9+
- **Model**: Implements the specific neural network architecture and inference logic
10+
11+
- **Agent**: Serves as a wrapper for the Model, handling environment interaction and data preprocessing
12+
13+
- **Config**: Defines configuration parameters for the model and training
14+
15+
## Custom Model
16+
A Model is the concrete implementation of your algorithm. Implement model under `baselines/models`. A model ideally would inherit from the base model and implement the following key methods:
17+
18+
- `forward(train_batch) -> dict(output, loss)`
19+
- `inference(obs_batch, state) -> output_for_agent`
20+
21+
## Create a Custom Config Class
22+
23+
In the model file, define a `Config` class that inherits from `PretrainedConfig`.
24+
A reference implementation is `CMAModelConfig` in [`cma_model.py`](../internnav/model/cma/cma_policy.py).
25+
26+
## Registration and Integration
27+
28+
In [`internnav/model/__init__.py`](../internnav/model/__init__.py):
29+
- Add the new model to `get_policy`.
30+
- Add the new model's configuration to `get_config`.
31+
32+
## Create a Custom Agent
33+
34+
The Agent handles interaction with the environment, data preprocessing/postprocessing, and calls the Model for inference.
35+
A custom Agent usually inherits from [`Agent`](../internnav/agent/base.py) and implements the following key methods:
36+
37+
- `reset()`: Resets the Agent's internal state (e.g., RNN states, action history). Called at the start of each episode.
38+
- `inference(obs)`: Receives environment observations `obs`, performs preprocessing (e.g., tokenizing instructions, padding), calls the model for inference, and returns an action.
39+
- `step(obs)`: The external interface, usually calls `inference`, and can include logging or timing.
40+
41+
Example: [`CMAAgent`](../internnav/agent/cma_agent.py)
42+
43+
For each step, the agent should expect an observation from environment.
44+
45+
For the vln benchmark under internutopia:
46+
47+
```
48+
action = self.agent.step(obs)
49+
```
50+
**obs** has format:
51+
```
52+
obs = [{
53+
'globalgps': [X, Y, Z] # robot location
54+
'globalrotation': [X, Y, Z, W] # robot orientation in quaternion
55+
'rgb': np.array(256, 256, 3) # rgb camera image
56+
'depth': np.array(256, 256, 1) # depth image
57+
}]
58+
```
59+
**action** has format:
60+
```
61+
action = List[int] # action for each environments
62+
# 0: stop
63+
# 1: move forward
64+
# 2: turn left
65+
# 3: turn right
66+
```
67+
68+
## Create a Trainer
69+
70+
The Trainer manages the training loop, including data loading, forward pass, loss calculation, and backpropagation.
71+
A custom trainer usually inherits from the [`Base Trainer`](../internnav/trainer/base.py) and implements:
72+
73+
- `train_epoch()`: Runs one training epoch (batch iteration, forward pass, loss calculation, parameter update).
74+
- `eval_epoch()`: Evaluates the model on the validation set and records metrics.
75+
- `save_checkpoint()`: Saves model weights, optimizer state, and training progress.
76+
- `load_checkpoint()`: Loads pretrained models or resumes training.
77+
78+
Example: [`CMATrainer`](../internnav/trainer/cma_trainer.py) shows how to handle sequence data, compute action loss, and implement imitation learning.
79+
80+
## Training Data
81+
82+
The training data is under `data/vln_pe/traj_data`. Our dataset provides trajectory data collected from the H1 robot as it navigates through the task environment.
83+
Each observation in the trajectory is paired with its corresponding action.
84+
85+
You may also incorporate external datasets to improve model generalization.
86+
87+
## Evaluation Data
88+
In `raw_data/val`, for each task, the model should guide the robot at the start position and rotation to the target position with language instruction.
89+
90+
## Set the Corresponding Configuration
91+
92+
Refer to existing **training** configuration files for customization:
93+
94+
- **CMA Model Config**: [`cma_exp_cfg`](../scripts/train/configs/cma.py)
95+
96+
Configuration files should define:
97+
- `ExpCfg` (experiment config)
98+
- `EvalCfg` (evaluation config)
99+
- `IlCfg` (imitation learning config)
100+
101+
Ensure your configuration is imported and registered in [`__init__.py`](../scripts/train/configs/__init__.py).
102+
103+
Key parameters include:
104+
- `name`: Experiment name
105+
- `model_name`: Must match the name used during model registration
106+
- `batch_size`: Batch size
107+
- `lr`: Learning rate
108+
- `epochs`: Number of training epochs
109+
- `dataset_*_root_dir`: Dataset paths
110+
- `lmdb_features_dir`: Feature storage path
111+
112+
Refer to existing **evaluation** config files for customization:
113+
114+
- **CMA Model Evaluation Config**: [`h1_cma_cfg.py`](../scripts/eval/configs/h1_cma_cfg.py)
115+
116+
Main fields:
117+
- `name`: Evaluation experiment name
118+
- `model_name`: Must match the name used during training
119+
- `ckpt_to_load`: Path to the model checkpoint
120+
- `task`: Define the tasks settings, number of env, scene, robots
121+
- `dataset`: Load r2r or interiornav dataset
122+
- `split`: Dataset split (`val_seen`, `val_unseen`, `test`, etc.)

source/en/user_guide/internnav/quick_start/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,6 @@ myst:
1414
1515
installation
1616
train_eval
17+
vln_evaluation
18+
create_model
1719
```

source/en/user_guide/internnav/quick_start/installation.md

Lines changed: 83 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -192,12 +192,43 @@ Choose the environment that best fits your specific needs to optimize your exper
192192

193193
Before proceeding with the installation, ensure that you have [Isaac Sim 4.5.0](https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html) and [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) installed.
194194

195+
**Pull our latest Docker image with everything you need**
196+
```bash
197+
$ docker pull crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2
198+
```
199+
200+
Run the container
201+
```bash
202+
$ xhost +local:root # Allow the container to access the display
203+
204+
$ cd PATH/TO/INTERNNAV/
205+
206+
$ docker run --name internnav -it --rm --gpus all --network host \
207+
-e "ACCEPT_EULA=Y" \
208+
-e "PRIVACY_CONSENT=Y" \
209+
-e "DISPLAY=${DISPLAY}" \
210+
--entrypoint /bin/bash \
211+
-w /root/InternNav \
212+
-v /tmp/.X11-unix/:/tmp/.X11-unix \
213+
-v ${PWD}:/root/InternNav \
214+
-v ${HOME}/docker/isaac-sim/cache/kit:/isaac-sim/kit/cache:rw \
215+
-v ${HOME}/docker/isaac-sim/cache/ov:/root/.cache/ov:rw \
216+
-v ${HOME}/docker/isaac-sim/cache/pip:/root/.cache/pip:rw \
217+
-v ${HOME}/docker/isaac-sim/cache/glcache:/root/.cache/nvidia/GLCache:rw \
218+
-v ${HOME}/docker/isaac-sim/cache/computecache:/root/.nv/ComputeCache:rw \
219+
-v ${HOME}/docker/isaac-sim/logs:/root/.nvidia-omniverse/logs:rw \
220+
-v ${HOME}/docker/isaac-sim/data:/root/.local/share/ov/data:rw \
221+
-v ${HOME}/docker/isaac-sim/documents:/root/Documents:rw \
222+
-v ${PWD}/data/scene_data/mp3d_pe:/isaac-sim/Matterport3D/data/v1/scans:rw \
223+
crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.0
224+
```
225+
195226
<!-- To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
196227
```bash
197228
docker pull registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
198229
docker run -it --name internutopia-container registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
199230
``` -->
200-
#### Conda installation
231+
#### Conda installation from Scretch
201232
```bash
202233
conda create -n <env> python=3.10 libxcb=1.14
203234

@@ -253,19 +284,42 @@ pip install -r requirements/habitat_requirements.txt
253284
### Data/Checkpoints Preparation
254285
To get started, we need to prepare the data and checkpoints.
255286
1. **InternVLA-N1 pretrained Checkpoints**
256-
Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
287+
- Download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
257288
2. **DepthAnything v2 Checkpoints**
258-
Please download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
259-
3. **InternData-N1 VLN-CE Episodes**
260-
Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for `vln-ce`. Extract them into the `data/vln_ce/` directory.
289+
- Download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
290+
3. **InternData-N1 Dataset Episodes**
291+
- Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1). Extract them into the `data/vln_ce/` and `data/vln_pe/` directory.
261292
4. **Scene-N1**
262-
Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce`. Extract them into the `data/scene_data/` directory.
293+
- Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce`. Extract them into the `data/scene_data/` directory.
294+
5. **Embodiments**
295+
- Download the [Embodiments](https://huggingface.co/datasets/InternRobotics/Embodiments) for the `Embodiments/`
296+
297+
6. **Baseline models**
298+
```bash
299+
# ddppo-models
300+
$ mkdir -p checkpoints/ddppo-models
301+
$ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth
302+
# longclip-B
303+
$ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long
304+
# download r2r finetuned baseline checkpoints
305+
$ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/
306+
```
263307

264308
The final folder structure should look like this:
265309

266310
```bash
267311
InternNav/
268312
├── data/
313+
│ ├── scene_data/
314+
│ │ ├── mp3d_ce/
315+
│ │ │ └── mp3d/
316+
│ │ │ ├── 17DRP5sb8fy/
317+
│ │ │ ├── 1LXtFkjw3qL/
318+
│ │ │ └── ...
319+
│ │ └── mp3d_pe/
320+
│ │ ├──17DRP5sb8fy/
321+
│ │ ├── 1LXtFkjw3qL/
322+
│ │ └── ...
269323
│ ├── vln_ce/
270324
│ │ ├── raw_data/
271325
│ │ │ ├── r2r
@@ -275,26 +329,36 @@ InternNav/
275329
│ │ │ │ └── val_unseen
276330
│ │ │ │ └── val_unseen.json.gz
277331
│ │ └── traj_data/
278-
│ ├── scene_data/
279-
│ │ ├── mp3d_ce/
280-
│ │ │ ├── mp3d/
281-
│ │ │ │ ├── 17DRP5sb8fy/
282-
│ │ │ │ ├── 1LXtFkjw3qL/
283-
│ │ │ │ └── ...
284-
285-
├── src/
286-
│ ├── ...
287-
332+
│ └── vln_pe/
333+
│ ├── raw_data/ # JSON files defining tasks, navigation goals, and dataset splits
334+
│ │ └── r2r/
335+
│ │ ├── train/
336+
│ │ ├── val_seen/
337+
│ │ │ └── val_seen.json.gz
338+
│ │ └── val_unseen/
339+
│ └── traj_data/ # training sample data for two types of scenes
340+
│ ├── interiornav/
341+
│ │ └── kujiale_xxxx.tar.gz
342+
│ └── r2r/
343+
│ └── trajectory_0/
344+
│ ├── data/
345+
│ ├── meta/
346+
│ └── videos/
288347
├── checkpoints/
289348
│ ├── InternVLA-N1/
290349
│ │ ├── model-00001-of-00004.safetensors
291350
│ │ ├── config.json
292-
│ │ ── ...
351+
│ │ ── ...
293352
│ ├── InternVLA-N1-S2
294353
│ │ ├── model-00001-of-00004.safetensors
295354
│ │ ├── config.json
296-
│ │ ├── ...
297-
│ │ depth_anything_v2_vits.pth
355+
│ │ └── ...
356+
│ depth_anything_v2_vits.pth
357+
│ ├── r2r
358+
│ │ ├── fine_tuned
359+
│ │ └── zero_shot
360+
├── internnav/
361+
│ └── ...
298362
```
299363
### Gradio demo
300364

@@ -373,4 +437,3 @@ data/
373437
└── vln_n1/
374438
└── traj_data/
375439
```
376-

source/en/user_guide/internnav/quick_start/train_eval.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,15 @@ INTERNUTOPIA_ASSETS_PATH=/path/to/InternUTopiaAssets MESA_GL_VERSION_OVERRIDE=4.
3030

3131
The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
3232

33+
Also, the Baselines can directly run:
34+
```bash
35+
# seq2seq model
36+
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py
37+
# cma model
38+
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py
39+
# rdp model
40+
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py
41+
```
3342

3443
#### Evaluation on habitat
3544
Evaluate on Single-GPU:
@@ -217,11 +226,11 @@ python -m internnav.agent.utils.server --config scripts/eval/configs/h1_xxx_cfg.
217226
Start Evaluation:
218227
```bash
219228
# seq2seq model
220-
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py
229+
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_seq2seq_cfg.py
221230
# cma model
222-
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py
231+
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_cma_cfg.py
223232
# rdp model
224-
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py
233+
./scripts/eval/start_eval.sh --config scripts/eval/configs/h1_rdp_cfg.py
225234
```
226235

227-
The evaluation results will be saved in the `eval_results.log` file in the `output_dir` of the config file.
236+
The evaluation results will be saved in the `eval_results.log` file in the `output_dir` of the config file.

source/en/user_guide/internnav/quick_start/vln_evaluation.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ The main architecture of the evaluation code adopts a client-server model. In th
77
## Supported baselines
88
- InternVLA-N1
99
- CMA (Cross-Modal Attention)
10+
- RDP (Recurrent Diffusion Policy)
1011
- Navid (RSS2023)
1112
- Seq2Seq Policy
1213

@@ -15,14 +16,15 @@ The main architecture of the evaluation code adopts a client-server model. In th
1516
- Matterport3D
1617

1718
## Evaluation Metrics
18-
1919
The project provides comprehensive evaluation metrics:
2020

21-
- **Success Rate (SR)**: Proportion of successful goal arrivals
22-
- **Success weighted by Path Length (SPL)**: Success rate weighted by path length
23-
- **Navigation Error (NE)**: Distance error to target point
24-
- **Oracle Success Rate (OSR)**: Success rate on optimal paths
25-
- **Trajectory Length**: Actual trajectory length
21+
- **Success Rate (SR)**: Proportion of episodes where the agent reaches the goal location within 3m
22+
- **SPL**: Success weighted by Path Length
23+
- **Trajectory Length (TL)**: Total length of the trajectory (m)
24+
- **Navigation Error (NE)**: Euclidean distance between the agent's final position and the goal (m)
25+
- **OS Oracle Success Rate (OSR)**: Whether any point along the predicted trajectory reaches the goal within 3m
26+
- **Fall Rate (FR)**: Frequency of the agent falling during navigation
27+
- **Stuck Rate (StR)**: Frequency of the agent becoming stuck during navigation
2628

2729

2830
# Quick Start for Evaluation
@@ -47,15 +49,15 @@ eval_cfg = EvalCfg(
4749
env=EnvCfg(
4850
env_type='vln_multi',
4951
env_settings={
50-
'use_fabric': False,
51-
'headless': True,
52+
'use_fabric': True, # improve simulation efficiency
53+
'headless': True, # display option: set to False will open isaac-sim interactive window
5254
},
5355
),
5456
task=TaskCfg(
5557
task_name='test',
5658
task_settings={
57-
'env_num': 1,
58-
'use_distributed': False,
59+
'env_num': 1, # number of env in one isaac sim
60+
'use_distributed': False, # Ray distributed framework
5961
'proc_num': 1,
6062
},
6163
scene=SceneCfg(
@@ -68,11 +70,16 @@ eval_cfg = EvalCfg(
6870
camera_resolution=[640, 480] # (W,H)
6971
),
7072
dataset=EvalDatasetCfg(
73+
dataset_type="mp3d",
7174
dataset_settings={
7275
'base_data_dir': '/path/to/R2R_VLNCE_v1-3',
7376
'split_data_types': ['val_unseen'],
7477
'filter_stairs': True,
7578
},
79+
eval_settings={
80+
'save_to_json': False, # evaluation result saved in separate json file
81+
'vis_output': True # save simulation progress to video under logs/
82+
}
7683
),
7784
```
7885
## 3. Launch the server

0 commit comments

Comments
 (0)