A comprehensive metrics collection and monitoring solution for vLLM deployments using Fluent Bit and Parseable with Prometheus-format compatibility.
This repo provides a complete observability stack for vLLM services by:
- Proxying vLLM metrics with Prometheus-format compatibility fixes
- Collecting metrics using Fluent Bit
- Storing metrics in Parseable for analysis and visualization
- Containerized deployment with Podman Compose
Modern AI inference is the future of computational workloads. While open models like GPT-OSS-20B deployed on high-performance hardware (GPUs via RunPod) deliver exceptional capabilities, understanding what happens under the hood through metrics provides:
- Performance optimization - Identify bottlenecks and resource utilization patterns
- Cost control - Monitor GPU usage and request patterns for efficient scaling
- Reliability insights - Track error rates, response times, and system health
- Capacity planning - Understand throughput limits and scaling requirements
Metrics-driven inference operations transform black-box model serving into a transparent, controllable, and optimizable system.
vLLM Service → Metrics Proxy → Fluent Bit → Parseable
↓ ↓ ↓ ↓
Metrics Sanitization Collection Storage
- Flask-based HTTP proxy service
- Sanitizes vLLM metric names by replacing colons with underscores
- Ensures Prometheus-format compatibility
- Runs on port 9090
- Scrapes metrics from the proxy every 2 seconds
- Forwards metrics to Parseable via OpenTelemetry protocol
- Configured via
fluent-bit.conf
- Time-series data storage and analysis platform
- Web UI available on port 8080
- Stores metrics in the
vLLMmetricsstream
- Podman with Podman Compose (or Docker with Docker Compose)
- Open ports:
9090(proxy),8080(Parseable UI) - vLLM metrics endpoint reachable from the host running the stack
-
Clone the repository
git clone https://github.com/opensourceops/vllm-inference-metrics.git cd vllm-inference-metrics -
Configure vLLM endpoint
Replace the
VLLM_METRICS_URLincompose.ymlwith your vLLM deployment endpoint:environment: - VLLM_METRICS_URL=https://your-vllm-endpoint/metrics
For local vLLM deployments:
environment: - VLLM_METRICS_URL=http://localhost:8000/metrics
-
Start the stack
podman compose up -d
Using Docker instead of Podman:
docker compose up -d
-
Access services
- Parseable UI: http://localhost:8080 (admin/admin)
- Metrics endpoint: http://localhost:9090/metrics
| Variable | Description | Default |
|---|---|---|
VLLM_METRICS_URL |
vLLM metrics endpoint URL | Required |
P_USERNAME |
Parseable username | admin |
P_PASSWORD |
Parseable password | admin |
P_ADDR |
Parseable listen address | 0.0.0.0:8000 |
P_STAGING_DIR |
Parseable staging dir (volume) | /staging |
Note: Parseable-related environment variables are defined in parseable.env and loaded via env_file in compose.yml.
| Service | Container | Host |
|---|---|---|
| Proxy | 9090 | 9090 |
| Parseable UI | 8000 | 8080 |
Key settings in fluent-bit.conf:
- Scrape Interval: 2 seconds
- Target: proxy:9090/metrics
- Output: Parseable OpenTelemetry endpoint
- Proxy service includes HTTP health check
- Services have dependency management and restart policies
View service logs:
podman compose logs -f [service-name]The proxy transforms vLLM metrics from:
vllm:num_requests_running 5
To Prometheus-compatible format:
vllm_num_requests_running 5
-
Connection refused to vLLM
- Verify
VLLM_METRICS_URLis accessible - Check network connectivity
- Verify
-
Parseable not receiving data
- Check Fluent Bit logs:
podman compose logs -f fluentbit - Verify proxy health:
curl http://localhost:9090/metrics
- Check Fluent Bit logs:
-
Proxy errors
- Check SSL/TLS settings for vLLM endpoint
- Verify vLLM metrics endpoint responds
Services start in order:
- Parseable
- Proxy (with health check)
- Fluent Bit
Test the proxy standalone:
export VLLM_METRICS_URL=https://your-vllm-endpoint/metrics
pip install flask requests
python proxy.pyStop and remove the stack:
podman compose downAfter modifying configurations:
podman compose down
podman compose up -d- Default credentials are
admin/admin- change in production - Proxy disables SSL verification - configure properly for production
- Consider network security for metric endpoints
This is a demo/development setup designed to get you started quickly. For production deployments, consider:
- Security: Replace default credentials, implement secrets management, enable SSL/TLS
- Images: Pin specific versions instead of
edge/latesttags - Resources: Add memory/CPU limits and proper resource allocation
- Monitoring: Implement logging, alerting, and backup strategies for the metrics stack itself
- Networking: Configure proper network security and access controls
The compose.yml provides a solid foundation - customize it based on your production requirements.
MIT License