A high-performance, stateless OpenRouter proxy service built with Node.js, TypeScript, and Express. Provides REST API and WebSocket streaming capabilities for LLM inference without authentication or user tracking.
- Dual Interface: REST API for standard requests and WebSocket for streaming
- Comprehensive Parameter Support: System prompts, model/provider selection, temperature, tools, etc.
- Multi-modal Support: Text, audio, and image generation capabilities
- Robust Error Handling: Graceful failure recovery and informative error responses
- High Performance: Optimized for speed and low latency
- IP-based Rate Limiting: Protection against abuse while maintaining simplicity
- Node.js 20+
- npm or yarn
- OpenRouter API key
- Clone the repository:
git clone <repository-url>
cd llm-proxy- Install dependencies:
npm install- Set up environment variables:
cp .env.example .env
# Edit .env with your OpenRouter API key- Build the project:
npm run build- Start the server:
npm startFor development:
npm run devThe service uses environment variables for configuration. See .env.example for all available options:
- OPENROUTER_API_KEY: Your OpenRouter API key
- PORT: Server port (default: 3000)
- HOST: Server host (default: 0.0.0.0)
- NODE_ENV: Environment (development/production/test)
- LOG_LEVEL: Logging level (debug/info/warn/error)
- RATE_LIMIT_WINDOW_MS: Rate limit window in milliseconds (default: 900000)
- RATE_LIMIT_MAX_REQUESTS: Max requests per window (default: 100)
- WS_MAX_CONNECTIONS: Max WebSocket connections (default: 1000)
- WS_HEARTBEAT_INTERVAL: WebSocket heartbeat interval (default: 30000)
- MAX_CONCURRENT_REQUESTS: Max concurrent requests (default: 100)
- REQUEST_TIMEOUT: Request timeout in milliseconds (default: 30000)
Check service health status.
Response:
{
  "status": "healthy",
  "timestamp": "2024-01-01T00:00:00.000Z",
  "uptime": 123.45,
  "version": "1.0.0",
  "environment": "production"
}Create a completion using the specified model.
Request Body:
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Hello, world!"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 100,
  "stream": false
}Response:
{
  "id": "chatcmpl-123",
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "content": "Hello! How can I help you today?",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  },
  "model": "openai/gpt-4o",
  "created": 1704067200,
  "object": "chat.completion"
}Create a streaming completion using the specified model.
Request Body:
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a story"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 500,
  "stream": true
}Response: Server-Sent Events stream
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1704067200,"model":"openai/gpt-4o","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
data: [DONE]
List all available models with optional filtering and pagination.
Query Parameters:
- provider(optional): Filter by provider (e.g., "openai", "anthropic")
- search(optional): Search in model name or description
- limit(optional): Number of models to return (default: 50, max: 100)
- offset(optional): Number of models to skip (default: 0)
Example:
GET /api/v1/models?provider=openai&search=gpt&limit=10&offset=0
Response:
{
  "data": [
    {
      "id": "openai/gpt-4o",
      "name": "GPT-4o",
      "description": "Most advanced GPT-4 model",
      "context_length": 128000,
      "pricing": {
        "prompt": "0.005",
        "completion": "0.015"
      },
      "supported_parameters": ["temperature", "max_tokens", "top_p"],
      "is_moderated": true,
      "max_completion_tokens": 4096
    }
  ],
  "pagination": {
    "total": 150,
    "limit": 10,
    "offset": 0,
    "hasMore": true
  }
}Get detailed information about a specific model.
Example:
GET /api/v1/models/openai/gpt-4o
Response:
{
  "data": {
    "id": "openai/gpt-4o",
    "name": "GPT-4o",
    "description": "Most advanced GPT-4 model",
    "context_length": 128000,
    "pricing": {
      "prompt": "0.005",
      "completion": "0.015"
    },
    "supported_parameters": ["temperature", "max_tokens", "top_p", "frequency_penalty", "presence_penalty"],
    "is_moderated": true,
    "max_completion_tokens": 4096
  }
}Get supported parameters for a specific model.
Example:
GET /api/v1/models/openai/gpt-4o/parameters
Response:
{
  "data": {
    "model": "openai/gpt-4o",
    "supported_parameters": [
      "temperature",
      "max_tokens",
      "top_p",
      "frequency_penalty",
      "presence_penalty",
      "stop",
      "stream"
    ]
  }
}Get pricing information for a specific model.
Example:
GET /api/v1/models/openai/gpt-4o/pricing
Response:
{
  "data": {
    "model": "openai/gpt-4o",
    "pricing": {
      "prompt": "0.005",
      "completion": "0.015"
    }
  }
}Get top models by context length.
Query Parameters:
- limit(optional): Number of models to return (default: 10)
Example:
GET /api/v1/models/top?limit=5
Response:
{
  "data": [
    {
      "id": "anthropic/claude-3-5-sonnet-20241022",
      "name": "Claude 3.5 Sonnet",
      "context_length": 200000,
      "pricing": {
        "prompt": "0.003",
        "completion": "0.015"
      }
    }
  ]
}Search models by query.
Query Parameters:
- q(required): Search query
- limit(optional): Number of results to return (default: 20)
Example:
GET /api/v1/models/search?q=code&limit=5
Response:
{
  "data": [
    {
      "id": "openai/gpt-4o",
      "name": "GPT-4o",
      "description": "Most advanced GPT-4 model with code capabilities"
    }
  ],
  "query": "code",
  "total": 25
}Get all available providers.
Response:
{
  "data": [
    "openai",
    "anthropic",
    "google",
    "meta",
    "mistral"
  ]
}Get models by provider.
Example:
GET /api/v1/models/providers/openai
Response:
{
  "data": [
    {
      "id": "openai/gpt-4o",
      "name": "GPT-4o",
      "context_length": 128000
    }
  ],
  "provider": "openai"
}Connect to the WebSocket endpoint:
ws://localhost:3000/ws
{
  "type": "inference_request",
  "id": "req-123",
  "data": {
    "model": "openai/gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Hello, world!"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }
}{
  "type": "inference_response",
  "id": "req-123",
  "data": {
    "content": "Hello! How can I help you today?",
    "finish_reason": "stop",
    "usage": {
      "prompt_tokens": 10,
      "completion_tokens": 15,
      "total_tokens": 25
    },
    "model": "openai/gpt-4o",
    "created": 1704067200
  }
}{
  "type": "heartbeat",
  "timestamp": 1704067200000
}{
  "type": "error",
  "id": "req-123",
  "error": {
    "code": 400,
    "message": "Invalid model",
    "type": "validation"
  }
}{
  "type": "close",
  "reason": "Client requested close",
  "code": 1000
}// Standard completion
const response = await fetch('http://localhost:3000/api/v1/inference', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'openai/gpt-4o',
    messages: [
      { role: 'user', content: 'Hello, world!' }
    ],
    temperature: 0.7,
    max_tokens: 100
  })
});
const data = await response.json();
console.log(data.choices[0].message.content);// Streaming completion
const response = await fetch('http://localhost:3000/api/v1/inference/stream', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'openai/gpt-4o',
    messages: [
      { role: 'user', content: 'Tell me a story' }
    ],
    temperature: 0.7,
    max_tokens: 500,
    stream: true
  })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = line.slice(6);
      if (data === '[DONE]') break;
      
      try {
        const parsed = JSON.parse(data);
        if (parsed.choices?.[0]?.delta?.content) {
          console.log(parsed.choices[0].delta.content);
        }
      } catch (e) {
        // Ignore invalid JSON
      }
    }
  }
}const ws = new WebSocket('ws://localhost:3000/ws');
ws.onopen = () => {
  // Send inference request
  ws.send(JSON.stringify({
    type: 'inference_request',
    id: 'req-123',
    data: {
      model: 'openai/gpt-4o',
      messages: [
        { role: 'user', content: 'Hello, world!' }
      ],
      temperature: 0.7
    }
  }));
};
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  switch (message.type) {
    case 'inference_response':
      if (message.data.content) {
        console.log(message.data.content);
      }
      if (message.data.finish_reason) {
        console.log('Finished:', message.data.finish_reason);
      }
      break;
      
    case 'error':
      console.error('Error:', message.error.message);
      break;
      
    case 'heartbeat':
      console.log('Heartbeat received');
      break;
  }
};
ws.onclose = () => {
  console.log('WebSocket connection closed');
};import requests
import json
# Standard completion
response = requests.post('http://localhost:3000/api/v1/inference', 
    json={
        'model': 'openai/gpt-4o',
        'messages': [
            {'role': 'user', 'content': 'Hello, world!'}
        ],
        'temperature': 0.7,
        'max_tokens': 100
    }
)
data = response.json()
print(data['choices'][0]['message']['content'])# Standard completion
curl -X POST http://localhost:3000/api/v1/inference \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'
# List models
curl http://localhost:3000/api/v1/models
# Get model details
curl http://localhost:3000/api/v1/models/openai/gpt-4o
# Search models
curl "http://localhost:3000/api/v1/models/search?q=gpt&limit=5"All errors follow this format:
{
  "error": {
    "code": 400,
    "message": "Validation error",
    "type": "validation",
    "details": {
      "field": "model",
      "message": "Model is required"
    }
  }
}- validation: Request validation failed
- rate_limit: Rate limit exceeded
- openrouter: OpenRouter API error
- internal: Internal server error
- 400: Bad Request - Invalid request data
- 404: Not Found - Model or endpoint not found
- 429: Too Many Requests - Rate limit exceeded
- 500: Internal Server Error - Server error
- 502: Bad Gateway - OpenRouter API error
- 503: Service Unavailable - Service temporarily unavailable
The service implements IP-based rate limiting:
- Default: 100 requests per 15 minutes per IP
- Inference endpoints: 50 requests per 15 minutes per IP
- WebSocket: 5 connections per minute per IP
Rate limit headers are included in responses:
- X-RateLimit-Limit: Maximum requests allowed
- X-RateLimit-Remaining: Requests remaining in current window
- X-RateLimit-Reset: Time when the rate limit resets
- npm run dev- Start development server with hot reload
- npm run build- Build the project
- npm start- Start production server
- npm test- Run tests
- npm run test:watch- Run tests in watch mode
- npm run test:coverage- Run tests with coverage
- npm run lint- Run ESLint
- npm run lint:fix- Fix ESLint errors
src/
├── controllers/          # Request handlers
├── services/            # Business logic
├── middleware/          # Express middleware
├── routes/              # API routes
├── types/               # TypeScript definitions
├── utils/               # Utility functions
├── app.ts               # Express app setup
└── server.ts            # Server entry point
The project includes comprehensive tests:
- Unit tests: Test individual functions and classes
- Integration tests: Test complete request/response cycles
- Load tests: Test performance under load
Run tests:
npm test# Build the image
docker build -f docker/Dockerfile -t llm-proxy .
# Run the container
docker run -p 3000:3000 -e OPENROUTER_API_KEY=your-key llm-proxy# Start all services
docker-compose -f docker/docker-compose.yml up -d
# Stop all services
docker-compose -f docker/docker-compose.yml downThe service provides monitoring endpoints:
- GET /health- Health check with uptime and version info
- IP-based rate limiting
- Input validation and sanitization
- CORS protection
- Security headers (Helmet)
- No authentication required (stateless design)
- Connection pooling for OpenRouter API
- Efficient WebSocket handling
- Memory-optimized streaming
- Request/response compression
- Caching for model information
- Stateless design for horizontal scaling
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details.
For issues and questions, please open an issue on GitHub.