Background and Value
You can use a unified entry point to call multiple model formats (OpenAI / Claude / Gemini), achieving a consistent experience without modifying existing clients. The platform actively validates quotas, records critical logs, and supports proxies and multipliers, helping you operate securely in a controllable and observable manner.Use Cases
- Want to quickly switch to a unified entry point while continuing to use existing OpenAI / Claude / Gemini clients.
- Need to set multipliers or model wildcards across different channels to balance cost and compatibility.
- Care about call visibility: want to see records of models, tokens, channels, latency, tokens, etc.
- Need to configure proxies in restricted networks or use notification capabilities to receive results promptly.
Core Principles
- Compatibility First: Follows request formats of OpenAI
/v1, Claude/claude/v1, and Gemini/gemini/:version. You only need to replace the Base URL and token to make calls. - Controllable Quotas: Each call validates and pre-deducts quota; insufficient quota results in immediate rejection. Successful calls confirm deduction, failed calls roll back pre-deduction, ensuring predictable consumption.
- Flexible Routing: Models support wildcards, channels can set multipliers or proxies. When needed, you can specify routes via token suffixes or specific headers, enabling both automatic distribution and precise control.
- Transparent and Traceable: Logs record model names, token names, channels, request latency (
request_time), prompt and completion tokens, etc. Statistics aggregate call counts, quotas, and latency by date, facilitating reconciliation and troubleshooting. - Special Capabilities Pass-through: Supports official parameter pass-through for o1/o3 ReasoningEffort selection, Claude thinking, Gemini search and code execution, maintaining consistency with respective clients.
Key Usage Points
- Before the first call, retrieve the model list to confirm available models, then use a test token to verify multiplier and wildcard configurations.
- When sending requests, keep official fields intact and only replace the Base URL and token. Decide whether to specify a route based on needs; otherwise, use default automatic routing.
- Monitor quota status: insufficient quota results in immediate rejection. Replenish in advance when continuous calls are needed.
- After calls, review logs and aggregated statistics to verify models, tokens, latency, and consumption, identifying anomalies or reconciling accounts.
Limitations and Notes
- Only enabled models and routes will be routed; disabled models or insufficient quota will be rejected.
- Proxies and multipliers are configured at the channel level; effectiveness depends on current route settings.
- Special capabilities require passing official parameters; when corresponding parameters are not provided, default behavior applies.