Predictable GPU inference spend — without surprise bills.
Darktree provides OpenAI‑compatible inference endpoints backed by prepaid compute credits.
Credits are consumed by token usage (prompt_tokens + completion_tokens)
returned in API responses. Add daily caps for budget control and an append-only audit ledger for reconciliation.
/v1 endpoints
Token usage in every response
Daily caps (HTTP 429 on limit)
API key auth
Append-only usage ledger
1) Buy prepaid credits
Purchase a bundle via Stripe. Credits fund token-metered inference and help you lock your spend up front.
2) Send requests
Use OpenAI‑compatible endpoints with standard headers. Most clients work with minimal changes.
3) Cap, meter, audit
Token usage is logged per request. Daily caps protect against runaway usage; the ledger supports reconciliation.
What you get
- Cost certainty: prepaid bundles + customer‑level daily caps
- Auditability: append‑only per‑request usage ledger
- Compatibility: OpenAI‑style endpoints and usage fields
- Support: direct operator help during onboarding
What credits are (and aren’t)
- Credits are a prepaid balance consumed by token usage
- Credits are not GPU hours, server rentals, or reserved hardware
- Token usage returned by the API response is the billing source of truth
Larger models and heavier workloads may consume credits faster. The API’s token counts remain authoritative.
Plans & Credits
Prepaid bundles fund token‑metered inference. Each plan includes a conservative daily cap to prevent runaway spend. Caps reset daily and can be raised or lowered on request.
| Solo | Team | Scale | |
|---|---|---|---|
| Bundle price | $50 | $150 | $300 |
| Tokens included | ~66,667 | ~200,000 | ~400,000 |
| Default daily cap | 2,000 tokens/day | 7,000 tokens/day | 15,000 tokens/day |
| ~Requests/day (example) Assumes ~800 tokens/request |
~2–3 | ~8–10 | ~18–20 |
| Max spend/day at cap $0.75 / 1k tokens |
$1.50/day | $5.25/day | $11.25/day |
Token totals assume $0.75 per 1,000 tokens. Daily caps reset at 00:00 UTC and act as a hard stop
(requests return HTTP 429 when exceeded) until the next reset. Caps are set per customer by Darktree and can be adjusted on request.
Latency note: Premium models are typically higher‑latency than Standard models.
In steady state, qwen25-14b-awq is commonly ~100 tokens/sec and qwen25-32b-awq ~45 tokens/sec
(typical medians; depends on prompt length, max_tokens, concurrency, and warm vs cold starts).
Budgeting Guide · Usage & Billing PDF
Note: credits are prepaid and non‑refundable. Usage is measured in tokens, not time. Token usage returned by API responses is the authoritative record for credit deduction.