DeepInfra arrives on Hugging Face as an Inference Provider | Keryc
DeepInfra is now a supported Inference Provider on the Hugging Face Hub. What does that mean for you? More serverless inference options, direct integration on model pages and in the Hugging Face SDKs for Python and JavaScript, all set so you can plug models into your apps without much fuss.
What the integration brings
DeepInfra is a serverless inference platform that promises one of the most competitive per-token prices on the market. Its catalog exceeds 100 models and covers everything from LLMs to text-to-image, text-to-video and embeddings.
In this initial phase the integration enables support for conversational and text-generation tasks, including open-weight models like DeepSeek V4, Kimi-K2.6 and GLM-5.1. So what does that look like in practice? On model pages you’ll see DeepInfra as a compatible option, and you can select it from widgets and code snippets without configuring complex infra.
Modes of use and billing
Hugging Face offers two modes to call Inference Providers. Which one fits you? It depends on how much control and how you want billing handled:
Custom key: you use the provider’s key (for example your DeepInfra API key). Calls go directly to the provider and the provider bills you.
Routed by HF: you use your Hugging Face token. The request is routed to DeepInfra but the charge is applied to your Hugging Face account. There’s no markup from HF; right now HF passes the provider cost through as-is.
In your account you can also:
Configure your own keys for each provider you use.
Order providers by preference so widgets and snippets show your favorites first.
SDKs, examples and model format
The integration is already available through the Hugging Face SDKs: huggingface_hub (>= 1.11.2) for Python and @huggingface/inference for JavaScript. Models are called with a suffix that indicates the provider, for example deepseek-ai/DeepSeek-V4-Pro:deepinfra.
Python example (use your HF_TOKEN and the request will be routed to DeepInfra automatically):
import os
from openai import OpenAI
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages=[
{
"role": "user",
"content": "Write a Python function that returns the nth Fibonacci number using memoization."
}
],
)
print(completion.choices[0].message)
JavaScript example:
import { OpenAI } from "openai";
const client = new OpenAI({
baseURL: "https://router.huggingface.co/v1",
apiKey: process.env.HF_TOKEN,
});
const chatCompletion = await client.chat.completions.create({
model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages: [
{
role: "user",
content: "Write a Python function that returns the nth Fibonacci number using memoization.",
},
],
});
console.log(chatCompletion.choices[0].message);
If instead you use DeepInfra’s direct key, you’ll point at their endpoint and authenticate with the DeepInfra API key; in that case billing comes from them.
Integrations and ecosystem
Hugging Face’s Inference Providers are integrated into several Agent Harnesses, for example Pi, OpenCode, Hermes Agents and OpenClaw. What does that mean for you? It means you can plug models hosted on DeepInfra directly into those agents without writing extra glue code.
You can also check the full list of models supported by DeepInfra and their dedicated documentation to see which tasks are already available and which are coming soon (text-to-image, text-to-video, embeddings, etc.).
Costs, PRO credits and recommendations
Hugging Face PRO users receive 2 USD in inference credits each month. Those credits work with the different providers.
Hugging Face offers a small free quota for registered users, but if you need more capacity or credits it’s worth evaluating the PRO plan.
To control costs I recommend: using your own keys if you prefer direct billing to the provider, ordering providers by preference, and monitoring usage per model.
Good technical practices
If you want maximum transparency in billing and limits, use the provider’s custom key.
If you prefer simplicity and managing everything from Hugging Face, use the routed by HF mode.
Keep in mind the minimum SDK version (huggingface_hub >= 1.11.2) when you automate deployments.
When you test open models, confirm the :deepinfra suffix in the name to point to DeepInfra’s implementation.
The news is good for developers and teams who want serverless inference alternatives with competitive per-token costs. Interested in trying an end-to-end flow? We can sketch a script to compare latency and cost between DeepInfra and other providers.