Hugging Face announced that Scaleway is now an Inference Provider supported on the Hub, an integration that lets you run open and third‑party models directly from model pages and the official SDKs. The official post was published on September 19, 2025 and explains how this partnership expands options for serverless inference with both open-weight and commercial models. (huggingface.co)
Why does this matter for you?
Curious to try large models without spinning up your own infrastructure? Now you can pick Scaleway as an inference destination from the Hugging Face interface or via the official client libraries in Python and JavaScript. That simplifies the flow: you start on the Hub, choose the provider and you’re ready—without rewriting much of your code. (huggingface.co)
Also, Scaleway brings a Europe‑focused option: datacenters in Paris that help ensure data sovereignty and lower latency for European users. If your app needs data residency or you want faster responses, that geographic proximity matters. (scaleway.com)
How it works in practice
On the Hub you can configure your own API keys per provider or let Hugging Face route requests for you using your HF token. You have two modes:
- Use your own provider key (calls go directly to the provider and they bill you).
- Route through Hugging Face with your token, in which case charges show up on your HF account. (huggingface.co)
From the developer side, Hugging Face includes examples with huggingface_hub in Python and @huggingface/inference in JS showing how to set provider="scaleway" or pass your api_key. With a few tweaks to the client you can swap backends without rewriting your app logic. (huggingface.co)
Short example (conceptual)
client = InferenceClient(provider="scaleway", api_key=os.environ["HF_TOKEN"])
This kind of call lets you use models like gpt-oss, Qwen3, DeepSeek R1 or Gemma 3 from the Hub and experiment with them without deploying local infrastructure. (huggingface.co)
Pricing, performance and capabilities
Scaleway offers a pay‑per‑token model with competitive prices per million tokens; for example, rates starting at 0.20 euros per million tokens for certain models. They also publish an initial free tier of 1,000,000 tokens for new customers. Those numbers make testing and prototyping affordable before you scale. (scaleway.com)
On latency and interactive experience, Scaleway announces sub‑200 ms times for the first token on their serverless endpoints, which helps for chatbots, assistants and flows that need immediate responses. They also support multimodal generation and structured outputs (JSON), plus compatibility with techniques like RAG (Retrieval Augmented Generation). (scaleway.com)
Billing, credits and limits
If you use your own Scaleway key you pay Scaleway directly. If you route through Hugging Face, charges appear on your HF account with no extra fee today. Hugging Face PRO also offers monthly inference credits (for example, $2 in Inference credit for PRO users) that you can use on supported providers. Keep in mind pricing and credits can change; check the official pages before a production deployment. (huggingface.co)
What this means for small projects and companies
For a founder or small team this lowers the barrier to entry: you can experiment with state‑of‑the‑art models without hiring a dedicated GPU or managing clusters. For companies, it becomes another option in a multi‑cloud or digital‑sovereignty strategy—especially if your user base is in Europe. (scaleway.com)
Limitations and next steps
It’s not magic: you still need to evaluate cost per token, privacy terms and how the model behaves on your domain. Test, measure latency and compare outputs across providers before committing. Hugging Face asks the community for feedback to improve the experience, and the list of supported providers will keep growing. (huggingface.co)
Final reflection
This integration is practical: it reduces friction between discovering a model and putting it into production, and it adds a European alternative focused on latency and data sovereignty. If you’re building prototypes, it’s worth trying Scaleway from the Hub and comparing results and costs with your current options. Isn’t it better to experiment first and pay later than to design full infrastructure from scratch?
