Hugging Face Spaces and agents.md transform multimedia

Jun 9, 2026Keryc Díaz3 minutes

You asked an agent to build a site with Paris monuments in 3D from images. I didn't open an image generator. I didn't touch a 3D reconstruction tool. The agent called two Hugging Face Spaces and assembled everything: images, reconstructions as Gaussian splats, compression, a viewer and static deployment. Sounds magical? It's block-based engineering, and it's already here.

What `agents.md` does and why it matters

So far, the hard part wasn't really training a good image, video, TTS or 3D model. The real problem was integration: SDKs, weights, GPUs, input formats, polling. What if each model were a documented, easily-invocable block? Could an agent just glue them together like npm packages?

That's exactly what agents.md delivers in a Gradio Space: the minimal recipe for an agent to invoke that service. A curl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md returns in one go what you need: the schema URL, call and poll templates, how to upload files and the auth hint. With that, an agent can use the Space end-to-end.

Practical example: the gallery with TripoSplat

The author put an agent to work that chained two Spaces: one to generate images and another to reconstruct 3D from a single view.

Image generation: an image Space (for example ideogram4) produces isolated views on a black background, ready for reconstruction.
3D reconstruction: VAST-AI/TripoSplat takes each image and generates a Gaussian splat in .ply format.

From there the agent did the automatic "glue":

It detected that outputs were Y-down and rotated them upright.
Auto-framed each monument and cropped according to composition.
Compressed the .ply files to .ksplat (about 3x smaller) for fast browser loads.
Built a viewer with Three.js: scroll to change model, drag to rotate, cinematic transitions.
Deployed everything as a static Space.

The only human decisions were matters of taste: 'more zoom', 'swap the obelisk for another shape', 'shorten the transition'. The rest was automatic iteration: the agent reacted when a glass pyramid splatted badly or when the reconstruction inferred the back from a single view.

Which endpoints and formats does an agent use?

The pattern is simple and repeatable. An agents.md describes things like:

GET .../gradio_api/info (schema)
POST .../gradio_api/call/v2/{endpoint} with a body like {param_name: value, ...} for calls
GET .../gradio_api/call/{endpoint}/{event_id} to poll for results
POST .../gradio_api/upload -F 'files=@file.ext' to upload files
Auth: Bearer $HF_TOKEN

You don't need a client library or hardcoded integrations. An agent reads the agents.md, inserts its HF_TOKEN, and can orchestrate flows.

Technical and product implications

This is not just a neat trick. It follows the "building block economy" logic Mitchell Hashimoto described: the most effective way to build software today is to orchestrate small, well-documented components, not reinvent polished monoliths.

In multimedia this changes several things:

Lower technical barrier: assembling image → 3D → streaming pipelines no longer requires installing and adapting each model.
Faster iteration: an agent can try combinations, detect failures (formats, orientations, artifacts) and fix them without constant human intervention.
Reuse and composition: a Space's outputs become another Space's inputs with minimal friction.

Technically, this pushes architectures toward modular orchestration: agents that interpret descriptions (agents.md), trigger endpoints, handle polling and transformations (rotate, recompress, convert formats), and deploy results.

How to try it yourself

Copy the agents.md from a Space that interests you: curl https://huggingface.co/spaces/ideogram-ai/ideogram4/agents.md.
Copy the agents.md from TripoSplat: curl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md.
Paste the links into your preferred code agent (Claude Code, etc.), add HF_TOKEN and ask it to assemble a pipeline: image → TripoSplat → web viewer.

The Space repository contains reproducible scripts that show exactly the calls the agent made. It's a great template to experiment with.

In the end, building multimedia software stops being about setting up infra and becomes orchestration design: define the blocks, their transformations and the rules that connect them.

Next time you see an impressive demo on the web, ask yourself: was it a polished monolith or an orchestra of well-documented blocks? I bet on the latter.

Original source

https://huggingface.co/blog/mishig/spaces-agents-md

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.

Hugging Face Spaces and agents.md transform multimedia

Jun 9, 2026Keryc Díaz3 minutes

What `agents.md` does and why it matters

Practical example: the gallery with TripoSplat

The author put an agent to work that chained two Spaces: one to generate images and another to reconstruct 3D from a single view.

Image generation: an image Space (for example ideogram4) produces isolated views on a black background, ready for reconstruction.
3D reconstruction: VAST-AI/TripoSplat takes each image and generates a Gaussian splat in .ply format.

From there the agent did the automatic "glue":

It detected that outputs were Y-down and rotated them upright.
Auto-framed each monument and cropped according to composition.
Compressed the .ply files to .ksplat (about 3x smaller) for fast browser loads.
Built a viewer with Three.js: scroll to change model, drag to rotate, cinematic transitions.
Deployed everything as a static Space.

Which endpoints and formats does an agent use?

The pattern is simple and repeatable. An agents.md describes things like:

GET .../gradio_api/info (schema)
POST .../gradio_api/call/v2/{endpoint} with a body like {param_name: value, ...} for calls
GET .../gradio_api/call/{endpoint}/{event_id} to poll for results
POST .../gradio_api/upload -F 'files=@file.ext' to upload files
Auth: Bearer $HF_TOKEN

You don't need a client library or hardcoded integrations. An agent reads the agents.md, inserts its HF_TOKEN, and can orchestrate flows.

Technical and product implications

In multimedia this changes several things:

Lower technical barrier: assembling image → 3D → streaming pipelines no longer requires installing and adapting each model.
Faster iteration: an agent can try combinations, detect failures (formats, orientations, artifacts) and fix them without constant human intervention.
Reuse and composition: a Space's outputs become another Space's inputs with minimal friction.

How to try it yourself

Copy the agents.md from a Space that interests you: curl https://huggingface.co/spaces/ideogram-ai/ideogram4/agents.md.
Copy the agents.md from TripoSplat: curl https://huggingface.co/spaces/VAST-AI/TripoSplat/agents.md.
Paste the links into your preferred code agent (Claude Code, etc.), add HF_TOKEN and ask it to assemble a pipeline: image → TripoSplat → web viewer.

The Space repository contains reproducible scripts that show exactly the calls the agent made. It's a great template to experiment with.

In the end, building multimedia software stops being about setting up infra and becomes orchestration design: define the blocks, their transformations and the rules that connect them.

Next time you see an impressive demo on the web, ask yourself: was it a polished monolith or an orchestra of well-documented blocks? I bet on the latter.

Original source

https://huggingface.co/blog/mishig/spaces-agents-md

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.

What agents.md does and why it matters

Practical example: the gallery with TripoSplat

Which endpoints and formats does an agent use?

Technical and product implications

How to try it yourself

Original source

Stay up to date!

What agents.md does and why it matters

Practical example: the gallery with TripoSplat

Which endpoints and formats does an agent use?

Technical and product implications

How to try it yourself

Original source

Stay up to date!

What `agents.md` does and why it matters

What `agents.md` does and why it matters