DiScoFormer: single transformer for density and score

DiScoFormer proposes a simple, powerful idea: a single transformer that, given a set of points, estimates at the same time the density of the distribution and its score (the gradient of the log density). Why does this matter? Because the score is the direction that tells you how to move a point toward more probable regions, and it appears in generative models, Bayesian sampling, and scientific simulations.

What problem DiScoFormer solves

Many problems in machine learning and science boil down to recovering the distribution that generated a sample of data. Traditionally there are two families of solutions:

KDE (kernel density estimation): it needs no training and works on any distribution, but it fails when dimensionality grows.
score models trained with neural networks: they work in high dimension, but you need to train them from scratch for every new distribution.

What problem DiScoFormer solves

How it works (technical)

Inference-time adaptation

Training: why they used GMMs

Performance and limits

Practical implications

Final reflection

Source

Stay up to date!

DiScoFormer: single transformer for density and score