#ai


I ran into this quite interesting paper when exploring embeddings of time series.

In the past, the manifold hypothesis has always been working quite well for me regarding physical world data. You just take a model, compress it, do something in the latent space, decode it, damn it works so well. The latent space is so magical.

To me, the hyperparameters for the latent space has always been some kind of battle between the curse of dimensionality and the Whitney embedding theorem.

Then there comes the language models. The ancient word2vec was already amazing. It brings in the questioin of why embeddings works unbelievably well in language models and it bugs me a lot. If you think about it, regardless of the model, embedding has been working so well. This hints that language embeddings might be universal. There is the linear representation hypothesis, but it is weird as it is missing the global structure. This paper provides a bit more clarity. The authors used a lot of assumptions but the proposal is interesting in the sense that the cosine similarity we used is likely a tool that depends on the distance on the manifold of the continuous features in the backstage.



https://arxiv.org/abs/2505.18235v1
 
 
Back to Top