Machine learning and other gibberish
See also: https://sharing.leima.is
Archives: https://datumorphism.leima.is/amneumarkt/
See also: https://sharing.leima.is
Archives: https://datumorphism.leima.is/amneumarkt/
#dl
Introducing more symmetries in attention
https://github.com/NVIDIA/torch-harmonics
https://neurips.cc/virtual/2025/loc/san-diego/poster/117783
Introducing more symmetries in attention
https://github.com/NVIDIA/torch-harmonics
https://neurips.cc/virtual/2025/loc/san-diego/poster/117783
#dl
Park, Chanwook, Sourav Saha, Jiachen Guo, Hantao Zhang, Xiaoyu Xie, Miguel A. Bessa, Dong Qian, et al. 2025. “Unifying Machine Learning and Interpolation Theory via Interpolating Neural Networks.” Nature Communications 16 (1): 1–12.
https://www.nature.com/articles/s41467-025-63790-8
Park, Chanwook, Sourav Saha, Jiachen Guo, Hantao Zhang, Xiaoyu Xie, Miguel A. Bessa, Dong Qian, et al. 2025. “Unifying Machine Learning and Interpolation Theory via Interpolating Neural Networks.” Nature Communications 16 (1): 1–12.
https://www.nature.com/articles/s41467-025-63790-8
#dl
A few cool ideas in this model.
Introducing Gemma 3n: The developer guide - Google Developers Blog
https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
A few cool ideas in this model.
Introducing Gemma 3n: The developer guide - Google Developers Blog
https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
#dl
So tensorflow and jax are deprecated in the transformers package.
https://github.com/huggingface/transformers/pull/38758
So tensorflow and jax are deprecated in the transformers package.
https://github.com/huggingface/transformers/pull/38758
#dl
PyTorch Native Architecture Optimization: torchao | PyTorch
https://pytorch.org/blog/pytorch-native-architecture-optimization/
PyTorch Native Architecture Optimization: torchao | PyTorch
https://pytorch.org/blog/pytorch-native-architecture-optimization/
#dl
There is this new lib called scale. One could compile CUDA code to use it on AMD GPU.
https://docs.scale-lang.com/manual/how-to-use/
I don't know who is more pissed off, NVidia or AMD.
There is this new lib called scale. One could compile CUDA code to use it on AMD GPU.
https://docs.scale-lang.com/manual/how-to-use/
I don't know who is more pissed off, NVidia or AMD.
#dl
This repo is really nice.
yuanchenyang/smalldiffusion: Simple and readable code for training and sampling from diffusion models
https://github.com/yuanchenyang/smalldiffusion
This repo is really nice.
yuanchenyang/smalldiffusion: Simple and readable code for training and sampling from diffusion models
https://github.com/yuanchenyang/smalldiffusion
Google & USC benchmarked a prompt based forecasting method, and the results are amazing.
Cao D, Jia F, Arik SO, Pfister T, Zheng Y, Ye W, et al. TEMPO: Prompt-based Generative Pre-trained Transformer for time series forecasting. arXiv [cs.LG]. 2023. Available: http://arxiv.org/abs/2310.04948
#dl
I am experimenting with torch 2.0 and searching for potential training time improvements in lightning. The following article provides a very good introduction.
https://lightning.ai/pages/community/tutorial/how-to-speed-up-pytorch-model-training/
I am experimenting with torch 2.0 and searching for potential training time improvements in lightning. The following article provides a very good introduction.
https://lightning.ai/pages/community/tutorial/how-to-speed-up-pytorch-model-training/
#dl
https://github.com/Lightning-AI/lightning/releases/tag/2.0.0
You can compile (torch 2.0) LightningModule now.
https://github.com/Lightning-AI/lightning/releases/tag/2.0.0
You can compile (torch 2.0) LightningModule now.
import torch
import lightning as L
model = LitModel()
# This will compile forward and {training,validation,test,predict}_step
compiled_model = torch.compile(model)
trainer = L.Trainer()
trainer.fit(compiled_model)