#data
In physics, people claim that more is different. In the data world, more is very different. I'm no expert in big data, but I learned the scaling problem only when I started working for corporates.
I like the following from the author.
> data sizes increase much faster than compute sizes.
In deep learning, many models are following a scaling law of performance and dataset size. Indeed, more data brings in better performance. But the increase in performance becomes really slow. Business doesn't need a perfect model. We also know computation costs money. At some point, we simply have to cut the dataset, even if we have all the data in the world.
So ..., data hoarding is probably fine, but our models might not need that much.
https://motherduck.com/blog/big-data-is-dead/
In physics, people claim that more is different. In the data world, more is very different. I'm no expert in big data, but I learned the scaling problem only when I started working for corporates.
I like the following from the author.
> data sizes increase much faster than compute sizes.
In deep learning, many models are following a scaling law of performance and dataset size. Indeed, more data brings in better performance. But the increase in performance becomes really slow. Business doesn't need a perfect model. We also know computation costs money. At some point, we simply have to cut the dataset, even if we have all the data in the world.
So ..., data hoarding is probably fine, but our models might not need that much.
https://motherduck.com/blog/big-data-is-dead/