Am Neumarkt 😱

Machine learning and other gibberish
See also: https://sharing.leima.is
Archives: https://datumorphism.leima.is/amneumarkt/

10:28 · Feb 10, 2023 · Fri

#data

In physics, people claim that more is different. In the data world, more is very different. I'm no expert in big data, but I learned the scaling problem only when I started working for corporates.

I like the following from the author.

> data sizes increase much faster than compute sizes.

In deep learning, many models are following a scaling law of performance and dataset size. Indeed, more data brings in better performance. But the increase in performance becomes really slow. Business doesn't need a perfect model. We also know computation costs money. At some point, we simply have to cut the dataset, even if we have all the data in the world.

So ..., data hoarding is probably fine, but our models might not need that much.

https://motherduck.com/blog/big-data-is-dead/

MotherDuck

Big Data is Dead - MotherDuck Blog

Big data is dead. Long live easy data.

data

13:50 · Feb 8, 2023 · Wed

#fun

The authors got some styles.

Source:
https://twitter.com/mraginsky/status/1181712367966674945

fun

13:59 · Feb 7, 2023 · Tue

#data

This is gold.

https://youtu.be/pjq3QOxl9Ok

YouTube

So You Wanna Be a Pandas Expert? (Tutorial) - James Powell | PyData Global 2021

So You Wanna Be a Pandas Expert? | (Pre-recorded Tutorial)
Speaker: James Powell

So… you want to be a Pandas expert.

What’s it going to take? Should you memorize the Pandas API? Should you read through the source code, line-by-line, file-by-file? Should…

data

21:35 · Jan 26, 2023 · Thu

#ml

google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.
https://github.com/google-research/tuning_playbook

GitHub

GitHub - google-research/tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.

A playbook for systematically maximizing the performance of deep learning models. - google-research/tuning_playbook

21:51 · Jan 25, 2023 · Wed

#ml

Haha icecube

IceCube - Neutrinos in Deep Ice | Kaggle
https://www.kaggle.com/competitions/icecube-neutrinos-in-deep-ice?utm_medium=email&utm_source=gamma&utm_campaign=comp-icecube-2023

Kaggle

IceCube - Neutrinos in Deep Ice

Reconstruct the direction of neutrinos from the Universe to the South Pole

19:55 · Jan 16, 2023 · Mon

#data

Just got my ticket.

I have been reviewing proposals for PyData this year. I saw some really cool proposals so I finally decided to attend the conference.

https://2023.pycon.de/blog/pyconde-pydata-berlin-tickets/

2023.pycon.de

PyConDE & PyData Berlin 2023 Tickets

Tickets for PyConDE & PyData Berlin 2023

data

16:46 · Jan 2, 2023 · Mon

#ml

https://illustrated-machine-learning.github.io/index.html

09:42 · Jan 1, 2023 · Sun

#ml

Top-10 Things in 2022 | Anima on AI
https://anima-ai.org/2022/12/31/top-10-things-in-2022/

10:39 · Dec 28, 2022 · Wed

https://pudding.cool/

The Pudding

The Pudding explains ideas with visual essays.

10:57 · Dec 24, 2022 · Sat

#misc

Lastpass was hacked and the hacker obtained

the encrypted user data including user names and passwords

already.

https://blog.lastpass.com/2022/12/notice-of-recent-security-incident/

Lastpass

Security Incident December 2022 Update - LastPass - The LastPass Blog

Please refer to the latest article for updated information.nbs[..]

misc

19:18 · Dec 18, 2022 · Sun

#ml

GPT writing papers... Both fancy and scary.

https://huggingface.co/stanford-crfm/pubmedgpt?text=Neuroplasticity

huggingface.co

stanford-crfm/BioMedLM · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

09:25 · Dec 18, 2022 · Sun

#data

https://evidence.dev/

I like the idea. My last dashboarding tool for work was streamlit. Streamlit is lightweight and fast. But it requires Python code and a Python server.

Evidence is mostly markdown and SQL. For many lightweight dashboarding tasks, this is just sweet.

Evidence is built on node. I could run a server and provide live updates but I can already build a static website by running npm run build.

Played with it a bit. Nothing to complain about at this point.

data

21:56 · Dec 17, 2022 · Sat

#visualization

Visualizations of energy consumption and prices in Germany. Given the low temperature atm, it maybe interesting to watch them evolve.

https://www.zeit.de/wirtschaft/energiemonitor-deutschland-gaspreis-spritpreis-energieversorgung

ZEIT ONLINE

Energiemonitor: Schafft Deutschland die Energiewende?

Wo gehen neue Windräder in Betrieb? Woher kommt der Strom? Reicht das Gas für den Winter? Der komplett überarbeitete ZEIT-ONLINE-Energiemonitor gibt Antworten.

visualization

10:06 · Dec 17, 2022 · Sat

#fun

Denmark...

I thought French was complicated, now we all know Danish leads the race.

https://www.reddit.com/r/europe/comments/zo258s/how_to_say_number_92_in_european_countries/

r/europe on Reddit: How to say number "92" in European countries

Posted by u/trollrepublic - 1,637 votes and 429 comments

fun

08:25 · Dec 14, 2022 · Wed

#visualization

https://www.nature.com/articles/d41586-022-04174-6

Nature

Nature’s top science graphics from 2022

Nature - From brain growth to COVID variants to vanishing trees, editors choose the charts and diagrams that define the year.

visualization

22:23 · Dec 7, 2022 · Wed

#ml

In his MinT paper, Hyndman said he confused these two quantities in his previous paper. 😂

MinT is a simple method to make forecasts with hierarchical structure coherent. Here coherent means the sum of the lower level forecasts equals the higher level forecasts.

For example, our time series has a strucutre like sales of coca cola + sales of spirit = sales of beverages. If this relations holds for our forecasts, we have coherent forecasts.

This may sound trivial, the problem is in fact hard. There are many trivial methods such as only forecasting lower levels (coca cola, spirit) then use the sum as the higher level (sales of beverages). These are usually too naive to be effective.

MinT is a reconciliation method that combines high level forecasts and the lower level forecasts to find an optimal combination/reconciliation.

https://robjhyndman.com/papers/MinT.pdf

22:54 · Dec 2, 2022 · Fri

#ml

This is amazing

compiled_model = torch.compile(model)

https://pytorch.org/get-started/pytorch-2.0/

PyTorch