Am Neumarkt 😱

Machine learning and other gibberish
See also: https://sharing.leima.is
Archives: https://datumorphism.leima.is/amneumarkt/

12:23 · May 21, 2022 · Sat

#ml

Parsimony with cognitive resource limitations 🤔

https://www.nature.com/articles/s41586-022-04743-9

Nature

People construct simplified mental representations to plan

Nature - Strategically perceiving and conceiving problems facilitates the effective use of limited cognitive resources.

06:31 · May 21, 2022 · Sat

#github

I have been following an issue on math support for github markdown (github/markup/issues/274).

One thousand years later ...

Math support in Markdown | The GitHub Blog
https://github.blog/2022-05-19-math-support-in-markdown/

The GitHub Blog

Math support in Markdown

We are pleased to announce that math expressions can now be rendered natively in Markdown on GitHub

github

02:19 · May 21, 2022 · Sat

#misc

Quote from this article:

"It doesn’t transmit from person to person as readily, and because it is related to the smallpox virus, there are already treatments and vaccines on hand for curbing its spread. So while scientists are concerned, because any new viral behaviour is worrying — they are not panicked."

https://www.nature.com/articles/d41586-022-01421-8

Nature

Monkeypox goes global: why scientists are on alert

Nature - Scientists are trying to understand why the virus, a less-lethal relative of smallpox, has cropped up in so many populations around the world.

misc

16:27 · May 18, 2022 · Wed

#ml

Finally... We can now utilize the real power of M1 chips.

Introducing Accelerated PyTorch Training on Mac | PyTorch
https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/

I have been following this issue: https://github.com/pytorch/pytorch/issues/47702#issuecomment-1130162835
There were even some fights. 😂

07:49 · May 15, 2022 · Sun

非常详细和深入
https://zhenzhongxu.com/the-four-innovation-phases-of-netflixs-trillions-scale-real-time-data-infrastructure-2370938d7f01

Medium

The Four Innovation Phases of Netflix’s Trillions Scale Real-time Data Infrastructure

The blog post will share the four phases of Real-time Data Infrastructure’s iterative journey in Netflix (2015-2021). For each phase, we will go over the evolving business motivations, the team’s unique challenges, the strategy bets, and the use case patterns…

07:13 · May 15, 2022 · Sun

#visualization

https://anvaka.github.io/map-of-reddit/?x=175273.66777410256&y=370576.01346498774&z=217281.8913341138

anvaka.github.io

Map of Reddit

This website shows a map of reddit. Each dot is a subreddit. Two dots within the same cluster are usually close to each other if multiple users frequently leave comments on both subreddits

visualization

11:46 · May 14, 2022 · Sat

#python

This post is a retro on how I learned Python.

Disclaimer: I can not claim that I am a master of Python. This post is a retrospective of how I learned Python in different stages.

I started using Python back in 2012. Before this, I was mostly a Matlab/C user.

Python is easy to get started, yet it is hard to master. People coming from other languages can easily make it work but will write some "disgusting" python code. And this is because Python people talk about "pythonic" all the time. Instead of being an actual style guide, it is rather a philosophy of styles.

When we get started, we are most likely not interested in [PEP8](https://peps.python.org/pep-0008/) and [PEP257](https://peps.python.org/pep-0257/). Instead, we focus on making things work. After some lectures from the university (or whatever sources), we started to get some sense of styles. Following these lectures, people will probably write code and use Python in some projects. Then we began to realize that Python is strange, sometimes even doesn't make sense. Then we started leaning about the philosophy behind it. At some point, we will get some peer reviews and probably fight against each other on some philosophies we accumulated throughout the years.

The attached drawing (in comments) somehow captures this path that I went through. It is not a monotonic path of any sort. This path is most likely to be permutation invariant and cyclic. But the bottom line is that mastering Python requires a lot of struggle, fights, and relearning. And one of the most effective methods is peer review, just as in any other learning task in our life.

Peer review makes us think, and it is very important to find some good reviewers. Don't just stay in a silo and admire our own code. To me, the whole journey helped me building one of the most important philosophies of my life: embrace open source and collaborate.

python

07:39 · May 14, 2022 · Sat

#health

https://www.scientificamerican.com/article/to-better-understand-women-rsquo-s-health-we-need-to-destigmatize-menstrual-blood/

Scientific American

To Better Understand Women’s Health, We Need to Destigmatize Menstrual Blood

Diseases such as endometriosis would have a cure if we could talk about them and study them without shame

health

19:01 · May 11, 2022 · Wed

#fun

Could use this

How to Lie with Statistics - Wikipedia
https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics

fun

07:25 · May 11, 2022 · Wed

#data

Stop squandering data: make units of measurement machine-readable
https://www.nature.com/articles/d41586-022-01233-w

Nature

Stop squandering data: make units of measurement machine-readable

Nature - In the age of big data, it is time to ensure that units are routinely documented for easy, unambiguous exchange of information.

data

19:48 · May 5, 2022 · Thu

#ml

https://ts.gluon.ai/

Highly recommended! If you are working on deep learning for forecasting, gluonts is a great package.
It simplifies all these tedious data preprocessing, slicing, backrest stuff. We can then spend time on implementing the models themselves (there're a lot of ready-to-use models).
What's even better, we can use pytorch lightning!

See this repository for a list of transformer based forecasting models.
https://github.com/kashif/pytorch-transformer-ts

GitHub

GitHub - kashif/pytorch-transformer-ts: Repository of Transformer based PyTorch Time Series Models

Repository of Transformer based PyTorch Time Series Models - kashif/pytorch-transformer-ts

05:34 · May 5, 2022 · Thu

#ml

Came across this post this morning. I realized the reason I am not writing a lot in Julia is simply because I don't know how to write quality code in Julia.

When we build a model in Python, we know all these details about making it quality code. For a new language, I'm just terrified by the amount of details I need to be aware of.

Ah I'm getting older.

JAX vs Julia (vs PyTorch) · Patrick Kidger
https://kidger.site/thoughts/jax-vs-julia/

Patrick Kidger

Personal Website. Math, SciML, scuba diving!

06:12 · May 3, 2022 · Tue

#python

Anaconda open sourced this...

I have no idea what this is for...

https://github.com/pyscript/pyscript

GitHub

GitHub - pyscript/pyscript: PyScript is an open source platform for Python in the browser. Try PyScript: https://pyscript.com …

PyScript is an open source platform for Python in the browser. Try PyScript: https://pyscript.com Examples: https://tinyurl.com/pyscript-examples Community: https://discord.gg/HxvBtukrg2 - pyscri...

python

06:10 · May 3, 2022 · Tue

#ml

I heard about information bottleneck so many times but didn't really go back and read the original papers.

I spent some time on it and I found it quite interesting. It is philosophically based on what was described in Vapnik's The Nature of Statistical Learning, where he discussed how generalizations work by enforcing parsimony.
Here in this information bottleneck paper, the most interesting thing is the quantified generalization gap and complexity gap. With these, we know where to go on the information plane.

It's a good read.

Tishby N, Zaslavsky N. Deep Learning and the Information Bottleneck Principle. arXiv [cs.LG]. 2015. Available: http://arxiv.org/abs/1503.02406,

00:46 · May 3, 2022 · Tue

#visualization

Hmm... Interesting pattern

https://www.reddit.com/r/dataisbeautiful/comments/uh02hd/uk_covid_deaths_in_2021_assorted_by_age_group_oc/

UK COVID deaths in 2021 assorted by age group [OC]

Posted in r/dataisbeautiful by u/Material-Banana8953 • 633 points and 111 comments

visualization

13:16 · Apr 24, 2022 · Sun

#work

I realized something interesting about time management.

If I open my calendar now, I see these “tiles” of meetings filling up most of my working hours. It looks bad, but it was even worse in the past. The thing is, if I do meetings during my working hours, I will have to work extra hours to do some thinking and analysis. It is rather cruel.

So what changed? I think I realized the power of Google Docs. Instead of many people talking and nobody listening, someone should write up a draft first and send it out to the colleagues. Then, once people get the link to the docs, everyone can add comments.

This doesn’t seem to be very different from meetings. Oh, it is very different. The workflow can be async. We are not forced to use our precious focus time to attend meetings. We can read and comment on the document whenever we like: when we are commuting, when we are taking a dump, when we are on a phone/tablet, just, any, time.

Apart from the async workflow, I also like the "think, comment and forget" idea. I feel people deliver better ideas when we think first, comment next, and forget about it unless there are replies to our comments. No pressure, no useless debates.

work

08:45 · Apr 20, 2022 · Wed

#ml #statistics

I read about conformal prediction a while ago and realized that I need to understand more about the hypothesis testing theories. As someone from natural science, I mostly work within the Neyman-Pearson ideas.
So I explored it a bit and found two nice papers. See the list below. If you have other papers on similar topics, I would appreciate some comments.

1. Perezgonzalez JD. Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Front Psychol. 2015;6: 223. doi:10.3389/fpsyg.2015.00223 https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00223/full
2. Lehmann EL. The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two? J Am Stat Assoc. 1993;88: 1242–1249. doi:10.2307/2291263

Frontiers

Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing

Despite frequent calls for the overhaul of null hypothesis significance testing (NHST), this controversial procedure remains ubiquitous in behavioral, social and biomedical teaching and research. Little change seems possible once the procedure becomes well…

ml statistics

06:05 · Apr 15, 2022 · Fri

Which language(s) do you prefer for reading?

Anonymous Poll

06:03 · Apr 15, 2022 · Fri