#ML

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

TL;DR:
- Data quality is crucial in any AI especially for those with high-stakes.
- Many data work are overlooked easily: politics (some data entries are not recorded or misrecorded), human in the loop of data quality interventions for cleaning and wrangling but upstream data creation shall be controlled well too, etc
- Data Cascades: how the issues are cascading from upstream to downstream should be clear.

> Data Cascades: compounding events causing negative, downstream effects from data issues, resulting in technical debt over time.

https://research.google/pubs/pub49953/
 
 
Back to Top