AirFlow and Prefect are probably the most popular schedulers in 2021. They are both more data-aware than the traditional orchestration softwares. …


(* originally posted in LinkedIn in 2018 )

Columnar file formats have become the primary storage choice for big data systems, but when I Googled related topics this weekend, I just found that most articles were talking about the simple query benchmark and storage footprint comparisons between a particular columnar…


This week (2020–11–10) was really big for System on a Chip: first Apple M1, and then followed by MediaTek MT8195/MT8192. But why on earth these have anything to do with lego, microservices and even data model? It is a topic I have put off for a few years.

Probably, all…


Data Lake” has become a buzz word since 2016, and we even invented the phrase “Lakehouse” lately. Did we simply upload the web logs and dump tables from MPP database into HDFS or cloud storage, so we call that Data Lake? …

Eric Sun

Advocate best practice of big data technologies. Challenge the conventional wisdom. Peel off the flashy promise in architecture and scalability.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store