PinnedEric SunImprove Ingest Latency and Query Efficiency of Data Lake — Partition and IndexData Lake continues to offer better and better Cost Performance with the new features and integration patterns. With the summits of both…Jul 34Jul 34
PinnedEric SunData Dependency Driven OrchestrationAirFlow and Prefect are probably the most popular schedulers in 2021. They are both more data-aware than the traditional orchestration…Jan 11, 20212Jan 11, 20212
PinnedEric SunAre We Taking Only Half Of The Advantage Of Columnar File Format?Sorting the records in columnar data format is a critical design considerations that many of us have not paid attention. Let’s leverage it.Mar 16, 20202Mar 16, 20202
Eric SunLego vs SoC, Apple M1 + MT8195, Microservices and Big Data ModelThis week (2020–11–10) was really big for System on a Chip: first Apple M1, and then followed by MediaTek MT8195/MT8192. But why on earth…Nov 22, 20201Nov 22, 20201
Eric SunReshape Data Lake: Delta, Iceberg, Hudi, or HiveThe super success of Spark in the ETL area also showed that many paradigms in the traditional data warehouse are indeed critical and usefulMar 16, 20204Mar 16, 20204