PinnedImprove Ingest Latency and Query Efficiency of Data Lake — Partition and IndexData Lake continues to offer better and better Cost Performance with the new features and integration patterns. With the summits of both…Jul 3, 2024A response icon4Jul 3, 2024A response icon4
PinnedData Dependency Driven OrchestrationAirFlow and Prefect are probably the most popular schedulers in 2021. They are both more data-aware than the traditional orchestration…Jan 11, 2021A response icon2Jan 11, 2021A response icon2
PinnedAre We Taking Only Half Of The Advantage Of Columnar File Format?Sorting the records in columnar data format is a critical design considerations that many of us have not paid attention. Let’s leverage it.Mar 16, 2020A response icon2Mar 16, 2020A response icon2
Lego vs SoC, Apple M1 + MT8195, Microservices and Big Data ModelThis week (2020–11–10) was really big for System on a Chip: first Apple M1, and then followed by MediaTek MT8195/MT8192. But why on earth…Nov 22, 2020A response icon1Nov 22, 2020A response icon1
Reshape Data Lake: Delta, Iceberg, Hudi, or HiveThe super success of Spark in the ETL area also showed that many paradigms in the traditional data warehouse are indeed critical and usefulMar 16, 2020A response icon4Mar 16, 2020A response icon4