AirFlow and Prefect are probably the most popular schedulers in 2021. They are both more data-aware than the traditional orchestration softwares. This article will describe an additional service architecture to put the data dependency as the enabling pattern for an effective & efficient orchestration in the complex big data environment which may span multiple data centers, cloud vendors, and hybrid topology.

It’s common to schedule a data flow/DAG as
# “0 0 2,14 ? * *” : everyday at 2AM and 2PM
# “0 0 */6 ? * *” : every 6 hours after the 1st execution
Yet the flow…

Eric Sun

Advocate best practice of big data technologies. Challenge the conventional wisdom. Peel off the flashy promise in architecture and scalability.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store