Outlining the Relevant Airflow Configuration SettingsĪirflow has a number of configuration settings. Secondly, the discussion is based on Airflow 1.10.x versions, as Airflow 2.0 will introduce significant changes. Much of what we discuss regarding Airflow worker settings are grounded within that context. First, we will assume you’ve already got a distributed Airflow deployment using the CeleryExecutor. Further, we’ll be equipping our mental models of understanding with the ‘why’.įinally, before we dive in, we’ll make two assumptions. Ultimately, we hope to achieve the understanding of ‘which’ Airflow control levers to pull and ‘when’. The discussion concludes on additional Airflow functionality supporting further tuning and monitoring for these situations. A walkthrough of several theoretical Airflow bottlenecks and the role those parameters play in fixing them follows. For example, we’ll call out the appropriate Airflow scheduler settings in the event of a scheduler bottleneck. To that end, we’ll begin by identifying a subset of Airflow’s configuration options that are relevant to the Airflow processes experiencing a bottleneck. We will frame the types of problems that can be identified and tuned for by discussing several broad examples. We mean that in a sense beyond Airflow’s own notion of configuration - the machine specs running our Airflow processes can also matter. Generally, problems of this nature are a result of either improper or insufficient Airflow configuration. The goal of this article is to connect questions like the above to various control levers we have in Airflow. If you’re reading this, maybe you’ve had and answered questions like this for yourself before. Ultimately, if my Airflow DAG performance is being negatively affected, what do I actually need to look at and fix? Why is my task scheduling latency increasing? Why are more tasks not being run, even after I add workers? You’ll have many questions and, frustratingly, fewer answers. You run the risk of developing a fragmented and mystical understanding of Airflow’s goings-on. You’ll apply solutions to those as they arise. You’ll discover and experience sporadic bottlenecks in Airflow-wide and DAG-level execution. You can easily shoot yourself in the foot if you don’t do it right at first. Properly configuring and tuning your Airflow deployment to be generally performant can be a nontrivial endeavor.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |