In the fast-paced world of data management, efficiency and cost-effectiveness are key. Apache Airflow has become a vital tool for orchestrating workflows. Still, when we initially adopted Amazon Web Services’ Managed Workflow for Apache Airflow (MWAA), we were drawn to its managed services promises. However, as our data operations grew, so did the cost of MWAA. We realized it was time for a change.
In this blog post, we’ll share our journey of transitioning from MWAA to self-hosting Apache Airflow on AWS Kubernetes (K8s) via Plural. We aimed to reduce costs while gaining control, scalability, and customization. This story highlights our experiences, challenges, and significant cost savings achieved in the process, offering insights for those considering a similar move or looking to optimize their Airflow deployments.
Our Breaking Point with MWAA
MWAA offered promises of convenience, but it was soon that we reached a breaking point, driven by several significant concerns that impacted our efficiency, flexibility, and costs.
Hidden Costs of CloudWatch Logging
One of the first challenges we encountered was the hidden costs associated with CloudWatch logging. While MWAA’s managed services handled various operational aspects, the pricing structure for CloudWatch logging could become unexpectedly high as our data operations expanded. It was a financial burden that we had yet to anticipate fully. MWAA prominently promotes its standard Small, Medium, and Large environment pricing, but it conveniently omits CloudWatch from the discussion because it isn’t enabled by default. You could run tasks without task logs — but when they fail, how would you know what went wrong?
Missing Features and Limitations
MWAA came with some notable limitations. Features like Deferrable Operators and the Airflow Stable REST API were absent, limiting our ability to design and execute workflows according to our needs. These missing features added complexity to our operations.
Delayed Airflow Updates
MWAA’s dependency on AWS infrastructure meant we couldn’t use the latest Airflow version until AWS officially released it for MWAA. This delay restricted our access to new features.
As we reached a breaking point with MWAA, we began seeking an alternative that could offer us greater control, cost-efficiency, and flexibility. This quest led us to explore self-hosting, and in that exploration, we discovered Plural.
Why Plural
The Plural platform builds on the premise that there is a better way to manage infrastructure and deploy applications, one that doesn’t require giving up control, portability, privacy, and cost-effectiveness for the sake of convenience. Plural’s website contains a marketplace of self-hosted applications unlocked when using their platform.
After reviewing the applications in Plural’s marketplace, we determined that we could replace not only MWAA with OSS Airflow but also Talend Stitch with OSS Airbyte. Stitch bills by the row (similar to FiveTran), and we found it increasingly expensive to use as our data grew. Lastly, we sought a Data Cataloging solution and actively conversed with Monte Carlo. Instead of paying a premium for their managed service, we concluded that we could use DataHub (another app available in the Plural marketplace).
Everything seemed too good to be true because we were killing three birds with one stone (Data Ingestion, Data Orchestration, and Data Cataloging). So, we put together a plan that would allow us to test out Plural without entirely deprecating MWAA & Stitch.
Planning the Transition
To start the transition, we created a 1:1 copy of our MWAA Airflow in Plural. To do so required us to spin up an EKS cluster in our AWS Account (we couldn’t install an application without the underlying infrastructure). Creating the EKS cluster was straightforward and only took an hour to initialize. I did it independently by following Plural’s CLI quickstart guide.
Installing Airflow to the cluster took a Plural CLI command and about 20 minutes. After we had the empty Airflow environment in our Plural Cluster, we completed the following steps:
- We created our own custom Airflow image using AWS ECS. The custom image allowed us to install pip dependencies needed in Airflow and initialize a Python virtual environment to keep our dbt workloads isolated from Airflow.
- We copied our MWAA DAGs and supporting files to the repo that Plural was syncing to the newly created Airflow environment.
- Lastly, we fine-tuned the Plural Airflow configuration settings so that:
- The cluster could authenticate to AWS Secrets Manager to pull secrets for our Airflow Connections/Variables.
- Airflow Executor used KubernetesExecutor with proper requests/limits for Airflow Tasks.
- Airflow Tasks used a custom node group of SPOT instances to reduce our AWS costs.
Completing the above steps took a little over a day to complete. Documentation exists for most of these steps on Plural’s site, and when there wasn’t documentation, the Plural Discord community was also very helpful in getting started.
Cost Analysis
After initializing our 1:1 copy of Airflow in Plural, we tested each Airflow DAG individually to ensure it worked. Upon successfully testing all of our DAGs, we paused the DAGs in MWAA and unpaused them in our Plural Airflow.
After pausing the MWAA DAGs on 10/10/23, you can see that our CloudWatch costs disappeared from AWS (because the tasks were no longer running and generating logs). After about a week of successful DAG Runs in Plural, we completely deleted our MWAA environment on 10/17/23.
As you can see from the AWS Cost Explorer Graph, our daily costs are cut almost in half. So, would it be fair to say that Plural helped us run our Airflow workloads at half of the cost of MWAA? Well, yes, but remember, we also installed Airbyte and DataHub! So it would be more realistic to say that for half of the cost we were running MWAA, we can run Airflow, eliminate Stitch costs (approximately $300/mo), and avoid paying Monte Carlo as a managed service for Data Cataloging.
Conclusion
Our transition has been a significant milestone, not only in terms of cost savings but also in regaining control and customization over our data management solutions. This approach can empower other organizations to chart a similar course toward enhanced efficiency, cost-effectiveness, and data workflow management. The journey from MWAA to self-hosted Airflow on K8s via Plural has not only saved costs but also opened doors to innovation and flexibility in data management.
References and Resources
If you are looking to implement a Modern Data Stack for your business but need help figuring out where to start, please reach out to us on Discord or submit a service request here.