Airflow tutorial 3: Set up airflow environment using Google Cloud Composer

3 minute read

We will learn how to set up airflow environment using Google Cloud Composer

Overview of Cloud Composer

  • A fully managed Apache Airflow to make workflow creation and management easy, powerful, and consistent.
  • Cloud Composer helps you create Airflow environments quickly and easily, so you can focus on your workflows and not your infrastructure.

Hosting Airflow on-premise

Let’s say you want to host Airflow on-premise. In another word, you host Airflow on your local server. There are a lot of problems with this approach:

  • You will need to spend a lot of time doing DevOps work: create a new server, manage Airflow installation, takes care of dependency management, package management, make sure your server always up and running, then you have to deal with scaling and security issues…
  • If you don’t want to deal with all of those DevOps problem, and instead just want to focus on your workflow, then Google Cloud composer is a great solution for you.

Google Cloud Composer benefit

  • The nice thing about Google Cloud Composer is that you as a Data Engineer or Data Scientist don’t have to spend that much time on DevOps.
  • You just focus on your workflows (writing code), and let Composer manage the infrastructure.
  • Of course you have to pay for the hosting service, but the cost is low compare to if you have to host a production airflow server on your own. This is an ideal solution if you are a startup in need of Airflow and you don’t have a lot of DevOps folks in-house.

Key Cloud Composer features

  • Simplicity:
    • One-click to create a new Airflow environment
    • Client tooling including Google Cloud SDK, Google Developer Console
    • Easy and controlled access to the Airflow Web UI
  • Security:
    • Identity access management (IAM): manage credentials, permissions, and access policies.
  • Scalability:
    • Easy to scale with Google infrastructure.
  • Production monitoring:
    • Stackdriver logging and monitoring:
      • Provide logging and monitoring metrics, and alert when your workflow is not running.
    • Simplified DAG (workflow) management
    • Python package management
  • Comprehensive GCP integration:
    • Integrate with all of Google Cloud services: Big Data, Machine Learning…
    • Run jobs elsewhere: Other cloud provider, or on-premises.

Releases

Google Cloud composer is a new product from Google. With the latest push from Google, you can be sure that Apache Airflow is the current cutting edge technology in the software industry.

  • First beta release: May 1, 2018 (6 months ago)

  • Latest release: October 24, 2018
    • Support Python 3 and Airflow 1.10.0

Set up Google Cloud Composer environment

  • It’s extremely easy to set up. If you have a Google Cloud account, it’s really just a few clicks away.

Composer environment

  • You can create multiple environments within a project.
  • Each environment is a different kubernetes cluster with multiple nodes, so they are perfectly isolated from each other.

Create an environment

  • Choose how many nodes and disk size
  • Choose Airflow and Python version

A complete Composer environment

Installing Python dependencies

  • Installing a Python dependency from Python Package Index (PyPI)

Deployment

Deployment is simple. Google Cloud Composer uses Cloud Storage to store Apache Airflow DAGs, so you can easily add, update, and delete a DAG from your environment.

  • Manual deployment:
    • You can drag-and-drop your Python .py file for the DAG to the Composer environment’s dags folder in Cloud Storage to deploy new DAGs. Within seconds the DAG appears in the Airflow UI.
    • Using gcloud sdk command to deploy a new dag.
  • Auto deployment:
    • Your DAGs files are stored in a Git repository. You can set up a continuous integration pipeline to automatically deploy every time a merge request is done in the master branch.

More information

Leave a comment