Run Apache Airflow on Windows 10

May 8, 2021 · 4 minute read

Run Apache Airflow on Windows 10

Apache Airflow is a great tool to manage and schedule all steps of a data pipeline. However, running it on Windows 10 can be challenging. Airflow’s official Quick Start suggests a smooth start, but solely for Linux users. What about us Windows 10 people if we want to avoid Docker ? These steps worked for me and hopefully will work for you, too.

Photo by Geran de Klerk on Unsplash

After struggling with incorrect configuration, I eventually found a way to install and launch my first Airflow instance. With high spirits I applied it to a data pipeline with Spark EMR clusters . I am happy to share my insights and list the steps that worked for me. If this also works for you - the better!

TLDR;

How to install and run Airflow locally with Windows subsystem for Linux (WSL) with these steps:

  1. Open Microsoft Store, search for Ubuntu, install it then restart

  2. Open cmd and type wsl

  3. Update everything: sudo apt update && sudo apt upgrade

  4. Install pip3 like this

    sudo apt-get install software-properties-common
    sudo apt-add-repository universe
    sudo apt-get update
    sudo apt-get install python3-pip
    
  5. Install Airflow: pip3 install apache-airflow

  6. Run sudo nano /etc/wsl.conf, insert the block below, save and exit with ctrl+s ctrl+x

      [automount]
      root = /
      options = "metadata"
    
  7. Run nano ~/.bashrc, insert the block below, save and exit with ctrl+s ctrl+x

export AIRFLOW_HOME=/c/users/YOURNAME/airflowhome
  1. Restart terminal, activate wsl, run airflow info
    1. Everything is fine if you see something like Apache Airflow [1.10.12]
    2. If you get errors due to missing packages, install them with pip3 install [package-name]
    3. Try airflow info again
    4. If it does not work by now, try to follow instructions by the error message. You might want to revert to Docker .

Airflow on Windows WSL

I managed to make it work with a Windows subsystem for Linux (WSL) which was recommended on blogs or Stack Overflow . However, even these resources lead into dead ends.

After a lot of try and error I want to help you with an approach that worked for me. Try to follow these steps. If you get stuck, try to resolve the error by installing missing dependencies, restart terminal or carefully check the instructions.

  1. Open Microsoft Store, search for Ubuntu, install it then restart

Run the following commands run in terminal:

  1. everything up to date with sudo apt update && sudo apt upgrade

  2. install pip3 by running

    sudo apt-get install software-properties-common
    sudo apt-add-repository universe
    sudo apt-get update
    sudo apt-get install python3-pip  
    
  3. Install Airflow: pip3 install apache-airflow

  4. type sudo nano /etc/wsl.conf

  5. To access directories like /c/users/philipp instead of /mnt/c/users/philipp insert the code block, save and exit with ctrl+s ctrl+x

    [automount]
    root = /
    options = "metadata"
    
  6. Type nano ~/.bashrc

  7. Define the environment variable AIRFLOW_HOME by adding the code below, then save and exit with ctrl+s, ctrl+x

    export AIRFLOW_HOME=/c/Users/philipp/AirflowHome
    
  8. Close terminal, open cmd again, type wsl

  9. Install missing packages with pip3 install [package-name]

  10. Restart terminal, activate wsl, run airflow info

    1. Everything is fine if you see something like Apache Airflow [1.10.12]
    2. If you get errors due to missing packages, install them with pip3 install [package-name]
    3. Try airflow info again
    4. If it does not work by now, try to follow instructions by the error message. You might want to revert to Docker .

Photo by Zhipeng Ya on Unsplash

Other ways to install Airflow

Docker offers a controlled environment (container) to run applications. Since Airflow solely runs on Linux it is a great candidate to use a Docker container. However, Docker is sometimes hard to debug, clunky and could add another layer of confusion. If you want to run Airflow with Docker see this tutorial .

How to run an Airflow instance

Now it is time to have a look at Airflow! Is AIRFLOW_HOME where you expect it to be? Open two cmd windows, activate wsland run:

# check whether AIRFLOW_HOME was set correctly
env | grep AIRFLOW_HOME

# initialize database in AIRFLOW_HOME
airflow initdb 

# initialize scheduler
airflow scheduler

# use the second cmd window to run
airflow webserver
# access the UI on localhost:8080 in your browser

Unfortunately, WSL does not support background tasks (daemon). This is why we have to open one terminal for airflow webserver and one for airflow scheduler.

Setup Airflow in a project setting

Copying your DAGs back and forth from a project folder to Airflow home directory is cumbersome. Fortunately, we can automate this with a bash script. For example, my project root directory is in /c/users/philipp/projects/project_name/ and contains one folder with all scripts related to data collection and processing named ./src/data/. I also have one folder for all Airflow-related files in ./src/airflow/. In this folder Have a look at my project Run Spark EMR clusters with Airflow on Github to see the project structure. You find the script deploy.sh in ./src/airflow.

I am thankful for Cookiecutter data science for inspiration about the project structure.