- Docker Community Edition
- Docker Compose
- pyenv (you can also use pyenv-virtualenv or virtualenvwrapper)
- Installing required packages for Docker and setting up docker repo
$ sudo apt-get update
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
- Install Docker
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
- Creating group for docker and adding current user to it.
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
Note : After adding user to docker group Logout and Login again for group membership re-evaluation.
- Test Docker installation
$ docker run hello-world
- Installing latest version of Docker Compose
$ COMPOSE_VERSION="$(curl -s https://api.github.com/repos/docker/compose/releases/latest | grep '"tag_name":'\
| cut -d '"' -f 4)"
$ COMPOSE_URL="https://github.com/docker/compose/releases/download/${COMPOSE_VERSION}/\
docker-compose-$(uname -s)-$(uname -m)"
$ sudo curl -L "${COMPOSE_URL}" -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
- Verifying installation
$ docker-compose --version
- Checking required packages
$ sudo apt-get install -y build-essential libssl-dev zlib1g-dev libbz2-dev \
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
xz-utils tk-dev libffi-dev liblzma-dev python-openssl git
$ sudo apt install build-essentials python3.6-dev python3.7-dev python3.8-dev python-dev openssl \
sqlite sqlite-dev default-libmysqlclient-dev libmysqld-dev postgresql
- Install pyenv
$ curl https://pyenv.run | bash
- Add the lines suggested at the end of installation to ~/.bashrc
- Restart your shell so the path changes take effect and verifying installation
$ exec $SHELL
$ pyenv --version
- Checking available version, installing required Python version to pyenv and verifying it
$ pyenv install --list
$ pyenv install 3.8.5
$ pyenv versions
- Creating new virtual environment named
airflow-env
for installed version python. In next chapter virtual environmentairflow-env
will be used for installing airflow.
$ pyenv virtualenv 3.8.5 airflow-env
- Entering virtual environment
airflow-env
$ pyenv activate airflow-env
Note
On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow you might need to either downgrade
pip to version 20.2.4 pip install --upgrade pip==20.2.4
or, in case you use Pip 20.3,
you need to add option --use-deprecated legacy-resolver
to your pip install command.
While pip 20.3.3
solved most of the teething
problems of 20.3, this note will remain here until we
set pip 20.3
as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only pip
installation is currently officially supported.
While they are some successes with using other tools like poetry or
pip-tools, they do not share the same workflow as
pip
- especially when it comes to constraint vs. requirements management.
Installing via Poetry
or pip-tools
is not currently supported.
If you wish to install airflow using those tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires.
Goto https://github.com/apache/airflow/ and fork the project.
Goto your github account's fork of airflow click on
Code
and copy the clone link.Open PyCharm and click
Get from Version Control
Paste the copied clone link in URL and click on clone.
- Open terminal and enter into virtual environment
airflow-env
and goto project directory
$ pyenv activate airflow-env
$ cd ~/PycharmProjects/airflow/
- Initializing breeze autocomplete
$ ./breeze setup-autocomplete
$ source ~/.bash_completion.d/breeze-complete
- Initialize breeze environment with required python version and backend. This may take a while for first time.
$ ./breeze --python 3.8 --backend mysql
- Creating airflow tables and users.
airflow db reset
is required to execute at least once for Airflow Breeze to get the database/tables created.
$ airflow db reset
$ airflow users create --role Admin --username admin --password admin --email [email protected] --firstname\
foo --lastname bar
- Closing Breeze environment. After successfully finishing above command will leave you in container,
type
exit
to exit the container
root@b76fcb399bb6:/opt/airflow#
root@b76fcb399bb6:/opt/airflow# exit
$ ./breeze stop
Installing airflow in the local virtual environment
airflow-env
with breeze.It may requires some packages to be installed, watch the output of the command to see which ones are missing.
$ sudo apt-get install sqlite libsqlite3-dev default-libmysqlclient-dev postgresql
$ ./breeze initialize-local-virtualenv --python 3.8
- Add following line to ~/.bashrc in order to call breeze command from anywhere.
export PATH=${PATH}:"/home/${USER}/PycharmProjects/airflow"
source ~/.bashrc
- Starting breeze environment using
breeze start-airflow
starts Breeze environment with last configuration run( In this case python and backend will be picked up from last execution./breeze --python 3.8 --backend mysql
) It also automatically starts webserver, backend and scheduler. It drops you in tmux with scheduler in bottom left and webserver in bottom right. Use[Ctrl + B] and Arrow keys
to navigate.
$ breeze start-airflow
Use CI image.
Branch name: master
Docker image: apache/airflow:master-python3.8-ci
Airflow source version: 2.0.0b2
Python version: 3.8
DockerHub user: apache
DockerHub repo: airflow
Backend: mysql 5.7
Port forwarding:
Ports are forwarded to the running docker containers for webserver and database
* 28080 -> forwarded to Airflow webserver -> airflow:8080
* 25555 -> forwarded to Flower dashboard -> airflow:5555
* 25433 -> forwarded to Postgres database -> postgres:5432
* 23306 -> forwarded to MySQL database -> mysql:3306
* 26379 -> forwarded to Redis broker -> redis:6379
Here are links to those services that you can use on host:
* Webserver: http://127.0.0.1:28080
* Flower: http://127.0.0.1:25555
* Postgres: jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
* Mysql: jdbc:mysql://127.0.0.1:23306/airflow?user=root
* Redis: redis://127.0.0.1:26379/0
Alternatively you can start the same using following commands
- Start Breeze
$ breeze --python 3.8 --backend mysql
- Open tmux
$ root@0c6e4ff0ab3d:/opt/airflow# tmux
- Press Ctrl + B and "
$ root@0c6e4ff0ab3d:/opt/airflow# airflow scheduler
- Press Ctrl + B and %
$ root@0c6e4ff0ab3d:/opt/airflow# airflow webserver
Now you can access airflow web interface on your local machine at http://127.0.0.1:28080 with user name
admin
and passwordadmin
.Setup mysql database in mysql workbench with Host
127.0.0.1
, port23306
, userroot
and password blank(leave empty), default schemaairflow
.Stopping breeze
root@f3619b74c59a:/opt/airflow# ./stop_airflow.sh
root@f3619b74c59a:/opt/airflow# exit
$ breeze stop
- Knowing more about Breeze
$ breeze --help
For more information visit : Breeze documentation
Following are some of important topics of Breeze documentation:
- Configuring Airflow database connection
Airflow is by default configured to use sqlite database. Configuration can be seen on local machine
~/airflow/airflow.cfg
undersql_alchemy_conn
.Installing required dependency for MySql connection in
airflow-env
on local machine.$ pyenv activate airflow-env $ pip install PyMySQL
Now set
sql_alchemy_conn = mysql+pymysql://root:@127.0.0.1:23306/airflow?charset=utf8mb4
in file~/airflow/airflow.cfg
on local machine.
- Debugging an example DAG in PyCharm
Add Interpreter to PyCharm pointing interpreter path to
~/.pyenv/versions/airflow-env/bin/python
, which is virtual environmentairflow-env
created with pyenv earlier. For adding an Interpreter go toFile -> Setting -> Project: airflow -> Python Interpreter
.In PyCharm IDE open airflow project, directory
\files\dags
of local machine is by default mounted to docker machine when breeze airflow is started. So any DAG file present in this directory will be picked automatically by scheduler running in docker machine and same can be seen onhttp://127.0.0.1:28080
.Copy any example DAG present in airflow project's
\airflow\example_dags
directory to\files\dags\
.Add main block at the end of your DAG file to make it runnable. It will run a back_fill job:
if __name__ == '__main__': from airflow.utils.state import State dag.clear(dag_run_state=State.NONE) dag.run()
Add
AIRFLOW__CORE__EXECUTOR=DebugExecutor
to Environment variable of Run Configuration.Click on Add configuration
Add Script Path and Environment Variable to new Python configuration
Now Debug an example dag and view the entries in tables such as
dag_run, xcom
etc in mysql workbench.
Click on branch symbol in the bottom right corner of Pycharm
Give a name to branch and checkout
All Tests are inside ./tests directory.
Running Unit tests inside Breeze environment.
Just run
pytest filepath+filename
to run the tests.
root@63528318c8b1:/opt/airflow# pytest tests/utils/test_decorators.py
======================================= test session starts =======================================
platform linux -- Python 3.8.6, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: celery-4.4.7, requests-mock-1.8.0, xdist-1.34.0, flaky-3.7.0, rerunfailures-9.0, instafail
-0.4.2, forked-1.3.0, timeouts-1.2.1, cov-2.10.0
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 3 items
tests/utils/test_decorators.py::TestApplyDefault::test_apply PASSED [ 33%]
tests/utils/test_decorators.py::TestApplyDefault::test_default_args PASSED [ 66%]
tests/utils/test_decorators.py::TestApplyDefault::test_incorrect_default_args PASSED [100%]
======================================== 3 passed in 1.49s ========================================
- Running All the test with Breeze by specifying required python version, backend, backend version
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All tests
- Running specific test in container using shell scripts. Testing in container scripts are located in
.\scripts\in_container
directory.
root@df8927308887:/opt/airflow# ./scripts/in_container/
bin/ run_flake8.sh*
check_environment.sh* run_generate_constraints.sh*
entrypoint_ci.sh* run_init_script.sh*
entrypoint_exec.sh* run_install_and_test_provider_packages.sh*
_in_container_script_init.sh* run_mypy.sh*
prod/ run_prepare_provider_packages.sh*
refresh_pylint_todo.sh* run_prepare_provider_documentation.sh*
run_ci_tests.sh* run_pylint.sh*
run_clear_tmp.sh* run_system_tests.sh*
run_docs_build.sh* run_tmux_welcome.sh*
run_extract_tests.sh* stop_tmux_airflow.sh*
run_fix_ownership.sh* update_quarantined_test_status.py*
root@df8927308887:/opt/airflow# ./scripts/in_container/run_docs_build.sh
Running specific type of test
- Types of tests
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All CLI Heisentests Integration Other Providers API Core Helm MySQL Postgres WWW
- Running specific type of Test
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type Core
Running Integration test for specific test type
- Types of Integration Tests
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type Core --integration all kerberos openldap presto redis cassandra mongo pinot rabbitmq
- Running an Integration Test
$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type All --integration mongo
For more information on Testing visit : TESTING.rst
Following are the some of important topics of TESTING.rst
Before committing changes to github or raising a pull request, code needs to be checked for certain quality standards such as spell check, code syntax, code formatting, compatibility with Apache License requirements etc. This set of tests are applied when you commit your code.
To avoid burden on CI infrastructure and to save time, Pre-commit hooks can be run locally before committing changes.
- Installing required packages
$ sudo apt install libxml2-utils
- Installing required Python packages
$ pyenv activate airflow-env
$ pip install pre-commit
- Go to your project directory
$ cd ~/PycharmProjects/airflow
- Running pre-commit hooks
$ pre-commit run --all-files
No-tabs checker......................................................Passed
Add license for all SQL files........................................Passed
Add license for all other files......................................Passed
Add license for all rst files........................................Passed
Add license for all JS/CSS/PUML files................................Passed
Add license for all JINJA template files.............................Passed
Add license for all shell files......................................Passed
Add license for all python files.....................................Passed
Add license for all XML files........................................Passed
Add license for all yaml files.......................................Passed
Add license for all md files.........................................Passed
Add license for all mermaid files....................................Passed
Add TOC for md files.................................................Passed
Add TOC for upgrade documentation....................................Passed
Check hooks apply to the repository..................................Passed
black................................................................Passed
Check for merge conflicts............................................Passed
Debug Statements (Python)............................................Passed
Check builtin type constructor use...................................Passed
Detect Private Key...................................................Passed
Fix End of Files.....................................................Passed
...........................................................................
- Running pre-commit for selected files
$ pre-commit run --files airflow/decorators.py tests/utils/test_task_group.py
- Running specific hook for selected files
$ pre-commit run black --files airflow/decorators.py tests/utils/test_task_group.py
black...............................................................Passed
$ pre-commit run flake8 --files airflow/decorators.py tests/utils/test_task_group.py
Run flake8..........................................................Passed
- Running specific checks in container using shell scripts. Scripts are located in
.\scripts\in_container
directory.
root@df8927308887:/opt/airflow# ./scripts/in_container/
bin/ run_flake8.sh*
check_environment.sh* run_generate_constraints.sh*
entrypoint_ci.sh* run_init_script.sh*
entrypoint_exec.sh* run_install_and_test_provider_packages.sh*
_in_container_script_init.sh* run_mypy.sh*
prod/ run_prepare_provider_packages.sh*
refresh_pylint_todo.sh* run_prepare_provider_documentation.sh*
run_ci_tests.sh* run_pylint.sh*
run_clear_tmp.sh* run_system_tests.sh*
run_docs_build.sh* run_tmux_welcome.sh*
run_extract_tests.sh* stop_tmux_airflow.sh*
run_fix_ownership.sh* update_quarantined_test_status.py*
root@df8927308887:/opt/airflow# ./scripts/in_container/run_docs_build.sh
- Enabling Pre-commit check before push. It will run pre-commit automatically before committing and stops the commit
$ cd ~/PycharmProjects/airflow
$ pre-commit install
$ git commit -m "Added xyz"
- To disable Pre-commit
$ cd ~/PycharmProjects/airflow
$ pre-commit uninstall
- For more information on visit STATIC_CODE_CHECKS.rst
Following are some of the important links of STATIC_CODE_CHECKS.rst
- To know how to contribute to the project visit CONTRIBUTING.rst
Following are some of important links of CONTRIBUTING.rst
Go to your Github account and open your fork project and click on Branches
Click on
New pull request
button on branch from which you want to raise a pull request.Add title and description as per Contributing guidelines and click on
Create pull request
.
Often it takes several days or weeks to discuss and iterate with the PR until it is ready to merge.
In the meantime new commits are merged, and you might run into conflicts, therefore you should periodically
synchronize master in your fork with the apache/airflow
master and rebase your PR on top of it. Following
describes how to do it.