Solving Python dependency hell
If I run a “python dependency hell” query on google, I get 1,940,000 results. That’s a lot! It clearly is a problem. However, in the past 2 years or so I have never experienced a problem when deploying python services in production. Well, at least not when I have followed the process anyways.
Tools
I use three tools to manage environments — pyenv, pip-tools and make. Let’s take a closer look into each one and how they contribute towards a hassle-free environment management.
pyenv — A Simple Python Version Management. This is a simple way to manage different Python versions. I use pyenv-virtualenv plugin as well to manage and create my environments. It’s easily installable with brew
brew update
brew install pyenv pyenv-virtualenv
After installation we only need to add a couple of lines to our .zshrc
(if you have bash, these might be a bit different, please check the README’s)
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
pip-tools — This Python package is our saviour. It’s a simple tool that takes in a list of packages our project depends on and compiles it into a package list that has all the versions fixed and compatible with each other.
We need to install pip-tools in every virtual environment just like any other package to use it.
While writing this post I found a medium post about an iteration of pip-tools with some nice additional benefits — pip-compile-multi.
GNU Make —The automation tool. Something that will help you forget all the commands you need to run in order to setup environment and run tests etc. Good thing that you will have your `Makefile` to refresh the knowledge.
Workflow
The above three tools help me be dependency worry free. Let’s get through the process.
- Setup the environment
pyenv install -s 3.11
pyenv virtualenv 3.11 demo-venv
pyenv local demo-venv
The above commands will install Python version that is needed (and will skip this step if it’s already available). Then given the version it will create virtual environment by the name demo-venv
. Last command will create a .python-version
file which will allow your shell to know which python to run from this directory.
2. Next, we will create a requirements.in
and dev-requirements.in
files:
# requirements.in
pandas
scikit-learn
# dev-requirements.in
-r requirement.txt
pytest
As you can see, in the dev-requirements.in
file we depend on requirements.txt
file which is not available yet. To get it, we will run pip-compile requirements.in
. This command will produce a requirements.txt
file:
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile requirements.in
#
joblib==1.2.0
# via scikit-learn
numpy==1.24.2
# via
# pandas
# scikit-learn
# scipy
pandas==1.5.3
# via -r requirements.in
python-dateutil==2.8.2
# via pandas
pytz==2022.7.1
# via pandas
scikit-learn==1.2.1
# via -r requirements.in
scipy==1.10.1
# via scikit-learn
six==1.16.0
# via python-dateutil
threadpoolctl==3.1.0
# via scikit-learn
And if you then compile dev-requirements.in
you will end-up with the file that looks like this
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile dev-requirements.in
#
attrs==22.2.0
# via pytest
exceptiongroup==1.1.0
# via pytest
iniconfig==2.0.0
# via pytest
joblib==1.2.0
# via
# -r requirements.txt
# scikit-learn
numpy==1.24.2
# via
# -r requirements.txt
# pandas
# scikit-learn
# scipy
packaging==23.0
# via pytest
pandas==1.5.3
# via -r requirements.txt
pluggy==1.0.0
# via pytest
pytest==7.2.1
# via -r dev-requirements.in
python-dateutil==2.8.2
# via
# -r requirements.txt
# pandas
pytz==2022.7.1
# via
# -r requirements.txt
# pandas
scikit-learn==1.2.1
# via -r requirements.txt
scipy==1.10.1
# via
# -r requirements.txt
# scikit-learn
six==1.16.0
# via
# -r requirements.txt
# python-dateutil
threadpoolctl==3.1.0
# via
# -r requirements.txt
# scikit-learn
tomli==2.0.1
# via pytest
3. Run a pip-sync dev-requirements.txt
command to install updated dependencies in your development environment.
4. I usually add the content of the requirements.txt
file to the setup.py
via install_requires
argument.
5. Automate the process with make
.
6. If you are really fired up with the automation, add a cookiecutter template.
We create repo for each developed functionality unit (endpoint to expose model) and therefore quite a bit of things needed to be copy-pasted from one repo to the other. Innevitably we made mistakes and needed to waist time for debugging. We found that having an automated way to creating repository with all the main files (including environment management) saved time.
And that is it! Every time you will update your packages, pip-compile will solve the dependencies for you and if there is conflicting package versions, you will know before running pip install
in production.
Note: There is also pre-commit hook
available to make sure that packages are always compiled.
Conclusion
After adopting the tools that I describe above I did not have a problem mismatching dependencies. Well once, when I tried to install additional package while building container and once all the solved dependencies were installed. It was a good reminder to keep my dependencies in one place!
How have you deal with Python dependencies? Have you found other tools/processes helpful?