Basic Setup
The documentation for the setup.py file is not very clear, and I think the quickest way to learn this is through an example, so here is an example of some of the minimum things to configure in the setup.py:
If your folder structure is the following (see here for why this structure is preferred):
we can use the following setup.py.
from setuptools import setup, find_packages
setup(name=PACKAGE_NAME,
maintainer=AUTHOR,
version=PACKAGE_VERSION,
install_requires=['pandas', 'numpy'],
package_dir={'': 'src'},
packages=find_packages('src'),
package_data={PACKAGE_NAME: ['data/*']},
include_package_data=True,
description=DESCRIPTION,
python_requires='>=3.6')
The name, maintainer, and version are self-explanatory.
install_requires
specifies the dependent python packages so that when installing your package, if some of these packages are not already installed in the user’s system, they will be downloaded and installed. One can specify version requirements for each package here too.packages
specifies all packages to include1. They should be specified in the form of[foo, foo.bar, foo.tar, foo.bar.tool, ...]
, where foo.bar, foo.tar and foo.bar.tool are sub-packages. All sub-packages need to be listed explicitly.If you put all packages under one folder, you can use the find_packages function to automatically find all packages under that folder.
package_dir
specifies where your package resides. The keys to this dictionary are package names, and an empty package name stands for the root package. In the above example, your package directly resides below thesrc
folder.If we have
package_dir={'foo': 'lib'}
, then it means the package foo is the folder lib. If we then havepackages = ['foo', 'foo.bar']
, Distutils will be looking forlib/__init__.py
andlib/bar/__init__.py
for the foo and foo.bar packages respectively.package_data
specifies data files to be included for each package. The key is the package name and the value is a list of paths to the data files that relative to the directory containing the package (information from the package_dir mapping is used if appropriate).
Why do we choose the “src” structure?
We could’ve put our package directly under the project folder:
But then when we write our tests, the local package with the same PACKAGE_NAME will be imported instead of the installed package in our system.
We could’ve put the content of our package directly under the
src
folder instead of having another folder with our package name:This way, when we import PACKAGE_NAME, it won’t import the local package since they are under the folder named
src
. However, if we had done so, we would not only need to specify thatpackage_dir={PACKAGE_NAME: 'src'}
, but because the package name wouldn’t be picked up by the find_packages function, we would also need to manually list all packages inpackages
.We could’ve put the test inside the package:
But then we would be distributing the test as part of the package, which is not ideal.
In comparison, the structure we mentioned in the very beginning avoids all these problems and is therefore what I consider the best solution I’ve found so far.