Having worked with and built different configuration methods, from YAML based libraries to simple text files, I noticed argparse as a standard library in Python is often enough for many projects. However, the main struggle point for argparse in my opinion was how it scaled to multiple files. We don’t want a monster configuration file that describes every parameter; that approach is very difficult to maintain. Instead, it would be nicer to have each module or file keep track of its own set of arguments. Another requirement is to avoid duplicate arguments. If the data module declares a batch size argument, we should be able to use it from the training script and the data module itself. Finally, I’m a big fan of small tools / libraries that you can understand what is going on and extend easily based on the project’s requirements.

Let’s take a simple 2 file project which consists of a data.py that generates / loads some data and a train.py file that utilises the data module. For the full code, including the library itself, you can have a look at the full Github Gist.

"""Some data loading module."""
from typing import List
import random

import configlib
from configlib import config as C

# Configuration arguments
parser = configlib.add_parser("Dataset config")
parser.add_argument("--data_length", default=4, type=int, help="Length of random list.")


def load_data() -> List[int]:
    """Load some random data."""
    data = list(range(C["data_length"]))
    random.shuffle(data)
    return data

We will look at configlib shortly, but first let’s note that the data module declares arguments and uses them. The way we declare arguments is identical to vanilla argparse, that is because it is argparse! We are just exposing it in a more manageable way such that we can keep track of all the arguments declared across files. The train.py script can now utilise data module and declare its own arguments on top:

"""Some training script that uses data module."""
import configlib
from configlib import config as C

import data


# Configuration arguments
parser = configlib.add_parser("Train config")
parser.add_argument("--debug", action="store_true", help="Enable debug mode.")


def train():
    """Main training function."""
    if C["debug"]:
        print("Debugging mode enabled.")
    print("Example dataset:")
    print(data.load_data())


if __name__ == "__main__":
    configlib.parse(save_fname="last_arguments.txt")
    print("Running with configuration:")
    configlib.print_config()
    train()

Only in the main script, bottom block of code, we ask configlib to now parse the arguments and potentially save them to a file called last_arguments.txt for later use. Once we call parse, all the arguments passed to our program will be parsed and exposed through the global configlib.config variable. Running python3 train.py -h, we can see all the arguments curated across all the files:

python3 train.py -h
usage: train.py [-h] [--data_length DATA_LENGTH] [--debug]

Configuration for experiments.

optional arguments:
    -h, --help            show this help message and exit

Dataset config:

    --data_length DATA_LENGTH
                        Length of random list.

Train config:

    --debug               Enable debug mode.

We can run our training script by passing arguments from the command line, or from the saved arguments file. It is very important in most of my projects to be able to rerun from a specific configuration and we can easily handle that from a text file that describes all the arguments, argparse provides a mechanism for parsing the file already:

python3 train.py --data_length 10
python3 train.py @last_argument.txt

and without further conversation, here is the small configlib that puts together this use case of argparse:

"""Configuration library for experiments."""
from typing import Dict, Any
import logging
import pprint
import sys
import argparse

# Logging for config library
logger = logging.getLogger(__name__)

# Our global parser that we will collect arguments into
parser = argparse.ArgumentParser(description=__doc__, fromfile_prefix_chars="@")

# Global configuration dictionary that will contain parsed arguments
# It is also this variable that modules use to access parsed arguments
config: Dict[str, Any] = {}


def add_parser(title: str, description: str = ""):
    """Create a new context for arguments and return a handle."""
    return parser.add_argument_group(title, description)


def parse(save_fname: str = "") -> Dict[str, Any]:
    """Parse given arguments."""
    config.update(vars(parser.parse_args()))
    logging.info("Parsed %i arguments.", len(config))
    # Optionally save passed arguments
    if save_fname:
        with open(save_fname, "w") as fout:
            fout.write("\n".join(sys.argv[1:]))
        logging.info("Saving arguments to %s.", save_fname)
    return config


def print_config():
    """Print the current config to stdout."""
    pprint.pprint(config)

As you can see, it is a very small, extendable library that keeps track of a global variable config as well as handling of argparse groups through the add_argument_group method. Essentially, it acts as a glue between different files and provides a single uniform way of accessing configuration in our projects. We can see all of our configuration in one place, save it and load it later. My opinions regarding this library:

  • The library is small and easy to understand. It is contained in a single file and wouldn’t scare new users from adopting it. Installing it is simple as taking the idea and implementing into the existing project.
  • It does the job and nothing more. Do we have simple argparse based multiple file configuration handler in Python? Yes. If users need more, they can extend and implement their own requirements such as saving to JSON (which would be just json.dumps(config) inside the parsing function).
  • It is extendable as mentioned in the above comment with no external dependencies. So we can take it off the shelf and add our own requirements. What I’ve noticed over the years is, there is no single library that fits all my requirements. We are better off building small tools that have some common ground but are specialised for our projects. Yes, I do end up re-implementing similar stuff but I get the most out of the libraries I implement for each project rather than trying to fit the project to use a specific library.