Yes, AI is all the hype for now. But behind the scenes, a web of low-complexity interconnected code is fuelling the push. This “spaghetti code” phenomenon is increasingly prevalent in large AI libraries like LangChain, llama_index, raising concerns about maintainability and the long-term sustainability of development.

Rise of the Spaghetti Code

The term “spaghetti code” is not new. It has been around since the early days of programming. It refers to code that is tangled and difficult to understand. The code is often poorly structured, with many dependencies and interconnections between different parts of the codebase.

Every iteration there is a new abstraction that neatly fits the current landscape models, use cases, and data. But relatively soon a new model with new parameters comes in, and the whole thing starts to disintegrate.

We end up in this situation mainly because of the pressure to deliver results quickly and the rapid pace of development in the field. You cannot afford to spend too much time designing things out when the competition is fierce. I think it comes from these three main factors:

  • Rapid Development: The AI field is evolving at breakneck speed, leading to code that’s often hacked together rather than designed. The term hack is used here in a negative sense, meaning a quick and dirty solution that gets the job done but is not sustainable in the long run.
  • Large Libraries: Libraries like LangChain are powerful, but their size and the interconnected nature of their components can lead to convoluted code structures. It is effectively a massive web of interconnected code that can be difficult to navigate and understand.
  • Python Language: It works and is flexible. But this flexibility also pours gasoline on the fire. Create a class for this, another for that, then a function to glue them together with no type checking.
  • Glueing Code: A lot of AI development involves “glueing” together different pre-existing components. We basically call something, get its output, and pass it to the next component which requires a slightly different input.

Dawn of the Spaghetti Code

With several libraries and frameworks now gathering a huge user base, this problem is not going to disappear anytime soon. The ironic situation is:

  1. One starts with a wrapper library to build something. It is marketed as a way to simplify the process, a one-stop solution.
  2. One pip installs and sudo apt installs a bunch of dependencies.
  3. One ends up with a wrapper library to manage the wrapper library.

The example from the human chat model in LangChain:

from typing import TYPE_CHECKING, Any

from langchain._api import create_importer

if TYPE_CHECKING:
    from langchain_community.chat_models.human import HumanInputChatModel

# Create a way to dynamically look up deprecated imports.
# Used to consolidate logic for raising deprecation warnings and
# handling optional imports.
DEPRECATED_LOOKUP = {"HumanInputChatModel": "langchain_community.chat_models.human"}

_import_attribute = create_importer(__package__, deprecated_lookups=DEPRECATED_LOOKUP)


def __getattr__(name: str) -> Any:
    """Look up attributes dynamically."""
    return _import_attribute(name)


__all__ = [
    "HumanInputChatModel",
]

is importing, exporting, deprecating and dynamically looking up imports. One doesn’t get the confidence that the code is working with the programming language, but rather against it.

War of the Spaghetti Code

So how can we help prevent the rise of spaghetti code in AI development? Here are a few strategies:

  • Avoid wrapper libraries: Try to directly get to the core of the problem and use tools, libraries built for the model, use case etc. If you are using a model, training a network, or doing some data processing, try to use the core libraries directly.
  • Build small tools: It is better to build smaller, more modular tools that work well for a specific model, use case instead of trying to cover everything in one wrapper library.
  • Only you should glue: If you are gluing together different components, make sure you are the only one doing it. This way, you can ensure that the code is well-structured and maintainable.

Would you not just duplicate work across the community? Maybe, but most things in the AI landscape are slightly different, models take different parameters, different inputs and give different outputs. Glueing together small components is not that much work in general and can save you a lot of headache in the long run.