Visualizing software dependencies

The web interface for Crow, the corpus and repository of writing, depends on a complex amalgam of interdependent bits of code built by thousands of people.

Five minute read.

A few years ago Thomas Thwaites decided to make a toaster. From scratch. Armed with the breadth of extant human knowledge (thanks, internet!), after a bit of petroleum refining here, a bit of iron smelting there, Thwaites would assemble some things into a bigger thing that would be capable of toasting bread. Surely a simple task. It wasn’t. Thwaites’ exploration into the institutional knowledge and global dependencies that go into a seemingly trivial kitchen appliance makes visible the complexities of something we take for granted. Thwaites shows the real cost of a $20 toaster.

Most web applications I build can’t make your breakfast, but like your average toaster, they have a deep system of prerequisites and dependencies. And even though I do software development full-time—even though my job is to understand what’s going on behind the scenes—I take for granted how much my software relies on the work of so many others.

Like practically all software today, Crow’s online corpus and repository of writing leverages many other software packages — bundles of code that provide discrete services such as sending and receiving data over the web, reading from and writing to a database, rendering tables and forms and charts. Caching. Authenticating. Validating. In the case of Crow’s software, there are also corpus-specific libraries for normalizing, tokenizing, lemmatizing, indexing, querying, filtering, excerpting, and highlighting.

Those parts of the Crow code were created (and are actively maintained) by other developers. Not me. The software I build just sits on top of it, connecting the dots to make purpose out of possibility. And part of my development time is simply keeping Crow code updated with changes in those dependencies, changes which can take the form of bug fixes, security patches, and new features.

To help me with those updates, I built a tool to visualize my code’s dependencies: the Composer Dependency Tree Generator.

Package Management

A bit of background: most contemporary software uses package management tools to define and retrieve all of the bits of those external software libraries that are needed. In the case of the PHP programming language, that package management tool is Composer. Package managers also help developers identify when those packages have available updates (though testing the updates before applying them is still the work of the developer). Package requirements are stored in a single file which contains all of the building blocks of the application — its DNA. The Composer Dependency Tree Generator takes that DNA file and renders it as a collapsible tree.

As visualized below, the Crow software backend requires 36 packages in order for me to write the “real” code of the application (click image to interact).

Maybe 36 dependencies for a single project seems reasonable, even manageable. But as you probably already guessed, those packages depend on other packages. Zoom in on one small but critical part of the application: user authentication. Crow depends on Simple OAuth to send user credentials through the internet securely and verify that the user is who they say they are. OAuth depends on other package (for instance, league/oauth_server implements the OAuth 2.0 standard for handling access tokens and refresh tokens), which in turn depend on other libraries like defuse/php-encryption to encrypt data using the OpenSSL protocol, which is, itself, a dependency. The fractal nature of dependencies quickly comes into focus (select to enlarge):

Because of these interrelationships, Crow’s DNA file — not the actual code, mind you, just the list of which building blocks the code needs — is 8,700 lines long. Behold, the fully articulated Crow dependency tree (select to interact):

So: a lot of building blocks. A lot of moving parts. A lot of humans writing code. When you think about it this way, that simple blog you pay $20 a month for was built — and is actively maintained — by thousands of people.

So what?

For me, three main insights come to mind. First, software developers simply cannot build applications without relying on a preponderance of other people’s work. One open source project I built for the Drupal content management system, Layout Builder Restrictions, is in active use on 12,000 websites. Let’s say it’s taken me 100 hours to write and maintain that code. If the developers of each of those 12,000 websites had to build the same functionality as Layout Builder Restrictions, individually, just that tiny bit of functionality would have added weeks of work to the timeline of each of those sites: 1.2 million hours of total developer time.

Second: put that first point the other way: the fact that so many developers’ work is provided open source means that we can quickly build many applications that do many different things: 12,000 websites can be built in a fraction of the time it would take developers working in isolation. (And if you’ll forgive the pretentious meta moment, I would point out that my Composer Dependency Tree Generator, itself, is simply an implementation of the open source visualization tool, D3JS. It took me only hundreds of lines of code to write because someone else had already written thousands.)

And third (the inescapable and scary implication of the previous points): given the amount of code it takes to build a modern web application, and given that code is necessarily built by many different people, a single development team on a single project simply cannot review and vet every line of code. Scanning the Crow dependency tree above, I’ve read far less than 1 percent of its total code. There is no other way to say it: the way we build software — the only way we can build software — carries with it inherent vulnerability.

But there’s good news. I may not have looked at the code personally but many other open source contributors have. And when problems are discovered — bugs or security flaws — developers collaborate to fix the problems and make those fixes available to the rest of us. And that’s why I think it’s so important not just to use open source software but to contribute. Return to that project I built, Layout Builder Restrictions. If I made that software proprietary and charged those 12,000 websites $20 a month, I’d be rich enough to retire. But then: if I had to pay $20 a month for the myriad packages my applications depend on, I’d be filing for bankruptcy.

So: the open source methodology works because of deep interdependencies, and open-sourcing creates a virtuous cycle through participation and proliferation. When I can share my labor back to the community, all things considered, it’s a bargain.