in Uncategorized

The Ultimate Debian Database, and why I co-host a public mirror of it

In the summer of 2013, I worked with a lovely fellow named David Lu to do some research on how Debian welcomes new volunteers. To support him, I set up a public mirror of a database he needed called the Ultimate Debian Database. I’m still hosting that public database because it has come into wide use in the Debian world.

I wanted to write about its history, how people use it, and thank the people who hosted it in the past and will host it in the future.

The point of the Ultimate Debian Database (UDD) is to capture information about activity within Debian in a SQL database, mainly to analyze where to apply volunteer energy to improve Debian’s quality. Volunteers participate in Debian a few ways, and very few of them actually involve a SQL database. They file bugs in a bug tracking system that operates by email, not via web forms. They upload new versions of software (packages) by FTP. UDD gathers data from those systems via a series of scripts and stores the data in a Postgres database. You can read more on its page on the Debian wiki or the 2009 Debconf presentation about it.

The database itself is only accessible from Debian servers. Since David wasn’t a Debian project member, this presented a roadblock to our research. I set up a system that runs a Postgres database with public read access based on periodic snapshots that UDD publishes. Also, honestly, I just love hosting public mirrors of other people’s data. I remember doing that as far back as 2003.

The first use of the public UDD mirror was the Debian reproducible builds project. Lunar and I kicked off the Debian binary reproducibility project by writing a Debian wiki page and having a discussion at Debconf in the summer 2013. I stepped away due to lack of time that year, and the project rolled forward with great force. The public status pages for how much of Debian could be reproducibly built relied on the public UDD mirror. Mattia Rizzolo got involved in reproducible builds work in 2014, and he became a co-maintainer of the UDD mirror, too.

Over the years, the UDD mirror has come into wide use. Various Debian quality assurance tools rely on it, such as lintian, janitor, and a tool that lists rust security advisories that affect Rust packages in Debian. It also made its way into at least one person’s masters thesis! You can find quite a few other places it’s used in a GitHub code search or Debian code search.

Thanks to XVM for hosting it all these years! They’re a group of volunteers at MIT who provide free virtual machines to the MIT community. I had to ask them for extra RAM and CPU, and they were able to help out.

Today, we’re moving it to Fosshost to take advantage of faster CPUs and more disk space as well as the ability to share console access with Mattia. I’m relieved to stop being a single point of failure for it. I’m grateful to Fosshost for hosting, to Mattia for co-maintaining over the years, and to all the people who have used the mirror to improve Debian over the years!