Why do we need two types of repositories?

Rishi Yadav
2 min readAug 30, 2022

There are two types of repositories for modern applications in general and containerized applications in particular-one for source code and another for binary artifacts. We use them every day. Have we thought why these two types and only these two exist?

The tug of war between human-readablity and machine-readablity

While storing binary artifacts in the source code control system (SCCS) is not a crime, it may be considered a poor architecture decision. There was a time before Maven when it was a common practice. I am sure some companies are still doing it to avoid extinction and compromise the risk of open-source libraries.

Fast-forward to 2022; we have a separate category of solutions for each need. For source code control, Git-based repositories like GitHub, GitLab, and Bitbucket have become the gold standard. JFrog dominates the market for binaries, with Sonatype Nexus being the other leading choice.

What is the primary difference between these two types of repositories? A noticeable difference is that file types, and underlying formats are supported but I would consider it secondary. The primary difference is the functionality and life-cycle of these repository’s host content.

Versioning is about meta-data

The real power of versioning is not just the numbers associated with a major, minor & revision. The real power is the provenance that is embedded in the associated meta-data. Each system needs to make this provenance temper-proof.

Git-based SCCS’ do storage and versioning based on blobs. With the help of commit and tree as linking objects, version information is maintained and expanded. Temper-proof provenance enables features like rollback and measurement of developer productivity.

Binary repositories have a different type of challenge to meet. To start with dependence tree of binary libraries is arbitrarily complex in any organization which is more than a few days old. The first goal is to seamlessly enable the retrieval of packages and binaries. The second goal is to ensure an artifact is what it claims to be and has not been subject to a supply-chain attack (especially true for cloud-native applications where repetitive builds do not guarantee consistency of checksum).

Summary

Two types of repositories, i.e., Git-based and container/package-registries, represent the need to store and retrieve source code and binaries, respectively, effectively. There is also an in-between category of configuration artifacts that can be stored at either location. We need to see if these two categories remain static or evolve into one all-encompassing category.

Originally published at https://www.linkedin.com.

--

--

Rishi Yadav

This blog is mostly around my passion for generative AI & ChatGPT. I will also cover features of our chatgpt driven end-2-end testing platform https://roost.ai