Last month GitHub announced its Arctic Code Vault – a data repository deep inside an old mine in the remote archipelago of Svalbard.
Have you ever heard of the Arctic World Archive (AWA, for short)? Just a mile or so away from the world-famous Global Seed Vault – an archived collection of seeds from around the world – the AWA is “a very-long-term archival facility” deep within an Arctic mountain. 250 meters deep, to be exact, in a decommissioned coal mine in the Svalbard archipelago, Norway. Not only is it above the Arctic Circle, but the AWA is also actually closer to the North Pole.
Now, GitHub wants to use it to store its code from millions of repositories that will be captured in a snapshot on 02/02/2020. All active public GitHub repos will be archived, as will “significant” dormant ones. The dormant ones will be chosen by their importance by an advisory panel, based on things like stars and dependencies.
As the company explains in a recent announcement:
“The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size. Each repository will be packaged as a single TAR file. For greater data density and integrity, most of the data will be stored QR-encoded. A human-readable index and guide will itemize the location of each repository and explain how to recover the data.”
Every few years, the company will create further backups. The plan is to store the code for at least 1,000 years in the cold. According to the AWA, it already preserves significant historical and cultural data from countries like Italy, Brazil, Norway, the Vatican, and many others. In fact, the entire collection of Edvard Munch’s artworks are stored there in digital format.
If you’re concerned about global warming affecting the storage, you probably shouldn’t. Climate change is only likely to affect some meters of layers of permafrost. And yes, even places like Svalbard are affected by climate change.
The project is a partnership between GitHub, the Long Now Foundation, the Internet Archive, Software Heritage, Arctic World Archive, Microsoft Research, the Bodleian Library, and Stanford Libraries – an effort to “ensure the long-term preservation of the world’s open-source software.”