DIY Archiving – A Primer
This article was originally written for and published at The New Tech on July 8th, 2012. It has been posted here for safe keeping.
So maybe you want to be an archivist but don’t know where to start. I don’t claim to be an expert, but I have learned a few things going down this path that I can share. Let’s break this up into two main sections: digital media and physical media. No matter what you are archiving, you should first pick out what you’re going to save. You don’t have to put too much consideration into this step. You can be passionate about a subject you wish to preserve, be helping a group of people, or doing it for the hell of it. Archiving is at it’s base both a way to ensure survival and a way to fill a hole left forgotten.
In the digital world, you’re going to be focused on downloading, storing, and uploading. Let’s start with downloading as you’re going to want to get your hands on some media. I started on a Windows computer, downloading directly with a browser. You can get your hands on some stuff simply with a DownThemAll plugin and a lot of free time. For streaming content I turn to the video downloader plugin for Firefox or Replay Media Catcher. This lets you pull videos from all your sites like Vimeo or Youtube. Sometimes things aren’t as easy as clicking a file to download it. You may have to use download sites, ftp servers, Bittorrent, or something less standard. You never know. Get to know how to use jdownloader, an ftp client like Filezilla, and a Bittorrent client like uTorrent. You might find a whole slew of content on some obscure site and you need to have the tools to dig it out. Moving on to more Unix-like systems, learn all you can about wget, curl, grep, and bash scripting. I’m not going to cover how to use all of these tools, but with a little practice there are few things you can’t get when you use them in tandem. You would be surprised how simple it is to whip up automated processes that do everything you want with just a few sophisticated commands. Also, be on the lookout for more specified tools. For example, I found a fantastic tool for downloading Youtube videos called youtubedl. If there is something out there to be downloaded, there is usually a tool for the job.
When you get the data, you’re going to need to hold it. I was originally downloading everything locally, and still like to keep local copies of data I retrieve. Always keep your data on at least two drives, and preferably buy your drives in pairs so you can easily stick with this rule. You can never have too much storage. I currently have 15TB locally just for storing archived media. When it comes to other storage mediums, I’m not easily swayed. Data tape is expensive to adopt, and cloud storage lacks stability. In earlier days, I had only an 80GB drive, so I would back up to DVD+Rs. A lot of people will tell you that your burned media will go bad after about 6-10 years, but I have yet to have a disc become unreadable. I will say that I’ve had a lot of luck with Verbatim discs. I would coaster many discs by other brands when burning, but have only ever had one Verbatim fail on me. Stick with what your budget is, the price of hard drives are only going down, but if you’re a kid on a budget a spindle of DVDs can help in a pinch. Also, keep an eye on solid state drives. While I have yet to adopt them, they are the new thing in mass storage, though the price is still a bit steep.
So now you want to share your data. You have many options to consider. I’ve been using The Internet Archive most recently to place files which should be saved. Depending on your content, this may not be the best option for you. A few of the techniques I mentioned before to download from can be good options for quick data dispersal. For example, setting up a torrent for your files can be done in minutes, and fast FTP/HTTP servers can be rented for however much money you want to spend. The main points here though are longevity and redundancy. You want your files up for a long time, and you want them to stay online somewhere if one server takes a tumble. While torrents alone are terrible for longevity, they are great at getting data out fast. Combine this with a server, or a data hosting/streaming service and you have some type of redundancy. Always make sure your data is accessible.
So now you might want also want to save physical media. This can be as easy or as difficult as you want it to be. Saving your physical media is best done by making it digital. The shelf life of digital data is only going up these days while physical media like paper or tape only degrades. While I’m a fan of my physical media, transferring it to a digital format is the best way to share it and keep it alive.
When dealing with publications or photographs, you can generally get good results digitizing them with a decent scanner. I happen to be a fan of Epsons, but anything around $100 these days should be able to give you decent quality scans. As always, read the reviews just to make sure. If you’re feeling a bit more crafty, you can try your hand at creating a book scanning set-up. This can easily copy all of your publications quickly. After you make your scans, you can perform any type of compression you wish, and even take the resulting image files to assemble a PDF.
Video can also be challenging to save. Whether it be VHS, Laserdisc, or even DVDs, things can get messy. When dealing with your older analog media, you can find devices to digitize them. For example, you can get a capture card like a Canopis ADVC, or any number of DVD recorders on the market. There are also a slew of other little gadgets to clean up the video along the digitizing process. After you get the video converted, you can compress the raw capture down with a codec such as h264 to make the file more manageable.
Audio can be viewed in a similar fashion to video. You can easily pipe a tape or record player into a receiver and feed the output to a nicer sound card. Here, audio can be captured using a program like Audacity and saved as a lossless file or compressed with something like Vorbis or MP3.
With something to drive you, a little know-how, and a lot of time, you can easily start archiving the media in your world. Though it may be daunting at first, you can easily build as you go. Start small, and end up saving big.