Monday, August 6th, 2012
This article was originally written for and published at The New Tech on July 8th, 2012. It has been posted here for safe keeping.
So maybe you want to be an archivist but don’t know where to start. I don’t claim to be an expert, but I have learned a few things going down this path that I can share. Let’s break this up into two main sections: digital media and physical media. No matter what you are archiving, you should first pick out what you’re going to save. You don’t have to put too much consideration into this step. You can be passionate about a subject you wish to preserve, be helping a group of people, or doing it for the hell of it. Archiving is at it’s base both a way to ensure survival and a way to fill a hole left forgotten.
In the digital world, you’re going to be focused on downloading, storing, and uploading. Let’s start with downloading as you’re going to want to get your hands on some media. I started on a Windows computer, downloading directly with a browser. You can get your hands on some stuff simply with a DownThemAll plugin and a lot of free time. For streaming content I turn to the video downloader plugin for Firefox or Replay Media Catcher. This lets you pull videos from all your sites like Vimeo or Youtube. Sometimes things aren’t as easy as clicking a file to download it. You may have to use download sites, ftp servers, Bittorrent, or something less standard. You never know. Get to know how to use jdownloader, an ftp client like Filezilla, and a Bittorrent client like uTorrent. You might find a whole slew of content on some obscure site and you need to have the tools to dig it out. Moving on to more Unix-like systems, learn all you can about wget, curl, grep, and bash scripting. I’m not going to cover how to use all of these tools, but with a little practice there are few things you can’t get when you use them in tandem. You would be surprised how simple it is to whip up automated processes that do everything you want with just a few sophisticated commands. Also, be on the lookout for more specified tools. For example, I found a fantastic tool for downloading Youtube videos called youtubedl. If there is something out there to be downloaded, there is usually a tool for the job.
When you get the data, you’re going to need to hold it. I was originally downloading everything locally, and still like to keep local copies of data I retrieve. Always keep your data on at least two drives, and preferably buy your drives in pairs so you can easily stick with this rule. You can never have too much storage. I currently have 15TB locally just for storing archived media. When it comes to other storage mediums, I’m not easily swayed. Data tape is expensive to adopt, and cloud storage lacks stability. In earlier days, I had only an 80GB drive, so I would back up to DVD+Rs. A lot of people will tell you that your burned media will go bad after about 6-10 years, but I have yet to have a disc become unreadable. I will say that I’ve had a lot of luck with Verbatim discs. I would coaster many discs by other brands when burning, but have only ever had one Verbatim fail on me. Stick with what your budget is, the price of hard drives are only going down, but if you’re a kid on a budget a spindle of DVDs can help in a pinch. Also, keep an eye on solid state drives. While I have yet to adopt them, they are the new thing in mass storage, though the price is still a bit steep.
So now you want to share your data. You have many options to consider. I’ve been using The Internet Archive most recently to place files which should be saved. Depending on your content, this may not be the best option for you. A few of the techniques I mentioned before to download from can be good options for quick data dispersal. For example, setting up a torrent for your files can be done in minutes, and fast FTP/HTTP servers can be rented for however much money you want to spend. The main points here though are longevity and redundancy. You want your files up for a long time, and you want them to stay online somewhere if one server takes a tumble. While torrents alone are terrible for longevity, they are great at getting data out fast. Combine this with a server, or a data hosting/streaming service and you have some type of redundancy. Always make sure your data is accessible.
So now you might want also want to save physical media. This can be as easy or as difficult as you want it to be. Saving your physical media is best done by making it digital. The shelf life of digital data is only going up these days while physical media like paper or tape only degrades. While I’m a fan of my physical media, transferring it to a digital format is the best way to share it and keep it alive.
When dealing with publications or photographs, you can generally get good results digitizing them with a decent scanner. I happen to be a fan of Epsons, but anything around $100 these days should be able to give you decent quality scans. As always, read the reviews just to make sure. If you’re feeling a bit more crafty, you can try your hand at creating a book scanning set-up. This can easily copy all of your publications quickly. After you make your scans, you can perform any type of compression you wish, and even take the resulting image files to assemble a PDF.
Video can also be challenging to save. Whether it be VHS, Laserdisc, or even DVDs, things can get messy. When dealing with your older analog media, you can find devices to digitize them. For example, you can get a capture card like a Canopis ADVC, or any number of DVD recorders on the market. There are also a slew of other little gadgets to clean up the video along the digitizing process. After you get the video converted, you can compress the raw capture down with a codec such as h264 to make the file more manageable.
Audio can be viewed in a similar fashion to video. You can easily pipe a tape or record player into a receiver and feed the output to a nicer sound card. Here, audio can be captured using a program like Audacity and saved as a lossless file or compressed with something like Vorbis or MP3.
With something to drive you, a little know-how, and a lot of time, you can easily start archiving the media in your world. Though it may be daunting at first, you can easily build as you go. Start small, and end up saving big.
Friday, July 6th, 2012
This article was originally written for and published at The New Tech on June 8th, 2012. It was a collaboration between Moonlit and myself. Enjoy
– Famicoman –
I think I’ve always been an archivist. A vital ally in the digital world. I’m the guy that saves a file from six years ago and pulls it up when people wonder whatever happened to it. I’m the guy who is going to make sure you can still find The New Tech episodes in 20 years, whether anyone would want to or not.
Some might call me a hoarder. Technically, by definition, they are correct. But just like how the word “hacker” has been usurped and manipulated by mass media, so has this term. The word conjures up television-tinted images of people living in trash and debris. It isn’t always like that. Things I save are organized, studied, and shared with the world, not rotting away in some closed off building. Not sealed from the world. If anything, I save because these items may be important to someone else. I’m not always part of the equation.
One could argue that you’re born with an archivist instinct. My philosophy has always been that to be able to look forward, we must look back. Besides digital data, I collect physical artifacts of our technological past. You can learn a lot about Blu-ray by looking at Betamax. This resonates in all archiving. There will always be someone wanting to know how we got to where we are, and hopefully he isn’t left with puzzled faces.
My digital archiving habits started with the world of internet video. In the beginning, I was maxing out my DSL connection and throwing videos up on to Google Video. That later evolved to the IPTV Archive and ultimately my current efforts with archiving Revision3 and a wider range of digital content.
Archiving isn’t an easy task. It isn’t just plucking files off of a download page. It’s mastering wget. It’s manipulating URLs. It’s fighting tooth and nail with a server for weeks, months. It’s talking to people, some of whom don’t want to be talked to. It really stops becoming a hobby and starts being a mindset. You begin to look at things differently, communicate differently, prioritize differently.
When I started out with the IPTV Archive, things were simpler. I could just go download episodes from show sites and be on my way. Now, I get to sites that don’t want to be downloaded in their entirety, and are definitely not set up to be. For example, last year I worked on backing up portions of good.net. After a while, they’d lock me out of their servers and the only way to keep downloading was to get a new ip address or wait the block out. This year with Revision3, their CDN throttles me, which ultimately just means I’m going to be waiting longer for their files. For whatever reason, corporations are not fans of someone downloading their entire library of material. Some entities are set up with commercial content, meaning eyeballs are numbers. If you mirror their content, they don’t get as many viewers and less viewers mean less money. In this light, I’m an enemy. I’m a thief. More importantly, I’m a necessity. Without me and those like me, entire cultures could be snuffed out like a flame. Many already have. It’s a strange feeling when you’re contacted by a show creator asking if he could download his episodes from you.
Archiving someone’s digital work is a weird concept to get your head around. Think if you were approached and someone wanted a copy of your entire website. Every little detail becomes theirs to thumb through, spread to others, and replicate for years after you’ve brought the original down. It’s weird, but it’s necessary. When someone years down the road says, “Man, I wish I could watch some old Revision3,” I’ll be there to say, “Here is a copy of all their content. Ever. Enjoy.”
It would be wonderful if it was all as easy as hitting a button and someone’s site downloads for you, but it’s never that simple. Most websites are not designed to be cloned so readily. They lack internal organization. When you peel back the layers, you’d be surprised to see how clumsily some large sites are maintained and held together with rubber bands and paper clips. Out of convenience, we can pull up the Revision3 example again. So many episodes are mislabeled, so many links are dead, the formats for each episode can vary at will, and there are so many episodes and full shows that are just outright gone to the point that if you had no prior knowledge, you wouldn’t know them to have ever existed. It feels like someone ripping pages out of a book and passing it off as if nothing happened.
You have to be one part resources, one part nice guy, one part detective, one part historian and one part hacker. You have to learn about the missing files, you have to track them down, you have to communicate with others who may have them, you have to have the storage and bandwidth to get them, and you have to do it all no matter what is trying to stop you. You have to do all these things, be all these people, at the same time. Sometimes, you have to do it as quickly as possible.
After you gather everything, there is always the question of how to preserve it and disperse it. You have to keep the files up, and make sure they’ll stay up. More importantly, you need to make sure that people can get to them without jumping through hoops. I’ve tried everything on this front. Torrent sites, ftp drops, streaming services, etc. but have ultimately cemented my toolbox with archive.org. For the uninitiated, the Internet Archive is a non-profit digital library offering permanent data storage. It’s big, and it’s growing every day. Anyone can upload content provided it’s licensed to be distributed openly. It makes things easy when I can be bringing things in through the front door, and flipping them right out the back to archive.org.
Digital archiving is a brutal but rewarding process that most people don’t see on the front lines. The next time you’re going to put something up online, take a minute to think about it. Your files are going to live much longer than you could imagine. You might as well make it easy for them to.
– Moonlit –
I’ve been a wannabe archivist for some time, but through a mixture of altruistic and less altruistic means, which just so happen to coincide.
On one hand I can’t bear the thought that there is so much recent history that may be, or in some cases already is, needlessly lost forever. Whether it be hardware, software or media, much of what is produced today has no vision for the future, it’s created, it’s used and, ultimately, it’s destined to be lost to whatever forces may eventually whittle its existence down to extinction. Failed storage media, the thought that “if I delete it, somebody else will still have it” or even just plain old waning interest in a flash in the pan which is no longer relevant tomorrow.
On the other hand I find it somewhat distressing that the content I grew up with, much of which came from TV rather than the internet, is very difficult to find. It’s just that little bit too old to have been swept up by a thousand torrent sites or archived to the ever expanding YouTube. It appears to me to exist in a narrow void between content old and popular enough to have made its way to public release via VHS or DVD as a nostalgia trip for the previous generations and the modern piracy scene, who will capture and upload almost anything as pristine digital clones of the broadcast content we enjoy.
Luckily, the two often overlap, so one can be the driving inspiration to accomplish both. But as long as the end result is shared, I don’t view the selfishness of the latter to be a problem. In fact it could very well be a boon, because if everybody was selfish enough to demand copies of the content they thought they’d lost, it means that content still exists, and given that everybody likes different things, meshing all that together would create a patchwork of content from that point in time.
Now, I’ve erred somewhat on the side of piracy so far, but I don’t mean to imply that I’m only interested in commercial media, or indeed in breaking the law. Before moving on though, I’d like to say that I think it’s a collossal shame that in order to capture and preserve certain parts of their lives, we often have to resort to methods which might seem unsavoury to those who disseminate that content. I don’t think it’s unreasonable to suggest that there are indeed large archives these days maintained by large media producers and broadcasters, yet those of us the content was created to be viewed by have no access. Whether that be through music or video clip copyright and licensing issues, laziness or cost, it’s still a great loss to us, and will continue to be until such a time that the content is opened up. This history should not suffer for the sake of a few contracts and a slew of many-digit bank balances. Please, somehow, let this content see the light of day again.
Whew. Got a little bit heavy there. User-created content, there’s a good place to jump to. Podcasts and video podcasts exploded in the mid-2000s along with the proliferation of high speed broadband and cheap consumer cameras. The trouble is, many of those shows had small numbers of fans who, along with the creators themselves, have moved on and left behind their content. This is an important chunk of internet history to me, it got me involved in a large percentage of what I do and who I speak to every day. That’s why I tried my best to help Famicoman build the IPTV Archive when we originally began trying to preserve this stuff. With my pitiful upload speeds and meagre hard drive space, which was frustrating enough, I helped transcode and re-host piles of videos. Those videos were then uploaded to DivX’s Stage6 video hosting site, all neatly encoded in DivX format, with their own special DivX player plugin. Then they took the service down. After countless weeks of pulling down videos, transcoding where necessary, uploading back to Stage6, straining my resources as I went, it was all for naught. Once bitten, twice shy, as they say, and since then I’ve been very wary of trying to do it again, but I’m slowly getting back on the horse. Lesson learned: redundancy. Redundancy and backups. Everywhere. Never rely on any single service to host this kind of stuff, it might be gone tomorrow.
Things get a bit weird somewhere in the middle of those two areas of content, though, with companies like Revision3. They began as a show, or later a couple of shows, which very much fit into the user created content model, a couple of guys with a camera drinking and talking out of their arses for 20-30 minutes. But then it changed. It became the Revision3 we have today, the corporate ad-driven sludge that could very well have been taken direct from the TV and uploaded wholesale to the internet. I’m not against making a profit on content, but stop sucking the soul out of it, it feels like it’s hurting the product. But I’m not here to rag on content creators, my point here is that no matter how poor, tasteless or boring I believe the content or its presentation to be, it still deserves to be archived. What’s crap for me might be gold to somebody else, and it’s not my job to curate history in the making. If I even began to try I would doubtlessly decide that something which later turned out to be pivotal in the future was actually the naffest thing to ever grace a visual display. I believe Jason Scott made a similar point about the preservation of GeoCities. Yes, it might be full of weakly written, poorly laid out, eye-damaging animated horribleness, but it’s historical weakly written, poorly laid out, eye-damaging animated horribleness. It’s a snapshot of what the internet was at that time, and as such it should not be forgotten. So go forth and grab it, grab it all, because as hard as it might be to believe, one day it will all be gone.
Saturday, December 10th, 2011
I’m pretty sure I hinted that I like to digitalize old formats. I’m that guy you see digging through bins of VHS tapes at yard sales, looking to find that one piece of gold that I haven’t seen and probably won’t find any other way.
You might not know just how into this stuff I am. I didn’t really know I was, but after years of accumulating relevant equipment, it starts to add up. I used to have everything sitting around in various piles. I’d keep some stuff up by the television in my bedroom to do simple transfers, and some of the bigger stuff downstairs where I couldn’t trip over it.
I recently decided to consolidate more, and bring most things to a dedicated area where I could do transfers.
I present the wall.
To briefly go over what we see here: The top row has some video enhancers, an audio enhancer, power station, and a DVD recorder. Next row down has some more professional video enhancers, a detailer, some boxes for stabilization and a full frame time base corrector / freeze. The next row has two Commodore 1702 monitors. 4th row has an SVHS deck and a laserdisc player on the left, and two editing VCRs with a Betamax deck on the right. The next row has 5 laserdisc players and a VHS duplicator. The last row has all my CEDs.
This doesn’t represent all of my gear (I got much more of this stuff on the other side of the room), but there is a pretty nice portion here for both storage and transferring. I can quickly wire up any format and start converting in a matter of minutes.
It has become something as a monolith to antiquated technology. I’m quite happy with it.