Thursday, December 9, 2010

Unit 14: Perpetual Access

Perpetual access, not to be confused with digital preservation (Stemper and Barribeau) is a right to access the content of a licensed resource or subscription resource. The reading for this week addresses the problems and concerns relating to perpetual access, as well as digital preservation practices. The issues stem from the question of once a library cancels online access to a journal, what happens to the content for which they already paid? Perpetual access is a key component to the preservation of online/electronic resources. This issue directly ties into the access versus ownership debate. As more institutions are moving towards a pay per view model, fewer institutions are concerned with perpetual access.

There are several reasons for this trend, including patron pressure to have the article now, the availability of content where accessing an online licensed issue of an article may be available at an earlier date the owning the print version, and of course, financial considerations. Many libraries, as we all are aware, are canceling print versions of journals that are available online due to cost, which is not just the price of the journals themselves, but the cost of maintaining adequate shelf space and staff. The latter concern is more related to the license negotiations, which take more time and effort than many ER librarians have. Furthermore, what happens when the licensed content is purchased by a new publisher or vendor? Frequently, the terms of agreement may change, possibly rendering any perpetual access clause useless. Plus, few aggregators offer perpetual access, which is too bad as they are usually more affordable.

Some of the key issues, as outlined by Watson and others, included changing formats, data migration, and a few solutions. Outdated software, hardware, as well as pricing are big concerns. While it seems easy enough to say, hey, let's just preserve that content digitally, in actuality, this is a big undertaking, and not just in the technical arena. There is the initial cost, including staff training, followed by a need to transfer the content to new media every ten years or so in order to avoid digital decay. And that is just for the content that is still accessible and not on outdated equipment, media, or software.

What about emails, blogs, and other website type of information? While there is the Wayback Machine, wikis and blogs, just to name a couple, are not archived well as they are frequently changing. Surprisingly, online reference resources are also in this “frequently changing and therefore, difficult to archive” category. Not everything can be preserved, so tools, like The Digital Preservation Coalition have been developed in order to help librarians decide which types of electronic resources they should preserve.

Certain third party organizations are key in helping to preserve electronic content. These include JSTOR, LOCKSS, and Portico. JSTOR is a subscription service that provides access to back issues of journals. Subscriptions may be purchased on a collection basis. Each journal has a rolling embargo on the content, which varies based on the publisher. JSTOR is also considered to be very trustworthy and reliable among librarians and patrons alike. Since they only provide access to back issues, it is like a digital archive of journals, but not the most recent years. LOCKSS (Lots Of Copies Keep Stuff Safe) provides an archiving service where participating libraries maintain a LOCKSS server that preserves copies and free and subscribed materials. Each server also serves to back up the others in the event of catastrophe. There is a concern about the data format becoming obsolete, but so far, LOCKSS has proven itself to be a reliable service.

Portico, also funded through the Mellon Foundation, archives content, but not at the library itself, assuming that libraries do not wish to house or maintain the appropriate infrastructure. The archive is “dark”, meaning that participating libraries do not have access to the content unless the original source (publisher) stops providing its content. Furthermore, the data is normalized in order to help with migration to new, and possibly more stable, formats in the event that file formats change. Normalization is not to change the content, but to keep is accessible. According Fenton, Portico is more concerned with preserving the intellectual content, not the original webpage or digital source as no one (apparently) is interested in preservation for preservation’s sake. (Did Fenton really need to roll her eyes when she mentioned librarians and preservation while she discussed the pricing model?) PubMed and Google Book Search (formerly Google Print) are also providing digitization services, although some may disagree as to the success of Google Books in particular.

Libraries are also banding together in order to help preserve and archive content. For example, the consortia, like the CIC, may go in on content, like all of the Springer Verlag and Wiley publications and house them in an off-site, environmentally controlled storage facility for preservation's sake. Institional repositories are also a viable, yet underused, source for archiving content. Steering committees have started in order to determine some possible solutions to the preservation issue of electronic content and the print versions, with national libraries taking the lead role in providing those services. In addition, due to pressure from libraries, publishers are taking a more active role in digital preservation by signing on with the three main third party systems or by depositing in to a national archive or repository.

Seadle looks to address the issue of “what is a trusted repository in a digital environment?” While the same preservation vocabulary for digital and hard copies of materials are used, the process and needs are quite different. The initial and primary focus is usually on technology, especially in the case of electronic journals. But according to Seadle, this may be to the detriment of social organization, the idea behind keeping rare books in vaults and locked cases. Nothing can damage a physical object like an unrestrained reader. Therefore, we need trust for successful archiving, whether it be physical or digital. While it may be easy to trust a museum to keep a rare manuscript safe, what about digital content? Perhaps we need to have a level of distrust in order to ensure a safe copy of the content. If we distrust the media for example, we may create multiple copies in multiple formats housed different locations, including off-site servers. Not bad for an untrustworthy format.

We should also distrust proprietary software and use more open source software and systems, as they tend to not be in it for the money. Furthermore, Seadle addressing integrity and authenticity in relation to digital objects. If a digital object is marked up or the format changes, it may loose integrity, which somehow devalues the item. Thus, preserving digital content as a bitstream is a more stable, at least in terms of format, method of maintaining an object's integrity. In terms of authenticity, this is tricky. There usually is no one genuine copy of a digital work. However, using bistreams with timestamps and checksums may help determine the authenticity of a digital object.

So, should we trust LOCKSS? It is a community supported project, as opposed to a corporate entity. It is a network of collaborators using open source software and Seadle compares it to the apparent trustworthy appearance of Linux, on which the majority of library servers are based. LOCKSS archives content through a bitstream, without normalizing its content. Even though normalization may help with possible future formatting issues, that process does inherently loose some content through compression. Each bitstream has a timestamp that aids in determining the first, and therefore, authentic version of the content. LOCKSS also get permissions from publishers and content provider prior to crawling their websites for content. Basically, according to Seadle, we really should trust LOCKSS, as it is the best system around.

No comments:

Post a Comment