Tales from the Open Content Alliance

San Francisco is a truly miserable town to try to recuperate from a sore throat. Not that anyone would consider a place with the nickname ‘Fog City’ to be a good place to convalesce, especially since walking around in the wet and cold is so desirable given the charms of the city. And so I found myself last week (and my first week at Talis), slogging through precipitation to the Open Content Alliance’s Annual Meeting at the Officer’s Club at the Presidio.

Festivities got a late start and began with everyone in room (all hundred something of us) standing up and introducing ourselves to the room. While this helped put names to faces (otherwise I never would have met John Mignault) and identify some of the OCA’s initiatives, it had the added effect of pushing us completely off schedule.

Brewster Kahle talked for a while about the need to focus on texts between 1923 and 1963 (or whatever); a vast majority of these works would likely be out of copyright, it just takes some research or contacting the rights holder. We cannot just digitize works pre-1923; this has diminishing returns of value. Post-1964 materials will likely never be available, so this middle group needs to be exploited.

Next, we got a report out on some of the activities the OCA has been undertaking for the last year: microfilm digitization from UIUC; the Biodiversity Heritage Library; printing on demand; and briefly on the Open Library. The microfilm digitization project is interesting. They can scan something like a roll every hour. I had a similar idea when I was at Tech, although my plan had been to use the existing, public microform scanners. Obviously that would have been a lot slower and with possibly mixed results, but a lot cheaper.

The Biodiversity Heritage Library is a consortium of 10 natural history museums and botanical gardens (including the Smithsonian, New York Botanical Garden, Kew Gardens) working to create a subject specific portal to their collections. If there were a lot of details said about this I must have tuned them out. Same goes for the Open Library project. I’m pretty sure both of these updates were rather light on specifics. Printing on demand was from a vendor: the simple message was that this is becoming affordable — the cool part being that the economics are exactly the same for printing one book as it is for creating 1000 copies of the book and that is around $0.01 per page.

After a break (where I talked to John Mignault the whole time), we had breakout sessions. I attended “Sharing and Integration of Bibliographic Records” and intended to sit and listen. Instead, I wound up talking. A lot. One of the main issues of conversation surrounded a proposal by Bowker (the ISBN issuing agency for North America) to supply ISBNs to digitized copies of works that would not originally have had an ISBN (which, in the context of the Open Content Alliance, would be nearly all of them). Bowker has made a deal that they will offer 3 million ISBNs (250,000 per library) as a gift to the non-profit sources in the OCA. After a library burns through their 250k, the ISBNs are $0.10 each.

Superficially, you could either love or hate this proposal, but as debate wore on, it became easy to both love and hate this proposal. I think I argued for both sides during the course of the session. While the payoffs seem logical (it would be nice to discover these materials by related ISBN, certainly), there are also some pitfalls, as well. For instance, since the OCA isn’t terribly effective at discouraging multiple institutions digitizing the same edition/run of a particular book. This means that two different scans of essentially the same book could potentially have two different ISBNs. This actually complies with the ISBN specification, but it certainly wouldn’t comply with people’s expectations.

Also, it is unclear how these records should be treated in services like OCLC. If they get an ISBN, does that mean a new record should be added? If multiple scans are created, what is the appropriate 856 to add to an existing record? What is the ‘authoritative’ URL?

We also talked a bit the static nature of the Internet Archive’s metadata. It is assumed that the metadata will not change after a scan is loaded into the archive, but this is hardly the reality. How then does the IA know of the updates made at the owning institution? This seems like the perfect application of RDF; the IA would just point at the owning institution’s record, but that would obviously require some infrastructure. The notion of a pingback, like weblogs, was raised.

After lunch, we got reports from the breakouts. The ILL/Scan-on-demand group came up with a process to share still in copyright items. Also they made the recommendation that no ILL charges be made on these requests. This is really quite an interesting development and I’m awfully impressed they made the progress they did. It’s obviously because I wasn’t there to argue about every little point.

Brewster Kahle then had those interested in the ISBN deal go back into a room and work things out. The same issues came up, but this time I feel they were largely ignored. Kahle wants to implement ISBNs and really wouldn’t take any other answer, so the plan is to figure out how to make this work. Since he’s willing to pony up a lot of the cash to make it happen, it’s certainly his call. His argument was, basically, “it will work or it won’t”. Sure, but how will it effect the use of ISBNs in the meantime?

We broke up to listen to Carl Lagoze speak about life after we’ve digitized everything. His thesis was, in a nutshell, what are we going to do after we’ve aggregated everything into repositories? It’s imperative that we come up with value on top this data we’re accumulating, it’s not enough just to collect it. We need to make associations, content and means to allow our researchers to leverage the objects we have. He opened the floor to discussion on this topic and comments varied. I think I was still hung up on the ISBN thing and wasn’t in the mood to argue anymore.

I skipped the reception to meet Ian and Paul and get my Macbook Pro. Unfortunately, I had to fly out the next morning, but all in all it was a good trip!

Posted

October 22, 2007

libraries

Ross

Tags:

Comments

2 responses to “Tales from the Open Content Alliance”

Stuart Weibel

October 25, 2007

“This means that two different scans of essentially the same book could potentially have two different ISBNs. This actually complies with the ISBN specification, but it certainly wouldnâ€™t comply with peopleâ€™s expectations.”

sort of reminds me of the sentiment… “been in debt all my life, so deficit spending is fine”

Equivalence judgments, even as the (indebted) poor will always be with us, but encouraging this through uncoordinated assignment of identifiers is a waste of users’ time and everyone’s efforts. Not to say, simply unnecessary (nice business opportunity for Bowker, though).

Canonical identification of resources is increasingly important if we are to see the Web presence of library assets increase.
Ross

October 26, 2007

Peter, OCLC providing an xIdentifier service does little to alleviate my concerns, unless they make a dramatic shift in business direction and make it totally open to everyone for free.

I agree that books have multiple ISBNs now, but I think, historically, ISBN has signified a particular edition of a book. Under this plan, two different ISBNs could be assigned to the same edition/same run. The only differences would be the quality of the scan, the digitization metadata and the marginalia. So, now, instead of ISBN:XXXXXXXXXXXXX meaning “1979 Penguin Books edition”, it could functionally identical to ISBN:YYYYYYYYYYYYY except the physical location it is currently occupying.

I’m not arguing against these items having a standard identifier, far from it! I want them to have something, and there’s a case to be made for piggybacking on an infrastructure that is fairly ubiquitous. I also think there may be ramifications that we aren’t considering.

…

“Canonical identification of resources is increasingly important if we are to see the Web presence of library assets increase.”

Stuart, certainly no argument here… but why not DOIs? Why not handles? For the ISBN to be ‘actionable’, Bowker has to enable a DOI (which is not part of the ISBN ‘gift’, I might add) anyway, so why mess with the ISBN at all?

I don’t know, as I mentioned in the posting, I think I argued on both sides of this debate. I know I want an identifier, I’m just not sure I want that identifier.

Tales from the Open Content Alliance

Comments

2 responses to “Tales from the Open Content Alliance”

Leave a Reply