Objectifying OpenURL

Sometime in November, I came to the realization that I had horribly misinterpreted the NISO Z39.88/OpenURL 1.0 spec. I’m on the NISO Advisory Committee for OpenURL (which makes this even more embarrassing) and was reviewing the proposal for the Request Transfer Message Community Profile and its associated metadata formats when it dawned on me that my mental model was completely wrong. For those of you that have primarily dealt with KEV based OpenURLs (which is 99% of all the OpenURLs in the wild), I would wager that your mental model is probably wrong, too.

A quick primer on OpenURL:

OpenURL is a standard for transporting ContextObjects (basically a reference to something, in practice, mostly bibliographic citations)
A ContextObject (CTX, for short from now on) is comprised of Entities that help define what it is. Entities can be one of six kinds:

Referent – this is the meat of the CTX, what it’s about, what you’re trying to get context about. A CTX must have one referent and only one.
ReferringEntity – defines the resource that cited the referent. This is optional and can only appear once.
Referrer – the source of where the CTX came from (i.e. the A&I database). This is optional and can only appear once.
Requester – this is information about who is making the request (i.e. the user’s IP address). This is optional and can only appear once.
ServiceType – this defines what sorts of services are being requested about the referent (i.e. getFullText, document delivery services, etc.). There can be zero or many ServiceType entities defined in the CTX.
Resolver – these are messages specifically to the resolver about the request. There can be zero or more Resolver entities defined in the CTX.

All entities are basically the same in what they can hold:

Identifiers (such as DOI or IP Address)
By-Value Metadata (the metadata is included in the Entity)
By-Reference Metadata (the Entity has a pointer to a URL where you can retrieve the metadata, rather than including it in the CTX itself)
Private Data (presumably data, possibly confidential, between the entity and the resolver)

A CTX can also contain administrative data, which defines the version of the ContextObject, a timestamp and an identifier for the CTX (all optional)
Community Profiles define valid configurations and constraints for a given use case (for instance, scholarly search services are defined differently than document delivery). Context objects don’t actually specify any community profile they conform to. This is a rather loose agreement between the resolver and the context object source: if you provide me with a SAP1, SAP2 or Dublin Core compliant OpenURL, I can return something sensible.
There are currently two registered serializations for OpenURL: Key/Encoded Values where all of the values are output on a single string, formatted as key=value and delimited by ampersands (this is what majority of all OpenURLs that currently exist look like) and XML (which is much rarer, but also much more powerful)
There is no standard OpenURL ‘response’ format. Given the nature of OpenURL, it’s highly unlikely that one could be created that would meet all expected needs. A better alternative would be for a particular community profile to define a response format since the scope would be more realistic and focused.

Looking back on this, I’m not sure how “quick” this is, but hopefully it can bootstrap those of you that have only cursory knowledge of OpenURL (or less). Another interesting way to look at OpenURL is Jeff Young’s 6 questions approach, which breaks OpenURL down to “who”, “what”, “where”, “when”, “why” and “how”.

One of the great failings of OpenURL (in my mind, at least) is the complete and utter lack of documentation, examples, dialog or tutorials about its use or potential. In fact, outside of COinS, maybe, there is no notion of “community” to help promote OpenURL or cultivate awareness or adoption. To be fair, I am as guilty as anybody for this failure, since I had proposed making a community site for OpenURL, but due to a shift in job responsibilities and then the wholesale change in employers, coupled with the hacking of the server it was to live on, left this by the wayside. I’m putting this back on my to do list.

What this lack of direction leads to is that would-be implementors wind up making a lot of assumptions about OpenURL. The official spec published at NISO is a tough read and is generally discouraged by the “inner core” of the OpenURL universe (the Herbert van de Sompels, the Eric Hellmans, the Karen Coyles, etc.) in favor of the “Implementation Guidelines” documents. However, only the KEV Guidelines are actually posted there. The only other real avenue for trying to come to grips with OpenURL is to dissect the behavior of link resolvers. Again, in almost every instance this means you’re working with KEVs and the downside of KEVs is that they give you a very naive view of OpenURL.

KEVs, by their very nature, are flat and expose next to nothing about the structure of the model of the context object they represent. Take the following, for example:

url_ver=Z39.88-2004&url_tim=2003-04-11T10%3A09%3A15TZD
&url_ctx_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Actx&ctx_ver=Z39.88-2004
&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_id=10_8&ctx_tim=2003-04-11T10%3A08%3A30TZD
&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.aulast=Vergnaud
&rft.auinit=J.-R&rft.btitle=D%C3%A9pendances+et+niveaux+de+repr%C3%A9sentation+en+syntaxe
&rft.date=1985&rft.pub=Benjamins&rft.place=Amsterdam%2C+Philadelphia
&rfe_id=urn%3Aisbn%3A0262531283&rfe_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook
&rfe.genre=book&rfe.aulast=Chomsky&rfe.auinit=N&rfe.btitle=Minimalist+Program
&rfe.isbn=0262531283&rfe.date=1995&rfe.pub=The+MIT+Press&rfe.place=Cambridge%2C+Mass
&svc_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Asch_svc&svc.abstract=yes
&rfr_id=info%3Asid%2Febookco.com%3Abookreader

Ugly, I know, but bear with me for a moment. From this example, let’s focus on the Referent:

rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.aulast=Vergnaud
&rft.auinit=J.-R&rft.btitle=D%C3%A9pendances+et+niveaux+de+repr%C3%A9sentation+en+syntaxe
&rft.date=1985&rft.pub=Benjamins&rft.place=Amsterdam%2C+Philadelphia

and then let’s make this a little more human readable:

rft_val_fmt: info:ofi/fmt:kev:mtx:book
rft.genre: book
rft.aulast: Vergnaud
rft.auinit: J.-R
rft.btitle: DÃ©pendances et niveaux de reprÃ©sentation en syntaxe
rft.date: 1985
rft.pub: Benjamins
rft.place: Amsterdam, Philadelphia

Looking at this example, it’s certainly easy to draw some conclusions about the referent, the most obvious being that it’s a book.

Actually (and this is where it gets complicated and I begin to look pedantic) it’s really only telling you, I am sending some by value metadata in the info:ofi/fmt:kev:mtx:book format, not that the thing is actually a book (although the info:ofi/fmt:kev:mtx:book metadata values do state that, but, ignore that for a minute since genre is optional).

The way this actually should be thought of:

ContextObject:
    Referent:
       Metadata by Value:
          Format: info:ofi/fmt:kev:mtx:book
          Metadata:
             Genre: book
             Btitle: DÃ©pendances et niveaux de reprÃ©sentation en syntaxe
             …
    ReferringEntity:
       Identifier: urn:isbn:0262531283
       Metadata by Value:
           Format: info:ofi/fmt:kev:mtx:book
           Metadata:
               Genre: book
               Isbn: 0262531283
               Btitle: Minimalist Progam
               …
    Referrer:
       Identifier: info:sid/ebookco.com:bookreader
    ServiceType:
       Metadata By Value:
           Format: info:ofi/fmt:kev:mtx:sch_svc
           Metadata:
               Abstract: yes

So, this should still seem fairly straightforward, but the hierarchy certainly isn’t evident in the KEV. It’s a good starting point to begin talking about the complexity of working with OpenURL, though, especially if you’re trying to create a service that consumes OpenURL context objects.

Back to the referent metadata. The context object didn’t have to send the data in the “metadata by value” stanza. It could have just sent the identifier “urn:isbn:9027231141” (and note in the above example, it didn’t have an identifier at all). It could also have sent metadata in the Dublin Core format, MARC21, MODS, ONIX or all of the above (the Metadata By Value element is repeatable) if you wanted to make sure your referent could be parsed by the widest range of resolvers. While all of these are bibliographic formats, in Request Transfer Message context objects (which would be used for document delivery, which got me started down this whole path), you would conceivably have one or more of the aforementioned metadata types plus a Request Transfer Profile Referent type that describes the sorts of interlibrary loan-ish types of data that accompany the referent as well as an ISO Holdings Schema metadata element carrying the actual items a library has, their locations and status.

If you only have run across KEVs describing journal articles or books, this may come as a bit of a surprise. Instead of saying the above referent is a book, it becomes important to say that the referent contains a metadata package (as Jonathan Rochkind calls it) that is in this (OpenURL specific) book format. In this regard, OpenURL is similar to METS. It wraps other metadata documents and defines the relationships between them. It is completely ambivalent about the data it is transporting and makes no attempt to define it or format it in any way. The Journal, Book, Patent and Dissertation formats were basically contrived to make compatibility with OpenURL 0.1 easier, but they are not directly associated with OpenURL and could have just as easily been replaced with, say, BibTex or RIS (although the fact that they were created alongside Z39.88 and are maintained by the same community makes the distinction difficult to see).

What this means, then, is that in order to know anything about a given entity, you also need to know about the metadata format that is being sent about it. And since that metadata could literally be in any format, it means there are lot of variables that need to be addressed just to know what a thing is.

For the Umlaut, I wrote an OpenURL library for Ruby as a means to parse and create OpenURLs. Needless to say, it was originally written with that naive, KEV-based, mental model (plus some other just completely errant assumptions about how context objects worked) and, because of this, I decided to completely rewrite it. I am still in the process of this, but am struggling with some core architectural concepts and am throwing this out to the larger world as an appeal for ideas or advice.

Overall the design is pretty simple: there is a ContextObject object that contains a hash of the administrative metadata and then attributes (referent, referrer, requester, etc.) that contain Entity objects.

The Entity object has arrays of identifiers, private data and metadata.

And then this is where I start to run aground.

The original (and current) plan was to populate the metadata array with native metadata objects that are generated by registering metadata classes in a MetadataFactory class. The problem, you see, is that I don’t want to get into the business of having to create classes to parse and access every kind of metadata format that gets approved for Z39.88. For example, Ed Summers’ ruby-marc has already solved the problem of effectively working with MARC in Ruby, so why do I want to reinvent that wheel? The counter argument is, by delegating these responsibilities to third party libraries, there is no consistency of APIs between “metadata packages”. A method used in format A may very well raise an exception (or, worse, overwrite data) in format B.

There is a secondary problem that third party libraries aren’t going have any idea that they’re in an OpenURL context object or even know what that is. This means there would have to be some class that handles functionality like xml serialization (since ruby-marc doesn’t know that Z39.88 refers to it as info:ofi/fmt:xml:xsd:MARC21), although this can be handled by the specific metadata factory class. This would also be necessary when parsing an incoming OpenURL since, theoretically, every library could have a different syntax for importing XML, KEVs or whatever other serialization is devised in the future.

So I’m looking for advice on how to proceed. All ideas welcome.

Posted

January 9, 2008

coding, OpenURL, ruby

Ross

Tags:

Comments

2 responses to “Objectifying OpenURL”

Jonathan Rochkind

January 9, 2008

“Context objects donâ€™t actually specify any community profile they conform to.”

Is it just me, or is that a mistake in the spec? Oughtn’t they to? Doesn’t it makes things that much more confusing that they don’t?

And I didn’t realize that the serializations were openurl-wide. I thought each given format had to specify it’s own serialization. But kev and XML apply to all formats? Still confused about this. I thought each format specified it’s serialization(s) somehow, although exactly how I’m still confused about.

I am confused about where and how a serialization is specified, and at what level (format? Community profile? both?).

I have to admit I find this stuff incredibly confusing, despite putting quite a bit of effort toward understanding it. I don’t know if that says good things about adoptability of OpenURL. If nobody can understand it…
Karen Coyle

January 25, 2008

â€œContext objects donâ€™t actually specify any community profile they conform to.â€

I’m also confused, but I often was during the OpenURL discussions. I see no reason to have profiles if you don’t communicate them in the OpenURL. What I understood in the committee was that you define a profile to constrain the context object to a particular set of components (transports, serializations, metadata formats) that a community has agreed on. Why constrain the CO if the recipient isn’t aware of these constraints? Also, why give the community profile an OpenURL identifier if you aren’t going to ever include the identifier in an OpenURL? No, I don’t remember any particular discussion about this, so I hope someone else from the committee chimes in. I doubt if this was an oversight, so someone must have thought it makes sense.

Jonathan, serialization is specified in the registry as a “core” component of the OpenURL “framework.” The only way I make sense of it is to look at the front page of the OpenURL registry and see the list there:

Namespaces
Character Encodings
Serializations
Constraint Languages
ContextObject Formats
Metadata Formats
Transports
Community Profiles

Essentially you need one of each to make up an OpenURL. (You can have multiples of metadata formats.) Think of it as building blocks. Serialization is just one of the building blocks, and if you come up with a new serialization you could potentially combine it with a variety of different components.

What confuses me is the serialization/constraint language combo, since for the two that we have (KEV and XML) those two elements do not seem separable. I would need to see examples of having a serialization that works with more than one constraint language. I suspect that the concept is simple, it’s just the expression of it that is confusing me.

Objectifying OpenURL

Comments

2 responses to “Objectifying OpenURL”

Leave a Reply