PIDapalooza 2018 has ended

Welcome to PIDapalooza 2018...where anything goes...as long as it goes on forever.   

Wednesday, January 24 • 3:30pm - 4:00pm
Unsolved problems with PIDs and PID systems

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The PID community has consolidated around a few key concepts: that PIDs are short, URI-compatible strings that are explicitly registered using a PID system, which maintains a database of such registrations; that PIDs carry some type and amount of associated citation metadata; and that PIDs resolve to URLs. So, are PIDs a solved problem? No! Here are four significant problems that remain unsolved. Solving them may require that the PID community collaborate to achieve newfound interoperability across PID types and systems, and to provide additional services beyond simple registration and resolution.

1. Are PIDs being used? How can we tell? That PIDs are being registered and assigned to resources is clear enough, but if a resource is cited or accessed via a local URL or other non-persistent identifier, then the existence of any PID it might have is pointless. This problem is exacerbated by the fact that the HTTP redirection mechanism exposes impermanent URLs to browsers and users, thus making it inevitable that such URLs will be bookmarked and subsequently used. Solving this problem may require work outside of PID systems proper, including web crawling and repository log analysis to detect resource references that could have been done through PIDs but weren't. Within PID systems themselves it may require providing additional services, such as facilitating reverse lookups (given a resource or URL, what PID(s) have ever been assigned to it?) and maintaining history of URL assignments (what has this PID ever identified?).

2. Who owns an identifier? Who may modify it? In the use case of a repository system that registers a PID for a resource it manages, the repository system is likely the "owner" in the sense that, should the repository move, it is the repository's responsibility to update the PID. But is it the sole owner? What if the resource author decides to move or copy the resource to a different repository? Who has rights to the identifier then, the old repository, the new, or the creator? Other use cases that involve institutions, libraries, departments, journals, and journal editors only add complexity. Some internet services, notably Wikipedia, have eschewed formal ownership models of resources, instead emphasizing the maintenance of history and the ability to undo change. Should PID systems adopt the same?

3. How can identifier aliasing, if not avoided altogether, at least be better handled? It is de facto common practice for repository systems to assign new PIDs to newly ingested resources regardless of the existence of any previously-assigned PIDs. This problem is particularly rife in the life sciences, where resources are often co-registered in multiple databases, receiving an identifier from each. The problem then, of course, is that having multiple, equivalent PIDs for a single resource sows confusion and dilutes citation metrics. At best, current PID systems record additional identifiers as "alternative identifiers," but this is far from sufficient and far from a universal practice. Solving this problem may require that PID systems maintain better and more comprehensive records of identifier aliasing (including aliasing that occurs across PID types and PID systems), and to support operations across whole "equivalence sets" of identifiers.

4. When a PID system itself moves or fails, what needs to persist? The awkward, but ultimately successful handover of the PURL system from OCLC to the Internet Archive should serve as a wakeup call to the PID community. While PID systems provide well-defined means of accommodating the movement of individual PID-identified resources over time, the movement of entire PID systems remains a very much ad hoc process. Must all the services and concepts of the old PID system be preserved? If not, which can safely be discarded? Note that the PURL system provided some unique characteristics in terms of user roles and resolution options. Solving this problem may require that standard models of PIDs and new forms of interoperability be adopted.

avatar for Greg Janée

Greg Janée

University of California at Santa Barbara

Wednesday January 24, 2018 3:30pm - 4:00pm CET
Stage 2