Linked data in libraries: global standards and local practices – interview with Esther Guggenheim and Ido Ivri

Ido Ivri and Esther Guggenheim

Esther Guggenheim and Ido Ivri work for the National Library of Israel (NLI). During 2013 they collaborated with LOD2 members on a joint PUBLINK project that focused on converting the NLI’s extension of
Library of Congress Subject Headings to RDF. At the end of the year, when this effort is coming to a close, the LOD2’s project Jindřich Mynarz had a chance to ask Esther and Ido about their experiences with the project and discuss library linked open data in general.

Jindřich Mynarz: There is a widely established practice of sharing catalog records among libraries and especially in library networks. Collaborative cataloging enables libraries to reuse and improve either bibliographic or authority data provided to them by partner institutions. Given that this practice is already widely spread, do you think that it lowers the barriers in transition to publishing library data as open data on the Web?

Esther Guggenheim: Library networks have been around for a long time. They were established to allow users to check availability of books and other items in multiple libraries in one place. They also allow libraries to download records created by other institutions, but each library maintains their own database, where they have full control over their records.

On the other hand, the idea of sharing authority data, i.e. thesauri of subject headings or information on personal and institutional names, as linked data and using it to enrich local data is relatively new. We are aware of several libraries involved in publishing their own data or cooperating with various data providers to work on publishing library vocabularies as linked open data. There are also cooperative efforts such as the Virtual International Authority File (VIAF).

However, there are hardly any use cases of enrichment of local data from external sources. In addition to technical limitations, this may be due to reluctance to rely on vocabularies maintained by other institutions. Libraries are very aware of data control and even though there are international standards for cataloging rules and metadata there is a lot of localization. It is possible that use cases demonstrating successful reuse of national authority data and national bibliographies across institutions within a country could lower this barrier, especially if it could be shown that this lowers maintenance costs for participating institutions.

A major prerequisite to make this work efficiently is to improve the integration of external sources via linked data technology into library software. During the past year engaged people from various libraries worldwide have initiated cooperation on this issue with the industry. On our side, we are very pleased to participate in the LOD user group working with our software provider.

Jindřich Mynarz: Library of Congress Subjects Headings (LCSH) is a well-known instance of library dataset that is available as linked open data. It can be also considered as a prime example of pre-coordinated vocabulary, in which the possible combinations of individual concepts are pre-defined by the vocabulary authors. Semantic web, on the other hand, builds on the open world assumption that presumes there is no single agent having complete knowledge, which is necessary to exhaust the possible combinations of concepts in a pre-coordinated vocabulary. Which of these approaches do you foresee to prevail over time now that libraries are adopting much of the semantic web stack?

Esther Guggenheim: The concept of semantic web implemented as linked data would gain a lot if there were more cross references among the published vocabularies. For example, LCSH is used beyond the Anglo-Saxon domain. Adaptions and translations to various languages of parts of the scheme have been developed usually on a by need basis. Other vocabularies have been linked to LCSH.

The nature of linked data is to use the “building stones” and obviously using the elements from which subject headings are assembled rather than pre-coordinated headings would allow for more flexibility. In fact, pre-coordination seems somewhat contrary to the concepts used in the linked data world.

The National Library of Israel not only translated LCSH to use in our catalog to Hebrew. For Judaica and Israeliana, our subject headings’ systems, some headings were adopted for our needs from the English version of LCSH. This results in semantic difficulties when attempting to link back to LCSH. It would be most useful if we could link elements instead of pre-coordinated headings.

LCSH was designed for card cataloging and at the time pre-coordination was useful. With the introduction of computerized cataloging and retrieval there is no more a need for this and the disadvantages of pre-coordination have been discussed in the library community already over years.

Apparently, there was a discussion about using elements rather than full headings before the publication of the RDF version of LCSH. We would support to revisit this discussion.

Jindřich Mynarz: Using many of the tools developed in libraries, such as the afore-mentioned LCSH, can be a daunting task without previous training and expertise. These tools seem to empower power users (i.e. librarians) rather than lay users.
Is the National Library of Israel now focused more on developing advanced search tools for experts or approachable tools for laymen? Would you agree that encapsulating complex tools in easy-to-use interfaces is a responsibility of software, such as semantic web technologies?

Ido Ivri: In order for libraries, and more so national libraries, to remain a core service of an informed society, they have to be aware of the breadth of possible users, and not aim exclusively at “power users” — be it librarians or experienced researchers. A national library has to provide access to cultural heritage materials to teachers, students, parents and senior citizens. In order for that to happen, the National Library of Israel is faced with perhaps its biggest challenge — condensing all its richness of resources of many types, sizes and eras into one search box. For that to happen, we have to rely on encapsulation of our metadata. We believe that allowing intuitive interfaces (such as timelines, maps but also tag clouds and others) is crucial for empowering a multitude of users to browse, search and discover our collections, and it is therefore the responsibility of software providers, but also of the library itself, to strive to find effective ways of providing access to our collections in all their richness.

Moreover, we assume that power users would like to stay as close to the original data and data structures as possible, and there is a bit less need to mediate the display and ordering of data for them. This use case requires less effort on interfacing, but an emphasis on providing uninterrupted, reliable services that may be reused by some of the more advanced users.

Jindřich Mynarz: As libraries open up their data, opportunities for involving members of the public appear. One of them is maintaining data collaboratively, in a manner similar to Wikipedia. Do you think that outsourcing authority data to sources such as DBpedia or Wikidata is a reasonable option?

Esther Guggenheim: Provided an efficient revising mechanism there is certainly space to engage readers in enrichment of library and archival metadata. Nevertheless, authority files are used by a large community as controlled vocabularies, so this data is less suitable.

However, as cultural institutions are digitizing large parts of their collections and making them available online, one of their challenges is to keep up with cataloging and describing the digitized objects to make them more accessible. This is a place where the public could help by assigning subject headings, preferably by using terms from a published controlled vocabulary.

Moreover, for some types of collections, such as historic photographs, members of the public may recognize places, events and persons, which the library staff may not know. Another model is to work on special collections with target groups.

Ido Ivri: In the context of authority data, we suspect it is more beneficial to employ crowd wisdom (as well as automated tools) to point out where the existing authority data may be inaccurate or incorrect and allow dedicated staff to fix it.

Jindřich Mynarz: The collaboration between NLI and the LOD2 project started after your application for the PUBLINK consultancy programme. What convinced you about using linked open data in library setting? After your initial experience, what would you say to people from other libraries considering to start with opening their data using semantic web technologies?

Esther Guggenheim: Linked open data and other semantic web technologies are not sufficiently known in the local library community. The National Library of Israel sees it as its responsibility to play a pioneer role in the community.

This project was one of the first “hands on” experiences we were involved in. Although we were aware that projects like this are prone to reveal lacunas in the local data, it was eye-opening to experience the consequences. We hope we will be able to refine the results, pursue implementation of additional features and be involved in further aspects of data enrichment.

Ido Ivri: Our message to other librarians who are enthusiastic about linked open data would be to think of publishing their metadata using semantic web technologies as an extension of their mission of providing access to knowledge in a reusable manner for the benefit of their users. We assume that the more published data is linked to, used and reused, the better and more up-to-date it would become.

Short bios of the interviewees

Esther Guggenheim is Europeana and Metadata Coordinator at the National Library of Israel. She engages in various projects aiming to share library resources. Esther holds a Master of Library and Information Science (MLS) degree from the Hebrew University of Jerusalem.

Ido Ivri is manager of business development and innovation under the Library’s Executive Director. Ido holds a BSc in computer science from the Hebrew University of Jerusalem, and studying for an MBA in the Recanati Executive program in Tel Aviv University.

Leave a Reply

Your email address will not be published. Required fields are marked *