Serbian statistical data as Linked Open Data

National statistical offices across the world already possess an abundance of structured data, both in their databases and files in various formats, but lack the means for exposing, sharing, and interlinking this data on the Semantic Web. Statistical data underpins many of the mash-ups and visualizations we see on the Web, as well as being the foundations for policy prediction, planning and adjustments. Making this data available as Linked Open Data would allow for third party annotations and linking, flexible combination of data across both statistical and non-statistical datasets, new ways of manipulating the data, as well as machine readable means of publication that supports an out-of-the-box web API for programmatic access.

The Statistical Office of the Republic of Serbia (SORS) has shown strong interest in being able to publish statistical data in a web-friendly format to enable it to be linked and combined with related information. The SORS is in the process of harmonization of standards, classifications and methodologies with the system of official statistics of the EU, and, therefore, coordinates its activities in accordance with the European Statistics Code of Practice. The Pupin team, recognizing the need for aligning their efforts with those of the other European statistical offices, opted for the both most beneficial and future proof solution. The Serbian statistical data is represented using the Data Cube vocabulary, which is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard used by the U.S. Federal Reserve Board, the European Central Bank, Eurostat, the WHO, the IMF, and the World Bank. The United Nations and the Organization for Economic Cooperation and Development expect the national statistics offices across the world to use SDMX to allow aggregation across national boundaries. The domain semantics, dataset’s metadata, and other crucial information needed in the process of statistical data exchange are described using SDMX-RDF, an extension vocabulary that provides a layer on top of Data Cube.

Eurostat requires using standard classification schemas such as NACE, COICOP, PRODCOM, etc. However, obliged to follow national legislatives, most national statistical offices use also local coding systems, applicable in their countries. As having well defined code lists paves the way for easy dataset comparison, interlinking, discovery and merging, a number of concept schemes were designed for the purpose of representing the domains involved, ranging from time and geographical concepts to currency and statistical indicators. The SORS statistical data in XML form is passed as input to the XSLT processor and transformed into RDF using the aforementioned vocabularies and concept schemes.

A Serbian CKAN open government data catalog instance was set up, as part of the PublicData.eu data hub network, thereby immediately increasing the instance visibility and reachability. Over 80 RDF datasets have been cataloged in the local metadata repository, and periodical “harvesting” at an international level has been scheduled, thereby increasing transparency and improving public service delivery, while enriching the Linked Data cloud.

The results of the successful cooperation between the Mihajlo Pupin Institute and the Statistical Office of the Republic of Serbia have also been announced on the SORS’s official website. Moreover, a paper, titled Publishing Statistical Data As Linked Open Data was presented on the 1st of March on Kopaonik, Serbia, at the 2nd International Conference on Information Society Technology (ICIST 2012).

