Linked data excited interest in the Data Mining community

The first Workshop on Data Mining from Linked Data, DMoLD’13, was held on September 23, 2013, in Prague, in collocation with the  European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013), a prime scientific event in the data mining field. The workshop was co-organized by LOD2 researchers from UEP, Prague and I2G, Poznań (plus Claudia d’Amato from the University of Bari).

Despite the modest format of a half-day (afternoon) workshop with six technical presentations only, the interest of ECML/PKDD participants surpassed all expectations. The number of pre-registered was the highest of all twelve workshops, and during the sessions  the large room was constantly half-full with over 40 listeners and discussants.

A highlight of the whole afternoon was the invited talk on Exploiting Linked Open Data as Background Knowledge in Data Mining, by Heiko Paulheim from the University of Mannheim.  He advertised the recently released RapidMiner LOD Extension, and exemplified how such technology allows to exploit rich background knowledge (for both horizontal and vertical enrichment of original data) while outsourcing its maintenance to the LOD infrastructure, for example, the regular DBpedia updates via extraction from Wikipedia. The audience was interested, among other, in applying the presented approach in bioinformatics and in comparing/combining it with more traditional relational data mining such as Inductive Logic Programming.

Nearly half of the workshop was devoted to the outcomes of the Linked Data Mining Challenge. Its participants faced data mining tasks on linked data describing public contracts (LOD2 project has a whole work package devoted to managing this kind of data), such as predicting the number of bidders.

A clear lesson from the workshop was that the data mining community is eager to apply their tools on novel, richly structured types of data such as linked data, although the syntactical peculiarities of RDF are not yet sufficiently addressed by their pre-processing components, which currently limits their active engagement. Effort from the semantic web side, like the mentioned RapidMiner extension project, could help remove this obstacle and bring the two communities even more closely together.

