«Aman SHAKYA DOCTOR OF PHILOSOPHY Department of Informatics, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies ...»
Creating and Sharing
Structured Semantic Web Contents
through the Social Web
Department of Informatics,
School of Multidisciplinary Sciences,
The Graduate University for Advanced Studies (SOKENDAI)
2009 (School Year)
A dissertation submitted to
The Department of Informatics,
School of Multidisciplinary Sciences,
The Graduate University for Advanced Studies (SOKENDAI) In partial fulfillment of the requirements for The degree of Doctor of Philosophy
Hideaki Takeda National Institute of Informatics, SOKENDAI Nigel Collier National Institute of Informatics, SOKENDAI Kenro Aihara National Institute of Informatics, SOKENDAI Asanobu Kitamoto National Institute of Informatics, SOKENDAI Takahira Yamaguchi Keio University Acknowledgements I would like to acknowledge my advisor Prof. Hideaki Takeda for his constant guidance and support throughout my study, research activities and in producing this thesis. I would also like to convey special thanks to my sub-advisors and all members in the committee for their help in improving and enhancing the thesis by providing constructive suggestions. I would like to convey special thanks to Prof. Vilas Wuwongse for constantly supporting my study by providing valuable suggestions and co-authoring several papers with me. I am also grateful to Dr. Ikki Ohmukai for his academic and technical assistance, server machine set-ups and co- authoring papers with me. I am especially thankful for his major role in the SocioBiblog project. I should also acknowledge Dr. Hendry Muljadi for the useful discussions and guidance, especially in increasing my knowledge about semantic wikis. I also thank Dr.
Hideyuki Tan for his technical assistance to my work. Special thanks to assoc. Prof Ryutaro Ichise for his constructive advices to improve the research work. My sincere thanks also go to assoc. Prof. Yutaka Matsuo from the University of Tokyo for his interest in our work and providing the opportunity to present our work at the Biz-model conference.
I would like to express sincere gratitude to the Semantic Web Company, Vienna and CEO Andreas Blumauer for providing recognition to our work by awarding it in the Linked Data Vision competition and offering helpful advices. I am also grateful to IADIS for honoring our work with the best paper award in the area of Web 2.0. I also acknowledge Harry Halpin from the University of Edinburgh for showing special interest in our work and helping us with constructive discussions.
I should acknowledge the efforts of Dr. Kei Kurakawa for bringing my research work into real practical application for Japanese universities. I would like to thank Mr. Sanjil Shrestha from the Asian Institute of Technology, Thailand for using our work for significant real world project. I also thank Dr. Yessy Arvelyna and friends from the Tokyo International Exchange Center for accepting our experimental system for their real purpose and providing us useful feedback. I would like to thank all friends in different parts of the world who used our online systems and provided us vital feedback and suggestions.
I would like to convey special thanks to all the participants to my experiments for contributing their precious time and providing valuable feedback. I would like to thank Karina Shakya for her special help in my experiments. I am also grateful to all my colleagues and friends for their continuous support and fruitful discussions on my research and related areas. I would also like to express my sincere gratitude to the National Institute of Informatics for providing the environment, resources and funding indispensable for carrying out research studies, disseminating our research results and connecting us with researchers worldwide.
Finally, I would like to thank my family and friends for their understanding, patience and support for my years of study abroad.
i Preface Sharing of information is important for its utilization to full potential. Information should be published with understandable semantics so that it can be used by others. It should also be accessible and properly disseminated. The Semantic Web provides structure and semantics to data making it machine understandable. The social web has made it easy for people to publish information online. It also enables collaboration and facilitates information dissemination by connecting people. These two areas complement each other to form a social Semantic Web.
This is a highly promising direction but poses some major challenges.
The first challenge is to have people publish structured data on the social Semantic Web.
Some specific problems for this are as follows. Systems for publishing structured data on the Semantic Web are complex and have considerable learning curve for people. It is also difficult for people to contribute due to strict constraints imposed by such systems. The second challenge is to form the models, so called ontologies, required to structure data with understandable semantics. People have a wide variety of data to share but there are limited ontologies and creating ontologies is difficult. Some specific problems for this dealt by the thesis are as follows. It is difficult to create perfect concept definitions to model things. It is not easy to cover the evolving requirements of all people. Moreover, different people may have multiple conceptualizations for the same thing due to different perspectives and contexts.
It is not always possible to have consensus over conceptualizations and the collaborative process is itself difficult. Finally, proper dissemination of structured data on the web is also challenging. Information dissemination is mostly happening in a centralized and static way.
There is a lack of flow of relevant structured information among people.
The thesis proposes some solutions to the specific problems. It proposes enabling people to contribute structured data by providing an easy-to-use social platform. It proposes allowing users to define their own concepts and freely contribute various types of data through a flexible and relaxed interface. Concepts contributed by people are partial definitions from their own perspective and multiple conceptualizations are allowed. These can be consolidated to form a rich unified conceptualization. This is possible by semi-automatic techniques for data integration and schema alignment supported by the community. A formalization of concept consolidation is also presented in the thesis. This serves as a loose collaborative approach that does not enforce consensus and direct interaction. Further, concepts can be semi-automatically grouped and organized by similarity. As a result of consolidation and grouping, informal lightweight ontologies gradually emerge in a bottom-up way. A system called StYLiD has been implemented to realize the proposed approach.
The thesis also proposes a decentralized approach for disseminating structured data in communities. Relevant information can be aggregated through socially linked sources. This has been demonstrated experimentally. By combining the capabilities of publishing and aggregating, proper flow of information can be maintained in the community. A semantic blogging system called SocioBiblog has been implemented to demonstrate this for the bibliographic domain.
Experimental evaluations have been done to test the usability of StYLiD. Experimental studies have also been done to observe the multiple conceptualizations done by people and to verify that such conceptualizations can be consolidated. Methods used for concept consolidation and grouping have also been experimentally tested with some real data. The applicability and significance of the proposed approach has also been demonstrated by some real practical applications.
iiList of Figures
Figure 1. Level of expressiveness of ontologies.
Figure 2. The Semantic Web stack.
Figure 3. A more recent version of the Semantic Web stack.
Figure 4. Classification of works on structured content creation in the social Semantic Web.
Figure 5. Linking blog posts and ontology by semantic annotation.
Figure 6. Long tail of information domains.
Figure 7. Single global ontology.
Figure 8. Multiple local ontologies.
Figure 9. Hybrid approach with shared vocabulary.
Figure 10. Existing collaborative knowledge creation approaches.
Figure 11. Proposed collaborative knowledge creation approach.
Figure 12. Block diagram of the proposed approach.
Figure 13. Concept consolidation.
Figure 14. Formalization of concept consolidation.
Figure 15. Information sharing social platform scenario.
Figure 16. Integrated semantic portal scenario.
Figure 17. StYLiD screenshot.
Figure 18. Interface to create a new concept.
Figure 19. Interface to modify and reuse an existing concept.
Figure 20. Interface shown when defining a concept that already exists.
Figure 21. Importing attributes from existing concept
Figure 22. Concept Cloud in StYLiD.
Figure 23. Personal concept collection.
Figure 24. Selecting concept to input instance data.
Figure 25. Interface to enter instance data
Figure 26. Pop-up list of suggested values.
Figure 27. Backlinks to a data instance in StYLiD.
Figure 28. Annotation with Wikipedia contents using DBpedia linked data.
Figure 29. Consolidated concept cloud.
Figure 30. Aligning the attributes of multiple concepts.
Figure 31. Unified table view of instances.
Figure 32. Interface for semi-automatic grouping and consolidation of concepts.
................ 84 Figure 33. Named concept groups.
Figure 34. Interface for browsing grouped concepts.
Figure 35. Visualization of similar concept groupings using Cytoscape.
Figure 36. Structured search interface
Figure 37. SPARQL query interface.
Figure 38. Providing operations on embedded data using custom Operator script.
............... 87 Figure 39. Implementation architecture.
Figure 40. Decentralized publishing and aggregation with SocioBiblog.
Figure 41. Aggregation of information through social links.
Figure 42. Integration and mixing of information feeds.
Figure 43. Average co-author similarity (AvgSim1).
Figure 44. Max.
co-author similarity (MaxSim1).
Figure 45. Average co-authors‟ co-author similarity (AvgSim2).
iii Figure 46. Maximum co-authors‟ co-author similarity (MaxSim2).
Figure 47. Difference between co-author similarity and keyword similarity (AvgSim1- Sim0).
Figure 48. Comparison of co-author similarity(AvgSim1) and keyword search baseline(Sim0) (N = 5)
Figure 49. Comparison of co-author similarity(AvgSim1) and keyword similarity (Sim0) (N = 10)
Figure 50. Example scenario for SocioBiblog.
Figure 51. System architecture of SocioBiblog.
Figure 52. Publishing and aggregation on the current web with SocioBiblog.
Figure 53. SocioBiblog interface.
Figure 54. Blog this interface.
Figure 55. Searching aggregated publications.
Figure 56. Histogram of the number of users who have defined concepts.
Figure 57. Histogram of instance counts.
Figure 58. A data instance from Osaka University.
Figure 59. A data instance from Nagoya University
Figure 60. Alignment of concepts from two universities.
Figure 61. Uniform table view of integrated data from the university directories.
.............. 135 Figure 62. The TIEC musical community website.
Figure 63. View showing list of artists covered.
Figure 64. Screenshot of www.
Figure 65. A screenshot of the DMS system at AIT.
Figure 66. The concept explorer/selector interface.
Figure 67. Structured data input interface for the DMS.
Figure 68. Auto-complete to select the staff
Figure 69. Country selector widget.
Figure 70. Date selector widget.
Figure 71. Example semantic annotation of blog entries.
Figure 72. Example scenario for OntoBlog.
Figure 73. A part of a computer department ontology.
Figure 74. Semantic navigation.
Figure 75. Semantic aggregation.
ivList of Tables
Table 1. Analysis of existing collaborative knowledge base creation systems.
Table 2. Concept consolidation example.
Table 3. Statistics about randomly chosen authors.
Table 4. Total SUS scores given by participants.