FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 |

«Abstract. Web Usage Mining (WUM) is the application of data mining techniques over web server logs in order to extract navigation usage patterns. ...»

-- [ Page 1 ] --

Exploiting Knowledge Representation for Pattern


Mariângela Vanzin, Karin Becker

Pontifícia Universidade Católica do Rio Grande do Sul – PUCRS

Av. Ipiranga, 6681, Porto Alegre, Brazil

{mvanzin, kbecker}@inf.pucrs.br

Abstract. Web Usage Mining (WUM) is the application of data mining techniques over web server logs in order to extract navigation usage patterns. Semantic Web Usage Mining aims at combining the Semantic Web and WUM.

The main goal of the Semantic WUM is to improve the process and the results of WUM by exploiting the new semantic structure in the Web. Pattern analysis is a critical phase in WUM, for two main reasons: a) mining algorithms yield a huge number of patterns; b) there is a significant semantic gap between URLs and events performed by users. This paper discusses the use of ontologies available at Semantic Web to support the interpretation of web usage sequential patterns. Functionality is targeted at supporting the comprehension of patterns, as well as on the identification of potentially interesting ones through interactive pattern rummaging.

1 Introduction Web Mining aims at discovering insights about Web resources and their usage [1][2].

Web Usage Mining (WUM) is the application of data mining techniques to extract navigation usage patterns from records of page requests made by visitors of a Web site. Access patterns mined from Web logs can represent useful knowledge in practice. It can help improving the design of Web sites, analyzing users reaction and motivation, building adaptive Web sites, improving site content, among others. The comprehension of mined patterns is difficult due to the primarily syntactical nature of web data [3]. Thus, the formalization of the semantics of Web resources and navigation behavior is increasingly required.

Semantic Web is the proposal of enriching the Web with machine-processable information to better support users in their tasks [4]. Semantic Web Mining aims at combining these two research areas [3, 5, 6]. The main goal is, on one hand, to improve the results of Web Mining by exploiting the new semantic structures available in the Web; and on the other hand, to make use of Web Mining, for building up the Semantic Web. Recently, many approaches started exploiting the semantic structures stored in the ontology layer [7] in the Semantic Web architecture.

The WUM process is divided into three generic phases [1]: preprocessing, pattern discovery and pattern analysis. Pattern analysis remains a key issue in the area of WUM. Typically mining techniques (e.g. association, sequence) yield a huge number of patterns and most of them are useless, uncompressible or uninteresting to users [8].

Due to the elevated number of patterns, users have difficulty on identifying the ones that are interesting with regard to the domain.

This paper discusses the ontology usage, possibly available at the Semantic Web, to support pattern interpretation. Ontologies are exploited for addressing three interrelated problems: a) to represent patterns in a more intuitive form, b) to identify patterns related to some subject of interest, and c) to identify potentially interesting patterns through concept-oriented, interactive pattern rummaging. Other features complement this approach, such as patterns grouping and pattern visual representation.

The remainder of this paper is structured as follows. Section 2 presents the proposed ontology-based functionality targeted at supporting the analysis phase. It describes the ontology properties, and its use for conceptual pattern representation, pattern rummaging, pattern retrieval and concepts merging. Section 3 describes a scenario of usage. Section 4 compares related work with the proposed approach.

Conclusions and future work are addressed in Section 5.

2 An Ontology-based Approach for Pattern Analysis

Given the output of the pattern discovery phase, the goal of the pattern analysis phase is to eliminate irrelevant patterns and to extract the interesting ones, i.e. those that constitute knowledge. But pattern analysis is not an easy task because: a) the number of patterns yielded by mining algorithms can easily exceed the capabilities of a human user of identifying interesting results; b) the output of Web mining algorithms is not suitable for human interpretation, and c) frequently in a WUM process the user does not know what he is looking for, i.e. in most cases the search for interesting patterns is exploratory, which does not include hypothesis verification.

Our approach makes use of ontologies, possibly available in the Ontology Layer of the Semantic Web, to support the interpretation of web usage sequential patterns.

Ontologies are exploited for addressing three interrelated problems: a) to represent patterns in a more intuitive form, thus reducing the gap between URLs and site events, b) to identify patterns that are related to some subject of interest, and c) to identify potentially interesting patterns through concept-oriented interactive pattern rummaging. Other features complement this approach, such as the grouping of patterns by different similarity criteria and visual pattern representation and manipulation. The remaining of this section describes the underlying assumptions for developing the pattern analysis, the ontology structure, as well the functionality proposed to support the pattern analysis activity. The next section illustrates the use of the functionality using the prototype currently under implementation.

3.1 WUM Process Assumptions Our approach is targeted at the pattern analysis phase. The pre-processing phase considers a set of URLs as data source, which are processed using typical activities, such as data cleaning, user and session identification and path completion [1]. Preprocessing also does not assume any particular data enrichment. If available, a semantic log composed by records with formal semantics based on an ontology underlying the site could be used as well (e.g. [6]).

Because we are interested in usage patterns, we assume the application of the sequence technique in the pattern discovery phase using the algorithm of [9]. As in [10], we assume running the mining algorithm with minimum support threshold.

Higher values can make a mining algorithm run faster, but at the risk of reducing the usefulness of data mining results. The basic idea is to accept the execution time required for mining, as well as the huge number of patterns returned. Then, pattern analysis functionality described in the remainder of this section is used to set focus on a subset of patterns, to interpret their meaning, and to identify the potentially interesting ones.

3.2 Ontology Representation Ontologies available at the Semantic Web can be used to represent the events of a web site, which can be roughly categorized as service (e.g. buying, finding) and content (e.g. Hamlet) [5]. Thus, they can be used to associate meaning to web pages and user actions over pages. Our approach exploits the semantic of the pages visited along users’ paths, where meaningful application events are mapped into domain knowledge. The domain events are represented in two levels: conceptual and physical. The conceptual level is composed by an ontology that specifies concepts and relationships among these concepts. At the physical level, events are represented by URLs. The conceptual layer corresponds to the ontology layer in the Semantic Web architecture.

Ontologies represent and support relationships among concepts providing them meaning. Three types of relationship are considered in this work: generalization/ specialization, which are powerful abstractions for sharing similarities among classes while preserving their differences; aggregation (part-whole and part-of relationships), in which classes representing the components are associated to the class representing the entire assembly; and binary relationships, representing any other type of relationship that connects two concepts.

URLs are then mapped into ontology concepts according to two dimensions: service and content. An URL can be mapped into one service, one content or both. In case an URL is mapped into a service and a content, the predominant dimension must be defined. A same ontology concept can be used in the mapping of various URLs.

Not all URLs need to be mapped (e.g. auxiliary pages [1]). Figure 1 describes the structure of the ontology using a UML class diagram.


–  –  –

The task of mapping URLs into ontology concepts can be laborious, but it pays off by greatly simplifying the interpretation activity, as described in the remaining of this section. The future semantic web will certainly contribute in reducing this effort [11], in that the creation of the respective ontology layer will be part of any site design.

3.3 Pattern Interpretation Functionalities

Visual Conceptual Pattern Representation.

Patterns yielded by the sequential mining algorithm are a sequence of URLs, which are often hard to interpret. In order to reduce the semantic gap between URLs and events performed by users in the Web sites, our approach exploits the semantic of the pages visited by users. Thus, the sequential patterns presented to the analyst are not composed of URLs, but rather of the primitive concepts of the ontology into which they were mapped.

Considering the ontology illustrated in Figure 2, a pattern in the form URL1→URL2 is displayed using the concepts that represent the corresponding primitive events in the site, such as Send-File→ Glossary. This pattern representation provides the analyst with a more intuitive meaning of the pattern. By exploring the dimensions, the analyst can interpret the patterns according to his interests. For instance, the pattern URL1→ URL2 can be represented as Send-File → VisualizeInformation if the analyst is interested by the service dimension or Send-File→ Glossary if both dimensions are of interest. According to the content dimension, the pattern URL2→ URL3 can be interpreted as Glossary→ Virtual-Environment.

The generalization/specialization and aggregation relationships can be explored to provide various abstraction levels over a same pattern. For instance, the pattern SendFile→ Virtual Environment can also be represented as Task-Submission → VirtualEnvironment, Task-Submission → Distance-Education and so on.

Interactive Pattern Rummaging.

The interactive pattern rummaging functionality allows exploiting the ontology in different ways to identify relevant patterns. The analyst can visualize the patterns in different abstraction levels, exploring the generalization and aggregation relationships through operations similar to “roll-up” and “drill-down” in OLAP (On-line Analytical Processing). The roll-up operation represents a concept either by its generalization or aggregation relationship. The drill-down operation explores these relationships in the inverse sense.

Roll-up and drill-down operations can be used for two different purposes: better understanding the events represented by the pattern, and to obtain


patterns that actually represent a set of patterns. Figure 3 illustrates the use of roll-up operations over individual elements of a pattern for understanding their meaning through more abstract concepts. This task is called pattern comprehension. In this example, the original pattern reveals that users access some page about the subject “virtual environment”, access the glossary and then load and send a specific file. By rollingup the concept Virtual-Environment, the user understands that it is part of the distance education content, which possibly motivate the users to look for other definitions available in the glossary. He also understands that loading and sending are activities related to the submission of an assignment.

With the same purpose of pattern comprehension, binary relations can be used to complement the information about the pattern events, by showing other related concepts on demand. The user selects a concept and asks for the relationships in which it participates. For instance, Glossary concept has a binary relationship with the concept Learning-Process as represented by the Figure 2. Thus, the analyst can understand that the glossary has words about learning process.

–  –  –

Another use of the roll-up operation is to obtain an abstract pattern, i.e. a pattern that actually represents a set of patterns. For that purpose, the user substitutes one or more pattern elements for their corresponding abstract concept, as depicted in Figure

4. In this example, the user is interested in patterns where a group of users access a page about virtual environment, then the glossary, followed by the use of two task submission activities (e.g. load, visualize, cancel and send a file, according to the ontology). Notice that in doing so, the support of the abstract pattern must be recalculated. For instance, the abstract pattern may match both Virtual-Environment→ Glossary→ Load-File→ Send-File and Virtual-Environment→ Glossary→ Load-File→ Cancel, which are found in the rule set. Our approach for recalculating the support is inspired in [10], and it is not discussed here due to space limitations.

–  –  –

The roll-up and drill -down operations allow users to analyze the rule set provided by the mining algorithm in an exploratory manner, based on the events captured by the ontology. For instance, the user starts with the pattern illustrated in Figure 3, and after having the insight that load and send a file are tasks related to the submission of assignments; he rolls-up for the abstract pattern of Figure 4, such that all patterns that match this abstraction can be found in the rule set. Then, by drilling down, he becomes aware of the use of other distinct task submission services that support the abstract pattern. By analyzing the support of these rules, he may realize that the number of students who canceled the submission after loading the file is greater than the number of students who actually sent their assignments. Then he realizes that the assignment submission service of the site is not intuitive for the students, and should be redesigned or a help/tutorial should be provided.

Pattern Retrieval.

Pages:   || 2 | 3 |

Similar works:

«COMPLEXITY REDUCTION OF H.264 USING PARALLEL PROGRAMMING by SUDEEP PRAKASH GANGAVATI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON December 2012 Copyright © by Sudeep Gangavati 2012 All Rights Reserved ACKNOWLEDGEMENTS The successful completion of my thesis is due to the invaluable suggestions and support of...»

«3000 Remembrance Drive Locked Bag 1011 Wollondilly TAHMOOR NSW 2573 Telephone: (02) 4684 2577 Facsimile: (02) 4684 2755 Anglican College Email: pa@wac.nsw.edu.au Website: www.wac.nsw.edu.au 1 August 2013 Vol 11 Newsletter 21 (Wk B) THE WARATAH WEEKLY Choose to Listen; and believe the voice of Truth From the Foundation Headmaster, Dr Stuart Quarmby Footy and Faith I was reading the column in this week’s Sunday Telegraph by Zoe Marshall, wife of Benji (who plays for the Tigers in case you did...»

«HIGH LEVEL MEETING 2 ON SOUTH-SOUTH COOPERATION ON CHILD RIGHTS IN ASIA AND PACIFIC NEW DELHI, INDIA, 23-25 OCTOBER 2013 COOK ISLANDS COUNTRY STATEMENT TOPIC 1: ADOLESCENTS, YOUTH AND MIGRATION Presented by Member of Parliament Associate Minister of Internal Affairs John Henry October 2013 COOK ISLANDS COUNTRY STATEMENT TOPIC 1: ADOLESCENTS, YOUTH AND MIGRATION OVERVIEW OF THE COOK ISLANDS  The Cook Islands area small, developing Pacific island nation of 15 islands dispersed over many miles...»

«The Community Development Institute A UNBC Community Development Institute Publication University of Northern British Columbia “Northern Professional Work Strength in Remote Northern Communities: to A Social Work Northern Perspective Strength.” By Dr. Glen Schmidt The Community Associate Professor Development Institute University of Northern British Columbia at UNBC 3333 University Way Prince George, BC CANADA V2N 4Z9 _ www.unbc.ca/cdi A Peer Reviewed Publication © Copyright The Community...»

«Woodstock Community Cookbook A collection of recipes submitted by Woodstock School staff members Recipe Index (Section divisions are included here in BOLD font.) 3-Minute No-Bake Cookies, 59 Chocolate Shake, 6 Aloo Bharta, 43 Chocolate Snack Cake (easy!), 69 Angie’s Scones, 14 Chocolate Syrup, 47 Appetizers, 7 Cocoa Apple Cake, 60 Apple Cake, 60 Coconut Cream Pie, 71 Apple Tart, 68 Coffee Cake, 20 Baby Ruth Bars, 56 Coffee Praline, 63 Baked Beans (American-style), 46 Cola Cake, 58 Baked...»

«Missouri Master Naturalist Program Chapter Chartering Guidelines University of Missouri Extension and Missouri Department of Conservation are Equal Opportunity/ADA institutions. Table of Contents Program Overview Overview for Establishing a Master Naturalist Chapter Phase I: Initiating Chapter Development Chapter Coordinating Committee Chapter Advisor(s) Extension Sponsorship Chapter Name and Location Coordinators’ Training Course Submitting Phase II: Developing the Chapter’s Activities...»

«Jarzabkowski, P. & Lê, J. (2016). We have to do this and that? You must be joking: Constructing and responding to paradox through humour. Organization Studies, doi: 10.1177/0170840616640846 City Research Online Original citation: Jarzabkowski, P. & Lê, J. (2016). We have to do this and that? You must be joking: Constructing and responding to paradox through humour. Organization Studies, doi: 10.1177/0170840616640846 Permanent City Research Online URL: http://openaccess.city.ac.uk/12593/...»

«Allan C. Dunlop fonds. – 1958-2013. – 5.4 m of textual records (42 boxes) and other material (2 boxes). Retrieval Code Series File Title Dates 2015-033/001 01 Correspondence 4 cm of textual records 1964-1969 2015-033/001 02 Correspondence 5 cm of textual records 1969-1971 2015-033/001 03 Correspondence 4 cm of textual records 1971-1972 2015-033/001 04 Correspondence 5 cm of textual records 1972-1973 2015-033/001 05 Correspondence 3 cm of textual records 1974 2015-033/002 01 Correspondence 5...»

«THE NATIONAL BATTLEFIELDS COMMISSION 2011-12 Report on Plans and Priorities THE HONOURABLE JAMES MOORE, P.C., M.P.MINISTER OF CANADIAN HERITAGE AND OFFICIAL LANGUAGES Table of Contents Minister’s Message 1 Section I: Overview 2 1.1 Summary Information 2 Raison d’être and Responsibilities 2 Strategic Outcome and Program Activity Architecture (PAA) 3 1.2 Planning Summary 4 Financial Resources 4 Human Resources 4 Planning Summary Table 4 Contribution of Priorities to Strategic Outcome 5 Risk...»

«Sermon #1487 Metropolitan Tabernacle Pulpit 1 THE PROPHET LIKE UNTO MOSES NO. 1487 A SERMON DELIVERED ON LORD’S-DAY MORNING, AUGUST 3, 1879, BY C. H. SPURGEON, AT THE METROPOLITAN TABERNACLE, NEWINGTON. “The Lord your God will raise up unto you a Prophet from the midst of you, of your brethren, like unto me; unto Him you shall hearken; according to all that you desired of the Lord your God in Horeb in the day of the assembly, saying, Let me not hear again the voice of the Lord my God,...»

«Annotated Bibliography of Recent Research on Sexual Assaults on Black Women Prepared by Ashley Manchester, M.A. Brandeis University SOCIAL SCIENCE: Amar, Angela Frederick. “African-American College Women’s Perceptions of Resources and Barriers When Reporting Forced Sex.” The Journal of the National Black Nurses Association 19.2 (2008) 35–41. Through her research, Amar found that “African-American and White women experience violence at equal rates except in the 2024 age range where...»

«The Cultural Criticism series consists of three volumes: Classics in Cultural Criticism I: Britain, edited by Bernd-Peter Lange Classics in Cultural Criticism II: USA, edited by Hartmut Heuermann Contemporaries in Cultural Criticism, edited by H. Heuermann and B.-P. Lange Bernd-Peter Lange (ed.) Classics in Cultural Criticism Volume I BRITAIN PETER L A N G Frankfurt a m Main • Bern • New York • Paris UniversitätsBibüothek München CIP-Titelaufnahme jer Deutschen Bib iothek Classics in...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.