清华大学学报自然科学版（英文版）

Select

Towards Efficient SPARQL Query Processing on RDF Data 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 613-622. DOI: 10.1016/S1007-0214(10)70108-5

摘要 ( 178 )

Efficient support for querying large-scale resource description framework (RDF) triples plays an important role in semantic web data management. This paper presents an efficient RDF query engine to evaluate SPARQL queries, where the inverted index structure is employed for indexing the RDF triples. A set of operators on the inverted index was developed for query optimization and evaluation. Then a main-tree-shaped optimization algorithm was developed that transforms a SPARQL query graph into the optimal query plan by effectively reducing the search space to determine the optimal joining order. The optimization collects a set of RDF statistics for estimating the execution cost of the query plan. Finally the optimal query plan is evaluated using the defined operators for answering the given SPARQL query. Extensive tests were conducted on both synthetic and real datasets containing up to 100 million triples to evaluate this approach with the results showing that this approach can answer most queries within 1s and is extremely efficient and scalable in comparison with previous best state-of-the-art RDF stores.

相关文章 | 计量指标

Select

Finding Data Tractable Description Logics for Computing a Minimum Cost Diagnosis Based on ABox Decomposition 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 623-632. DOI: 10.1016/S1007-0214(10)70109-7

摘要 ( 156 )

Ontology diagnosis, a well-known approach for handling inconsistencies in a description logic (DL) based ontology, computes a diagnosis of the ontology, i.e., a minimal subset of axioms in the ontology whose removal restores consistency. However, ontology diagnosis is computationally hard, especially computing a minimum cost diagnosis (MCD) which is a diagnosis such that the sum of the removal costs attached to its axioms is minimized. This paper addresses this problem by finding data tractable DLs for computing an MCD which allow computing an MCD in time polynomial in the size of the ABox of a given ontology. ABox decomposition is used to find a sufficient and necessary condition to identify data tractable DLs for computing an MCD under the unique name assumption (UNA) among all fragments of SHIN that are at least as expressive as DL-Lite_core without inverse roles. The most expressive, data tractable DL identified is SHIN without inverse roles or qualified existential restrictions.

相关文章 | 计量指标

Select

Semantic-Oriented Knowledge Transfer for Review Rating 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 633-641. DOI: 10.1016/S1007-0214(10)70110-3

摘要 ( 180 )

With the rapid development of Web 2.0, more and more people are sharing their opinions about online products, so there is much product review data. However, it is difficult to compare products directly using ratings because many ratings are based on different scales or ratings are even missing. This paper addresses the following question: given textual reviews, how can we automatically determine the semantic orientations of reviewers and then rank different items? Due to the absence of ratings in many reviews, it is difficult to collect sufficient rating data for certain specific categories of products (e.g., movies), but it is easier to find rating data in another different but related category (e.g., books). We refer to this problem as transfer rating, and try to train a better ranking model for items in the interested category with the help of rating data from another related category. Specifically, we developed a ranking-oriented method called TRate for determining the semantic orientations and for ranking different items and formulated it in a regularized algorithm for rating knowledge transfer by bridging the two related categories via a shared latent semantic space. Tests on the Epinion dataset verified its effectiveness.

相关文章 | 计量指标

Select

A Novel Ranking Framework for Linked Data from Relational Databases 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 642-649. DOI: 10.1016/S1007-0214(10)70111-5

摘要 ( 130 )

This paper investigates the problem of ranking linked data from relational databases using a ranking framework. The core idea is to group relationships by their types, then rank the types, and finally rank the instances attached to each type. The ranking criteria for each step considers the mapping rules and heterogeneous graph structure of the data web. Tests based on a social network dataset show that the linked data ranking is effective and easier for people to understand. This approach benefits from utilizing relationships deduced from mapping rules based on table schemas and distinguishing the relationship types, which results in better ranking and visualization of the linked data.

相关文章 | 计量指标

Select

DLP Learning from Uncertain Data 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 650-656. DOI: 10.1016/S1007-0214(10)70112-7

摘要 ( 125 )

Description logic programs (DLP) are an expressive but tractable subset of OWL. This paper analyzes the important under-researched problem of learning DLP from uncertain data. Current studies have rarely explored the plentiful uncertain data populating the semantic web. This algorithm handles uncertain data in an inductive logic programming framework by modifying the performance evaluation criteria. A pseudo-log-likelihood based measure is used to evaluate the performance of different literals under uncertainties. Experiments on two datasets demonstrate that the approach is able to automatically learn a rule-set from uncertain data with acceptable accuracy.

相关文章 | 计量指标

Select

Ontology-Driven Mashup Auto-Completion on a Data API Network 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 657-667. DOI: 10.1016/S1007-0214(10)70113-9

摘要 ( 154 )

The building of data mashups is complicated and error-prone, because this process requires not only finding suitable APIs but also combining them in an appropriate way to get the desired result. This paper describes an ontology-driven mashup auto-completion approach for a data API network to facilitate this task. First, a microformats-based ontology was defined to describe the attributes and activities of the data APIs. A semantic Bayesian network (sBN) and a semantic graph template were used for the link prediction on the Semantic Web and to construct a data API network denoted as N_p. The performance is improved by a semi-supervised learning method which uses both labeled and unlabeled data. Then, this network is used to build an ontology-driven mashup auto-completion system to help users build mashups by providing three kinds of recommendations. Tests demonstrate that the approach has a precision_p of about 80%, recall_p of about 60%, and F_0.5 of about 70% for predicting links between APIs. Compared with the API network N_e composed of existing links on the current Web, N_p contains more links including those that should but do not exist. The ontology-driven mashup auto-completion system gives a much better recall_r and discounted cumulative gain (DCG) on N_p than on N_e. The tests suggest that this approach gives users more creativity by constructing the API network through predicting mashup APIs rather than using only existing links on the Web.

相关文章 | 计量指标

Select

Disambiguating Authors by Pairwise Classification 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 668-677. DOI: 10.1016/S1007-0214(10)70114-0

摘要 ( 147 )

Name ambiguity is a critical problem in many applications, in particular in online bibliography systems, such as DBLP, ACM, and CiteSeerx. Despite the many studies, this problem is still not resolved and is becoming even more serious, especially with the increasing popularity of Web 2.0. This paper addresses the problem in the academic researcher social network ArnetMiner using a supervised method for exploiting all side information including co-author, organization, paper citation, title similarity, author's homepage, web constraint, and user feedback. The method automatically determines the person number k. Tests on the researcher social network with up to 100 different names show that the method significantly outperforms the baseline method using an unsupervised attribute-augmented graph clustering algorithm.

相关文章 | 计量指标

Select

Efficient Composition of Semantic Web Services with End-to-End QoS Optimization 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 678-686. DOI: 10.1016/S1007-0214(10)70115-2

摘要 ( 146 )

The efficiency of QoS-aware service composition is important since most service composition problems are known to be NP-hard. With the growing number of web services, service composition is like a decision problem on selecting services or/and execution plans to satisfy the users' end-to-end QoS requirements (e.g. response time, throughput). Composite services with the same functionality may have different execution plans, which may cause different end-to-end QoS. This paper presents a model combining semantic data-links and QoS, which leads to an efficient approach to automatic construction of a composite service with optimal end-to-end QoS. The approach is based on a greedy algorithm to select both services and execution plans for composite services. Empirical and theoretical analyses of the approach show that its time complexity is O(mn²) for a repository with n services and an ontology with m concepts. Moreover, the approach increases linearly in time when using an index to search services in the repository. Tests with a repository with 20 000 services and an ontology with 300 000 concepts show that the algorithm significantly outperforms current existing algorithms in terms of composition efficiency while achieving optimal end-to-end QoS.

相关文章 | 计量指标

Select

Reasoning with Inconsistent Ontologies 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 687-691. DOI: 10.1016/S1007-0214(10)70116-4

摘要 ( 150 )

Reasoning with inconsistent ontologies involves using an inconsistency reasoner to get meaningful answers from inconsistent ontologies. This paper introduces an improved inconsistency reasoner, which selects consistent subsets using minimal inconsistent sets and a resolution method, to improve the run-time performance of the reasoning processing. A minimal inconsistent set contains a minimal explanation for the inconsistency of a given ontology. Thus, it can replace the consistency checking operation, which is executed frequently in existing approaches. When selecting subsets of the inconsistent ontology, formulas which can be directly or indirectly resolved with the negation of the query formula are selected because only those formulas affect the consequences of the reasoner. Therefore, the complexity of the reasoning processing is significantly reduced. Tests show that the run-time performance of the inconsistency reasoner is significantly improved.

相关文章 | 计量指标

Select

Closed World Reasoning for OWL2 with NBox 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 692-701. DOI: 10.1016/S1007-0214(10)70117-6

摘要 ( 175 )

This paper describes the problem of doing description logic (DL) reasoning with partially closed world. The issue was addressed by extending the syntax of DL SROIQ with an NBox, which specifies the predicates to close, extending the semantics with the idea of negation as failure, reducing the closed world reasoning to incremental reasoning on classical DL ontologies, and applying the syntactic approximation technology to improve the reasoning performance. Compared with the existing DBox approach, which corresponds to the relation database, the NBox approach supports deduction on closed concepts and roles. Also, the approximate reasoning can reduce reasoning complexity from N2EXPTIME-complete to PTIME-complete while preserving the correctness of reasoning for ontologies with certain properties.

相关文章 | 计量指标

Select

An Empirical Study of Unsupervised Sentiment Classification of Chinese Reviews 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 702-708. DOI: 10.1016/S1007-0214(10)70118-8

摘要 ( 144 )

This paper is an empirical study of unsupervised sentiment classification of Chinese reviews. The focus is on exploring the ways to improve the performance of the unsupervised sentiment classification based on limited existing sentiment resources in Chinese. On the one hand, all available Chinese sentiment lexicons — individual and combined — are evaluated under our proposed framework. On the other hand, the domain dependent sentiment noise words are identified and removed using unlabeled data, to improve the classification performance. To the best of our knowledge, this is the first such attempt. Experiments have been conducted on three open datasets in two domains, and the results show that the proposed algorithm for sentiment noise words removal can improve the classification performance significantly.

相关文章 | 计量指标

Select

Normalized MEDLINE Distance in Context-Aware Life Science Literature Searches 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 709-715. DOI: 10.1016/S1007-0214(10)70119-X

摘要 ( 187 )

Literature searches on the Web result in great volumes of query results. A model is presented here to refine the search process using user interests. User interests are analyzed to calculate semantic similarity among the interest terms to refine the query. Traditional general purpose similarity measures may not always fit a domain specific context. This paper presents a similarity method for medical literature searches based on the biomedical literature knowledge source “MEDLINE”, the normalized MEDLINE distance, to more reasonably reflect the relevance between medical terms. This measure gives more accurate user interest descriptions through calculating the similarities of user interest terms to rerank the interest term list. The accurate user interest descriptions can be used for query refinement in keyword searches to give more personalized results for the user. This measure also improves the search results for personalization through controlling the return number of results on each topic of interest.

相关文章 | 计量指标

Select

Automatic Approach to Ontology Evolution Based on Change Impact Comparisons 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 716-723. DOI: 10.1016/S1007-0214(10)70120-6

摘要 ( 193 )

Ontology evolution is the timely adaptation of ontologies to changing requirements, which is becoming more and more important as ontologies become widely used in different fields. This paper shows how to address the problem of evolving ontologies with less manual case-based reasoning using an automatic selection mechanism. An automatic ontology evolution strategy selection framework is presented that automates the evolution. A minimal change impact algorithm is also developed for the framework. The method is shown to be effective in a case study.

相关文章 | 计量指标

Select

Extracting Semantic Subgraphs to Capture the Real Meanings of Ontology Elements 收藏

清华大学学报自然科学版（英文版）. 2010, (6): 724-733. DOI: 10.1016/S1007-0214(10)70121-8

摘要 ( 105 )

An element may have heterogeneous semantic interpretations in different ontologies. Therefore, understanding the real local meanings of elements is very useful for ontology operations such as querying and reasoning, which are the foundations for many applications including semantic searching, ontology matching, and linked data analysis. However, since different ontologies have different preferences to describe their elements, obtaining the semantic context of an element is an open problem. A semantic subgraph was proposed to capture the real meanings of ontology elements. To extract the semantic subgraphs, a hybrid ontology graph is used to represent the semantic relations between elements. An extracting algorithm based on an electrical circuit model is then used with new conductivity calculation rules to improve the quality of the semantic subgraphs. The evaluation results show that the semantic subgraphs properly capture the local meanings of elements. Ontology matching based on semantic subgraphs also demonstrates that the semantic subgraph is a promising technique for ontology applications.

相关文章 | 计量指标