Complete Publication Listing (as of 16th March 2019)

P.J. McBrien and A. Poulovassilis,
A Conceptual Modelling Approach to Visualising Linked Data, OTM Conferences 2019, Pages 227-245 (Presentation)
Abstract: Increasing numbers of Linked Open Datasets are being published, and many possible data visualisations may be appropriate for a user s given exploration or analysis task over a dataset. Users may therefore find it difficult to identify visualisations that meet their data exploration or analyses needs. We propose an approach that creates conceptual models of groups of commonly used data visualisations, which can be used to analyse the data and users queries so as to automatically generate recommendations of possible visualisations. To our knowledge, this is the first work to propose a conceptual modelling approach to recommending visualisations for Linked Data.
P.J. McBrien and A. Poulovassilis,
Towards Data Visualisation Based on Conceptual Modelling, ER 2018, Pages 91-99 (Longer Version) (Presentation)
Abstract: Selecting data, transformations and visual encodings in current data visualisation tools is undertaken at a relatively low level of abstraction - namely, on tables of data - and ignores the conceptual model of the data. Domain experts, who are likely to be familiar with the conceptual model of their data, may find it hard to understand tabular data representations, and hence hard to select appropriate data transformations and visualisations to meet their exploration or question-answering needs. We propose an approach that addresses these problems by defining a set of visualisation schema patterns that each characterise a group of commonly-used data visualisations, and by using knowledge of the conceptual schema of the underlying data source to create mappings between it and the visualisation schema patterns. To our knowledge, this is the first work to propose a conceptual modelling approach to matching data and visualisations.
Y. Liu and P.J. McBrien,
SPOWL: Spark-based OWL 2 reasoning materialisation
Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, 2017 (Presentation)
Abstract: This paper presents SPOWL, which uses Spark to perform OWL reasoning over large ontologies. SPOWL acts as a compiler, which maps axioms in the T-Box of an ontology to Spark programmes, which will be executed iteratively to compute and materialise a closure of reasoning results entailed by the ontology. Such a closure is then available to queries which retrieve information from the ontology. Compared to MapReduce, adopting Spark enables SPOWL to cache data in the distributed memory, to reduce the amount of I/O used, and to also parallelise jobs in a more flexible manner. We further analyse the dependencies among the Spark programmes, and propose an optimised order following the T-Box hierarchy, which makes the materialising process terminate with minimum iterations. Moreover, SPOWL uses a tableaux reasoner to classify the T-Box, and the classified axioms are complied into Spark programmes which are directly related to the ontological data under reasoning. This not only makes the reasoning by SPOWL more complete, but also avoids processing unnecessary rules, as compared to evaluating certain rulesets adopted by most state-of-the-art reasoners. Finally, since SPOWL materialises the reasoning closure for large ontologies, it processes queries retrieving ontology information faster than computing the query answers in real time.
Y. Liu and P.J. McBrien,
Transactional and Incremental Type Inference from Data Updates
The Computer Journal, Vol. 60(3), Pages 347-368, 2017
Abstract: A distinctive property of relational database systems is the ability to perform data updates and queries in atomic blocks called transactions, with the well-known atomicity, consistency, isolation and durability (ACID) properties. To date, the ability of systems performing reasoning to maintain the ACID properties, even over data held within a relational database, has been largely ignored. This article studies an approach to reasoning over data from web ontology language (OWL) 2 RL ontologies held in a relational database, where the ACID properties of transactions are maintained. Taking an incremental approach to maintaining materialised views of the result of reasoning, the approach is demonstrated to support a query and reasoning performance comparable to or better than other OWL reasoning systems, yet adding the important benefit of supporting transactions.
L. Al Khuzayem and P.J. McBrien,
OWLRel: Learning Rich Ontologies from Relational Databases
Baltic J. Modern Computing, Vol. 4, No. 3, Pages 466 482, 2016
Abstract: Mapping between ontologies and relational databases is a necessity for realising the Semantic Web vision. Most of the work concerning this topic has either (1) extracted an OWL schema using a limited range of OWL modelling constructs from a relational schema, or (2) extracted a relational schema from an OWL schema, that represents as much as possible the OWL schema. By contrast, we propose a general framework that maps between relational databases and schemas expressed in OWL 2. In particular, we regard the transformation from databases to ontologies as a two-phase process. Firstly, convert the relational schema into an OWL schema, and secondly enrich the OWL schema with highly expressive axioms based on analysing the schema and the data in the database. Testing our data analysis heuristics on a number of databases show that they produce an OWL schema that includes more semantic information than found in the relational schema.
L. Al Khuzayem and P.J. McBrien,
Extracting OWL Ontologies from Relational Databases using Data Analysis and Machine Learning
Proc. DB&IS, Databases and Information Systems IX Pages 43-56, IOS Press 2016
Abstract: Extracting OWL ontologies from relational databases is extremely helpful for realising the Semantic Web vision. However, most of the approaches in this context often drop many of the expressive features of OWL. This is because highly expressive axioms can not be detected from database schema alone, but instead require a combined analysis of the database schema and data. In this paper, we present an approach that transforms a relational schema to a basic OWL schema, and then enhances it with rich OWL 2 constructs using schema and data analysis techniques. We then rely on the user for the verification of these features. Furthermore, we apply machine learning algorithms to help in ranking the resulting features based on user supplied relevance scores. Testing our tool on a number of databases demonstrates that our proposed approach is feasible and effective.
Y. Liu and P.J. McBrien,
Transactional and Incremental Type Inference from Data Updates
Data Science: Proceedings of BICOD'15, Pages 205-219
Abstract: Knowledge in the Semantic Web is continuously subject to frequent updates. A key challenge for numerous reasoning systems is to efficiently perform reasoning over such updates. Most reasoners apply a non-incremental approach, by which reasoning is executed over the complete ontology, even if only a few facts have changed in the ontology. In this paper, we present and implement a trigger-based approach as an extension of SQOWL, which was the first RDBMS-based system supporting incremental type inference. In particular, this paper now performs incremental reasoning for updates, which the previous SQOWL approach did not. Moreover, our approach performs a so called transactional type inference (with full ACID properties), since the results of reasoning are available within the same transaction in which the base data of the reasoning is inserted or deleted. As our evaluation will show, our system gives more complete answers and processes queries more efficiently than comparable reasoners.
Y. Liu and P.J. McBrien,
SQOWL2: Transactional Type Inference for OWL 2 DL in an RDBMS.
Proceedings of Description Logics, Pages 779-790, 2013
Abstract: SQOWL2 is a compiler which allows an RDBMS to support sound reasoning of SROIQ(D) description logics, by implementing ontologies expressed in the OWL 2 DL language as a combination of tables and triggers in the RDBMS. The reasoning process is divided into two phases of classification of the T-Box and type inference of the A-Box. SQOWL2 establishes a relational schema based on classification completed using the Pellet reasoner, and performs type inference by using SQL triggers. SQOWL2 supports type inference over all OWL 2 DL constructs, and supports a more conventional relational schemas, rather than naively mapping OWL classes and properties to relational tables with one and two columns. Moreover, SQOWL2 is a transactional reasoning system (with full ACID properties), since the results of reasoning are available within the same transaction as that in which the base data of the reasoning was inserted.
B. Jakobus and P.J. McBrien,
Pig vs Hive: Benchmarking High Level Query Languages
Technical Report, 2013
Abstract: This article presents benchmarking results1 of two benchmarking sets (run on small clusters of 6 and 9 nodes) applied to Hive and Pig running on Hadoop 0.14.1. The first set of results were obtainted by replicating the Apache Pig benchmark published by the Apache Foundation on 11/07/07 (which served as a baseline to compare major Pig Latin releases). The second results were obtained by applying the TPC-H benchmarks.
The two benchmarks showed conflicting results; the first benchmark indicated that Pig outperformed Hive on most operations. However interestingly, TPC-H results provide evidence that Hive is significantly faster than Pig. The article analyzes the two benchmarks, concluding with a set of differences and justification of the results.
L. Al Khuzayem and P.J. McBrien,
Knowledge Transformation using a Hypergraph Data Model.
Proceedings of ICCSW, Pages 1-7, 2012
Abstract: In the Semantic Web, knowledge integration is frequently performed between heterogeneous knowledge bases. Such knowledge integration often requires the schema expressed in one knowledge modelling language be translated into an equivalent schema in another knowledge modelling language. This paper defines how schemas expressed in OWL-DL (the Web Ontology Language using Description Logic) can be translated into equivalent schemas in the Hypergraph Data Model (HDM). The HDM is used in the AutoMed data integration (DI) system. It allows constraints found in data modelling languages to be represented by a small set of primitive constraint operators. By mapping into the AutoMed HDM language, we are then able to further map the OWL-DL schemas into any of the existing modelling languages supported by AutoMed. We show how previously defined transformation rules between relational and HDM schemas, and our newly defined rules between OWL-DL and HDM schemas, can be composed to give a bidirectional mapping between OWL-DL and relational schemas through the use of the both-as-view approach in AutoMed.
P.J. McBrien, N. Rizopoulos, and A.C. Smith,
Type inference methods and performance for data in an RDBMS
Proceedings of SWIM'12
Abstract: In this paper we survey and measure the performance of methods for reasoning using OWL-DL rules over data stored in an RDBMS. OWL-DL Reasoning may be broken down into two processes of classification and type inference. In the context of databases, classification is the process of deriving additional schema constructs from existing schema constructs in a database, while type inference is the process of inferring values for tables/columns from values in other tables/columns. Thus it is the process of type inference that is the focus of this paper, since as data values are inserted into a database, there is the need to use the inserted data to derive new facts.

The contribution of this paper is that we place the existing methods for type inference over relational data into a new general framework, and classify the methods into three different types: Application Based Reasoning uses reasoners outside of the DBMS to perform type inference, View Based Reasoning uses DBMS views to perform type inference, and Trigger Based Reasoning uses DBMS active rules to perform type inference. We discuss the advantages of each of the three methods, and identify a list of properties that each method might be expected to meet. One key property we identify is transactional reasoning, where the result of reasoning is made available within a database transaction, and we show that most reasoners today fail to have this property. We also present the results of experimental analysis of representative implementations of each of the three methods, and use the results of the experiments to justify conclusions as to when each of the methods discussed is best deployed for particular classes of application.
P.J. McBrien, N. Rizopoulos, and A.C. Smith,
SQOWL: Type Inference in an RDBMS (PostScript)
Proceedings of ER10
Abstract: In this paper we describe a method to perform type inference over data stored in an RDBMS, where rules over the data are specified using OWL-DL. Since OWL-DL is an implementation of the Description Logic (DL) called SHOIN(D), we are in effect implementing a method for SHOIN(D) reasoning in relational databases. Reasoning may be broken down into two processes of classification and type inference. Classification may be performed efficiently by a number of existing reasoners, and since classification alters the schema, it need only be performed once for any given relational schema as a preprocessor of the schema before creation of a database schema. However, type inference needs to be performed for each data value added to the database, and hence needs to be more tightly coupled with the database system. We propose a technique to meet this requirement based on the use of triggers, which is the first technique to fully implement SHOIN(D) as part of normal transaction processing.
P.J. McBrien and N. Rizopoulos,
Schema Merging Based on Semantic Mappings, (PostScript)
In Proceedings of BNCOD 2009, pages 192-198

Abstract: In model management, the Merge operator takes as input a pair of schemas, together with a set of mappings between their objects, and returns an integrated schema. In this paper we present a new approach to implementing the Merge operator based on semantic mappings between objects. Our approach improves upon previous work by (1) using formal low-level transformation rules that can be translated into higher-level rules and (2) examining a much wider range of semantic mappings between schema objects. Our precise mappings and rules enable us to automate Merge and provide a sound and complete framework where schemas are merged without any information loss or gain.
P.J. McBrien, N. Rizopoulos and A.C. Smith,
RoKEx: Robust Application-layer Knowledge Exchange, (PostScript)
In Proceedings of SEAS DTC Conference 2009

Abstract Decisions in a dynamic environment may be based on information coming from a number of different knowledge sources described using different knowledge representation languages. This paper describes a common framework in which the data and rules, both static and dynamic, that may exist in disparate knowledge sources may be represented. We focus on two commonly used knowledge representation languages, namely SQL and OWL-DL. We have chosen these as examples because they make different assumptions about the knowledge they hold and we use them to show that our framework can represent knowledge under these different assumptions.
Z. Bellahsene and P.J. McBrien (Editors)
Proceedings of the CAiSE 2008 Doctoral Consortium

P.J. McBrien,
Temporal Constraints in Non-Temporal Data Modelling Languages (PostScript)
Proceedings of ER08
Abstract It is common to find that the definition or common usage of a data modelling language causes there to be restrictions placed on the evolution of data values that are associated with schemas expressed in that modelling language. This paper terms these restrictions temporal constraints, and defines three types of temporal constraint which are argued to be useful modelling concepts, capturing important real-world semantics about objects and their relationships. By reviewing how these temporal constraints are implied by either the definition or usage of UML and the relational modelling languages, this paper will use the temporal constraints to give precise definitions of modelling concepts that to date have been left only vaguely and partially understood. It will also consider the implementation of these constraints in SQL.

A.C. Smith N. Rizopoulos and P.J. McBrien,
AutoMed Model Management (PostScript)
To appear in Proceedings of ER08
Abstract Model Management (MM) is a way of raising the level of abstraction in metadata intensive application areas. The key idea behind Model Management is to develop a set of generic algorithmic operators that work on schemas and mappings between schemas, rather than individual schema elements. In this demonstration we present a new approach to the implementation of MM operators based on schema transformation that provides some important advantages over existing methods.

D.M. Le, A.C. Smith and P.J. McBrien,
Robust Data Exchange for Unreliable P2P Networks (PostScript)
To appear in Proceedings of GREP08, co-located with DEXA08

Abstract The aim of this work is to provide a robust way for peers with heterogeneous data sources to exchange information in an unreliable network. We address this problem in two ways. Firstly, we define a set of application-layer data exchange protocols to facilitate the discovery of, and communication between, peers. Secondly provide a query processing component with a cache-driven query processor that allows nodes on the network to cache queries and their results on demand, and to use the data caches to give partial or complete answers to a query if the original data sources are unavailable.

A.C. Smith and P.J. McBrien,
A Generic Data Level Implementation of ModelGen (PostScript)
In Proceedings of BNCOD08,
Pages 63-74, 2008, Springer-Verlag, LNCS 5071, ISBN-978-3-540-70503-1
Abstract The model management operator ModelGen translates a schema expressed in one modelling language into an equivalent schema expressed in another modelling language, and in addition produces a mapping between those two schemas. This paper presents an implementation of ModelGen which in addition allows for the translation of data instances from the source to the target schema, and vice versa. The translation mechanism is distinctive from others in that it takes a generic approach that can be applied to any modelling language.

A.C. Smith and P.J. McBrien,
AutoModelGen: A Generic Data Level Implementation of ModelGen
In Proceedings of the Forum at the CAiSE'08 Conference,
Pages 65-68
Abstract The model management operator ModelGen translates a schema expressed in one modelling language into an equivalent schema expressed in another modelling language, and in addition produces a mapping between those two schemas. AutoModelGen is a generic data level implementation of ModelGen that meets these desiderata. Our approach is distinctive in that (i) it takes a generic approach that can be applied to any modelling language, and (ii) it does not rely on knowing the modelling language in which the source schema is expressed in.

P.J. McBrien,
Translating Schemas between Data Modelling Languages (Postscript)
Chapter in Information systems engineering: From data analysis to process networks,
Pages 1-15, IGI Publishing, 2008
Abstract Data held in information systems is modelled using a variety of languages, where the choice of language may be decided by functional concerns (such as using a language suited to a particular database system, or using a language with modelling constructs suited to modelling a particular domain) or non-technical concerns (such as following organisation or national standards, or simply reusing a model from some other application).

This chapter focuses on data modelling languages, and the challenges faced in mapping schemas in one data modelling language into another data modelling language. In model management (Berstein 2003), this mapping process is called ModelGen, and a mapping process that restructures schemas within one modeling language is called Mapping. To illustrate the issues faced in implementing ModelGen, consider the ER schema in Figure 1(a), which describes details of students and the departments in which they study. The cardinality constraints in the ER model, which in our version of the ER model use look-here semantics (Song, Evans & Park 1995), state that each student studies in exactly one department, and that each department must have at least one student.

D.M. Le, A.C. Smith and P.J. McBrien,
Inter Model Data Integration in a P2P Environment , (pdf)
In Proceedings of DBISP2P 2007

Abstract The wide range of data sources available today means that the integration of heterogeneous data sources is now a common and im- portant problem. It is even more challenging in a P2P environment where peers often do not know in advance which schemas of other peers will suit their information needs and there is potentially a greater diversity of data modelling languages in use. In this paper, we propose a new ap- proach to P2P inter model data integration which supports multiple data models whilst allowing peers the flexibility of choosing how to integrate their schemas.

D.M. Le, P.J. McBrien and A.C. Smith,
RoDEx: Robust Data Exchange between UAVs and Sensors, (pdf)
In Proceedings of SEAS DTC Conference 2007

Abstract This paper describes the approach of the RoDEx project to the problem of data exchange and data integration in networks of \keywordabbr{unmanned autonomous vehicles}{UAV}. It illustrates the approach by describing how RoDEx may be used to handle the problems associated with reliably collecting data from a network of sensors, which forms a key foundation of the UUV vignette.

D.M. Le, P.J. McBrien and A.C. Smith,
Robust Application-layer data exchange protocols for networks of semi-autonomous vehicles, (pdf)
In Proceedings of SEAS DTC Conference 2006

Abstract Data integration is the process of building a single view over a collection of data sources. Although well established as a research topic, and supported by some commercial tools, current approaches assume that data exchange between data sources occurs in a standard business environment, with relatively stable data sources and communication channels. In the context of unmanned autonomous vehicles (UAV), reliable data exchange presents a number of challenges. Firstly, there may be the requirement to rapidly integrate a new UAV with different information requirements into an existing network. Secondly, the process of data exchange needs to cope with temporary and/or permanent loss of data sources and communication channels. The RoDeX project deals with these challenges, building upon peer-to-peer (P2P) networking techniques, and the \automed\ data integration and exchange toolkit. We briefly describe the functionality of the current RoDeX system, and present an example to show how it could be used in one of the demonstration vignettes.

P.J. McBrien and A. Poulovassilis,
P2P query reformulation over Both-as-View data transformation rules, (pdf)
In Proceedings of DBISP2P 2006
to appear in LNCS
Abstract The both-as-view (BAV) approach to data integration has the advantage of specifying mappings between schemas in a bidirectional manner, so that once a BAV mapping has been established between two schemas, queries may be exchanged in either direction between the schemas. By defining public schemas shared between peers, this allows peers to exchange queries via a public schema without the requirement for any one peer to hold the public schema data.

In this paper we discuss the reformulation of queries over BAV transformation pathways, and demonstrate the use of this reformulation in two modes of query processing. In the first mode, public schemas are shared between peers and queries posed on the public schema can be reformulated into queries over any data sources that have been mapped to the public schema. In the second, queries are posed on the schema of a data source, and are reformulated into queries on another data source via any public schema to which both data sources have been mapped.

A.C.Smith and P. McBrien,
Comparing and Transforming Between Data Models via an Intermediate Hypergraph Data Model (pdf),
Proceedings of DISWeb06,
Pages 307-321, 2006, Presses Universitaires de Namur, ISBN-13 978-2-87037-525-9

Abstract Data exchange between heterogeneous schemas is a difficult problem that becomes more acute if the source and target schemas are from different data models. The data type of the objects to be exchanged can be useful information that should be exploited to help the data exchange process. So far little has been done to take advantage of this in inter model data exchange. Using a common data model has been shown to be effective in data exchange in general. This work aims to show how the common data model approach can be useful specifically in exchanging type information by use of a common type hierarchy.

Z. Bellahsène, C. Lazanitis, P.J. McBrien and N. Rizopoulos,
iXPeer: Implementing layers of abstraction in P2P Schema Mapping using AutoMed (pdf),
Proceedings of IWI2006,
Pages XX-XX, 2006

Abstract The task of model based data integration becomes more complicated when the data sources to be integrated are distributed, heterogeneous, and high in number. One recent solution to the issues of distribution and scale is to perform data integration using peer-to-peer (P2P) networks. Current P2P data integration architectures have mostly been flat, only specifying mappings directly between peers. Some do form the schemas into hierarchies, but none provide any abstraction of the schemas. This paper describes a set of general purpose P2P meta-data and data exchange primitives provided by an extended version of the AutoMed toolkit, and uses the primitives to implement a new architecture called iXPeer. iXPeer deals with integration on several levels of abstraction, where the lower levels define precise mappings between data source schemas, but the higher levels are loser associations based on keywords.

M. Boyd and P. McBrien,
Comparing and Transforming Between Data Models via an Intermediate Hypergraph Data Model (pdf),
Journal on Data Semantics IV,
Pages 69-109, Springer-Verlag, 2005, ISBN-13 978-3-540-31001-3, ISSN 0302-9743

Abstract Data integration is frequently performed between heterogeneous data sources, requiring that not only a schema, but also the data modelling language in which that schema is represented must be transformed between one data source and another.

This paper describes an extension to the hypergraph data model (HDM), used in the AutoMed data integration approach, that allows constraint constructs found in static data modelling languages to be represented by a small set of primitive constraint operators in the HDM. In addition, a set of five equivalence preserving transformation rules are defined that operate over this extended HDM. These transformation rules are shown to allow a bidirectional mapping to be defined between equivalent relational, ER, UML and ORM schemas.

The approach we propose provides a precise framework in which to compare data modelling languages, and precisely identifies what semantics of a particular domain one data model may express that another data model may not express. The approach also forms the platform for further work in automating the process of transforming between different data modelling languages. The use of the both-as-view approach to data integration means that a bidirectional association is produced between schemas in the data modelling language. Hence a further advantage of the approach is that composition of data mappings may be performed such that mapping two schemas to one common schema will produce a bidirectional mapping between the original two data sources.

M. Magnani, N. Rizopoulos, P.J. McBrien and D. Montesi,
Schema Integration based on Uncertain Semantic Mappings (pdf),
In Proceedings of ER05,
LNCS Vol 3716, Pages 31-46, 2005, ISSN 0302-9743, ISBN-10 3-540-29389-2
Abstract Schema integration is the activity of providing a unified representation of multiple data sources. The core problems in schema integration are: schema matching, i.e. the identification of correspondences, or mappings, between schema objects, and schema merging, i.e. the creation of a unified schema based on the identified mappings. Existing schema matching approaches attempt to identify a single mapping between each pair of objects, for which they are 100% certain of its correctness. However, this is impossible in general, thus a human expert always has to validate or modify it. In this paper, we propose a new schema integration approach where the uncertainty in the identified mappings that is inherent in the schema matching process is explicitly represented, and that uncertainty propagates to the schema merging process, and finally it is depicted in the resulting integrated schema.

N. Rizopoulos, M. Magnani, P.J. McBrien and D. Montesi,
Uncertainty in Semantic Schema Integration (pdf),
In Proceedings of BNCOD05,Vol 2,
Pages 13-16, Univ. Sunderland Press, 2005, ISBN 1-873757-55-7
Abstract In this paper we present a new method of semantic schema integration, based on uncertain semantic mappings. The purpose of semantic schema integration is to produce a unified representation of multiple data sources. First, schema matching is performed to identify the semantic mappings between the schema objects. Then, an integrated schema is produced during the schema merging process based on the identified mappings. If all semantic mappings are known, schema merging can be performed (semi-)automatically.

S. Kittivoravitkul and P.J. McBrien,
Integrating Unnormalised Semi-Structured Data Sources (pdf),
In Proceedings of CAiSE05,
Springer Verlag LNCS Vol 3520, Pages 460-474, 2005, ISBN-10 3-540-26095-1, ISBN-13 978-3-540-26095-0
Abstract Semi-structured data sources, such as XML, HTML or CSV files, present special problems when performing data integration. In addition to the hierarchical structure of the semistructured data, the data integration must deal with the redundancy in semi-structured data, where the same fact may be repeated in a data source, but should map into a single fact in a global integrated schema. We term semi-structured data containing such redundancy as being an unnormalised data source, and we define a normal form for semi-structured data that may be used when defining global schemas. We introduce special functions to relate object identifiers used in the global data model to object identifiers in unnormalised data sources, and demonstrate how to use these functions in query processing, update processing and integration of these data sources.

N. Rizopoulos and P.J. McBrien,
A General Approach to the Generation of Conceptual Model Transformations (pdf),
In Proceedings of CAiSE05,
Springer Verlag LNCS Vol 3520, Pages 326-341, 2005, ISBN-10 3-540-26095-1, ISBN-13 978-3-540-26095-0
Abstract In data integration, a Merge operator takes as input a pair of schemas in some conceptual modelling language, together with a set of correspondences between their constructs, and produces as an output a single integrated schema. In this paper we present a new approach to implementing the Merge operator that improves upon previous work by considering a wider range of correspondences between schema constructs and defining a generic and formal framework for the generation of schema transformations. This is used as a basis for deriving transformations over high level models. The approach is demonstrated in this paper to generate transformations for ER and relational models.

M. Boyd, S. Kittivoravitkul, C. Lazanitis, P.J. McBrien and N. Rizopoulos,
AutoMed: A BAV Data Integration System for Heterogeneous Data Sources (pdf),
In Proceedings of CAiSE04,
Springer Verlag LNCS, Vol 3084, Pages 82-97, 2004, ISBN 3-540-22151-4
Abstract This paper describes the AutoMed repository and some associated tools, which provide the first implementation of the both as view (BAV) approach to data integration. Apart from being a highly expressive data integration approach, BAV in additional provides a method to support a wide range of data modelling languages, and describes transformations between those data modelling languages. This paper documents how BAV has been implemented in the AutoMed repository, and how several practical problems in data integration between heterogeneous data sources have been solved. We illustrate the implementation with examples in the relational, ER, and semi-structured data models.

Z. Bellahsene and P. McBrien,
Preface to DIWeb 2004 (pdf),
In Proceedings of DIWeb'04,
Volume 3, CAiSE Workshop Proceedings, Pages 3-4, Riga Technical University, 2004, ISBN 9984-9767-3-4

M. Boyd and P. McBrien,
Towards a Semi-Automated Approach to Intermodel Transformation (pdf),
In Proceedings of EMMSAD'04,
Volume 1, CAiSE Workshop Proceedings, Pages 175-188, Riga Technical University, 2004, ISBN 9984-9767-1-8

Abstract This paper introduces an extension to the hypergraph data model used in the AutoMed data intergration approach that allows constraints common in static data modelling languages to be represented by a small set of primitive constraint operators. A set of equivalence rules are defined for this set of primitive constraint operators, and demonstrated to allow a mapping between relational, ER or UML models to be defined. The approach provides both a precise framework in which to compare data modelling languages, and forms the platform for further work in automating the process of transforming between different data modelling languages.

E. Jasper, N. Tong, P. McBrien, and A. Poulovassilis,
View generation and optimisation in the AutoMed data integration framework (pdf),
In Proceedings of DBIS'04
Scientific Papers of the University of Latvia Vol 972, Pages 13-30, ISSN 1407-2157, ISBN 9984-770-11-7

This paper describes view generation and view optimisation in the AutoMed heterogeneous data integration framework. In AutoMed, schema integration is based on the use of reversible schema transformation sequences. We show how views can be generated from such sequences, for global-as-view (GAV), local-as-view (LAV) and GLAV query processing. We also present techniques for optimising these generated views, firstly by optimising the transformation sequences, and secondly by optimising the view definitions generated from them.

P.J. McBrien and A. Poulovassilis,
Defining Peer-to-Peer Data Integration using Both as View Rules, (pdf)
In Proceedings of DBISP2P 2003 Revised Papers,
Springer Verlag LNCS, Volume 2944, Pages 91-107, 2004, ISBN 3-540-20968-9
Abstract The loose and dynamic association between peers in a peer-to-peer integration has meant that, to date, peer-to-peer systems have been based on exchange of files identified with a very limited set of attributes, and no schema is used to describe the data within those files. This paper extends an existing approach to data integration, called both-as-view, to be an efficient mechanism for defining peer-to-peer integration at the schema level, and demonstrates how the data integration can be used for the exchange of messages and queries between peers.

E. Jasper, N. Tong, P.J. McBrien and A. Poulovassilis,
View Generation and Optimisation in the AutoMed Data Integration Framework (pdf),
In Proceedings of CAiSE03 Forum,
Editors: J. Eder and T. Welzer, Univ. of Maribor Press, Pages 29-32, 2003, ISBN 86-435-0549-8
Abstract In AutoMed, data integration is based on the use of reversible sequences of schema transformations. We discuss how views can be generated from these sequences. We also discuss techniques for optimising the views, firstly by simplifying the transformation sequences, and secondly by optimising the view definitions generated from them.

P.J. McBrien and A. Poulovassilis,
Data Integration by Bi-Directional Schema Transformation Rules (pdf),
In Proceedings of ICDE03,
IEEE, Pages 227-238, 2003, ISBN 0-7803-7665-X
Abstract In this paper we describe a new approach to data integration which subsumes the previous approaches of local as view (LAV) and global as view (GAV). Our method, which we term both as view (BAV), is based on the use of reversible schema transformation sequences. We show how LAV and GAV view definitions can be fully derived from BAV schema transformation sequences, and how BAV transformation sequences may be partially derived from LAV or GAV view definitions. We also show how BAV supports the evolution of both global and local schemas, and we discuss ongoing implementation of the BAV approach within the AutoMed project.

M. Boyd, P.J. McBrien and N. Tong,
The AutoMed Schema Integration Repository (pdf),
In Proceedings of BNCOD02,
Springer Verlag LNCS, Volume 2405, Pages 42-45, 2002, ISSN: 0302-9743, ISBN 3-540-43905-6
Abstract In this paper we describe the first version of the repository of the AutoMed toolkit. This is a Java API, that uses a RDBMS to provide a persistent storage for data modelling language descriptions in the HDM, database schemas, and transformations between those schemas. The repository also provides some of the shared functionality that tools accessing the repository may require.

P.J. McBrien and A. Poulovassilis,
Schema Evolution in Heterogeneous Database Architectures, A Schema Transformation Approach, (pdf)
In Proceedings of CAiSE02,
Springer Verlag LNCS, Volume 2348, Pages 484-499, 2002, ISSN: 0302-9743, ISBN 3-540-43738-X
Abstract This paper presents a new approach to schema evolution, which combines the activities of schema integration and schema evolution into one framework. In previous work we have developed a general framework to support schema transformation and integration in heterogeneous database architectures. Here we show how this framework also readily supports evolution of source schemas, allowing the global schema and the query translation pathways to be easily repaired, as opposed to having to be regenerated, after changes to source schemas.

P.J. McBrien and A. Poulovassilis,
A Semantic Approach to Integrating XML and Structured Data Sources,
In Proceedings of CAiSE01,
Springer Verlag LNCS, Volume 2068, Pages 330-345, 2001, ISSN: 0302-9743, ISBN 3-540-42215-3
Abstract XML is fast becoming the standard for information exchange on the WWW. As such, information expressed in XML will need to be integrated with existing information systems, which are mostly based on structured data models such as relational, object-oriented or object/relational data models. This paper shows how our previous framework for integrating heterogeneous structured data sources can also be used for integrating XML data sources with each other and/or with other structured data sources. Our framework allows constructs from multiple modelling languages to co-exist within the same intermediate schema, and allows automatic translation of data, queries and updates between semantically equivalent or overlapping heterogenous schemas.

P.J. McBrien and A. Poulovassilis,
A Semantic Approach to Integrating XML and Structured Data Sources,
Birkbeck/Imperial Technical Report, 2000
Abstract Longer version of paper published in CAiSE01, which includes details of how to use information in DTDs to further automate the integration process.

P.J. McBrien and A. Poulovassilis,
Distributed Databases,
Chapter 9, Pages 291-328 in Advanced Database Technology and Design Editors M.Piattini and O.Diaz Artech House, 2000, ISBN 0-89006-395-8

Abstract The now widespread use of computers for data-processing in large distributed organisations means that such organisations will often store their data at different sites of a computer network, possibly in a variety of forms, ranging from flat files, to hierarchical or relational databases, through to object-oriented or object-relational databases. The rapid growth of the Internet is causing an even greater explosion in the availability of distributed information sources. Distributed database technology aims to provide uniform access to physically distributed, but logically related, information sources.

Before introducing the main concepts of distributed databases, we first review some necessary concepts and terminology from centralised databases.

P.J. McBrien and A. Poulovassilis,
Automatic migration and wrapping of database applications - a schema transformation approach,
In Proceedings of ER99
Springer Verlag LNCS

Abstract Integration of heterogeneous databases requires that semantic differences between schemas are resolved by a process of schema transformation. Previously, we have developed a general framework to support the schema transformation process, consisting of a hypergraph-based common data model and a set of primitive schema transformations defined for this model. Higher-level common data models and primitive schema transformations for them can be defined in terms of this lower-level model.

In this paper, we show that a key feature of our framework is that both primitive and composite schema transformations are automatically reversible. We show how these transformations can be used to automatically migrate or wrap data, queries and updates between semantically equivalent schemas. We also show how to handle transformations between non-equivalent but overlapping schemas. We describe a prototype schema integration tool that supports this functionality. Finally, we briefly discuss how our approach can be extended to more sophisticated application logic such as constraints, deductive rules, and active rules.

P.J. McBrien and A. Poulovassilis,
A Uniform Approach to Inter-Model Transformations,
In Advanced Information Systems Engineering, 11th International Conference CAiSE'99
Springer Verlag LNCS 1626 pages 333-348

Abstract Whilst it is a common task in systems integration to have to transform between different semantic data models, such inter-model transformations are often specified in an ad hoc manner. Further, they are usually based on transforming all data into one common data model, which may not contain suitable data constructs to model directly all aspects of the data models being integrated. Our approach is to define each of these data models in terms of a lower-level hypergraph-based data model. We show how such definitions can be used to automatically derive schema transformation operators for the higher-level data models. We also show how these higher-level transformations can be used to perform inter-model transformations, and to define inter-model links.

P.J. McBrien and A. Poulovassilis,
A Formalisation of Semantic Schema Integration,
Information Systems 23(5) 307-334, 1998

Abstract Several methodologies for the semantic integration of databases have been proposed in the literature. These often use a variant of the Entity-Relationship (ER) model as the common data model. To aid the schema conforming, merging and restructuring phases of the semantic integration process, various transformations have been defined that map between ER representations which are in some sense equivalent. Our work aims to formalise the notion of schema equivalence and to provide a formal underpinning for the schema integration process.
We show how transformational, mapping and behavioural schema equivalence are all variants of a more general definition of schema equivalence. We propose a semantically sound set of primitive transformations and show how they can be used to express the transformations commonly used during the schema integration process and to define new transformations. We differentiate between transformations which apply to any instance of a schema and those which require knowledge-based reasoning since they apply only for certain instances; this distinction could serve to enhance the performance of transformation tools since it identifies which transformations must be verified by inspection of the schema extension; it also serves to identify when intelligent reasoning is required during the schema integration process.

A.Hunter and P.J.McBrien,
Default Databases: Extending the Approach of Deductive Databases Using Default Logic,
Data and Knowledge Engineering 26(2) 135-160, 1998

Abstract Extending the relational data model using classical logic to give deductive databases has some significant benefits. In particular, classical logic rules offer an efficient representation: a universally quantified rule can represent many facts. However, classical logic does not support the representation of general rules, or synonymously defaults. General rules are rules that are usually valid, but occasionally have exceptions. They are useful in a database since they can allow for the derivation of relations on the basis of ncomplete information. The need for incorporating general rules into a database is reinforced when considering that participants in the development process may naturally describe rules for a deductive database in the form of general rules. In order to meet this need for using general rules in databases, we extend the notion of deductive databases. In particular, we use default logic, an extension of classical logic that has been developed for representing and reasoning with default knowledge, to formalize the use of general rules in deductive databases, to give what we call default databases. In this paper, we provide an overview of default logic, motivate its applicability to capturing general rules in databases, and then develop a framework for default databases. In particular, we propose a methodology for developing default databases that is based on entity-relationship modelling.

A. Poulovassilis and P.J.McBrien,
A General Formal Framework for Schema Transformation (pdf),
Data and Knowledge Engineering 28(1) 47-71, 1998.

Abstract Several methodologies for integrating database schemas have been proposed in the literature, using various common data models (CDMs). As part of these methodologies transformations have been defined that map between schemas which are in some sense equivalent. This paper describes a general framework for formally underpinning the schema transformation process. Our formalism clearly identifies which transformations apply for any instance of the schema and which only for certain instances. We illustrate the applicability of the framework by showing how to define a set of primitive transformations for an extended ER model and by defining some of the common schema transformations as sequences of these primitive transformations. The same approach could be used to formally define transformations on other CDMs.

P.J.McBrien:
Design of Distributed Applications based on the OSI Model.
CAiSE 1997, Springer-Verlag LNCS

Abstract This paper demonstrates how the fundamental concepts of a layered design found in the OSI model may be realised in the dataflow diagram (DFD) type approach to application development. The work describes a coupling between DFD and the OSI model, and describes a methodology using that coupling to design implementations of applications which conform to the OSI model. The work serves to complement existing approaches to the conceptual modelling of distributed information systems.

M. Finger and P.J. McBrien,
Concurrency Control for Perceivedly Instantaneous Transactions in Valid-Time Databases,
In proceedings of the 4th Workshop on Temporal Representation and Reasoning (TIME'97), IEEE, 1997.

Abstract Although temporal databases have received considerable attention as a topic for research, little work in the area has paid attention to the concurrency control mechanisms that might be employed in temporal databases. This paper describes how the notion of the current time - also called now - in valid-time databases can cause standard serialisation theory to give what are at least unintuitive results, if not actually incorrect results. The paper then describes two modifications to standard serialisation theory which correct the behaviour to give what we term perceivably instantaneous transactions: transactions where serialising T1 and T2 as [T1,T2] always implies that the current time seen by T1 is less than or equal to the current time seen by T2.

P.J. McBrien and A. Poulovassilis,
A Formal Framework for ER Schema Transformation, In Proceedings of ER'1997, Springer-Verlag LNCS

Abstract Several methodologies for semantic schema integration have been proposed in the literature, often using some variant of the ER model as the common data model. As part of these methodologies, various transformations have been defined that map between ER schemas which are in some sense equivalent. This paper gives a unifying formalisation of the ER schema transformation process and shows how some common schema transformations can be expressed within this single framework. Our formalism clearly identifies which transformations apply for any instance of the schema and which only for certain instances.

M. Finger and P.J. McBrien,
On the Semantics of `Current-Time' In Temporal Databases,
In XI Brazilian Symposium on Databases (SBBD'96), pages 324-337, October, 1996

Abstract The notion of the current-time is frequently found in work on temporal databases, and is usually accessed via a special interpreted variable called now. Whilst this variable has intuitive and simple semantics with respect to the processing of queries in temporal databases, the semantics of the variable inside transactions has not been studied. With the growing maturity of temporal databases, and the need to construct production quality temporal DBMS to demonstrate the usefulness of the technology, it is necessary that a complete study be made of transaction processing in temporal databases. This work aims to meet this objective by (1) detailing the alternatives that exist in the evaluation of now, (2) studying the problems of using now in current transaction processing systems, and (3) provide a formal framework for the processing of temporal transactions, and give rules in the framework for avoiding problems in evaluating now.

H. Barringer, D. Brough, M. Fisher, A. Hunter, R. Owens, D. Gabbay, G. Gough, I. Hodkinson, P.J. McBrien, and M. Reynolds,
Languages, Meta-Languages and MetateM, A Discussion Paper,
Journal of the IGPL, 4(2) ISSN 0945-9103 March, 1996.

P.J. McBrien and A.H. Seltveit, Coupling Process Models and Business Rules, In Proceedings of the IFIP WG8.1 Working Conference: Information System Development for Decentralised Organisations, Trondheim, Norway, Chapman-Hall, 1995.

Abstract Two techniques commonly used in the conceptual modelling of information systems are process modelling and business rule modelling. In this paper we propose a technique for associating certain types business rules with structures in a process modelling language. This coupling of the two models allows them to be used as complimentary languages in conceptual modelling; the process language being suitable when modelling how activities interact, whilst the business rule model is suitable when we need to make precise statements about a certain activity. The ability to model certain aspects of business rules within the process model is particularly important in distributed organisations, where the process model may be used as a means of communication between different parts of the organisation. The coupling also serves (1) to make apparent what effect re-engineering of one model has on the structure of the other model, and (2) indicate how the process model may be used to drive the creation of business rules.

P.J.McBrien, A.H. Seltveit, and B. Wangler: Rule Based Specification of Information Systems. Proceedings of CISMOD 1994, Madras, India: pages 212-228

Abstract This paper describes the TEMPORA approach to capturing the functions and policies of a business as business rules, and the way in which those rules may be used to build a working information system. It discusses the nature of such rules, and presents some techniques for rule elicitation.

P.J. McBrien,
Principles of Implementing Historical Databases in RDBMS.
Proceedings of BNCOD11, Springer-Verlag LNCS, pages 220-237, 1993

Abstract The issue of query languages for historical databases has received considerable interest in the database literature over the past decade. Recently temporal relational algebras (TRA) have been described which provide a theoretical foundation for these languages in the same manner that the relational algebra provides for the SQL language. In this paper the issue of algorithms for the querying and updating of information for one such temporal algebra based on US Logic is discussed, in the specific context of implementing such algorithms on conventional database management systems (DBMS) based on the relational algebra. In so doing, we make apparent the extensions needed to make an RDBMS support any historical database query language with the expressive power of the temporal relational algebra.

P.J. McBrien, Implementing Logical Variables and Disjunctions in Graph Rewrite Systems, In Term Graph Rewriting: Theory and Practice, Editors M.R. Sleep, M.J. Plasmeijer, and M.C.J.D. van Eekelen, Chapter 23, pages 333-346, Wiley, 1993.

Abstract Graph Rewriting Systems (GRS) have been widely studied and used as the implementation vehicle for functional programming languages, some example implementations of GRS being DACTL and the G-Machine. To date, the application of graph rewriting to the implementation of logic programming has met with somewhat less success, in part due to a concentration on the Prolog language which already has a successful implementation vehicle in the Warren Abstract Machine (usually referred to as the WAM). In this chapter the issues relating to the use of graph rewriting for implementing logic programming languages in the wider sense will be discussed.

P.J. McBrien, A.H. Seltveit and B.Wangler:
An Entity-Relationship Model Extended to Describe Historical Information.
CISMOD 1992, Bangalore, India: pages 244-260

Abstract The entity-relationship (ER) model has proved a successful conceptual capture and modelling tool for the relational data model. Much effort has recently been made to extend the relational data model to describe historical information, but as yet little corresponding development in ER modelling has been made. We describe in detail the various temporal behaviours that entities and relationships may exhibit, and apply the results to enhance a binary ER model with temporal semantics. The resultant ERT may be used to fully model historical relational database schemata.

P.J. McBrien,
Implementing Logic Languages by Graph Rewriting,
PhD Thesis, Imperial College, 1992.

Abstract The use of graph rewriting as a computational model for computer programming languages has received considerable attention in the scientific literature over the past decade. Whilst the implementation of functional programming languages by graph rewriting is simple and intuitive, the implementation of logic programming languages is less direct and consequently has been more limited in practice.

After describing some of the problems associated with a `direct' approach to implementing logic programming by graph rewriting, and comparing the situation to that found when using term rewriting, this thesis sets out to address the problems by proposing two innovations. The first is a modified form of graph rewriting, which supports some features of term rewriting, and directly supports the notion of the logical variable and backtracking on variables. The second is a compiler target language and corresponding abstract machine, which efficiently implements such modified graph rewriting on conventional Von Neumann computer architectures. A proof that the abstract machine correctly implements the compiler target language is given.

The approach is illustrated by demonstrating how it may be used to implement both Prolog and a more novel logic programming language called the PLL. The graph rewriting language and its associated abstract machine have been the subject of an implementation in the C language, and thus the results described within this thesis have been realized in practice. Issues relating to the use of the techniques described on a parallel computer architecture are also briefly dealt with, but no practical work has been conducted in this direction.

P.J. McBrien, M. Niézette, D.Pantazis, A.H. Seltveit, U.Sundin, B. Theodoulidis, G. Tziallas and R. Wohed:
A Rule Language to Capture and Model Business Policy Specifications.
CAiSE 1991: Spinger-Verlag LNCS 498, pages 307-318, 1991.

Abstract The TEMPORA paradigm for the development of large data intensive, transaction oriented information systems explicitly recognises the role of organisational policy within an information system, and visibly maintains this policy throughout the software development process, from requirements specifications through to an executable implementation.

This paper introduces the External Rule Language (ERL) of the TEMPORA conceptual modelling formalism, and describes how it is used to captured and model business organisational policy. The syntax and semantics of the language are presented, together with a number of examples drawn from a realistic case study.

J. Krogstie, P.J. McBrien, R.P. Owens and A.H. Seltveit: Information Systems Development Using a Combination of Process and Rule Based Approaches. CAiSE 1991: Spinger-Verlag LNCS 498, pages 319-335, 1991

P. Loucopoulos, P.J. McBrien, F. Schumacker, B. Theodoulidis, V. Kopanas and B. Wangler, TEMPORA - Integrating Database Technology, Rule-based Systems and Temporal Reasoning for Effective Software, Journal of Information Systems 1(2), pages 388-411, 1991

D.M. Gabbay and P.J. McBrien,
Temporal Logic & Historical Databases.
Proceedings of the 17th Conference on VLDB, Barcelona, pages 423-430, 1991

Abstract We review attempts at defining a general extension to the relational algebra to include temporal semantics, and define two temporal operators to achieve a temporal relational algebra with a close correspondence to temporal logic using since and until (US Logic). We then demonstrate how this temporal relational algebra (TRA) may to a limited extent be encoded in standard relational algebra, and in turn show how an extended temporal SQL (TSQL) may be encoded in standard SQL.

P.J. McBrien, Implementing Logic Languages by Graph Rewriting, In Logic Programming: Expanding the Horizons, Editors A. Dodd, R. Owens and S. Torrance, Intellect, 1991, pages 164-188

P. Loucopoulos, P.J. McBrien, U. Persson, F. Schumacker and P. Vasey, TEMPORA - Integrating Database Technology, Rule-based Systems and Temporal Reasoning for Effective Software, ESPRIT'90 Conference Proceedings, Kluwer Academic Publishers, pages 388-411, 1990

P.J. McBrien, R.P. Owens, D.M. Gabbay, M. Niezette and P. Wolper, TEMPORA: A Temporal Database Transaction System, IN IEE Colloquium on Temporal Reasoning, January, 1990.

Selected Publications By Students I Have Supervised

A. Dey,
Data Integration System based on both GAV and LAV query processing approaches (pdf),
MSc Thesis, Dept of Computing, Imperial College, 2004
Abstract Introduces the first implementation of LAV query processing in a BAV system, and briefly discusses how the existing GAV query processing in AutoMed may be combined with the new LAV query processing implemented as part of the work of this MSc thesis.

N. Rizopoulos
Automatic discovery of semantic relationships between schema elements
In Proc. of 6th ICEIS 2004
Abstract The identification of semantic relationships between schema elements, or schema matching, is the initial step in the integration of data sources. Existing approaches in automatic schema matching have mainly been concerned with discovering equivalence relationships between elements. In this paper, we present an approach to automatically discover richer and more expressive semantic relationships based on a bidirectional comparison of the elements data and metadata. The experiments that we have performed on real-world data sources from several domains show promising results, considering the fact that we do not rely on any user or external knowledge.

N. Tong,
Database Schema Transformation Optimisation Techniques for the AutoMed System (pdf),
In Proceedings of BNCOD20 ,Springer Verlag LNCS, Volume 2712, Pages 157--171, 2003, ISSN: 0302-9743, ISBN 3-540-40536-4

Abstract AutoMed is a database integration system that is designed to support the integration of schemas expressed in a variety of high-level conceptual modelling languages. It is based on the idea of expressing transformations of schemas as a sequence of primitive transformation steps, each of which is a bi-directional mapping between schemas. To become an efficient schema integration system in practice, where the number and size of schemas involved in the integration may be very large, the amount of time spent on the evaluation of transformations must be reduced to a minimal level. It is also important that the integrity of a set of transformations is maintained during the process of transformation optimisation. This paper discusses a new representation of schema transformations which facilitates the verification of the well-formedness of transformation sequences, and the optimisation of transformation sequences.