# 2014 - Technical Reports

Number Report Title 2014/1 Memory-Aware Sizing for In-Memory Databases

Karsten Molka, Giuliano Casale, Thomas Molka, and Laura Moore, 10pp2014/2 Control in the Pi-Calculus

Kohei Honda, Queen Mary, Nobuko Yoshida, Imperial College London, Martin Berger, University of Sussex, 54pp2014/3 Timed Multiparty Session Types*

Laura Bocchi, Weizhen Yang and Nobuko Yoshida, 52pp2014/4 Synthesis of Graphical Choreographies

Julien Lange, Nobuko Yoshida and Emilio Tuosto, 38pp2014/5 Multiparty Session Nets

Luca Fossati, Raymond Hu and Nobuko Yoshida, 50pp2014/6 Using an holistic method based on prior information to represent global and local variations on face images

Carlos E Thomaz, Vagner do Amaral Department of Electrical Engineering, FEI Sao Bernardo do Campo, Sao Paulo, Brazil Gilson A. Giraldi National Laboratory for Scientific Computing Petr'polis, Rio de Janeiro, Brazil Duncan Gillies Department of Computing, Imperial College., 13pp2014/7 TaDA: A Logic for Time and Data Abstraction

Pedro da Rocha Pinto, Thomas Dinsdale-Young and Philippa Gardner, 44pp2014/8 Improving classification accuracy of response in leukaemia treatment using feature selection over pathway segmentation

Zena M. Hira, Duncan F. Gillies and Edward Curry , 10pp2014/9 Compositional Reliability Analysis using Probabilistic Component Automata

Pedro Rodrigues, Emil Lupu and Jeff Kramer, 25pp2014/10 Quantitative threat analysis via a logical service

Michael Huth and Jim Huan-Pu Kuo, 26pp2014/11 Efficient Partial-Pairs SimRank Search on Large Graphs

Weiren Yu and Julie A. McCann, 14pp2014/12 Dynamic deadlock verification for general barrier synchronisation

Tiago Cogumbreiro, Raymond Hu, Francisco Martins, Nobuko Yoshida, 13pp

## Memory-Aware Sizing for In-Memory Databases

Karsten Molka, Giuliano Casale, Thomas Molka, and Laura Moore, 10ppIn-memory database systems are among the technological drivers of big data processing. In this paper we apply analytical modeling to enable efficient sizing of in-memory databases. We present novel response time approximations under online analytical processing workloads to model thread-level forkjoin and per-class memory occupation.We combine these approximations with a non-linear optimization program to minimize memory swapping in in-memory database clusters. We compare our approach with state-of-the-art response time approximations and trace-driven simulation using real data from an SAP HANA in-memory system and show that our optimization model is significantly more accurate than existing approaches at similar computational costs.

## Control in the Pi-Calculus

Kohei Honda, Queen Mary, Nobuko Yoshida, Imperial College London, Martin Berger, University of Sussex, 54ppThis paper presents a type-preserving translation from the call-by-value lambda-mu-calculus (lambda-mu-v-calculus) into a typed pi-calculus, and shows full abstraction up to maximally consistent observational congruences in both calculi. The lambda-mu-calculus has a particularly simple representation as typed mobile processes where a unique stateless replicated input is associated to each name. The corresponding pi-calculus is a proper subset of the linear pi-calculus, the latter being able to embed the simply-typed lambda-calculus fully abstractly. Strong normalisability of the lambda-mu-v-calculus is an immediate consequence of this correspondence and the strong normalisability of the linear pi-calculus, using the standard argument based on simulation between the lambda-mu-v-calculus and its translation. Full abstraction, our main result, is proved via an inverse transformation from the typed pi-terms which inhabit the encoded lambda-mu-v-types into the lambda-mu-v-calculus (the so-called definability argument), using proof techniques from games semantics and process calculi. A tight operational correspondence assisted by the definability result opens a possibility to use typed pi-calculi as a tool to investigate and analyse behaviours of various control operators and associated calculi in a uniform setting, possibly integrated with other language primitives and operational structures.

## Timed Multiparty Session Types*

Laura Bocchi, Weizhen Yang and Nobuko Yoshida, 52ppWe propose a typing theory, based on multiparty session types, for modular verification of real-time choreographic interactions. To model real-time implementations,we introduce a simple calculus with delays and a decidable static proof system.The proof system with time constraints ensures type safety and time-error freedom, namely processes respect the prescribed timing and causalities between interactions. A decidable condition, enforceable on timed global types, guarantees global time-progress for validated processes with delays, and gives a sound and complete characterisation of a new class of CTAs with general topologies that enjoys global progress and liveness.

## Synthesis of Graphical Choreographies

Julien Lange, Nobuko Yoshida and Emilio Tuosto, 38ppGraphical choreographies, or global graphs, are general multiparty session specifications featuring expressive constructs such as forking, merging, and joining for representing application-level protocols. Global graphs can be directly translated into modelling notations such as BPMN and UML. This paper presents an algorithm whereby a global graph can be synthesised from asynchronous buffered behaviours represented by communicating finite state machines (CFSMs). Our results include: a sound and complete characterisation of a subset of safe CFSMs from which global graphs can be synthesised; a synthesis algorithm to translate CFSMs to global graphs; a time complexity analysis; and an implementation of our theory, as well as an experimental evaluation.

## Multiparty Session Nets

Luca Fossati, Raymond Hu and Nobuko Yoshida, 50ppThis paper introduces global session nets, an integration of multiparty session types (MPST) and Petri nets, for role-based choreographic specifications to verify distributed multiparty systems. The graphical representation of session nets enables more liberal combinations of branch, merge, fork and join patterns than the standard syntactic MPST. We use session net token dynamics to verify a flexible conformance between the graphical global net and syntactic endpoint types, and apply the conformance to ensure type-safety and progress of endpoint processes with channel mobility. We have implemented Java APIs for validating global session graph well-formedness and endpoint type conformance.

## Using an holistic method based on prior information to represent global and local variations on face images

Carlos E Thomaz, Vagner do Amaral Department of Electrical Engineering, FEI Sao Bernardo do Campo, Sao Paulo, Brazil Gilson A. Giraldi National Laboratory for Scientific Computing Petr'polis, Rio de Janeiro, Brazil Duncan Gillies Department of Computing, Imperial College., 13ppFaces are familiar objects that can be easily perceived and recognized by ourselves. However, the computational modeling of such apparently natural human ability remains challenging. Recent studies in the literature have suggested that face processing is a cognition task composed of configural (or global) and featural (or local) sources of information, but with controversial arguments about the combination of these two types of information. In this work, we describe an holistic method that combines variance used in Principal Component Analysis (PCA) with some prior knowledge about the underlying visual perception task, including systematically the global and local information in the common multivariate representation of face images. We have showed that, with prior information, important local variations represented by principal components with small eigenvalues may not be discarded augmenting the classification accuracy of the first orthogonal basis vectors. Most interestingly, PCA with prior knowledge provides a specialized feature selection procedure, where the mapping of high-dimensional data into a lower-dimensional space has been able to handle local variations and capture not only the entire facial appearance but also some sample group facial features.

## TaDA: A Logic for Time and Data Abstraction

Pedro da Rocha Pinto, Thomas Dinsdale-Young and Philippa Gardner, 44ppTo avoid data races, concurrent operations should either be at distinct times or on distinct data. Atomicity is the abstraction that an operation takes effect at a single, discrete instant in time, with linearisability being a well known correctness condition which asserts that concurrent operations appear to behave atomically. Disjointness is the abstraction that operations act on distinct data resource, with concurrent separation logics enabling reasoning about threads that appear to operate independently on disjoint resources. We present TaDA, a program logic that combines the benefits of abstract atomicity and abstract disjointness. Our key contribution is the introduction of atomic triples, which offer an expressive approach to specifying program modules. By building up examples, we show that TaDA supports elegant modular reasoning in a way that was not previously possible.

## Improving classification accuracy of response in leukaemia treatment using feature selection over pathway segmentation

Zena M. Hira, Duncan F. Gillies and Edward Curry , 10ppMotivation: Many people die every year from leukaemia. Some of them respond to treatment, and some of them not. This study investigates whether there is any relationship between response to treatment and features drawn from the measured methylation profiles of a set of patients. Such features could potentially be used to predict the outcome of a putative treatment regime. Results: Using AdaBoost with decision trees as weak classifiers, we managed to identify two pathways that affect classification of response and progression in blood cancer with 0.988 accuracy. We also identified a gene whose presence or absence from the dataset can drop classification accuracy from 0.988 to random. Conclusion: We identified one gene that with 99% accuracy can predict response to treatment. We were also able to identify a list of genes from the same dataset that can predict response with 0.94% accuracy.

## Compositional Reliability Analysis using Probabilistic Component Automata

Pedro Rodrigues, Emil Lupu and Jeff Kramer, 25ppCompositionality is a key property in the development and analysis of component-based systems. In non-probabilistic formalisms such as Labelled Transition Systems (LTS) the functional behaviour of a system can be readily constructed from the behaviours of its parts. However, this is not true for probabilistic extensions of LTS, which are necessary to analyse non-functional properties such as reliability. We propose Probabilistic Component Automata (PCA) as a probabilistic extension to Interface Automata to automatically construct a system model by composing models of its sub-components. In particular, we focus on modelling failure scenarios, failure handling and failure propagation. Additionally, we propose a novel algorithm based on Compositional Reachability Analysis to mitigate the well-known state-explosion problem associated with composable models. Both Probabilistic Component Automata and the reduction algorithm have been implemented in the LTSA tool.

## Quantitative threat analysis via a logical service

Michael Huth and Jim Huan-Pu Kuo, 26ppIt is increasingly important to analyze system security quantitatively using concepts such as trust, reputation, cost, and risk. This requires a thorough understanding of how such concepts should interact so that we can validate the assessment of threats, the choice of adopted risk management, etc.. To this end, we propose a declarative language Peal+ in which the interaction of such concepts can be rigorously described and analysed. Peal+ has been implemented in PEALT using the SMT solver Z3 as analysis back-end. PEALT's code generators target complex back-ends and evolve with optimisations or new back-ends. Thus we can neither trust the tool chain nor feasibly prove correctness of all involved artefacts. We eliminate the need to trust that tool chain by independently certifying scenarios found by back-ends in a manner agnostic of code generation and choice of back-end. This scenario validation is compositional, courtesy of Kleene's 3-valued logic and potential refinement of scenarios. We prove the correctness of this validation, discuss how PEALT presents scenarios to further users' understanding, and demonstrate the utility of this approach by showing how it can express attack-countermeasure trees so that the interaction of attack success probability, attack cost, and attack impact can be analysed.

## Efficient Partial-Pairs SimRank Search on Large Graphs

Weiren Yu and Julie A. McCann, 14ppThe assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that "two nodes are similar if their in-neighbors are similar". While most existing work retrieving SimRank only considers all-pairs SimRank s(*,*) and single-source SimRank s(*,j) (scores between every node and query j), there are appealing applications for partial-pairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partial-pairs SimRank assessment aims to retrieve only {s(a,b)} for all a in A, and b in B. However, the best-known solution is not self-contained since it hinges on the premise that the SimRank scores with node-pairs in an h-go cover set must be given beforehand.

This paper focuses on efficient assessment of partial-pairs SimRank in a self-contained manner. (1) We devise a novel "seed germination" model that computes partial-pairs SimRank in O(k|E|min{|A|,|B|}) time and O(|E|+k|V|) memory for k iterations on a graph of |V| nodes and |E| edges. (2) We further eliminate unnecessary edge access to improve the time of partial-pairs SimRank to O(m*min{|A|,|B|}), where m<=min{k|E|, ∆^{2k}}, and ∆ is the maximum degree. (3) We show that our artial-pairs SimRank model also can handle the computations of all-pairs and single-source SimRanks, as well as partial-pairs SimRank* (a related notion of SimRank). (4) We empirically verify that our algorithms are (a) 38x faster than the best-known competitors, and (b) memory-efficient, allowing scores to be assessed accurately on graphs with tens of millions of links.

## Dynamic deadlock verification for general barrier synchronisation

Tiago Cogumbreiro, Raymond Hu, Francisco Martins, Nobuko Yoshida, 13ppWe present Armus, a dynamic verification tool for deadlock detection and avoidance specialised in barrier synchronisation. Barriers are used to coordinate the execution of groups of tasks, and serve as a building block of parallel computing. Our tool verifies more barrier synchronisation patterns than current state-of-the-art. To improve the scalability of verification, we introduce a novel event-based representation of concurrency constraints, and a graph-based technique for deadlock analysis. The implementation is distributed and fault-tolerant, and can verify X10 and Java programs.

To formalise the notion of barrier deadlock, we introduce a core language expressive enough to represent the three most widespread barrier synchronisation patterns: group, split-phase, and dynamic membership. We propose a graph analysis technique that selects from two alternative graph representations: the Wait-For Graph, that favours programs with more tasks than barriers; and the State Graph, optimised for programs with more barriers than tasks. We prove that finding a deadlock in either representation is equivalent, and that the verification algorithm is sound and complete with respect to the notion of deadlock in our core language.

Armus is evaluated with three benchmark suites in local and distributed scenarios. The benchmarks show that graph analysis with automatic graph-representation selection can record a 7-fold execution increase versus the traditional fixed graph representation. The performance measurements for distributed deadlock detection between 64 processes show negligible overheads.