Alexander L. Wolf: Student Theses and Dissertations Available on the Web

Student Theses and Dissertations Available on the Web

This page contains abstracts of selected theses and dissertations published by students supervised or co-supervised by Alexander L. Wolf. The abstracts are listed in reverse order of publication date.

Full text of the theses and dissertations is available in PDF or compressed Postscript. Retrieve a publication by clicking on the format of choice available. Clicking also indicates your unconditional agreement to abide by all applicable copyright agreements.

Compression of the Postscript files was performed by gzip using a format interpretable by most archiving tools, including WinZip.

This page was last updated on 30 June 2014.

Student Index

Ph.D.

Naveed Arshad, Ph.D., May 2006. A Planning-Based Approach to Failure Recovery in Distributed Systems
Antonio Carzaniga, Ph.D., December 1998. Architectures for an Event Notification Service Scalable to Wide-Area Networks
Jonathan E. Cook, Ph.D., December 1996. Process Discovery and Validation through Event Data Analysis
Victor Faion, Ph.D., October 2013. Congestion Avoidance in Overlay Networks through Multipath Routing
Richard S. Hall, Ph.D., May 1999. Agent-Based Software Configuration and Deployment
André van der Hoek, Ph.D., May 2000. A Reusable, Distributed Repository for Configuration Management Policy Programming
Thorna O. Humphries, Ph.D., August 2000. An Infrastructure to Generate Experimental Workloads for Persistent Object System Performance Evaluation
Alexander Lamaison, Ph.D., September 2012. Inferring Useful Static Types for Duck Typed Languages
Petr Novotny, Ph.D., January 2014. Fault Localization in Service-Based Systems Hosted in Mobile Ad Hoc Networks
Matthew J. Rutherford, Ph.D., August 2006. Adequate System-Level Testing of Distributed Systems
Nathan D. Ryan, Ph.D., December 2004. Using Event-Based Translation to Support Dynamic Protocol Evolution
W. David Reese, Ph.D., May 2001. An Investigation of Techniques for Integrating Web-Accessible Data
Judy A. Stafford, Ph.D., August 2000. A Formal, Language-Independent, and Compositional Approach to Control Dependence Analysis
C. Reid Turner, Ph.D., May 1999. Feature Engineering of Software Systems
Yanyan Wang, Ph.D., August 2006. Automating Experimentation with Distributed Systems Using Generative Techniques

M.S. and Laurea

Naveed Arshad, M.S., May 2003. Dynamic Reconfiguration of Software Systems Using Temporal Planning
Mauro Caporuscio, Laurea, March 2002. Mobility Support in the Siena Publish/Subscribe Middleware
Christopher R. Corliss, M.S., May 2002. Using the Structure of XML to Generate Notifications and Subscriptions in Siena
John C. Doppke, M.S., August 1996. Software Process Modeling and Execution within Virtual Environments
Cyrus P. Hall, M.S., December 2004. A Content-Based Networking Protocol for Sensor Networks
Michael W. Hollis, M.S., August 1998. Adding Licensing and Access Control to an Internet Accessible Software Release Management Tool
Matthew J. Rutherford, M.S., December 2001. EJB-ARK: Enterprise JavaBean Automatic Reconfiguration frameworK
Robert A. Smith, M.S., December 1999. Analysis and Design for a Next Generation Software Release Management System

Fault Localization in Service-Based Systems Hosted in Mobile Ad Hoc Networks

Petr Novotny, Ph.D., Department of Computing, Imperial College London, January 2014.

Fault localization in general refers to a technique for identifying the likely root causes of failures observed in systems formed from components. Fault localization in systems deployed on mobile ad hoc networks (MANETs) is a particularly challenging task because those systems are subject to a wider variety and higher incidence of faults than those deployed in fixed networks, the resources available to track fault symptoms are severely limited, and many of the sources of faults in MANETs are by their nature transient.

We present a suite of three methods, each responsible for part of the overall task of localizing the faults occurring in service-based systems hosted on MANETs. First, we describe a dependence discovery method, designed specifically for this environment, yielding dynamic snapshots of dependence relationships discovered through decentralized observations of service interactions. Next, we present a method for localizing the faults occurring in service-based systems hosted on MANETs. We employ both Bayesian and timing-based reasoning techniques to analyze the dependence data produced by the dependence discovery method in the context of a specific fault propagation model, deriving a ranked list of candidate fault locations. In the third method, we present an epidemic protocol designed for transferring the dependence and symptom data between nodes of MANET networks with low connectivity. The protocol creates network wide synchronization overlay and transfers the data over intermediate nodes in periodic synchronization cycles.

We introduce a new tool for simulation of service-based systems hosted on MANETs and use the tool for evaluation of several operational aspects of the methods. Next, we present implementation of the methods in Java EE and use emulation environment to evaluate the methods. We present the results of an extensive set of experiments exploring a wide range of operational conditions to evaluate the accuracy and performance of our methods.

Read the abstracts of some related publications: [ 1, 2 ].

Congestion Avoidance in Overlay Networks through Multipath Routing

Victor Faion, Ph.D., Department of Computing, Imperial College London, October 2013.

Overlay networks relying on traditional multicast routing approaches use only a single path between a sender and a receiver. This path is selected based on latency, with the goal of achieving fast delivery. Content is routed through links with low latency, ignoring slower links of the network which remain unused. With the increasing size of content on the Internet, this leads to congestion, messages are dropped and have to be retransmitted.

A multicast multipath congestion-avoidance routing scheme which uses multiple bottleneck-disjoint paths between senders and receivers was developed, as was a linear programming model of the network to distribute messages intelligently across these paths according to two goals: minimum network usage and load-balancing. The former aims to use as few links as possible to perform routing, while the latter spreads messages across as many links as possible, evenly distributing the traffic. Another technique, called message splitting, was also used. This allows nodes to send a single copy of a message with multiple receivers, which will then be duplicated by a node closer to the receivers and sent along separate paths only when required.

The model considers all of the messages in the network and is a global optimisation. Nevertheless, it can be solved quickly for large networks and workloads, with the cost of routing remaining almost entirely the cost of finding multiple paths between senders and receivers. The Gurobi linear programming solver was used to find solutions to the model. This routing approach was implemented in the NS-3 network simulator. The work is presented as a messaging middleware scheme, which can be applied to any overlay messaging network.

Inferring Useful Static Types for Duck Typed Languages

Alexander Lamaison, Ph.D., Department of Computing, Imperial College London, September 2012.

Complete and precise identification of types is essential to the effectiveness of programming aids such as refactoring or code completion. Existing approaches that target dynamically typed languages infer types using flow analysis, but flow analysis does not cope well with heavily used features such as heterogeneous containers and implicit interfaces. Our solution makes the assumption that programs that are known to work do not encounter run-time type errors which allows us to derive extra type information from the way values are used, rather than simply where those values originate. This is in keeping with the "duck typing" philosophy of many dynamically typed languages. The information we derive must be conservative, so we describe and formalise a technique to "freeze" the duck type of a variable using the features, such as named methods, that are provably present on any run of the program. Development environments can use these sets of features to provide code-completion suggestions and API documentation, amongst other things. We show that these sets of features can be used to refine imprecise flow analysis results by using the frozen duck type to perform a structural type-cast. We first formalise this for an idealised duck-typed language semantics and then show to what extent the technique would work for a real-world language, Python. We demonstrate its effectiveness by performing an analysis of several real-world Python programs which shows that we can infer the types of method-call receivers more precisely than can flow analysis alone.

Adequate System-Level Testing of Distributed Systems

Matthew J. Rutherford, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, August 2006.

Software testing is about risk management. Typically, engineers use test adequacy criteria to balance the cost and efficacy of the testing activity. Test adequacy criteria are rules that provide an objective stopping condition on test input creation by defining a finite set of test requirements that must be satisfied. While adequacy criteria have been a focus of research activity for many years, existing testing criteria do not address the unique features of distributed applications. The contributions of this dissertation are: (1) a study of reported failure scenarios of seven distributed applications; (2) a novel testing technique based on discrete-event simulations that serves as a basis for adequacy criteria for distributed systems; (3) a fault-based analysis technique that allows testers to addresses the fundamental risks associated with using adequacy criteria; and (4) a case-study evaluation of the simulation-based and fault-based techniques.

The failure study involves the categorization of test inputs and observations needed to replicate failures reported by users. The results show that failure-producing scenarios are amenable to categorization, that simple system topologies are likely to be quite effective, and that a significant proportion of failure-producing scenarios involve distributed inputs. Thus, the results confirm our intuition that distributed systems need their own class of adequacy criteria.

Rather than inventing a new specification formalism, we instead adapt the common practice of using discrete-event simulations for the design and understanding of distributed systems to testing. Our key observation is that these simulations can be viewed as specifications of the expected behavior of the system. Using simulations to test the implementation of a system is therefore a matter of selecting inputs to cover the simulation according to some criterion, and then mapping the inputs into the implementation domain. As simulations are sequential programs themselves, virtually all black- and white-box criteria can be used with a simulation-based technique.

When using any adequacy criterion for testing, there is generally no way for engineers to know a priori how effective a test suite or the criterion itself is going to be on their system. To mitigate this risk within the context of simulation-based testing, we propose a fault-based analysis technique that uses code mutation operators to create a set of incorrect simulations. Candidate test cases are executed against each of these specification mutants and the number killed is used as a surrogate measure of effectiveness.

We evaluate the simulation-based technique and the companion fault-based analysis method on 28 implementations of three distributed systems. The results of these experiments are striking. First, we confirm that discrete-event simulations can indeed be used in testing, and that white-box techniques based on the simulation are both effective, and cost-effective when compared to randomly selected test suites of the same size. Second, in a significant advancement of the state of the art, we demonstrate the power of the fault-based analyses by improving the effectiveness of adequate suites within each criterion, and by predicting exactly the overall relationships between different criteria.

Read the abstracts of some related publications: [ 1, 2 ].

Automating Experimentation with Distributed Systems Using Generative Techniques

Yanyan Wang, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, August 2006.

Engineering distributed systems is a challenging activity. This is partly due to their intrinsic complexity, and partly due to the practical obstacles that developers face when evaluating and adjusting their design and implementation decisions. This thesis addresses the latter aspect by providing a framework to automate experiments. The experiment automation framework is designed in a generic and programmable way to be used with different types of distributed systems for wide-ranging experimental goals. It covers three key steps for each experiment: (1) workload generation, (2) experiment deployment and execution, and (3) post-processing. We designed an approach to workload generation, the simulation-based approach, in which the stimuli of the subject system are modeled by simulating its user behaviors and its execution environment variations. The execution trace of the simulation programs constructs a workload. We automate the next two steps with a model-based generative approach. It is founded on workloads and a suite of configuration models that characterize the distributed system under experimentation, the testbed on which the experiment is to be carried out, and their mappings. The models are used by generative techniques to automate construction of a control system for deploying, executing, and post-processing the specific experiment. We have validated our approaches by performing experiments with a variety of distributed systems on different testbeds to achieve wide-ranging experimental goals. Our experience shows that this framework can be readily applied to different kinds of distributed system architectures and distributed testbeds, and that using it for meaningful experimentation, especially in large-scale network environments, is advantageous.

Read the abstracts of some related publications: [ 1, 2 ].

A Planning-Based Approach to Failure Recovery in Distributed Systems

Naveed Arshad, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, May 2006.

Automated failure recovery in distributed systems poses a tough challenge because of myriad requirements and dependencies among its components. Moreover, failure scenarios are usually unpredictable so they cannot easily be foreseen. Therefore, it is not practical to enumerate all possible failure scenarios and a way to recover a distributed system for each of them. Due to this reason, present failure recovery techniques are highly manual and have considerable downtime associated with them. In this dissertation, we have developed a planning-based approach to automated failure recovery in distributed component-based systems. This approach automates failure recovery through continuous monitoring of the system. Therefore, an exact system state is always available with a failure monitor. When a failure is detected the monitor performs various checks to ensure that it is not a false positive or false negative. A dependency analyzer then checks effects of the failure on other parts of the system. After this an online planning procedure is performed to take the system from a failed state to a working state. This planning is performed using an artificially intelligent (AI) planner. By using planning, this approach can be used to recover from a variety of failed states and reach any of several acceptable states: from minimal functionality to complete recovery. When a plan is calculated, it is executed onto the system to bring it back to a working state. We have evaluated this technique through various online and synthetic experiments performed on various distributed applications. Our results have shown that this is indeed an effective technique to automatically recover component-based distributed systems from a failure. Our results have also shown that this technique can also scale to large-scale distributed systems.

Read the abstracts of some related publications: [ 1, 2 ].

Using Event-Based Translation to Support Dynamic Protocol Evolution

Nathan D. Ryan, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, December 2004.

All systems built from distributed components involve the use of one or more protocols for inter-component communication. Whether these protocols are based on a broadly used "standard" or are specially designed for a particular application, they are likely to evolve. The goal of the work described here is to contribute techniques that can support protocol evolution. We are concerned not with how or why a protocol might evolve, or even whether that evolution is in some sense correct. Rather, our concern is with making it possible for applications to accommodate protocol changes dynamically. Our approach is based on a method for isolating the syntactic details of a protocol from the semantic concepts manipulated within components. Protocol syntax is formally specified in terms of tokens, message structures, and message sequences. Event-based translation techniques are used in a novel way to present to the application the semantic concepts embodied by these syntactic elements.

Read the abstract of a related publication here.

A Content-Based Networking Protocol for Sensor Networks

Cyrus P. Hall, M.S., Department of Computer Science, Univeristy of Colorado at Boulder, December 2004.

An ideal sensor network would minimize communication by routing information only to those nodes requiring the information. This thesis explores the use of a content-based network for this purpose, where messages containing sensor readings and associated metadata are relayed from source nodes to destination nodes based solely on the fact that the destination nodes have expressed interest in specific message content. Routing uses a distance vector protocol augmented to construct both primary and alternate routes. Forwarding uses content-based matching together with a special process called dynamic receiver partitioning. The protocol, called DV/DRP, is specifically designed for wireless sensor networks or other similarly constrained network configurations. We present simulations showing that the protocol scales to large networks while minimizing the resource consumption of individual nodes. It is also shown that the protocol is robust with respect to both transient and permanent node and communication failures. Finally, a preliminary implementation of the protocol on the MOS/Mica2 platform is presented.

Dynamic Reconfiguration of Software Systems Using Temporal Planning

Naveed Arshad, M.S., Department of Computer Science, Univeristy of Colorado at Boulder, May 2003.

Dynamic reconfiguration of software systems can be divided into three phases. These are sensing the need for reconfiguration, planning it, and carrying it out. Most research has focused on the sensing and carrying out of reconfiguration. However, not much attention has been paid to the planning part of reconfiguration. The planning of the reconfiguration process is considered to be difficult because there are not many automated and intelligent tools available that can carry out this process in an efficient way. This thesis describes a novel technique for carrying out the planning process using temporal planners. The temporal planner computes the plan for dynamic reconfiguration under tight time and resource constraints. In this thesis we have demonstrated two aspects of dynamic reconfiguration. The first is the optimum deployment of the initial system and the second is the reconfiguration of the system while it is running. We have developed a tool called Planit that acts as mediator between the temporal planner and the actual system. This tool is capable of detecting any change in the system, planning for the reconfiguration, and disseminating the new reconfiguration. We have tested the temporal planner on systems with large numbers of reconfigurable artifacts with good results.

Read the abstract of a related publication here.

Using the Structure of XML to Generate Notifications and Subscriptions in Siena

Christopher R. Corliss, M.S., Department of Computer Science, Univeristy of Colorado at Boulder, May 2002.

This thesis focuses on how to allow XML clients to interface with Siena, an event notification architecture. There are two challenges to make this happen. The first one is how to take a client's XML notification and translate it into a Siena notification, which has a different data model from XML. The other challenge is translating an XPath expression into a Siena subscription. The approach taken to overcome these challenges is to use the structure of the XML notification as a way to guide the translation process. The structure of the XML notification is specified in a file using the XSchema language. A set of generic rules has been developed, which can then be applied to any XSchema that will result in the creation of a map mapping the XML notification tags into Siena notification attribute names. This same mapping is also used to translate an XPath expression into a set of constraints used in the creation of the Siena subscription.

Read the abstract of a related thesis here.

Mobility Support in the Siena Publish/Subscribe Middleware

Mauro Caporuscio, Laurea, Dipartimento di Informatica, Universita dell'Aquila, March 2002.

This thesis is concerned with mobile applications that use a publish/subscribe infrastructure. In particular, this work consists of (1) a case study on the deployment of a publish/subscribe middleware on top of a wireless communication service, where mobility is supported at the network level, and (2) a design and initial implementation of a mobility support service realized within the publish/subscribe middleware.

Read the abstract of a related publication here.

Read the abstract of a related thesis here.

EJB-ARK: Enterprise JavaBean Automatic Reconfiguration frameworK

Matthew J. Rutherford, M.S., Department of Computer Science, Univeristy of Colorado at Boulder, December 2001.

This project examined the development of EJB-ARK, a management framework for the software deployment life cycle of Enterprise JavaBeans. The overarching goals of the framework were to allow reconfigurations of EJB systems to proceed with a minimum of disruption, and to minimize the impact on the typical software producer activities. To understand the ways EJB systems evolve, six classifications of system reconfigurations were created. Additionally, enhancements to the existing EJB component life cycle were identified. These enhancements were aimed at making the deployment life cycle of EJBs easier to manage. A prototype EJB-ARK system was developed to integrated with JBoss, an open-source EJB server. A sample EJB system, Dirsync, was developed to validate the EJB-ARK architecture and design. Dirsync was designed to have many features that are commonly found in distributed applications, and as a final demonstration the Dirsync application was automatically evolved through seven different releases. This study showed that a framework could be developed that would allow a system to be automatically reconfigured with a relatively small impact on the activities of the software producers.

Read the abstract of a related publication here.

Read the abstract of a related thesis here.

A Formal, Language-Independent, and Compositional Approach to Control Dependence Analysis

Judith A. Stafford, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, August 2000.

Dependence relationships among the statements of a program are important to understand for various software development and maintenance purposes. The program's dependence graph is used as a base for various types of program analyses. A dependence graph represents the potential for one statement in a program to affect another in terms of the control and data dependencies among a program's statements. A dependence graph is a directed multi-graph; the vertices of the graph represent the statements in a program and the arcs represent control and data dependencies separately. During the past two decades the value of a dependence graph as a program representation has been recognized by a wide audience and the definition has been extended in various ways in order to incorporate dependence relationships in various types of programs.

This dissertation concentrates on the control dependence aspect of program dependence and describes a new approach to identifying control dependencies in sequential, imperative, multi-procedure programs. The approach is formal, compositional, and language independent. It addresses previously identified pitfalls associated with identifying control dependencies in programs that contain procedure calls. Additionally, because it is rigorously defined, it provides a foundation for reasoning about its potential use as a base for formal extension to other types of programs.

Models of control dependencies for uni-procedure programs are typically based on composing two program representations: the control flow graph and the forward dominance tree. The key observation underlying the work described in this dissertation is that the notion of forward dominance has not been carried forward into approaches to computing control dependencies in more complex types of programs. The forward dominance relationship, as previously defined, is not effective for use in identifying control dependencies in non-inlined control flow representations of multi-procedure programs.

In this thesis we extend the definitions of control flow, forward dominance, and control dependence for application to multi-procedure programs. We describe structures that represent the interprocedural relationships in a program. We describe and define interprocedural forward dominance and a related program representation called the forward dominance forest and its use to identify control dependencies in multi-procedure programs.

Read the abstracts of some related publications: [ 1, 2 ].

An Infrastructure to Generate Experimental Workloads for Persistent Object System Performance Evaluation

Thorna O. Humphries, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, August 2000.

Performance evaluation of persistent object system implementations requires the use and evaluation of experimental workloads. Such workloads include a schema describing how the data are related, and application behaviors that capture how the data are manipulated over time. Currently, these experimental workloads generate data in a manner that does not support sharing of applications or data among researchers either because it is specific to a particular hardware platform or it is specific to a particular persistent object system. Using trace-driven simulation as a technique for analyzing the performance of persistent object systems, an infrastructure for generating experimental workloads and capturing their behavior is designed and implemented in this dissertation. This infrastructure consists of a common trace format that allows data to be shared among researchers and a modeling toolkit that reduces the effort to model, implement, and instrument applications. This infrastructure also consists of a new technique to generate multi-user workloads.

PTF (POSSE Trace Format) is a general-purpose trace format that is the specification of a set of events characterizing application operations on persistent object stores. PTF is novel in that the semantics of the higher-level application is maintained through the trace events (e.g., the notion of an object is captured in the trace events). It also captures the information about an application that is not specific to a particular persistent object system implementation.

AMPS (Application Modeling for Persistent Systems) is a toolkit that consists of a set of C++ classes and a TCL interface to ease the creation of self-tracing applications. The set of classes provides mechanisms for specifying a schema, coding application operations on the schema, and transparently instrumenting an application to record trace events. The TCL interface provides an interactive mechanism for specifying the workload of an application.

Several benefits can be derived from the use of the infrastructure of this dissertation. These benefits are as follows: the process of building new experiments for analysis is made easier; experiments to evaluate the performance of implementations can be conducted and reproduced with less effort; and pertinent information can be gathered in a cost-effective manner.

Read the abstract of a related publication here.

A Reusable, Distributed Repository for Configuration Management Policy Programming

André van der Hoek, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, May 2000.

Although the number and variety of available configuration management (CM) systems has grown rapidly in the past few years, the need to construct new CM systems remains. The desire to manage different kinds of artifacts other than source code, situations demanding highly specialized solutions, and the exploration of new research questions all may require the construction of a novel CM system. Unfortunately, in the face of today's move towards distributed projects, this is becoming an increasingly daunting task for which existing CM technology provides little to no support.

This dissertation contributes a novel reusable testbed that supports the rapid development of potentially distributed prototype CM systems. The testbed separates CM repositories from CM policies by providing a generic model of a distributed repository and an associated programmatic interface. Together, the repository model and programmatic interface stipulate a precisely defined abstraction layer upon which specific CM policies are built. In particular, CM policies are programmed as unique extensions to the interface, while the underlying distributed repository is reused across different policies. Within the abstraction layer, distribution is isolated. Low-level details of distributed programming are placed within the implementation of the repository model whereas distribution aspects that are controlled at the policy programming level are placed in a separate, orthogonal functional category within the programmatic interface.

Two tangible benefits result from the use of the reusable testbed. First, the effort required in constructing prototype CM systems is reduced significantly because the generic repository is reused and the CM policy is easily implemented. Second, the rapid exploration of new CM policies is enabled, leading to the creation of unique CM policies that are tailored to specific situations.

The testbed is evaluated abstractly, by mapping ten CM policies onto the repository model and programmatic interface. Additionally, it is evaluated concretely through the use of a prototype, called NUCM, upon which three novel CM policies are implemented. Demonstrating the expressiveness, feasibility, utility, and validity of the testbed, these policies are characterized by their rapid development, ease of change, incremental evolution, and seamless distributed operation.

Read the abstracts of some related publications: [ 1, 2, 3, 4 ].

Analysis and Design for a Next Generation Software Release Management System

Robert A. Smith, M.S., Department of Computer Science, Univeristy of Colorado at Boulder, December 1999.

SRM is a software release management system that helps developers in releasing software and users in obtaining software. Two key characteristics of SRM are its inherent support for dependencies and its support for distributed operation. However, despite its advantages, two important disadvantages exist in the current version of SRM. First, it is inflexible in terms of its distribution, which needs to be specified a priori. Second, it is inflexible in terms of its appearance, which cannot be configured and is based on a standard set of descriptive metadata. This thesis contributes a solution to these two problems. Based on a comprehensive survey of existing release management systems, a core set of capabilities is defined. These capabilities are further refined into a detailed design of both a novel repository structure and a new XML-based method of defining the descriptive metadata of a repository. The advantages of the design are demonstrated through the use of a prototype, which is used to demonstrate how the distribution topology can change over time and how the same repository can be configured to be a software or document release management system.

Read the abstract of a related publication here.

Read the abstract of a related thesis here.

Agent-Based Software Configuration and Deployment

Richard S. Hall, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, May 1999.

Software deployment is an evolving collection of interrelated processes such as release, install, adapt, reconfigure, update, activate, deactivate, remove, and retire. The connectivity of large networks, such as the Internet, is affecting how software deployment is performed. It is necessary to introduce new software deployment technologies that leverage this connectivity. The Software Dock framework presented in this thesis creates a distributed, agent-based deployment framework to support the ongoing cooperation and negotiation among software producers themselves and among software producers and software consumers. This deployment framework is enabled by the use of a standardized deployment schema for describing software systems, called the Deployable Software Description (DSD) format. The Software Dock also employs agents to traverse between software producers and consumers in order to perform software deployment activities by interpreting the deployment descriptions of the software systems. The Software Dock infrastructure allows software producers to offer to their customers high-level deployment services that were previously not possible.

Read the abstracts of some related publications: [ 1, 2, 3, 4 ].

Read the abstract of a related thesis here.

Feature Engineering of Software Systems

C. Reid Turner, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, May 1999.

Software users and developers have different perspectives. Users are focused on the problem domain, where a system's features are the primary concern. Developers are focused on the solution domain, where a system's life-cycle artifacts are key. This difference leads to difficulties producing successful software systems. At present, there is little understanding of how to narrow the gulf between the two perspectives. This dissertation argues for establishing an organizing viewpoint, which we term feature engineering, to incorporate features into software engineering as first class objects. Features, acting as a bridge between the problem and solution domains, can begin to narrow the gulf.

This dissertation explores three aspects of features in software engineering. First, it establishes a framework for understanding software features. In this framework, we define the concept of feature as separate from feature implementation, its realization in the developed system. The framework also includes a model of features in software engineering. This model identifies relationships between a system's features and the major development artifacts. The final part of the framework is an evaluation of the potential use of feature information in the various software development activities. The second and third aspects considered are the roles for features and feature relationships in configuration management and testing. In this dissertation, we argue that configuration management is pivotal for taking advantage of the various relationships that features structure. The ability of commercial configuration management systems to use information about features is evaluated, and our experiences using a prototype feature-based configuration management system are reported. Feature testing is identified as a useful extension to traditional testing. Feature tests are used to confirm that a feature implementation faithfully produces the desired behavior. Feature tests are also used to uncover feature relationships and interactions in the solution domain.

Read the abstracts of some related publications: [ 1, 2 ].

Architectures for an Event Notification Service Scalable to Wide-Area Networks

Antonio Carzaniga, Ph.D., Dipartimento di Elettronica e Informazione, Politecnico di Milano, December 1998.

A wide range of software systems are designed to operate in a reactive manner. In such systems, the high-level control flow is not explicitly programmed, instead it is driven by the occurrence of events. These systems realize their functionality by performing some actions in response to events, possibly using the information associated with the stimulating events.

A common logical component of every event-based application is what we call an event observation and notification service, or more concisely an event service. The event service observes the occurrence of events or the occurrence of combinations of events and consequently notifies all the applications or components that have declared their interests in reacting to such events. Because the semantics of event-based applications is substantially independent of the mechanisms that are used to capture and dispatch events of interest, it is convenient to separate the event service from all the applications that make use of it. In accordance to many standardization efforts that have been proposed recently (e.g., the CORBA Event Service), and according to strategic plans and research in network technology, we envision a unified event service implemented as a common ``middle-ware'' to support the event-based interaction among software components.

This work presents Siena, a project directed towards the design and implementation of a scalable general-purpose event service. In summary, we see two main challenges in the area of event-based infrastructures, that Siena proposes to address:

scalability: this is the ability to provide an event service across a whole wide-area network. The large scale of the network implies a large number of applications scattered over many distant sites exchanging many events;
expressiveness or flexibility: this quality denotes the ability of the event service to provide a good level of support to a wide variety of applications. Expressiveness has to do with the type of information that can be attached and propagated with notifications as well as the ability of the event service to aggregate and filter events as precisely as possible on behalf of applications.

The contributions of this work are a formal definition of an event service that combines expressiveness with scalability together with the design and implementation of the architectures and algorithms that realize this event service as a distributed infrastructure. One obvious issue that we must face in this research is the validation and verification of the solutions that we propose. To this end, we used a simulation environment by which we performed systematic simulations of our architectures and algorithms in several network scenarios. Here we present the framework that we built and the modeling abstraction that we adopted to perform this analysis. We also discuss some initial results that clearly differentiate the Siena event service from traditional ones. Further simulations will help clarify the trade-offs and differentiators between the alternative solutions that we propose. In addition to the simulation environment, we implemented a real prototype of Siena, consisting of a server that realizes one of our distributed architectures plus a client-side interface with a mapping to the Java language. This prototype has been used in distributed settings of limited scale to support an event-based software deployment system.

Read the abstracts of some related publications: [ 1, 2, 3 ].

Read the abstracts of some related theses: [ 1, 2 ].

Adding Licensing and Access Control to an Internet Accessible Software Release Management Tool

Michael W. Hollis, M.S., Department of Computer Science, Univeristy of Colorado at Boulder, August 1998.

This thesis describes the design and implementation of a Licensing System and an Access Control Ssytem for a Software Release Management Tool. The Software Release Management Tool provides a process through which software can be made available to and be obtained by its users. This tool is designed to automate and optimize the retrieval of a software system's components. Because the tool allows users to retrieve software systems over the internet, some regulation is in order. A Licensing System is used to insure each user agrees to certain terms and conditions prior to retieving a software system. An Access Control System insures that each user has been given permission to retrieve a software system. These systems work together to regulate the software retrieval process.

Read the abstract of a related publication here.

Read the abstract of a related thesis here.

Process Discovery and Validation through Event Data Analysis

Jonathan E. Cook, Ph.D., Department of Computer Science, Univeristy of Colorado at Boulder, December 1996.

Software process is how an organization goes about developing or maintaining a software system. It is the methodology employed when people use machines, tools, and artifacts to create a product. Recent work has applied formal modeling to software process, with the hope of reaping the benefits of unambiguous and analyzable formalisms. Yet industry has been slow to adopt formal model technologies. Two reasons are that it is costly to develop a formal model and, once developed, there are no methods to ensure that the model indeed reflects reality.

This thesis develops techniques for process event data analysis that help solve these two problems, which are termed process discovery and process validation.

For process discovery, event data captured from an on-going process is used to generate a formal model of process behavior. To do this, results from the field of grammar inference are applied, and a new method is also developed. The methods are shown to be efficient and practical to use in an interactive tool that is developed in the course of this work.

For process validation, event data is used to measure the correspondence between existing process models and the actual process, yet allowing discrepancies to exist. A paradigm based on string distance metrics is developed, and several validation metrics in this paradigm are described. How these metrics can be calculated is then shown, and a tool set for doing process validation is provided.

In implementing these methods, a framework is developed, called Balboa, for managing process data and facilitating the construction of analysis tools. This framework serves to unite the variety of collection mechanisms and tools by providing consistent data manipulation, management, and access services, and assistance in tool construction.

Finally, the techniques developed in this thesis are applied in an industrial study. This study provides concrete results showing that one can relate the quality of a process as prescribed by a model to the quality of the product. In doing so, it also shows that the discovery and validation techniques are able to capture important aspects about software process, and can be applied in the real world.

Read the abstracts of some related publications: [ 1, 2, 3, 4 ].

Software Process Modeling and Execution within Virtual Environments

John C. Doppke, M.S., Department of Computer Science, Univeristy of Colorado at Boulder, August 1996.

While multi-user virtual environments have been developed in the past as venues for entertainment and social interaction, recent research in virtual environments has focused on their utility in carrying out work in the real world. This recent research has identified the importance of a mapping between real and virtual that permits the representation of tasks in the virtual environment. In this paper we investigate the use of virtual environments -- in particular, MUDs (Multi-User Dimensions) -- in the domain of software process. In so doing, we define a mapping, or metaphor, that permits the representation of software process within a MUD called LambdaMOO. The system resulting from this mapping, called Promo, permits the modeling and execution of software processes.

Read the abstract of a related publication here.

Alexander L. Wolf | Department of Computing | Imperial College London