BioANS: Lightweight Autonomicity for Wireless Sensor Networks through Bio-Inspired Communication and Quality-of-Context

Michael Breza, Department of Computing, Imperial College, London email: mjb04@doc.ic.ac.uk

Richard Anthony, Department of Computer Science, University of Greenwich, London

email: R.J.Anthony@gre.ac.uk

Julie McCann, Department of Computing Imperial College, London email: jamm@doc.ic.ac.uk

Abstract:

This paper describes the Bio-ANS (Bio-inspired Autonomic Networked Services) protocol that uses a novel utility-based service selection mechanism to drive autonomicity in sensor networks. Due to the increase in complexity of sensor network applications, self-configuration abilities, in terms of service discovery and automatic negotiation, have become core requirements. Further, as such systems are highly dynamic due mobility and/or unreliability; runtime self-optimisation and self-healing is required. However the mechanism to implement this must be lightweight due to the sensor nodes being low in resources and scalable, as some applications can require thousands of nodes. Bio-ANS incorporates some characteristics of natural emergent systems and these contribute to its overall stability whilst it remains simple and efficient. We show that not only does the Bio-ANS protocol implement autonomicity in allowing a dynamic network of sensors to continue to function under demanding circumstances, but that the overheads incurred are reasonable. Moreover, state-flapping between requester and provider; message loss and randomness are not only tolerated but utilised to an advantage in the new protocol.

A highly efficient distributed bidding protocol is described. The protocol has been specifically designed to limit the amount of communication that actually occurs during the negotiation of service provision in large-scale self-configuring systems with limited resources (communication bandwidth and battery power).

In particular, this paper is concerned with applications which gather context and/or environmental information from networks of wireless sensors. Applications in this domain have some common requirements which include: robustness, the ability to reconfigure dynamically; scalable deployment platforms; stability despite configuration change; efficiency in the use of systems resources, especially in relation to the number of messages transmitted as scale increases; and low communication latency because the applications can have a real-time aspect.

Weiser's vision of ubiquitous or calm computing is where technology would be integrated or embedded into our environment, and would be able to adapt to changes in that environment and its user's demands [19]. Therefore introducing autonomicity to sensor networking is key to the achievement of calm computing. An example low scale application could be medical monitoring in the home where systems fully self-manage as the patient is unable to carry out tech support. The other application extreme is environment monitoring e.g. temperature and movement sensing of glaciers, contains potentially 10,000's of sensor nodes that must self-configure, self-optimise to route data saving performance and battery life while the glacier is moving and self-heal or degrade gracefully as some nodes will inevitably die.

To achieve application level self-management in sensor networks we have developed a decentralized, lightweight, scalable yet powerful protocol called ANS. ANS (Autonomic Networked Services) executes on each of the sensor nodes. We assume a degree of redundancy whereby a given node can have many functions e.g. a video sensor might be relaying patient location as its primary function, but also may be capable of analysing the gait of the patient should that be required. This redundancy is key to the function of the ANS and based on the well-established principle that sensing is cheap while communication is expensive. Location and Gait are known as contexts in ANS. If an application requires a location service it will initiate a request identifying a level of Quality of Context (QoC) required, and thus accept service from the best-suited device. When a given device is no longer providing the appropriate context quality (in terms of say resolution or precision) the application running ANS automatically reconfigures to another context provider that is closest to the appropriate QoC. This self-optimisation can include a new context or service, not available when the initial provider was selected. The ANS node, using a given context, is kept aware of that quality/accuracy though the use of a periodic message piggy-backed on top of requested sensor data that describes a given device's current quality of context. When the QoC becomes beyond the range suited to the requester it issues a re-tender request. Further, a re-tender message can be issued periodically. The re-tender is broadcast to the network and the sensors that match the context supply their QoC and a binding is made between the requester and that node's service. Through this simple mechanism the ANS allows us to build self-configuring, self-optimising, and self-repairing applications for sensor networks and other service oriented systems that require autonomicity.

As ANS is bio-inspired and exhibits engineered emergence we describe what we mean by this in section three. Quality is described as quality of context (QoC) and not Service and therefore Context and Context awareness is described in section four. Section five covers autonomicity and section six emergence. Section seven describes the experiments carried out and their respective assumptions. The detailed experimental results are presented in sections eight, nine and ten while related work is discussed in section eleven and then we conclude.

Though initially the ANS was not designed with emergence in mind, it has become evident that it exhibits and can exploit further emergent properties. Emergence describes higher-level states, patterns or other behaviours that arise in systems of numerous lower-level components that have local autonomy to interact with their neighbours. The individual components are typically quite simple and operate with only a local view of the system and yet there are many examples in nature where highly optimized global behaviour results [9]. The higher-level behaviour cannot be predicted by examining the individual components or their behaviour in isolation. The science of emergence is described in [12] [6] [10].

The term 'engineered emergence' describes the purposeful design of Interaction protocols so that a predictable, desired outcome is achieved at a higher level (i.e. emerges), although at lower levels the specific behaviour of Individual components at any moment cannot be predicted. See for example [4]. Emergence is employed in ANS to achieve simultaneously scalable and robust negotiation in sensor network applications. The negotiation protocol needs to be stable and predictable in terms of its higher-level behaviour (i.e. a suitable context provider needs to be located within a reasonable time-frame),although the low-level behaviour (such as the actual interactions with and between sensor nodes, and the ordering of events such as message transmission) has elements of randomness and can thus not be precisely predicted.

Engineered emergence is a general approach to building systems that benefit from these characteristics (scale robustness and stability, but that do not require precise knowledge of lower-level activity. Sensor networks, which contain numerous sensors each having slightly different QoC characteristics (different locations, different accuracy, different levels of battery life remaining etc.), but fundamentally serving as redundant spares for one another, are a highly suitable domain in which applications can take advantage of engineered emergence.

Traditional design of distributed applications focuses on strict protocols, message acknowledgments and event ordering. Each message and event is considered important and randomness is generally undesirable imposing sequenced or synchronised behaviour which is generally deterministic. Such a design paradigm can lead to inefficiency, for example through large numbers of transmitted messages and additional communication latency, especially when some of these messages do not directly contribute to correct application behaviour at higher levels [5].

Natural biological systems however are fundamentally non-deterministic and there are many examples of large-scale systems that are stable and robust at a global level; the most commonly cited examples being drawn from cellular systems and insect colonies. ANS requires that a small number of appropriate quality bids are elicited from sensor nodes (service providers) in potentially very large systems. In this application domain it is important to minimise the total amount of communication, the latency of service negotiation, and also to preserve the battery power at each sensor node.

The delayed-bid mechanism (described in detail later) purposely introduces randomness into the basic ANS protocol to spread, in time, the high number of bids sent as responses to a QoC request in large systems. The randomness makes the system non-deterministic as exactly which time a particular sensor node will send its reply, or the order in which replies are received; thus the actual choice of context provider, is not predictable. Yet this non-determinism can be shown to enhance stability and efficiency, whilst simultaneously reducing resource usage. Results presented in the sections nine and ten demonstrate that these benefits are achieved without adversely affecting robustness or latency.

The perception of context and its quality drives the autonomic behaviour of the ANS. Traditionally context is defined broadly as the circumstances or situations in which a computing task takes place [17]. One of the most common contexts is the location of the user or objects of interest. In smart homes location can be obtained using a variety of alternative sensor types including ultrasonic badges, RFID tags, [11]. The quality (which is quantitatively application-specific) of the location information acquired by different sensors will be different. For instance, ultrasonic badges can determine location with a precision of up to 3 cm, while RF lateration is limited to 1-3 m precision.

We can define properties, which we call 'quality of context' (QoC) attributes, which characterise the quality of the context data received. QoC is essential to the ANS for choosing the best service among those available when delivering a specific type of context to an application. While different types of contexts will have QoC attributes specific to them, there are certain attributes that will be common to most contexts. Based on [8], we identify the following common attributes: Precision, Probability of correctness, Resolution, Up-to-datedness and Refresh rate. In ANS context providers need to specify QoC attributes for the context information they deliver. These attributes may vary over time and therefore must be updated regularly.

Services and quality of context attributes are what enable the ANS to be autonomic. An autonomic system can have several key abilities based on [13], including:

**Figure 1:** A message diagram of the ANS protocol.
[scale=0.70]images/protocol.eps

The ANS uses constructs called 'services' to tie QoC information together with the sensors or actuators that it relates to. A service is a named list of 'commands' and 'events'. Figure 1 is a message diagram showing how the protocol is implemented in terms of numbers and types of messages.

A process called 'tendering' is used to select a service to use. When an application wishes to use a service it must broadcast a 'request' command containing the name of the service (such as temperature) and its preferences for the QoC attributes. Any devices within range which support the service will use a 'utility function' to calculate how closely they are able to match the requested QoC attributes. Regardless of the number of QoC attributes the result of the utility function will be a single signed integer called 'closeness'. The closeness value is sent back to the requesting device so that it may choose the best device to use. By regularly 're-tendering', applications ensure that they are always using the most appropriate device and so take advantage of any new devices joining the network. When a service is defined, the QoC attributes that apply to it are scaled and translated so that they are all of the same order of magnitude [15].

The utility function treats the available and requested QoC attributes on a device as two points in an n-dimensional space and returns the distance from the requested point to the available point. The application requesting the service will choose the device with the lowest positive closeness since that device will be at least as good as the application wants. If no positive closenesses are returned then the least negative closeness will be chosen since, while it will not optimal, it will be the least bad. By not choosing the highest value applications do not use sensors that are more accurate than they really need. This leaves those sensors more available for use by applications that do need their higher quality aiming to produce a more optimal network configuration.

Certain services provided by sensor nodes can have a variable QoC over time. For example we have two nodes; node A: a fast powerful node delivering a QoC of 100 which can serve many requesters, and node B: a slow node that can only service a single requester at time advertising a QoC of 60. The request may be for a Data Aggregation context, requiring the processing of data on the sensor node before sending it back to the requester node. Two requesters wish to have aggregated data at QoC of 90 and will select node A, who will then apportion its resource between them and give them an actual QoC of 50 each. At the next re-tender, one requester will see a better QoC at node B and switch to that node. This will cause the QoC at A to return to 100. During the following re-tender the requester will see that node A can provide a higher QoC again and return to that. Potentially this requester will flap between node A and B continuously. This is an example of state-flapping in ANS and in highly dynamic systems it can be destabilising.

This paper is primarily concerned with selection of context provider(s) based on the quality of information that can they offer, which is determined by a utility function which expresses the application's preferences amongst characteristics such as accuracy and up-to-datedness. Externally deterministic behaviour is required in the sense that applications must be served with context information of the appropriate quality. It is not important that the lower-level behaviour be deterministic, which sensor provides the information at any moment, or how the sensor(s) are selected. This relaxation provides an opportunity to take advantage of cheaper (less communication intensive, less synchronous and self-regulating) non-deterministic communication strategies inspired by biological systems such as insect colonies.

Natural systems have evolved techniques that have common characteristics which are part of their success. Examples include randomness (such as in timing mechanisms) and attributing low-value to individual events, actors and messages (the protocols are robust with respect to the loss of some messages, or if some events go unobserved or are unordered). The delayed-bid mechanism [2] introduces a random timing component into the ANS protocol. This breaks the symmetry of behaviour at sensor nodes, spreading out in time the responses to QoC requests.

On receiving a QoC request, the sensor nodes each compute their suitability based on the requested utility function. This is important because the sensors may serve several applications simultaneously. For example, one application may rank precision above resolution and another application may consider resolution to be more significant. Once the node has determined its QoC it transmits a reply to the requester. The use of a broadcast request is efficient with respect to the simplicity of the protocol and the total number of messages, but introduces a synchronisation point (the receipt of a request implicitly invokes a certain response at each node). In large systems, near-simultaneous reply-message generation behaviour presents a problem as the communication channel is temporarily congested. This may possibly deny communication service to another, maybe higher-priority, application. In addition, typically only one or a small number of sensors are required to provide information to a particular application, so much of the communication, the battery power consumed at sensor nodes, and the processing of replies at the requester, is wasted.

However, the injection of a random delay (locally determined at each sensor) spreads out the replies and directly reduces the network congestion problem. It also provides an opportunity for significant reduction of messages. This is because the response messages are dispersed in time and the requester node can process some messages before others have been sent. Once sufficient responses with the appropriate QoC parameters have been received, a Stop-Bids message is sent. This has the effect of canceling all unsent responses at sensor nodes. Some unwanted messages may already be in transmission, but if the system is tuned appropriately, the large majority of unnecessary messages can be saved.

An optimisation can be applied if sensor nodes can eavesdrop on the replies of their neighbours. When transmissions are dispersed there is the opportunity for the node to analyse its own operating context; i.e. is it a poor quality sensor with better quality neighbours, or perhaps it can offer the best QoC? In this way, if it 'hears' a relatively high quality response from a neighbouring node it need not transmit its own reply, see results in section ten part four.

Engineered emergence applications share many of the beneficial characteristics of the natural systems which inspire them. In the delayed-bids enhanced ANS mechanism all messages are deemed to have low-value, and the protocol tolerates the loss of any individual message: if a request receives no replies (within the maximum random time delay for replies) the request message is deemed lost and is repeated; if individual sensor nodes do not receive the request message they simply do not participate in the bidding, this aspect of reliability is actually more beneficial as the scale grows, as it can be actually helpful if the pool of respondents can be diluted in such a free way. Likewise individual response messages are of low value. If the Stop-Bids message is lost the protocol still functions completely correctly, it just looses efficiency as the reply-quenching savings are lost.

The experimental results presented in the paper are built up in a number of stages, evaluating the algorithm's performance as the various stages of operation are incrementally added and tuned.

Let

represent the total number of sensor nodes in the system, and

represent the number of messages transmitted in a re-tender. All calculations assume a single service requester and no message loss. In 'large' systems the Stop-Bids message reduces responses in a way which can only be treated probabilistically, as it depends on the specific tuning of the application (e.g. how many responses are required), the tuning of the protocol (e.g. the range of random timeout values), and the actual random timeouts chosen at each sensor. Let the proportion of responses canceled by the Stop-Bids message be

, where

The ANS protocol has been implemented on sensor networks for two applications; patient monitoring in the home, and large scale building usage monitoring. To further understand ANS we modeled it as a discrete event simulation, using observed performance. We want to examine the trade off between protocol overhead and performance. By overhead we refer to all of the communication not concerned with performing work. Performance we define as the ability of a requester to receive its desired QoC, and for what percentage of overall run time.

The first set of experiments assume that every sensor node can hear all requests and all can service the request. This would simulate environmental monitoring with high powered radio or multi-hop functionality. This will allow us to determine whether state-flapping will destabilize the system, and to what extent overhead impacts negatively on performance under extreme conditions. The second set of simulations includes location and reception radius limiting the number of sensors that can hear a given requester. These add a realistic set of constraints that the system will face in deployment.

The duty-cycle between re-tenders is an important consideration in ANS; a large duty-cycle between re-tenders lowers protocol overhead at a cost of resilience to sensor failure. In these simulations the duty-cycle was 10 queries. Packets sent to the sensor for readings and packets with sensor data were counted as work packets. All others were considered protocol admin packets, i.e. overhead.

To faithfully simulate the dynamic nature of the sensor network, sensor failure and replacement is built into the experiments. The sensor failures are exponentially distributed with a mean of one failure every 5000 time units, and a failure triggers a replacement of one or more new sensors with a replacement time lag exponentially distributed with a mean of 10000 time units. When the sensors are replaced, the number of new sensors is geometrically distributed with a mean sensor node count of 1.6. When a sensor used by a requester fails, the requester immediately starts the re-tendering process. The QoC of the new sensor is completely random, and it has a ten percent chance of having a variable QoC that will decrease by 33% for each user. The advertised QoC of the sensors is assumed to be correct. All sensors serve data that is of interest to all of the requesters, but different requesters want different QoC. Each requester requires only one sensor at a time.

The time between packet arrivals is affected by the random back-off algorithm used by the radio link layer that ANS was built on [1]. The arrival of responses from requests (non-random arrivals) were normally distributed with the means and standard deviations taken from the packet traces in[14]. Packet loss, collision and traffic management problems were not modeled, because we assume this to be handled by BNET.

Three metrics were measured in these experiments. The average percentage of time in the simulation run that the requesters got their requested QoC

80% between 60 and 80%,

0% and no sensor. As well as the average ratio of work related packets sent and received by the requesters. This is the inverse of the protocol�s overhead. Negotiation time, from request packet being sent out, to the sending of the select sensor packet. The simulations were run for 10,100,000 time units, measured at equilibrium with a restart at 100,000 time units. Results are generated as an average over all requesters, each over ten runs.

The results are presented as two experiments. Experiment one does not take into account sensor location and distance from the requester. First we look at the affects of state flapping on the basic ANS. The second part compares the basic ANS to two versions modified to increase the ratio of work packets. Experiment two takes location and reception distance into account. It compares the optimizations of ANS and adds the delayed-bids version.

This experiment sets out to determine whether or not the state flapping observed in [15] affects the ratio of work packets to overall network traffic, and for what percentage of time is the requested QoC received. State-flapping is only observed when sensors can have a variable QoC and are serving more than one requester. The assumption is that as a higher percentage of the sensors in the network have a variable QoC, there will be more state-flapping among the requesters, and that this will adversely affect our performance measures.

This was run with 10 requesters and a pool of 100 sensors. The percentage of sensors whose QoC changed was varied to test the potential effects of state flapping on the number of packets sent and received by an individual requester and the amount of time the QoC requirement is met. The range varied from 0% to 90% incremented by 10% each time.

We observed that as the potential for state-flapping increases, the number of work packets to overall packets stays fairly constant at about 15%. Figure 2 shows the average time that requesters received their desired QoC decreased as the percentage of sensors with variable QoC increased. The majority are getting no less than 80% of the desired QoC. The proportion of time for 79% to 60% QoC increases, but the time requesters get 59% and below is always very low.

These two results show that the adaptation mechanism in ANS provides resilience to state-flapping, without changing the overhead. There is an associated degradation of QoC received by the requesters, but it does not seem that severe. The low ratio of work related messages to overall communication, however, does have room for improvement.

This experiment looked at how ANS scaled with respect to the numbers of packets sent and received and amount of time QoC was met. The number of sensors were increased from 100 to 10,000. The number of requesters is also scaled, and is always 10% of the number of sensors. The number of sensors with variable QoC was fixed to 10%. Failure and new sensor rates were fixed to one in every 5000 (failure) and 10,000 (new sensor) time units respectively.

**Figure 2:** Average time requesters received desired QoC in the presence of state flapping.
[scale=0.65]images/small_qoc_ex1

**Figure 3:** Ratio of work packets to overall packets.
[scale=0.65]images/large_packets_ex1

Figure 3 shows the effects of increasing the network size on the ratio of work packets to overall packets sent and received by a requester. In the previous experiment we saw that only 100 sensors and 10 requesters gave us an average of 15% work packets sent and received per requester. That ratio becomes 2% by 1000 sensors and 1% by 2500 sensors. However, at all system sizes the ANS provided the desired QoC to the requesters a minimum of 94% of the time.

The results of these experiments show that ANS is certainly adaptive in maintaining a quality of context, but with a high network traffic overhead. The next set of experiments aims to reduce the number of unnecessary responses. The first is the Stop-bid. Here we introduce and exploit a random delay before sensor nodes send their QoC replies. This provides an opportunity for the requester, once it has a suitable response from one or more sensors, to effectively cancel outstanding replies, potentially cutting out a large fraction of the total reply messages. When a sensor receives a Stop-bid message it stops waiting and aborts its reply.

This experiment was done in three parts, and sought to improve the basic ANS's communication overhead. In each case the work packet ratio, and average time from request being sent until the requester receives the last sensor reply was measured. First, we re-ran the basic ANS, from 100 to 1000 nodes. Second, we added a timeout to the basic ANS of 2.0 time units, reducing the number of responding sensors to a fixed number. Next, ANS was optimised by examining each sensor response as it was received. If the requester received a sensor response packet with the requested QoC or greater, then the requester sent a select message (which acts as a stop-bid message), and started using the sensor. All sensors unnamed in the select message cease to send responses. If no sufficient sensors are received in 2 time units, then the requester choose the best from all the responses already received using the basic ANS selection mechanism. A select message was then sent, and the requester began to use the selected sensor.

First the original ANS is tested with sensor populations of 100 to 1000 sensors incremented by 100. Requester population is always 10% of the number of sensors.

As in experiments A and B, the ratio of work packets to total sent and received packets is very small, ranging from 15% at 100 sensors, to 2 percent at 1000 sensors (see figure 3). As expected, the time a requester waits for all the responses in the basic ANS increases linearly with the number of sensors in the network. This is considering the underlying random back-off algorithm preventing packet collision, and the fact that all sensors have the service that is requested by the requester. The ranges observed ranged from an average of 7.2 time units for 100 sensors, to 72 time units for 1000 sensors.

Here, a time out is added to the re-tender process. Once a request is broadcast, the requester waits for a period of time, then chooses from the responses received, and the select packet it broadcasts acts as the stop-bids message to the unnamed sensors (and a select to the named sensor). The number of responses was fixed to 28 sensors for all network sizes, and the time was a constant 2.0 time units. The value of the work packet to overall packet ratio averaged per requester was observed to be consistently 34% for all network sizes. This improves over the basic ANS which at best could deliver 15% work-packet ratio. The average amount of time the requester received their requested QoC was also above 98% of the time.

The final experiment of this section optimized ANS further by adding the immediate processing of the responses. The first sufficient response received where the sensor met or exceeded the requester's QoC needs is selected. We call this version 'First Sufficient' or FS. This gave us more information as to how many sensors were needed, on average, to satisfy the QoC requirements of the requesters. The results show that the average number of responding sensors before a suitable one was found ranged between 1.7 for 100 sensors (with a standard deviation of 1.1) and decreasing to 1.55 for 1000 sensors (standard deviation of 0.83). For each request the number of sensors needed was recorded. In all cases the most frequent (first mode) number of sensors needed was 1, and the second (second mode) most frequent was 2. The largest recorded number of sensors responding before a suitable one was found was 39.

**Figure 4:** Ratio of work packets to all packets sent and received.
[scale=0.65]images/ex2_packet.eps

Figure 4 shows us how the ratio of work related packets to all packets sent and received by a requester improves. By reviewing the response as it is received, we managed to get 75% of all packets sent and received to be work related, thus reducing the overhead of ANS to 25% of the packets sent. This is achieved while providing the requesters with their required QoC 99% of the time.

**Figure 5:** Average negotiation time. Note: basic ANS increases linearly off of the graph
[scale=0.65]images/ex2_neg_time.eps

Figure 5 shows the reduction in average negotiation time after a requester has made a request. The time needed for the basic ANS increased linearly right off of the graph. This representation was needed to show difference between ANS with timeout, and our First Sufficient ANS.

An interesting result of the random back-off communication scheme ANS is built upon is that, in the basic ANS, if the current sensor is still the best sensor, then no change of sensor is made. In ANS with timeouts and optimized ANS, if the current sensor does not have time to respond before another suitable sensor responds, then a change of sensor will occur. In the data we observed an average of 95% of the re-tenders resulting in sensor changes. A simple extension of the protocol, setting delay to 0, resolves this by ensuring that previously used sensor is replies immediately. This helps reduce the number of unnecessary sensor changes that occur. As state flapping has not had a negative impact on performance we did not wish to examine this further.

Location information is now added to the model constraining the number of sensors that can respond to a (re)-tender request. This is due to communication range. All of the sensors and requesters are randomly located on a grid whose sides are the square root of the sum of all of the sensors and requesters multiplied by a density factor. The density factor used makes the total population of sensors and requesters fill up five percent of the grid. The sensors and requesters all have a fixed range radius of 20 grid units.

Location information in the model constrains the number of sensors that can respond to a (re)-tender request. This is due to communication range.

To facilitate exploration of the effect of sensor node population density on the protocol's performance, a simple means of diluting the sensor nodes is used.

A 2-dimensional grid of cells is used to simulate the area of deployment. One or more node can reside in a given cell. Consequently, given a constant sensor population, the size of the grid determines the population density of the nodes. For a given node population, a larger grid will have a lower density than a smaller grid. In each experiment (i.e. change in grid size) the nodes are distributed across the grid with a uniform random distribution.

The term 'density factor' is defined as the mean number of sensor nodes within a 100 cell area of the grid; i.e. a density factor of 1 implies that there is an average of one sensor node per 100 grid cells. Each type of node has a wireless range of 20 cells, with the assumption that there is no interference to limit range. Thus its communication range covers $400\pi \approx 1257$ cells. Figure 11 illustrates a typical sensor node distribution with density factor 0.4 (for clarity, the lines in the diagram are drawn at a distance of 10 cells apart). On average this density factor equates to 5 sensor nodes being in communication range of a requester node.

**Figure 6:** A random sensor node distribution with a density factor of 0.4. Sensor and requestor nodes are represented by the symbols 'S' and 'R' respectively.
[scale=0.3]images/density_diagram.eps

Once again we aim to improve the basic ANS's communication overhead. The work packet ratio, average number of sensors responding per request, and average time from request being sent until the requester receives the last sensor reply are again measured. We re-run the basic ANS, basic ANS with a time-out of 2.0 units, and FS ANS. Delayed-Bids are now added to FS ANS. This assumes that the sensors themselves choose if they are going to respond to a re-tender request. The criterion they use is to calculate if they can provide at least 60% of the QoC asked for. If not, the sensor remains silent, therefore further reducing protocol overhead. The experiments are run from 100 to 1000 nodes. The results are summarized in figures 7, 8, 9 and 10.

**Figure 7:** Average percentage of time requesters got QoC.
[scale=0.65]images/location_QoC.eps

**Figure 8:** Average percentage of time requesters had no sensor.
[scale=0.65]images/location_no_QoC.eps

**Figure 9:** Ratio of work packets to overall packets.
[scale=0.65]images/location_packet.eps

**Figure 10:** Average re-tender time.
[scale=0.65]images/location_neg_time.eps

Figure 7 shows us that the basic ANS consistently provides its requesters with their requested QoC for the highest percentage of time. In figure 8 we see requesters receiving no sensor. All of the versions of ANS we tested showed a similar trend of better average time with QoC and lower average time without a sensor, when the number of sensors increased. Basic ANS now shows the percentage of work-packets to overall packets sent (see figure 9) is now 29% for 100 sensors, and reduces to 22% for 1000 sensors. The time taken for a re-tender shown in figure 10 now starts at an average of 3.3 time units for 100 sensors, and only goes up to 4.8 time units for 1000 sensors. Clearly, adding the constraint of location to the model reveals the true ability of the original ANS. Nevertheless, our optimizations still prove that the performance can be improved.

By stopping the bidding at a time of 2.0 time units, we see the same results as in the first experiment. With location added, the time-out enhancement gives the lowest average percentage of time of requesters getting their requested QoC (figure 7). In figure 8 time-out gives the least worst average percentage of time with a requester getting no sensor. The percentage of work packets in figure 9 is 35% regardless of number of sensors. Average negotiation time in figure 10 is also fixed at 2.0 time units, our specified time-out.

FS ANS shows the same trend as the other versions of ANS of increasing the average time its requesters get their QoC (figure 7), and reducing average time requesters are without sensors (figure 8) as sensor population increases. Aside from this the performance of FS ANS is almost the same as the previous experiment. The results in figure 10 show very low re-tender times. In figure 9 we can see the work packet ratio is similar to the previous experiments, again better than either the basic ANS with or without time-outs.

The delayed bids from the sensors improved the performance of ANS with regard to our metrics. Average time which the requesters got their QoC showed the same trend as the other versions of ANS, as did average time without a sensor (figures 7 and 8). The work packet ratio showed in figure 9 shows another improvement of 3% to 4% over FS ANS, and slightly shorter time for the re-tender process in figure 10.

These experiments all show that ANS is highly scalable. This is revealed by figures 7 and 8. All of the versions of ANS perform better when the sensor density is increased. A threshold of about 500 sensors and above gives us the best QoC and least time with no sensors.

The time between packet arrivals is affected by the random back-off algorithm used by the radio link layer BNET [1] that ANS was built on. The arrival of responses from requests (non-random arrivals) were normally distributed with the means and standard deviations taken from the packet traces in [14]. Packet loss, collision and traffic management problems were not modelled, because we assume this to be handled by BNET.

Three metrics were measured in these experiments: the average percentage of time in the simulation run that the requesters got their requested QoC or no sensor; the average ratio of work related packets sent and received by the requesters (the inverse of the protocol's overhead) and negotiation time, from request packet being sent out, to the sending of the select sensor packet. The simulations were run for 10,100,000 time units, measured at equilibrium with a restart at 100,000 time units. Results are generated as an average over all requesters, each over ten runs.

Given a range of 20 units, the density of the sensors in the deployment area will affect the performance of the network. The performance we choose to measure is QoC.

The average QoC obtained by the requester is almost 100% until the density factor fall below 1. At that point the average QoC begins to drop very quickly. At density factor 0.4 the requesters are only getting their QoC an average of 85% of the time. By density factor 0.2 that figure has dropped to 72%, and 64% by density factor 0.1. A further look at the raw data not graphically presented here shows that with a density factor of 1 a requester can hear an average of 8 sensors. A density factor of 0.4 reduced that to an average of 2 sensors. These figures are below the expected mathematical average because of sensor failure in the network.

**Figure 11:** Relationship between average percentage of time requesters got QoC and density factor.
[scale=0.65]images/density_QoC.eps

Given the results above, we decided to test the performance of BioANS as it scales in network size. The density factor was set to 0.4 to examine the performance as it begins to deteriorate. The results are summarised in figures 12, 13 and 14.

To test the robustness of BioANS in the face of a dynamic network, we run experiments where we vary the failure and resumption rate of the network. We also watch the degradation of the network as all of the nodes fail without replacement, and as the network recovers from a state of no sensors, to a full population of sensors.

**Figure 12:** Average percentage of time requesters got QoC as network size increases with fixed density.
[scale=0.65]images/QoC.eps

**Figure 13:** Ratio of work packets to overall packets.
[scale=0.65]images/packets.eps

**Figure 14:** Average re-tender time as network size increases with fixed density.
[scale=0.65]images/time.eps

All of the previous experiments are run with a failure rate of one failure every 5000 time units distributed exponentially. The resumption rate is half that, with new nodes being added every 10000 time units, but added in batches (one or more) with a geometric distribution with a mean of 1.6. The previous experiment shows that the network is stable as the size increases. In this experiment, we increase the failure and resumption rate from one every 5000 time units to one every 10 time units. The resumption rate is half the failure rate, and the new node batch mean is always 1.6. The population of the network is always 1000 sensors and 100 requesters, and the density factor is fixed at 0.4 The results are summarised in figures 15, 16, and 17.

**Figure 15:** Average percentage of time requesters had QoC in relation to mean time between failures.
[scale=0.65]images/failures2_QoC.eps

**Figure 16:** Subset of x axis from 0 to 1000 of average percentage of time requesters had QoC in relation to mean time between failures.
[scale=0.65]images/failures2_QoC_close.eps

**Figure 17:** Ratio of work packets to overall packets in relation to mean time between failures.
[scale=0.65]images/failures2_packets.eps

Our final experiment looks at how average QoC degrades as the sensors fail without replacement, and resumes as new sensors are are added. The experiments were run with a density factor of 0.4, varying the sensor node population from 0 to 1000. The failure and recovery rates are exponentially distributed with a rate of one every 5000 time units. Each experiment was run ten times, and all ten runs are shown to show the deviation of results in figure 18. The results show that BioANS degrades and recovers gracefully.

Utility based service selection is gaining interest in the Autonomic Computing community. Some of the current work on this assumes that the utility of services are per application and not shared between applications. The work that allows the sharing of contextual information however, assumes that bulky middle-ware will buffer this information and drive the self-management of the system therein.

**Figure 18:** Average percentage QoC as network resumes.
[scale=0.65]images/failures2_up.eps

Rajkumar et al. [16] propose a resource allocation model for QoS management within a single system. Resources include CPU utilisation, memory consumption, network bandwidth and latency. Each application delivers to the system the minimum resource requirements it has plus a utility function that returns the increase in performance given additional resources. The system then allocates resources to each application such that the total system utility is maximised.

The Context Toolkit [18] is a framework aimed at facilitating the development and deployment of context-aware applications. It was one of the first projects in this area and is often considered a reference framework which has inspired this work as well as many other projects. The Context Toolkit abstracts context services, e.g. a location service, from the sensors that acquire the necessary data to deliver the service. Thus, applications abstract from the sensors that provide the raw data necessary for determining context, and access context data through a network API. Further, the Context Toolkit allows sharing of context data through a distributed infrastructure and collection of storage data to create a history. However, unlike our work this middle-ware infrastructure is quite bulky thus not suitable for sensor applications where the infrastructure is deployed on the actual sensor nodes. Moreover, it does not provide any autonomicity in terms of allowing applications that enter the distributed environment to discover available services: the location of context services (IP address and port number) has to be known in advance. Also, there is no mechanism that allows context services to adapt and react to failure or degradation of the underlying sensor infrastructure, e.g. by switching to an alternative means of acquiring the same type of context. Cohen et al. have proposed

iQueue [7], a data-composition framework for pervasive data. iQueue allows applications to create data composers, specify a composer's data sources using functional data specification, and specify a composer's computation. Similar to our Requester, the iQueue run-time system selects data sources satisfying the data specifications, dynamically reselects data sources as appropriate. The goal is very similar to ours, although our approach somewhat different and again their middle-ware has not been designed in a lightweight fashion. They use a mechanism similar to our periodic re-tender request, in that a data source issues advertisements periodically, but also whenever properties of the data source, e.g. quality of information, change. It would appear that they use Boolean predicates over the values of the properties of the data source. Instead, we present a mathematical model based on application's wishes that evaluates each applications quantitative satisfaction with regard to any particular data source. These centralised solutions are not suitable for sensor networks as many of the nodes are too small to carry out this burden and it introduces a central point of failure to the system. Therefore we aimed to carry out the same functionality in a more lightweight and decentralised way, hence our bio-inspired approach. The inspiration for the methods to optimise ANS are stylistically bio-inspired. In [3] an emergent leader election algorithm is given whose communication style is based on the mechanics of pheremone based communication. Pheremone communication is essentially a broadcast, with no guarantee of delivery. The emergent leader election algorithm uses the inherent non-determinism of unreliable communication to make a very efficient algorithm for large size distributed systems. In [5] a similar use of the non-determinism of un-reliable communication is used to recruit idle nodes for distributed computation. Because of the similarity between the style of communication used in these works, and the type of communication we are restricted to in sensor nets, the optimized ANS is heavily inspired by these algorithms. The vast majority of WSN self-adaptation work has concentrated on the scalable and robust routing of packets from a source to a sink in a WSN whereas ANS is an applications level protocol. Conclusions

The autonomic protocol, ANS, describes all services provided by the WSN as contexts and the quality (QoC) of which this context can be delivered. Applications are then composed of sets of calls to the context providers which provide the most appropriate QoC. This paper therefore seeks to measure the trade-off between performance in terms of speed and quality of delivery against the overheads that the addition of an autonomic protocol adds to a highly distributed, resource scarce wireless sensor network application. To this end we took measurements obtained from a smaller scale WSN running ANS and applied them to a simulation model to observe how the protocol would operate under extreme conditions such as failure or very large numbers of nodes, which further allowed us to carry out partial validation of results. Our results show that the system under extremely dynamic circumstances with 90% state-flapping, all nodes get a QoC of 65% or above. Further extreme scaling to 10,000 nodes does not cause a drop in QoC below 94% but at a cost of a very high overhead. This issue was addressed by temporal, stop-bid and first sufficient heuristics being added to the protocol�s operation reducing overheads from 85% to 25% without affecting negotiation time or QoC. When wireless fading and neighbourhood clustering was taken into account (ie location) the redundancy was lessened and thus the choice of nodes to adapt to. However interestingly at even low densities of nodes per location the QoC remained high whilst overheads lessened considerably. In conclusion the experiments demonstrate that the bio-inspired optimisations of the basic ANS provide a stable, highly scalable and robust protocol that has general applicability to a wide range of applications in sensor networks and similar resource constrained domains.