DAISY is a demonstration platform developed within the DSoS European Project (Dependable Systems of Systems) that focuses on the use of COTS (Commercials Off The Shelf) technologies and reflection to build highly dependable systems. It uses CORBA portable interceptors, kernel-level reflection techniques, and system library interception in order to harden a distributed application built out standard software components (CORBA, Java Virtual Machine (JVM), Linux).
COTS Software is increasingly used in large and complex systems that have high dependability requirements (Telecom, Automotive, Space, Railways). Very frequently, COTS serve as foundation for the considered systems and build the executive layers on which value-adding applications can be built.
This central role of COTS raises many problems regarding the robustness and reliability of the resulting platforms. Indeed, high dependability is usually not a primary target of COTS providers, and the behaviors of COTS executive layer in presence of faults is questionable. Based on our long experience in COTS characterization and fault containment (for instance in the FRIENDS and MAFALDA projects) we exemplify in DAISY how wrapping technologies and reflective features can be used to build flexible and robust distributed systems using market components.
To exemplify our approach, we implemented a mini client-server banking application. This application is tolerant to crash faults thanks to a classical Primary-Backup Replication scheme (also known as "passive replication") that we transparently integrated with the application using standard CORBA interceptors. (Thus achieving a high degree of separation of concerns.)
Figure 1 shows the resulting architecture we obtained. PIS and PIC denote the portable interceptors of the servers and the client respectively. The three interceptors (one for the client, and two for the servers) monitor the passive replication by synchronizing the different checkpoints between the backup and primary with client requests, and providing error detection.
One major weakness of such an architecture is that is does not support any failures but plain crashes. Because the failure modes of COTS components must be assumed unrestrained if one has no precise knowledge regarding their internal dependability, a plain primary-backup scheme alone cannot be trusted. Fortunately using wrapping and reflective features, this limitation can be upheld to a large extent, as we show in the following.
Figure 2 shows the actual configuration of the system we used during the DSoS dissemination day. The 2 server processes run on a rack in Toulouse, France, while the client is installed on a laptop in the conference room in Vienna, Austria. This figure illustrates the diversity of the different software and hardware components at stake. This diversity is of high interest for fault-tolerance because it tends to eliminate correlated failure modes (i.e. the primary and the backup fail because of the same single cause). This can be seen as a cheap and "degenerated" version of N-version programming. This diversity, however, comes at the price of increased complexity.
Among the various wrappers that could be implemented using the proposed framework, we selected to wrap thread synchronization and communication. CORBA and Java implementations usually make intensive use of multi-threading facilities. A fault affecting the behavior of synchronization facilities can have severe consequences on multi-threaded entities, thus impacting various layers of the system. In particular, mutex locks must behave correctly.
Figure 3 illustrates the functioning of the mutex wrapper. This wrappers continuously checks an invariant property that must hold for each mutex used by the above layers. This invariant is derived from the definition of mutex semaphores and related operations, and expressed using Dijkstra terminology. #P(s) denotes the number of invocations on P(s) (lock operation on mutex s), #V(s) the number of release invocations on s, #Q(s) the number of threads blocked on a P operation for s, and #C(s) the number of threads that possess s. Mutual exclusion implies that at most one thread can possess s, i.e. #C(s) <= 1.
The formula is a simple balance equation on threads interacting with mutex s (similar to those found in fluid mechanics for instance). #P(s) - #V(s) is the number of threads interacting with the mutex at a given time. These thread are either in the queue (#Q(s)) or possess the semaphore (#C(s)), hence the resulting formula: #P(s) - #V(s) = #Q(s) + #C(s). To perform the evaluation of this expression, the platform must provide #P, #V, #C and #Q. The former, namely #P and #V, can easily be obtained using library interposition techniques, a conventional approach to intercept operating system calls. The latter, namely #Q and #C impose to introspect the operating system kernel as this information is normally not available. We implemented this introspection by inserting a reflective kernel module providing a Get_#Q() and a Get_#C() functions into Linux.
Many situations may render this formula false, all breaking the mutual exclusion semantics. One of these situations is when the mutex gets "lost"; i.e. a V (release) operation does not work properly, and does not actually release the mutex. If this happens, all subsequent threads that ask for the mutex are blocked indefinitely, resulting in a partial hang of the application. Depending of the blocked threads, this may result in the server blocking on client invocations indefinitely (server hang). As explained before, this kind of failure is not tolerated by a plain wrapper-less primary backup replication scheme. In the following part, we show step by step how thanks to this mutex wrapper this particular fault can now be tolerated by the DAISY platform.
We show here step by step the functioning of the mutex wrappers. We first launch the distributed application (backup, primary, client, plus a small demo monitoring facility); we run a series of actions without faults and observe the exchange of state information between the replicated servers. In a second phase, we activate the mutex wrapper, and inject a general mutex fault during the same series of actions. Without wrapping, this fault would freeze both client and primary, and go undetected by the primary/backup mechanisms. Thanks to the wrapper, the freeze is detected and the failure mode is converted into crash-fail behavior, leading to the transparent recovery of the whole application by the backup.Step 1: Initialization:
At initialization, both the primary and the backup are launched. We developed a small monitoring facility to toggle dynamically the wrapping facility of the platform and to inject fault artificially. On Figure 4, the backup is running on a machine called perth.laas.fr, while the primary runs on canberra.laas.fr. The demo monitoring facility in shown in front the two windows.
Figure 5 gives a closer look at this monitoring facility. This facility incorporates both injecting features and reflective capabilities to control and adapt the fault-tolerant behavior of the application. As such it can be seen as a kind of meta-interface for the servers. This monitoring facility does not run on the same machine as the replicated servers, and communicates with them using a TCP/IP connection on port 7700. (See Figure 5.)
The GUI of the client is shown on Figure 6. The client provides basic account management functions: creation, deletion, deposit, withdrawal, balance. That kind of application provides both a dynamic and non-trivial state-structure, which is quite interesting in our case.
We launch a series of operation without injected faults (Figures 7 and 8):
Creation of account 'marco' Deposit of 50 on 'marco' Balance of 'marco' Deposit of 50 on 'marco' Balance of 'marco' Withdrawal of 100 from 'marco' Balance of 'marco'
For each operation, the primary checkpoints its state and communicates it to its backup. The portable interceptor of the backup server ("PISb") receives this checkpoint information and modifies its own internal state accordingly.
We activate the mutex wrapper using the demo monitoring facilities. This means that the invariant property #P(s) - #V(s) = #Q(s) + #C(s) is now checked for each mutex on all mutex operations. (Figure 9)
We now launch the same series of actions as during Step 2, and
inject a general mutex fault into the primary using the demo
monitoring facility. This fault is immediately detected by the mutex
wrapper (implemented in our example by the interception library
libuspi), which crashes the primary (Figure
10) , thus triggering the switch primary-backup
Right now, requests made on the servers are serialized, which greatly simplifies the checkpoint algorithms and the state capture mechanisms. Checkpoints are taken before any reply to a client, which insures that a sent reply can always be recovered without requiring the client to roll-back as well. (This avoids the classical domino effect, which is not acceptable in our case since we consider the client state and actions to be outside the sphere of control of our mechanisms.)
Serializing request processing is straightforward but inhibits any advantage of multi-threading, notably w.r.t. availability and throughput. As already acknowledged by many research work, multi-threading raises two main challenges: non-determinism (of prime importance for active replication schemes like Triple Modular Redundancy), and state-restoration (problematic because of the opacity of the layered executive platform, and the entangling of state dependencies between the layers).
We've addressed those challenges with a new approach we termed "Multi-Level Reflection" (or Multi-Layer Reflection) in the following articles:
Principles of Multi-Level Reflection for Fault-Tolerant Architectures
François Taïani, Jean-Charles Fabre, Marc-Olivier Killijian, The 2002 Pacific Rim International Symposium on Dependable Computing (PRDC'2002), Tsukuba (Japon), 16-18 Décembre 2002, pp.59-66 (8 p.), abstract, complete document, talk, doi: http://doi.ieeecomputersociety.org/10.1109/PRDC.2002.1185619.
Towards Implementing Multi-Layer Reflection for Fault-Tolerance
François Taïani, Jean-Charles Fabre, Marc-Olivier Killijian, The International Conference on Dependable Systems and Networks (DSN'2003), San Francisco, CA, June 22nd-25th, 2003, pp.435-444 (10 p.), abstract, complete document, talk, doi: http://doi.ieeecomputersociety.org/10.1109/DSN.2003.1209954.
This work allowed the specification of a multi-layer meta-interface targeting those problems, and we are now working on a prototype implementation within DAISY of the ideas we have developed.