scholarly journals A Javaspace-Based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications

Author(s):  
Virginie Galtier ◽  
Constantinos Makassikis ◽  
Stephane Vialle
2021 ◽  
Vol 20 (5s) ◽  
pp. 1-22
Author(s):  
Haoran Li ◽  
Chenyang Lu ◽  
Christopher D. Gill

Fault-tolerant coordination services have been widely used in distributed applications in cloud environments. Recent years have witnessed the emergence of time-sensitive applications deployed in edge computing environments, which introduces both challenges and opportunities for coordination services. On one hand, coordination services must recover from failures in a timely manner. On the other hand, edge computing employs local networked platforms that can be exploited to achieve timely recovery. In this work, we first identify the limitations of the leader election and recovery protocols underlying Apache ZooKeeper, the prevailing open-source coordination service. To reduce recovery latency from leader failures, we then design RT-Zookeeper with a set of novel features including a fast-convergence election protocol, a quorum channel notification mechanism, and a distributed epoch persistence protocol. We have implemented RT-Zookeeper based on ZooKeeper version 3.5.8. Empirical evaluation shows that RT-ZooKeeper achieves 91% reduction in maximum recovery latency in comparison to ZooKeeper. Furthermore, a case study demonstrates that fast failure recovery in RT-ZooKeeper can benefit a common messaging service like Kafka in terms of message latency.


2012 ◽  
Vol 157-158 ◽  
pp. 839-842 ◽  
Author(s):  
Ya Li ◽  
Hai Rui Wang ◽  
Xiong Tong ◽  
Li Zhang

The paper addresses the problem of flexible Workflow Management Systems (WFMS) in distributed environment. Concerning the serious deficiency of flexibility in the current workflow systems, we describe how our workflow system meets the requirements of interoperability, scalability, flexibility, dependability and adaptability. With an additional route engine, the execution path will be adjusted dynamically according to the execution conditions so as to improve the flexibility and dependability of the system. A dynamic register mechanism of domain engines is introduced to improve the scalability and adaptability of the system. The system is general purpose and open: it has been designed and implemented as a set of CORBA services. The system serves as an example of the use of middleware technologies to provide a fault-tolerant execution environment for long running distributed applications. The system also provides a mechanism for communication of distributed components in order to support inter-organizational WFMS.


Sign in / Sign up

Export Citation Format

Share Document