Providing a tightly-coupled parallel system with support for load sharing poses some problems related to the nature of inter-processor communication and task granularity. In a recent work, the authors have proposed a hybrid adaptive load sharing algorithm for distributed-memory systems based on a centralized component, the broker. Simulations have shown that the proposed algorithm performs remarkably well and does not suffer from scalability problems for a wide range of operating conditions. In order to make the hybrid algorithm behave efficiently on a shared-memory parallel system, where the availability of faster communication makes it feasible to implement task migration and to use a finer task granularity, we have devised a hardware implementation of the broker component upon which the algorithm is based. The hardware broker, which is seen as a low-cost, additional peripheral in the system, is able to improve the performance, with respect to a software implementation, by at least two orders of magnitude. This makes it possible to run the centralized part of our load sharing algorithm in one bus cycle and deal with task granularities in the milliseconds range and systems with 50… 100 nodes. In this paper we present two different architectures for the broker, and discuss their simulated performance in the use of our load sharing algorithm on multiprocessor systems.
Hardware support for load sharing in parallel systems
AVVENUTI, MARCO;RIZZO, LUIGI;
1996-01-01
Abstract
Providing a tightly-coupled parallel system with support for load sharing poses some problems related to the nature of inter-processor communication and task granularity. In a recent work, the authors have proposed a hybrid adaptive load sharing algorithm for distributed-memory systems based on a centralized component, the broker. Simulations have shown that the proposed algorithm performs remarkably well and does not suffer from scalability problems for a wide range of operating conditions. In order to make the hybrid algorithm behave efficiently on a shared-memory parallel system, where the availability of faster communication makes it feasible to implement task migration and to use a finer task granularity, we have devised a hardware implementation of the broker component upon which the algorithm is based. The hardware broker, which is seen as a low-cost, additional peripheral in the system, is able to improve the performance, with respect to a software implementation, by at least two orders of magnitude. This makes it possible to run the centralized part of our load sharing algorithm in one bus cycle and deal with task granularities in the milliseconds range and systems with 50… 100 nodes. In this paper we present two different architectures for the broker, and discuss their simulated performance in the use of our load sharing algorithm on multiprocessor systems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.