Probabilistic resource allocation in heterogeneous distributed systems with random failures

Academic Article


  • The problem of finding efficient workload distribution techniques is becoming increasingly important today for heterogeneous distributed systems where the availability of compute nodes may change spontaneously over time. Resource-allocation policies designed for such systems should maximize the performance and, at the same time, be robust against failure and recovery of compute nodes. Such a policy, based on the concepts of the Derman-Lieberman-Ross theorem, is proposed in this work, and is applied to a simulated model of a dedicated system composed of a set of heterogeneous image processing servers. Assuming that each image results in a "reward" if its processing is completed before a certain deadline, the goal for the resource allocation policy is to maximize the expected cumulative reward. An extensive analysis was done to study the performance of the proposed policy and compare it with the performance of some existing policies adapted to this environment. Our experiments conducted for various types of task-machine heterogeneity illustrate the potential of our method for solving resource allocation problems in a broad spectrum of distributed systems that experience high failure rates. © 2012 Elsevier Inc. All rights reserved.
  • Digital Object Identifier (doi)

    Author List

  • Shestak V; Chong EKP; MacIejewski AA; Siegel HJ
  • Start Page

  • 1186
  • End Page

  • 1194
  • Volume

  • 72
  • Issue

  • 10