GraphD: Distributed vertex-centric graph processing beyond the memory limit

Academic Article

Abstract

  • © 2017 IEEE. Personal use is permitted. We propose GraphD, an out-of-core Pregel-like system targeting efficient big graph processing with a small cluster of commodity PCs connected by Gigabit Ethernet, an environment affordable to most users. This is in contrast to some recent efforts for out-of-core graph computation with specialized hardware. In our setting, a vertex-centric program is often data-intensive, since the CPU cost of calculating a message value is negligible compared with the network cost of transmitting that message. As a result, network bandwidth is usually the bottleneck, and out-of-core execution would not sacrifice performance if disk IO overhead can be hidden by message transmission, which is achieved by GraphD through the parallelism of computation and communication. GraphD streams edge and message data on local disks, and thus consumes negligible memory space. For a broad class of Pregel algorithms where message combiner is applicable, GraphD completely eliminates the need of any expensive external-memory join or group-by, which is required by existing systems such as Pregelix and Chaos. Extensive experiments show that GraphD beats existing out-of-core systems by orders of magnitude, and achieves comparable performance to in-memory systems running with adequate memory.
  • Authors

    Digital Object Identifier (doi)

    Author List

  • Yan D; Huang Y; Liu M; Chen H; Cheng J; Wu H; Zhang C
  • Start Page

  • 99
  • End Page

  • 114
  • Volume

  • 29
  • Issue

  • 1