Abstract
Big-data computing is a new critical challenge for the ICT industry. Engineers and researchers are dealing with data sets of petabyte scale in the cloud computing paradigm. Thus, the demand for building a service stack to distribute, manage, and process massive data sets has risen drastically. In this paper, we investigate the Big Data Broadcasting problem for a single source node to broadcast a big chunk of data to a set of nodes with the objective of minimizing the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed datacenters. This problem is one of the fundamental problems in distributed computing and is known to be NP-hard in heterogeneous environments. We model the Big-data broadcasting problem into a LockStep Broadcast Tree (LSBT) problem. The main idea of the LSBT model is to define a basic unit of upload bandwidth, r, such that a node with capacity c broadcasts data to a set of c/r children at the rate r. Note that r is a parameter to be optimized as part of the LSBT problem. We further divide the broadcast data into m chunks. These data chunks can then be broadcast down the LSBT in a pipeline manner. In a homogeneous network environment in which each node has the same upload capacity c, we show that the optimal uplink rate r∗ of LSBT is either c/2 or c/3, whichever gives the smaller maximum completion time. For heterogeneous environments, we present an O(nlog2n) algorithm to select an optimal uplink rate r∗ and to construct an optimal LSBT. Numerical results show that our approach performs well with less maximum completion time and lower computational complexity than other efficient solutions in literature.
| Original language | English |
|---|---|
| Article number | 7202863 |
| Pages (from-to) | 17-28 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| Volume | 28 |
| Issue number | 1 |
| DOIs | |
| State | Published - 01 01 2016 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2015 IEEE.
Keywords
- Big data computing
- big data management
- cloud computing
- data delivery algorithm
- distributed computing