Abstract
The Hadoop enabled cloud platforms are gradually becoming preferred computational environment to execute scientific big data workloads in a periodic manner. However, it is observed that the default data placement approach of such cloud platforms is not the efficient one and often ends up with significant data transfer overhead leading to degradation of the overall job completion time. In this article, a Resource and Network-aware Data Placement Algorithm (RENDA) is proposed to reduce the non-local executions and thereby reduce the overall job completion time for periodic workloads in the cloud environment. The entire job execution is modeled as a two-stage execution characterized as data distribution and data processing. The RENDA reduces the time of the stages as mentioned above by estimating the heterogeneous performance of the nodes on a real-time basis followed by careful allocation of data in several installments to participating nodes. The experimental results show that the proposed RENDA algorithm consistently outperforms over the recent state-of-the-art alternatives with as much as 28 percent reduction in data transfer overhead leading to 16 percent reduction in average job completion time with 27 percent average speedup on average job execution.
Original language | English |
---|---|
Article number | 9431731 |
Pages (from-to) | 2906-2920 |
Number of pages | 15 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 32 |
Issue number | 12 |
DOIs | |
State | Published - 01 12 2021 |
Bibliographical note
Publisher Copyright:© 1990-2012 IEEE.
Keywords
- MapReduce
- cloud computing
- data placement
- periodic workloads