Abstract
In-memory techniques keep data into faster and more expensive storage media for improving performance of big data processing. However, existing mechanisms do not consider how to expedite the data processing applications that access the input datasets only once. Another problem is how to reclaim memory without affecting other running applications. In this paper, we provide scheduling-aware data prefetching and eviction mechanisms based on Spark, Alluxio, and Hadoop. The mechanisms prefetch data and release memory resources based on the scheduling information. A mathematical method is proposed for maximizing the reduction of data access time. To make the mechanisms applicable in large-scale environments, we propose a heuristic algorithm to reduce the computational time. Furthermore, an enhanced version of the heuristic algorithm is also proposed to increase the amount of prefetched data. Finally, we perform real-testbed and simulation experiments to show the effectiveness of the proposed mechanisms.
| Original language | English |
|---|---|
| Article number | 8611384 |
| Pages (from-to) | 1738-1752 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Parallel and Distributed Systems |
| Volume | 30 |
| Issue number | 8 |
| DOIs | |
| State | Published - 01 08 2019 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- Big data processing
- data eviction
- data prefetching
- in-memory systems
- scheduling information