Data Prefetching and Eviction Mechanisms of In-Memory Storage Systems Based on Scheduling for Big Data Processing

Chien Hung Chen*, Ting Yuan Hsia, Yennun Huang, Sy Yen Kuo

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

9 Scopus citations

Abstract

In-memory techniques keep data into faster and more expensive storage media for improving performance of big data processing. However, existing mechanisms do not consider how to expedite the data processing applications that access the input datasets only once. Another problem is how to reclaim memory without affecting other running applications. In this paper, we provide scheduling-aware data prefetching and eviction mechanisms based on Spark, Alluxio, and Hadoop. The mechanisms prefetch data and release memory resources based on the scheduling information. A mathematical method is proposed for maximizing the reduction of data access time. To make the mechanisms applicable in large-scale environments, we propose a heuristic algorithm to reduce the computational time. Furthermore, an enhanced version of the heuristic algorithm is also proposed to increase the amount of prefetched data. Finally, we perform real-testbed and simulation experiments to show the effectiveness of the proposed mechanisms.

Original languageEnglish
Article number8611384
Pages (from-to)1738-1752
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume30
Issue number8
DOIs
StatePublished - 01 08 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • Big data processing
  • data eviction
  • data prefetching
  • in-memory systems
  • scheduling information

Fingerprint

Dive into the research topics of 'Data Prefetching and Eviction Mechanisms of In-Memory Storage Systems Based on Scheduling for Big Data Processing'. Together they form a unique fingerprint.

Cite this