Abstract
Panda is a high-performance library for accessing large multidimensional array data on secondary storage of parallel platforms and networks of workstations. When using Panda as the I/O component of a scientific application, H3expresso, on the IBM SP2 at Cornell Theory Center, we found that some nodes are more powerful with respect to I/O than others, requiring the introduction of load balancing techniques to maintain high performance. We expect that heterogeneity will also be a big issue for DBMSs or parallel I/O libraries designed for scientific applications running on networks of workstations, and the methods of allocating data to servers in these environments will need to be upgraded to take heterogeneity into account, while still allowing users to exert control over data layout. We propose such an approach to load balancing, under which we respect the user's choice of high-level disk layout, but introduce automatic subchunking. The use of subchunks allows us to divide the very large chunks typically specified by the user's disk layout into more manageable-size units that can be allocated to I/O nodes in a manner that fairly distributes the load. We also present two techniques for allocating sub-chunks to nodes, static and dynamic, and evaluate their performance on the SP2.
Original language | English |
---|---|
Pages | 79-90 |
Number of pages | 12 |
State | Published - 1997 |
Externally published | Yes |
Event | Proceedings of the 1997 9th International Conference on Scientific and Statistical Database Management - Olympia, WA, USA Duration: 11 08 1997 → 13 08 1997 |
Conference
Conference | Proceedings of the 1997 9th International Conference on Scientific and Statistical Database Management |
---|---|
City | Olympia, WA, USA |
Period | 11/08/97 → 13/08/97 |