RefDoc
Haut

Faire une nouvelle recherche
Make a new search
Lancer la recherche


Titre du document / Document title

An Efficient I/O Aggregator Assignment Scheme for Multi-Core Cluster Systems

Auteur(s) / Author(s)

CHA Kwangho (1) ;

Affiliation(s) du ou des auteurs / Author(s) Affiliation(s)

(1) Supercomputing Center, Korea Institute of Science and Technology Information(KISTI), Daejeon, COREE, REPUBLIQUE DE

Résumé / Abstract

As the number of nodes in high-performance computing (HPC) systems increases, parallel I/O becomes an important issue: collective I/O is the specialized parallel I/O that provides the function of single-file based parallel I/O. Collective I/O in most message passing interface (MPI) libraries follows a two-phase I/O scheme in which the particular processes, namely I/O aggregators, perform important roles by engaging the communications and I/O operations. This approach, however, is based on a single-core architecture. Because modern HPC systems use multi-core computational nodes, the roles of I/O aggregators need to be re-evaluated. Although there have been many previous studies that have focused on the improvement of the performance of collective I/O, it is difficult to locate a study regarding the assignment scheme for I/O aggregators that considers multi-core architectures. In this research, it was discovered that the communication costs in collective I/O differed according to the placement of the I/O aggregators, where each node had multiple I/O aggregators. The performance with the two processor affinity rules was measured and the results demonstrated that the distributed affinity rule used to locate the I/O aggregators in different sockets was appropriate for collective I/O. Because there may be some applications that cannot use the distributed affinity rule, the collective I/O scheme was modified in order to guarantee the appropriate placement of the I/O aggregators for the accumulated affinity rule. The performance of the proposed scheme was examined using two Linux cluster systems, and the results demonstrated that the performance improvements were more clearly evident when the computational node of a given cluster system had a complicated architecture. Under the accumulated affinity rule, the performance improvements between the proposed scheme and the original MPI-IO were up to approximately 26.25% for the read operation and up to approximately 31.27% for the write operation.

Revue / Journal Title

IEICE transactions on information and systems    ISSN  0916-8532 

Source / Source

2013, vol. 96, no2, pp. 259-269 [11 page(s) (article)] (34 ref.)

Langue / Language

Anglais

Editeur / Publisher

Oxford University Press, Oxford, ROYAUME-UNI  (1992) (Revue)

Mots-clés anglais / English Keywords

Supercomputing

;

Integrated circuit

;

Socket

;

Processor

;

Positioning

;

Performance evaluation

;

Message passing

;

Mots-clés français / French Keywords

Calcul intensif

;

Circuit intégré

;

Connecteur logiciel

;

Processeur

;

Positionnement

;

Evaluation performance

;

Envoi message

;

Mots-clés espagnols / Spanish Keywords

Circuito integrado

;

Conector software

;

Procesador

;

Posicionamiento

;

Evaluación prestación

;

Mots-clés d'auteur / Author Keywords

collective I/O

;

parallel I/O

;

processor affinity

;

Localisation / Location

INIST-CNRS, Cote INIST : 7315 E4, 35400018253428.0090

Nº notice refdoc (ud4) : 26899730



Faire une nouvelle recherche
Make a new search
Lancer la recherche
Bas