Hybrid Preemptive Scheduling of mpi applications

Download 445 b.
Hajmi445 b.

Hybrid Preemptive Scheduling of MPI applications

    • Aurélien Bouteiller, Hinde Lilia Bouziane, Thomas Hérault,
    • Pierre Lemarinier, Franck Cappello
    • MPICH-V team
    • INRIA Grand-Large
    • LRI, University Paris South

Problem definition

  • Context: Clusters and Grids (made of clusters) shared by many users

  • (less available resources than required at a given time)

  • In this study : finite sets of MPI applications.

  • Time sharing of parallel applications is attractive to increase fairness between users, compared to Batch scheduling

  • It is very likely that several applications will reside in the virtual memory at the same time, exceeding the total physical memory

  •  Out-of-core scheduling of parallel applications on clusters! (scheduling // applications on cluster under mem. constraint)

  • Most of the proposed approaches tries to avoid this situation (by limiting job admission based on mem. requirement, delaying some jobs

  • unpredictably if the jobs exec. time is not known)

  • Issue: Novel approach (out-of-core) that avoid delaying some jobs?

  • Constraint: No OS modification (no kernel patch)


Related work 1


  • Introduction (related work)

  • A Hybrid approach dedicated to out-of-core

  • Evaluation

  • Concluding remarks

Our approach 1/2: Hybrid

Our approach 2/2: Checkpointing

Implementation using MPICH-V Framework

Coordinated Checkpoint: 2 ways

MPICH-V/CL protocol

Implementation details


  • Introduction (related work)

  • A Hybride approach dedicated to out-of-core

  • Evaluation

  • Concluding remarks


  • LRI cluster:

    • Athlon 1800+
    • 1GB memory
    • IDE ATA100 Disc
    • Ethernet 100Mbs
    • Linux 2.4.2
  • Benchmark (MPI):

    • NAS BT (computation bound)
    • NAS CG (communication bound)
  • Time measurement:

    • Homogeneous Applications
    • Simultaneous launch (scripts)
    • Time is measured between the first launch and the last termination
    • Fairness is measured by response time standard deviation
  • Gang Scheduling time slice: 200 or 600 sec

    • Gang sched. also implemented by checkpointing (not OS signal)

Context switch overlap policy

Co VS. Gang (Ckpt based)

Ckpt Gang VS. Ckpt Hybrid

Overhead comparison

Co-scheduling Fairness (Linux)


  • Introduction (related work)

  • A Hybrid approach dedicated to out-of-core

  • Evaluation

  • Concluding remarks

Concluding remarks

  • Checkpoint based Gang Scheduling outperforms Co-scheduling and certainly classical (OS signal based) Gang scheduling on out-of-core situation (thanks to a better memory management)

  • Compared to known approaches, based on job admission control, the benefit of ckpt is that it avoids to delay some jobs

  • Hybrid scheduling, combining the two approaches + checkpointing, outperforms Gang scheduling on BT (presumably thanks to overlapping communications and computations)

  • More generally, Hybrid scheduling can take advantage of advanced co-scheduling approaches within a gang subset

  • Work in progress:

  • Test with other applications / benchmarks

  • Compare with traditional gang scheduling based on OS signals

  • Experiments with high speed networks

  • Experiments on Hybrid scheduling with Co-scheduling optimizations

Meet us! at the INRIA booth 2345


Is result for in-core situation Kernel dependent (Linux)?

Do'stlaringiz bilan baham:

Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2017
ma'muriyatiga murojaat qiling