'Amico' - special cases

Original picture bitmap source:www.publicdomainpictures.net. License: CC0 Public domain

⇔ Program

So far presented ideas and recipes that may work with any type of workload. Today we'll focus on a few special cases, that may not be of interest for everyone, so we present them in order of decreasing generality:

  1. Brief refresher - the case of the seasoned Fortran executable
  2. Docker, docker universe
  3. Common dependencies in 'Amico' and how to require them
  4. Challenges and opportunities for parallel execution in opportunistic environment (HP-HTC).
  5. MPI job cases

⇔ How was this past night ? Sweet dreams ?

Testing S3FS  

Too rapidly going out of fashion: the Standard Universe.

  • The issues with code dependencies and distributed I/O that we have been dealing with so far are of course nothing new. When the world was somewhat simpler (the era of commercial UNIXes) the 'standard' way of addressing the issue in Condor (slated to disappear in the next major version unless more people start crying) was based on:
    • Static linking.
    • Automatic redirection of I/O (normally to the submit node) via an interposition library.
    • Automatic (physical) checkpoint&resume by dumping the entire process virtual memory space (remember the limits of checkpointing we covered yesterday).

Your first (?) Docker® container

Listing Docker® Hub image versions

Creating custom Docker® images

Structure of Docker® images

The local Docker® registry - dr.mi.infn.it

Docker® containers

Available 'Amico' dependencies

DependencyRequirement Available (partitionable) Machines (*)
CVMFS mounted (from ATLAS experiment servers) HasCVMFS 44
Local installation of Java® (check JavaVendor, JavaVersion, JavaSpecificationVersion) HasJava 87
Local installation of Docker® with the ability to start jobs by the 'condor' user. HasDocker 11
Ability to handle Parallel Universe jobs (regardless of any local installation of MPI!) HasMPI 92
(*) Computed by condor_status -pool superpool-cm -constraint HasXXXX
  as of February 14th, 2019.

Submitting a 'Docker Universe' job

Amor, ch'a nullo amato amar perdona,
pur un linguaggio nel mondo non s'usa.   
prese costui de la bella persona

che 'l tien legato, o anima confusa,
quando la brina in su la terra assempra
Poi disse a me: <<Elli stessi s'accusa;
che 'l sole i crin sotto l'Aquario tempra
mostro` gia` mai con tutta l'Etiopia
ma poco dura a la sua penna tempra,

Tra questa cruda e tristissima copia
la via e` lunga e 'l cammino e` malvagio,
sanza sperar pertugio o elitropia:

⇔ Parallel "Universe" - main ingredients

  • An MPI (or generically parallel) job can be executed:
    • On N local CPUs/cores or slots
    • On a collection of co-scheduled CPUs/cores or slots (HPC)
  • 'HTC'ondor is not an HPC tool:
    • But it does support both styles of execution, the first one being preferred and simpler:
      Universe = Vanilla Executable = /path/to/mpirun Request_Cpus = 2
    • The second style ("Parallel Universe" or "HTHPC") requires one "Dedicated Scheduler", which makes the picture significantly more complex.

The "Dedicated" Scheduler for a given resource must be one

  • The "Dedicated" scheduler collects records of available machines and tries to claim enough immediately available slots to service pending "Parallel" universe requests.
  • A second "Dedicated" scheduler would enter a lose-lose race with the first one in amassing the claims.
    ⇒ A given job starter can report to one, and only one "Dedicated" scheduler.
  • Within "Amico", we configure one "Dedicated" scheduler per independent cluster, so that "Parallel" Universe jobs have to be submitted locally within a given cluster and cannot migrate.

Try running an MPI job

The real Parallel Universe

'Vanilla' MPI jobs can (of course) run on Docker

Bingo. Eat road-runner!

19 February 2019 05:49:21 PM

  C version.
  Estimate a solution of the wave equation using MPI.

  Using 4 processes.
  Using a total of 1001 points.
  Using 20000 time steps of size 0.0004. 
  Computing final solution at time 8

    I      X     F(X)   Exact

    0   0.000   0.000   0.000
    1   0.001   0.006   0.006
    2   0.002   0.013   0.013
  (... etc. etc. etc. ...)

⇔ Thank you for staying with us!

< Goto Page: >