Original picture bitmap source:www.publicdomainpictures.net. License: CC0 Public domain
→ Conceptual introduction:
- Distributed computing and storage: features and expectation management
- Opportunistic computing: survival techniques
→ Practical introduction - available tools:
→ Job examples - in increasing order of complexity:
- Start with a rare specimen: hello, world.
- Assembly of the submit file
- Job submission and monitoring
- What to do when things go wrong
- Add file transfer via "sandbox".
- Multiple/parametric job submission and control.
- File access via Object Storage
- Script submission - Object Storage file staging
- Interactive jobs
→ More complex cases (tomorrow...):
- Let jobs be restarted from scratch:
- Make sure an interrupted job doesn't leave any state behind (in files or databases).
- Partition the job in small-enough units of execution.
- Logical checkpointing:
- Periodically save enough state (in a local file or external database) to safely resume the execution from a known rescue point.
- This will help in reducing cycle waste.
- Physical checkpointing:
- Save (and restore) the entire job virtual memory state. Either via the (soon to be phased out) HTCondor standard universe, or via Docker+CRIU.
- Knowing that there are a few things that cannot be physically
- Data in transit on the network.
- Data cached by the OS on open files.
- OS Inter-Process Communications (IPC) structures.
- Kernel-level threads.
- File locks.
- Alarms, timers.
s3command ? Please grab a complimentary static x86_64 executable here. The tool is configurable via appropriate environment variables:
Here's a few examples - let's try them:
The s3 command is a bit picky about the ACL file format:
|Tools that have been found useful:|
Before entering into the details of how jobs are submitted and
controlled, let's focus on some terminology to describe the
computing resources available in the friendly 'Amico' clusters:
- HTCondor is based around a concept called “Fair Share”:
- Assumes users are competing for resources
- Aims for long-term fairness
- Available compute resources are “The Pie”
- Users, with their relative priorities, are each trying to get their “Pie Slice” (as long as job and machine requirements/"preferences" are satisfied)
- First, the Matchmaker takes some jobs from each user and finds resources for them.
- After all users have got their initial “Pie Slice”, if there are still more jobs and resources, the matchmaker continues “spinning the pie” and handing out resources until everything is matched.
- A user who didn't submit many jobs recently will get larger pie slices for some time.
- HTCondor tracks usage and has a formula for determining priority based on both current demand and prior usage - prior usage exponentially "decays” over time.
|proof-pool||proof[-XX].mi.infn.it (any node)||HEP - ATLAS|
List available 'Amico' resources: condor_status -pool superpool-cm.
-l' option returns all available machine attributes, all in 'classified ad' (or ClassAd) format:
The -af ('autoformat') option is useful to list specific attributes (-af:l shows the attribute names, -af:th formats a nice table):
Requirements' expression to select resources to match and execute jobs.
The reference for the ClassAd language is found in the HTCondor manual, here
- Have your executable ready. Runnable from the command line
and with no need for interactive input. What about
- Prepare a
- Note: we explore a few examples of submit files here, but there's a rich reference section on submit files in the HTCondor manual
- Submit the job:
$ condor_submit hello_world_submit Submitting job(s). 1 job(s) submitted to cluster 6376.
- (optional) If looking at the user log sounds too tedious (but we can use this time better):
condor_wait tutorial_jobs.log [job ID]
Why is my job not starting ? - The Requirements expression
Why did my job end up in a 'Held' state ?
Just find the reason in the job ClassAd (
work just as for machine ClassAds):
When the job reports errors on
stderr, don't forget to fetch
Error = /path/to/stderr_file
For read-only POSIX access another option would be
a web-based file system (e.g. CVMFS,
The limits shown in the diagram are back-of-the-envelope suggestions - many more-specific cases may require custom attention.
- Let's try another example requiring an input file.
- Note: all new files found in the job execution area (but not in
any subdirectory) will be transferred back by default if
transfer_output_filesis not specified, as demonstrated in the next (Executable, Source, Input 1, Input 2)
- When output files are written for the entire duration of
the jobs (e.g. to save a logical checkpoint for later recovery), we need
to set also:
When_To_Transfer_Output = On_Exit_Or_Evict
- Implies knowing the needs of our jobs:
- requesting too little: causes problems for your and other jobs + jobs might end up being held by Condor
- requesting too much: jobs will match to fewer “slots”
- Condor measures and logs resource usage:
$ condor_history -l 6394 -af:l DiskUsage ResidentSetSize\ ExecutableSize RemoteUserCpu BytesRecvd ResidentSetSize = 0 ExecutableSize = 1750 RemoteUserCpu = 0.0 BytesRecvd = 2196498.0 DiskUsage = 2250
- Certain requests have to be specified in the Job Submit file:
request_cpus = 1 request_memory = 20MB request_disk = 1MB
- Requests can be also made in the Requirements and Rank submit file expressions.
- They can be made against any attribute in the machine ad -
but CPU, Memory and Disk requirements will fail in certain cases when specified here:
# Select machines by name Requirements = Regexp("node",Machine) # Select machines by attribute(s) Requirements = ClusterName == "magi-pool" && HasJava # And prefer faster machines Rank = Mips
- Remember: smaller execution units get better goodput on
so do prefer running many independent jobs:
- to analyze multiple data files
- to sweep over various parameter or input combinations
- ClusterId and ProcId can be accessed inside the submit file using the $(ClusterId) and $(ProcId) macros.
- Sweep over input files: (we can at last add another input file).
- Submit many small jobs: (with a trick to change the input file list).
- Let's first stash our files onto the object storage (even if they
aren't that large):
$ s3 put tut/inferno filename=dc_inferno.txt 16270 bytes remaining (92% complete) ... $ s3 put tut/purgatorio filename=dc_purgatorio.txt $ s3 put tut/paradiso filename=dc_paradiso.txt
- Then our option of choice would be to adapt the code to use object storage natively:
- So that sending the resulting static executable (or otherwise self-contained executable) onto a distributed system becomes a piece of cake®:
- ROOT also provides native S3 read-only access (honoring, please note,
S3_ACCESS_KEYenvironment variables):export S3_ACCESS_ID=$S3_ACCESS_KEY_ID export S3_ACCESS_KEY=$S3_SECRET_ACCESS_KEY TFile* f = TFile::Open("as3://rgw.fisica.unimi.it/tut/calib_file.root"); f->ls(); // etc. etc.
- Another option would be the CERN
library. It is also natively accessible from recent versions of ROOT,
so it may be interesting for ROOT users who can resist the
confusion of tongues:
davix-get --s3accesskey $S3_ACCESS_KEY_ID \ --s3secretkey $S3_SECRET_ACCESS_KEY \ s3://tut.rgw.fisica.unimi.it/inferno
- Suppose, just suppose you don't like, don't prefer or
just cannot modify your code as in the previous example...
- We can then put together a nice shell script to stage the input files in, run the executable and stage the output files out.
- And here's the corresponding
- Most of the issues with executing scripts boil down to what we can
reasonably assume to find installed on the remote execution
machine. Pure POSIX
/dev/nullshould be safe enough, but we cannot blindly assume to find everything we need.
- Another popular example: in order to submit
a Python script
we'd better use another nice shell script
to make sure we aren't missing anything or getting stuck in
the Python version peat bog.
Submission via wrapper script:
Note that the 'indirect' script uses Condor's ability to edit the attributes in the job queue to exclude machines that were found not to have the needed Python version - a technique that may be used to work around other (inevitable) black holes.
- Start simple, if possible: 'Amico' includes execution nodes that mount
for ATLAS (add
HasCVMSto the job requirements).
- Try packing all dependencies with the job:
may be a recommended option that allows to carry all needed dependencies
with the job. Note that
cde_execitself is not a static executable and may hit incompatible glibc versions, and that there are a few catches both in a possible script to unwind CDE tarballs (1, 2) and in the corrisponding two -
There are simpler ways to make an ELF executable self-contained - they are harder to make truly general, but may be applicable and useful in certain cases, if one knows what (s)he's doing. One such example will be shown tomorrow.
- Hopefully this makes the option to assemble and run a Docker container
HasDocker) look much simpler that it seemed.
More details on this tomorrow as well - so please stay tuned!
condor_ssh_to_jobis a tool to get a shell with current working directory where a job is running (configuration allowing).
condor_submit -interactiveresults in a shell prompt issued on the execute machine where the job runs.
- A submit file can still be used: many options are ignored (Universe, Executable, Arguments...), but (notably) Requirements, Rank and sandbox Transfers are not.
- Do we now feel fluent enough to try them both out ?
$ condor_submit -interactive Submitting job(s). 1 job(s) submitted to cluster 115676. Waiting for job to start...
with the 'Amico' infrastructure: email@example.com