next up previous contents
Next: 2.7 Managing a Condor Up: 2. Users' Manual Previous: 2.5 Job Preparation


2.6 Submitting a Job to Condor

condor_submit is the program for actually submitting jobs to Condor. condor_submit wants as its sole argument the name of a submit-description file which contains commands and keywords to direct the queuing of jobs. In the submit-description file, you will tell Condor everything it needs to know about the job. Items such as the name of the executable to run, the initial working directory, command-line arguments, etc., all go into the submit-description file. condor_submit then creates a new job ClassAd based upon this information and ships it along with the executable to run to the condor_schedd daemon running on your machine. At that point your job has been submitted into Condor.

Now please read the condor_submit manual page in the Command Reference chapter before you continue; it is on page [*] and contains a complete and full description of how to use condor_submit.

2.6.1 Sample submit-description files

Now that you have read about condor_submit and have an idea of how it works, we'll followup with a few additional examples of submit-description files. Example 1

Example 1 below about the simplest submit-description file possible. It queues up one copy of the program ``foo'' for execution by Condor. Condor will attempt to run the job on a machine which has the same architecture and operating system as the machine from which it was submitted. Since no input, output, and error commands were given, the files stdin, stdout, and stderr will all refer to /dev/null. (The program may produce output by explicitly opening a file and writing to it.)

  # Example 1                                                                       
  # Simple condor job description file                                    
  Executable     = foo                                                    
  Queue Example 2

Example 2 below queues 2 copies of program the program ``mathematica''. The first copy will run in directory ``run_1'', and the second will run in directory ``run_2''. In both cases the names of the files used for stdin, stdout, and stderr will be, loop.out, and loop.error, but the actual files will be different as they are in different directories. This is often a convenient way to organize your data if you have a large group of condor jobs to run. The example file submits ``mathematica'' as a Vanilla Universe job, perhaps because the source and/or object code to program ``mathematica'' was not available and therefore the re-link step necessary for Standard Universe jobs could not be performed.

  # Example 2: demonstrate use of multiple     
  # directories for data organization.      
  Executable     = mathematica          
  Universe = vanilla                   
  input   =                
  output  = loop.out                
  error   = loop.error             
  Initialdir     = run_1         
  Initialdir     = run_2      
  Queue Example 3

The submit-description file Example 3 below queues 150 runs of program ``foo'' which must have been compiled and linked for Silicon Graphics workstations running IRIX 6.x. Condor will not attempt to run the processes on machines which have less than 32 megabytes of physical memory, and will run them on machines which have at least 64 megabytes if such machines are available. Stdin, stdout, and stderr will refer to ``in.0'', ``out.0'', and ``err.0'' for the first run of this program (process 0). Stdin, stdout, and stderr will refer to ``in.1'', ``out.1'', and ``err.1'' for process 1, and so forth. A log file containing entries about where/when Condor runs, checkpoints, and migrates processes in this cluster will be written into file ``foo.log''.

      # Example 3: Show off some fancy features including                            
      # use of pre-defined macros and logging.                                
      Executable     = foo                                                    
      Requirements   = Memory >= 32 && OpSys == "IRIX6" && Arch =="SGI"     
      Rank		     = Memory >= 64
      Image_Size     = 28 Meg                                                 
      Error   = err.$(Process)                                                
      Input   = in.$(Process)                                                 
      Output  = out.$(Process)                                                
      Log = foo.log                                                                       
      Queue 150

2.6.2 More about Requirements and Rank

There are a few more things you should know about the powerful Requirements and Rank commands in the submit-description file.

First of all, both of them need to be valid Condor ClassAd expressions. From the condor_submit manual page and the above examples, you can see that writing ClassAd expressions is quite intuitive (especially if you are familiar with the programming language C). However, there are some pretty nifty expressions you can write with ClassAds if you care to read more about them. The complete lowdown on ClassAds and their expressions can be found in section 4.1 on page [*].

All of the commands in the submit-description file are case insensitive, except for the ClassAd attribute string values that appear in the ClassAd expressions that you write! ClassAds attribute names are case insensitive, but ClassAd string values are always case sensitive. If you accidently say

        requirements = arch == "alpha"
instead of what you should have said, which is:
        requirements = arch == "ALPHA"
you will not get what you want.

So now that you know ClassAd attributes are case-sensitive, how do you know what the capitalization should be for an arbitrary attribute ? For that matter, how do you know what attributes you can use ? The answer is you can use any attribute that appears in either a machine or a job ClassAd. To view all of the machine ClassAd attributes, simply run condor_status -l. The -l argument to condor_status means to display the complete machine ClassAd. Similarly for job ClassAds, do a condor_q -l command (Note: you'll have to submit some jobs first before you can view a job ClassAd). This will show you all the available attributes you can play with, along with their proper capitalization.

To help you out with what these attributes all signify, below we list descriptions for the attributes which will be common by default to every machine ClassAd. Remember that because ClassAds are flexible, the machine ads in your pool may be including additional attributes specific to your site's installation/policies.  

: String which describes Condor job activity on the machine. Can have one of the following values:
: There is no job activity
: A job is busy running
: A job is currently suspended
: A job is currently checkpointing
: A job is currently being killed
: The startd is running benchmarks
: If the machine is running AFS, this is a string containing the AFS cell name.
: String with the architecture of the machine. Typically one of the following:
: Intel CPU (Pentium, Pentium II, etc).
: Digital Alpha CPU
: Silicon Graphics MIPS CPU
: Sun UltraSparc CPU
: A Sun Sparc CPU other than an UltraSparc, i.e. sun4m or sun4c CPU found in older Sparc workstations such as the Sparc 10, Sparc 20, IPC, IPX, etc.
: Hewlett Packard PA-RISC 1.x CPU (i.e. PA-RISC 7000 series CPU) based workstation
: Hewlett Packard PA-RISC 2.x CPU (i.e. PA-RISC 8000 series CPU) based workstation
: The day of the week, where 0 = Sunday, 1 = Monday, ... , 6 = Saturday.
: The number of minutes passed since midnight.
: The load average generated by Condor (either from remote jobs or running benchmarks).
: The number of seconds since activity on the system console keyboard or console mouse has last been detected.
: Number of CPUs in this machine, i.e. 1 = single CPU machine, 2 = dual CPUs, etc.
: A float which represents this machine owner's affinity for running the Condor job which it is currently hosting. If not currently hosting a Condor job, CurrentRank is -1.0.
: The amount of disk space on this machine available for the job in kbytes ( e.g. 23000 = 23 megabytes ). Specifically, this is the amount of disk space available in the directory specified in the Condor configuration files by the EXECUTE macro, minus any space reserved with the RESERVED_DISK macro.
: Time at which the machine entered the current Activity (see Activity entry above). Measured in the number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970).
: a domain name configured by the Condor administrator which describes a cluster of machines which all access the same networked filesystems usually via NFS or AFS.
: The number of seconds since activity on any keyboard or mouse associated with this machine has last been detected. Unlike ConsoleIdle, KeyboardIdle also takes activity on pseudo-terminals into account (i.e. virtual ``keyboard'' activity from telnet and rlogin sessions as well). Note that KeyboardIdle will always be equal to or less than ConsoleIdle.
: Relative floating point performance as determined via a linpack benchmark.
: Time when the Condor Central Manager last received a status update from this machine. Expressed as seconds since the epoch (integer value). Note: This attribute is only inserted by the Central Manager once it receives the ClassAd. It is not present in the startd's copy of the ClassAd. Therefore, you couldn't use this attribute in defining startd expressions (which you wouldn't want to, anyway).
: A floating point number with the machine's current load average.
: A string with the machine's fully qualified hostname.
: The amount of RAM in megabytes.
: Relative integer performance as determined via a dhrystone benchmark.
: The ClassAd type; always set to the literal string ``Machine''.
: The name of this resource; typically the same value as the Machine attribute, but could be customized by the site administrator. On SMP machines, the startd will divide the CPUs up into seperate virtual machines, each with with a unique name. These names will be of the form ``vm#@full.hostname'', for example, ``'', which signifies virtual machine 1 from
: String describing the operating system running on this machine. For Condor Version 6.1.2 typically one of the following:
``HPUX10'' (for HPUX 10.20)
``IRIX6'' (for IRIX 6.2, 6.3, or 6.4)
``LINUX'' (for LINUX 2.0.x kernel systems)
``LINUX-GLIBC'' (for LINUX systems, using GNU's libc)
``OSF1'' (for Digital Unix 4.x)
: A boolean which, when evaluated within the context of the Machine ClassAd and a Job ClassAd, must evaluate to TRUE before Condor will allow the job to use this machine.
: String with the IP and port address of the condor_startd daemon which is publishing this Machine ClassAd.
: String which publishes the machine's Condor state, which can be:
: The machine owner is using the machine, and it is unavailable to Condor.
: The machine is available to run Condor jobs, but a good match (i.e. job to run here) is either not available or not yet found.
: The Condor Central Manager has found a good match for this resource, but a Condor scheduler has not yet claimed it.
: The machine is claimed by a remote condor_schedd and is probably running a job.
: A Condor job is being preempted (possibly via checkpointing) in order to clear the machine for either a higher priority job or because the machine owner wants the machine back.
: Describes what type of ClassAd to match with. Always set to the string literal ``Job'', because Machine ClassAds always want to be matched with Jobs, and vice-versa.
: a domain name configured by the Condor administrator which describes a cluster of machines which all have the same "passwd" file entries, and therefore all have the same logins.
: The amount of currently available virtual memory (swap space) expressed in kbytes.

2.6.3 Heterogeneous submit: submit to a different architecture

There are times when you would like to submit jobs across machine architectures. For instance, let's say you have an Intel machine running LINUX sitting on your desk. This is the machine where you do all your work and where all your files are stored. But perhaps the majority of machines in your pool are Sun SPARC machines running Solaris. You would want to submit jobs directly from your LINUX box that would run on the SPARC machines.

This is easily accomplished. You will need, or course, to create your executable on the same type of machine where you want your job to run -- Condor will not convert machine instructions hetrogeneously for you! The trick is simply what to specify for your requirements command in your submit-description file. By default, condor_submit inserts requirements that will make your job run on the same type of machine you are submitting from. To override this, simply state what you want. Returning to our example, you would put the following into your submit-description file:

        requirements = Arch == "SUN4x" && OpSys == "SOLARIS251"
Just run condor_status to display the Arch and OpSys values for any/all machines in the pool.

next up previous contents
Next: 2.7 Managing a Condor Up: 2. Users' Manual Previous: 2.5 Job Preparation