GridEngineOBSOLETE
THIS PAGE IS OBSOLETE - please use www.hpc.dtu.dk instead!!!
Sun Grid Engine (SGE), a distributed resource management and job queuing system, is installed on all G-bar computers, to get the optimal performance and throughput from the available resources.
Contents |
Set up your account:
There is one important step you have to do before you can use Grid Engine (GE) - you have to make the GE commands known to your shell. This can be done by
module load gridengine
If you don't want to type this every time you log in, add the GridEngine module to the MODULES line in your ~/.gbarrc file:
MODULES=gridengine[,other_modules]
You can find more information about modules on the Modules page.
Prepare a job script
Similar to other batch queue systems, it is NOT possible to submit a binary executable to Grid Engine, you have to prepare a job script. Embedded in the job script, you can define the resources your job requests/needs to run under the control of GE.
Example script:
#!/bin/sh # -- our name --- #$ -N Sleeper # -- request /bin/sh -- #$ -S /bin/sh # -- run in the current working (submission) directory -- #$ -cwd /bin/echo Here I am. Sleeping now at: `date` time=60 if [ $# -ge 1 ]; then time=$1 fi sleep $time echo Now it is: `date`
As you can see above , it is possible to add GE options to your script file as special comment lines, starting with '#$'. These lines are not read by any other shell (they start with the comment meta character '#'), i.e. in principle you can use your script outside GE, too.
Explanation of the GE options used above:
-N
Name of the job in GE
-S
use the specified shell (here: /bin/sh) to run the job. Note: this option is required in all GE scripts!
-cwd
run in the current working directory
Often used and useful GE options explained:
-N job_name
With this option you can specify the name of the job, as it should be used by GE. Without this option, the name of your job is the filename of the job script submitted to GE.
-cwd
run job in the current working directory. Current means at the time of job submission, i.e. the directory from where you have used either qsub or qmon to submit the job to GE. If your job relies on a specific working directory, e.g. where it can find input files, it might be better to add a cd working-dir line to the job script.
If you do NOT use this option, the job's output files (standard output/error) will always be put in your home directory.
-o filename
Specify the filename where GE should save the messages from the standard output channel of your job. Without the -o option, the name of your job (usually the name of the job script, if not specified with -N) plus the extension .o## is used, where '##' is the number of the job as assigned by GE.
-e filename
Specify the filename where GE should save the messages from the standard error channel of your job. Without the -e option, the name of your job (usually the name of the job script, if not specified with -N) plus the extension .e## is used, where '##' is the number of the job as assigned by GE.
For a more detailed list of options see man qsub.
Job submission
After you have prepared your job script, you can submit it using the qsub command:
qsub [options] script_name
Almost all options to qsub can be either given on the command line or in the script file (see above). For more details see man qsub.
Job status
After you have submitted your job, you can check the status with the qstat command. Without extra arguments, qstat will print all your running and pending jobs to your screen, so here is short list of useful options:
-u user_name1 [user_name2 ....]
Show only jobs belonging to user(s) user_name1 (user_name2 ...). Use a wildcard (*) to show all jobs in the queue.
-j job_id1 [job_id2 ...]
Show only jobs with the requested job_ids (long listing!). This lists (almost) everything GE knows about the jobs, and this output from this command can be useful to check the reasons why your job will not start.
For more options see man qstat.
Stop/Delete a job
Usually your jobs will run, finish and disappear from the GE system, but sometimes you might want to stop a job or remove a job from the queue that does not start due to wrong submission options:
qdel job_id
For more options see man qdel.
The qmon GUI
All the tasks described above can also be done via a Graphical User Interface (GUI): qmon. To run qmon, you need to have a X Window display! When you start qmon, you'll see the following screen:
The two most important buttons for GE users have been marked red (Job Control) and green (Job Submission). The 'Job Submission' interface is avalaible via the 'Job Control' interface, too.
Running MPI jobs
NOTE: THIS SECTION DOES NOT REFLECT THE CURRENT SETUP!
To run MPI (SUN Cluster Tools implementation) jobs under Grid Engine, a special environment has been created to allow GE to control the MPI tasks. A simple job script for such a job is shown below.
Please note: This works only with SUN's MPI that comes with the HPC Cluster Tools installed on the HPC machines. Any other version of MPI is NOT SUPPORTED!
#!/bin/sh # (c) 2000 Sun Microsystems, Inc. # --------------------------- # General options # #$ -N MPIdate #$ -S /bin/sh #$ -o $JOB_NAME.$JOB_ID.out #$ -e $JOB_NAME.$JOB_ID.err # -M User@Domain # -m es # --------------------------- # Execute the job from the current working directory #$ -cwd # # Parallel environment request # --------------------------- # do not change the following line #$ -l cre # # PE_name CPU_Numbers_requested #$ -pe HPC 12 # ------------------------------- Program_name_and_options /appl/hgrid/current/bin/mprun -np $NSLOTS date # ---------------------------
You can download the script and make the changes to fit your MPI task.
There are at least three items you have to specify:
- The name of the Parallel Environment, HPC.
- The number of CPUs you request for the job.
- The name and options of your program.
Running OpenMP jobs
NOTE: THIS SECTION DOES NOT REFLECT THE CURRENT SETUP!
#!/bin/sh # (c) 2000 Sun Microsystems, Inc. # --------------------------- # General options # #$ -N OpenMPjob #$ -S /bin/sh #$ -o $JOB_NAME.$JOB_ID.out #$ -e $JOB_NAME.$JOB_ID.err # -M User@Domain # -m es # --------------------------- # Execute the job from the current working directory #$ -cwd # # Parallel environment request # --------------------------- # do not change the following line #$ -l cre # # PE_name CPU_Numbers_requested #$ -pe HPC 12 # OMP_NUM_THREADS=$NSLOTS export OMP_NUM_THREADS # ------------------------------- Program_name_and_options your_openmp_program [options] # ---------------------------
You can download the script and make the changes to fit your OpenMP task.
There are at least three items you have to specify:
- The name of the Parallel Environment, HPC.
- The number of CPUs you request for the job.
- The name and options of your program.
If you make use of special environment variables for your OpenMP program, remember to put them in your script (use the same syntax as the OMP_NUM_THREADS line in the script).


