stacpolly
Job Submission
Overview
Torque 2.3.5 is the queuing software. Folks familiar with PBS will be at home with Torque.
All parallel tasks require submission to the queue. Submission of jobs is done with the qsub command. The queue can be queried with showq or qstat.
Setting up your environment
You will need to add the path to maui to your PATH.
Add to your .tcshrc (or .cshrc or .paths) file:
setenv PATH ${PATH}:/usr/local/maui/bin:/usr/local/maui/sbin
Usage
Submit jobs with
qsub job_script
Display the queue status with
qstat
or
showq
Cancel jobs with
qdel job_id
where job_id can be retrieved from showq or qstat.
Other useful commands are:
- showbf
- see how many nodes are available for use.
- diagnose
- explain why jobs aren't running.
- checkjob
- check the status of a particular job.
- pbsnodes
- list nodes and their state.
- pbsnodes -l "up"
- list all nodes that are up.
- /home/ert/bin/CPU_usage
- Find the loads on each node.
Example Job Scripts
Serial job
# The shell #PBS -S /bin/tcsh # The number of nodes and the number of processes per node #PBS -l nodes=1 # The wall time for the calculation in HH:MM:SS #PBS -l walltime=720:00:00 # Where the errors go #PBS -e /home/foo/070419a/log.error # Where the messages go #PBS -o /home/foo/070419a/log set DIR = /home/foo/070419a # The work directory cd $DIR # Run the executable /home/foo/bin/my.executable < my.input > my.output
Parallel MPI job
The following will run PMRT on 16 cores spread over 2 nodes. Note, you are unable to request more than 8 processors per node (ppn). There are 17 nodes (16 8-way and 1 6-way).
# The shell #PBS -S /bin/tcsh # The number of nodes and the number of processes per node #PBS -l nodes=2:ppn=8 # The wall time for the calculation in HH:MM:SS #PBS -l walltime=720:00:00 # Where the errors go #PBS -e $HOME/PMRT/Runs/Polar/L=1e21/090107f/log.error # Where the messages go #PBS -o $HOME/PMRT/Runs/Polar/L=1e21/090107f/log set DIR = $HOME/PMRT/Runs/Polar/L=1e21/090107f set EXE = $HOME/bin/PMRT_Polar_HY_zi=8_zt=5.5 set IC = $HOME/PMRT/ICs/ic_256_z=009.bin.min # The work directory cd $DIR # Run the executable mpirun -np 16 $EXE $IC 1 1 >& PMRT.log
OpenMP job
# The shell #PBS -S /bin/tcsh # The number of nodes and the number of processes per node # OpenMP cannot request more than one node, and no more than 8 processes #PBS -l nodes=1:ppn=8 # The wall time for the calculation in HH:MM:SS #PBS -l walltime=720:00:00 # Where the errors go #PBS -e log.error # Where the messages go #PBS -o log set DIR = $HOME/PMRT/Runs/Polar/L=1e21/090107f set EXE = $HOME/bin/PMRT_Polar_HY_zi=8_zt=5.5 set IC = $HOME/PMRT/ICs/ic_256_z=009.bin.min # The work directory cd $DIR setenv OMP_NUM_THREADS 8 # Run the executable $EXE $IC 1 1 >& PMRT.log
Memory
The amount of memory required can be set with the mem parameter. This is useful if you know you need an amount of memory available only on all selection of nodes.
# The memory requested per process. The largest requested by all processes. #PBS -l mem=30gb
Reserving an entire node to one job
On occassion, you may want to start only a few jobs on a node, but reserve the whole node for your single job. For example, you know you will need all the memory or you want to mix threading with MPI.
The following will run the job on 2 nodes, using only 1 processor per node.
#PBS -l nodes=2:ppn=1 #PBS -W x=NACCESSPOLICY:SINGLEJOB mpirun -n 2 MyMPIProg
To determine the number of processors available on the assigned node:
# Determine the number of cores available on this node set NCores=`cat /proc/cpuinfo | grep processor | wc -l`
Requesting a particular node
On occassion, you may want to have a job placed on a particular node. For example, you already have a dataset on a local drive.
The following will run the job on worker004.
#PBS -l nodes=worker004.cluster.loc
