stacpolly

Job Submission

Overview

Torque 2.3.5 is the queuing software. Folks familiar with PBS will be at home with Torque.

All parallel tasks require submission to the queue. Submission of jobs is done with the qsub command. The queue can be queried with showq or qstat.

Setting up your environment

You will need to add the path to maui to your PATH.

Add to your .tcshrc (or .cshrc or .paths) file:
setenv PATH ${PATH}:/usr/local/maui/bin:/usr/local/maui/sbin

Usage

Submit jobs with
qsub job_script

Display the queue status with
qstat
or
showq

Cancel jobs with
qdel job_id
where job_id can be retrieved from showq or qstat.

Other useful commands are:

showbf
see how many nodes are available for use.
diagnose
explain why jobs aren't running.
checkjob
check the status of a particular job.
pbsnodes
list nodes and their state.
pbsnodes -l "up"
list all nodes that are up.
/home/ert/bin/CPU_usage
Find the loads on each node.

Example Job Scripts

Serial job


# The shell
#PBS -S /bin/tcsh

# The number of nodes and the number of processes per node
#PBS -l nodes=1

# The wall time for the calculation in HH:MM:SS
#PBS -l walltime=720:00:00

# Where the errors go
#PBS -e /home/foo/070419a/log.error

# Where the messages go
#PBS -o /home/foo/070419a/log

set DIR = /home/foo/070419a

# The work directory
cd $DIR

# Run the executable
/home/foo/bin/my.executable < my.input > my.output

Parallel MPI job

The following will run PMRT on 16 cores spread over 2 nodes. Note, you are unable to request more than 8 processors per node (ppn). There are 17 nodes (16 8-way and 1 6-way).


# The shell
#PBS -S /bin/tcsh

# The number of nodes and the number of processes per node
#PBS -l nodes=2:ppn=8

# The wall time for the calculation in HH:MM:SS
#PBS -l walltime=720:00:00

# Where the errors go
#PBS -e $HOME/PMRT/Runs/Polar/L=1e21/090107f/log.error

# Where the messages go
#PBS -o $HOME/PMRT/Runs/Polar/L=1e21/090107f/log

set DIR = $HOME/PMRT/Runs/Polar/L=1e21/090107f
set EXE = $HOME/bin/PMRT_Polar_HY_zi=8_zt=5.5
set IC  = $HOME/PMRT/ICs/ic_256_z=009.bin.min

# The work directory
cd $DIR

# Run the executable
mpirun -np 16 $EXE $IC 1 1 >& PMRT.log

OpenMP job

# The shell
#PBS -S /bin/tcsh

# The number of nodes and the number of processes per node
# OpenMP cannot request more than one node, and no more than 8 processes
#PBS -l nodes=1:ppn=8

# The wall time for the calculation in HH:MM:SS
#PBS -l walltime=720:00:00

# Where the errors go
#PBS -e log.error

# Where the messages go
#PBS -o log

set DIR = $HOME/PMRT/Runs/Polar/L=1e21/090107f
set EXE = $HOME/bin/PMRT_Polar_HY_zi=8_zt=5.5
set IC  = $HOME/PMRT/ICs/ic_256_z=009.bin.min

# The work directory
cd $DIR

setenv OMP_NUM_THREADS 8

# Run the executable
$EXE $IC 1 1 >& PMRT.log

Memory

The amount of memory required can be set with the mem parameter. This is useful if you know you need an amount of memory available only on all selection of nodes.

# The memory requested per process. The largest requested by all processes.
#PBS -l mem=30gb

Reserving an entire node to one job

On occassion, you may want to start only a few jobs on a node, but reserve the whole node for your single job. For example, you know you will need all the memory or you want to mix threading with MPI.

The following will run the job on 2 nodes, using only 1 processor per node.

#PBS -l nodes=2:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB

mpirun -n 2 MyMPIProg

To determine the number of processors available on the assigned node:

# Determine the number of cores available on this node
set NCores=`cat /proc/cpuinfo | grep processor | wc -l`

Requesting a particular node

On occassion, you may want to have a job placed on a particular node. For example, you already have a dataset on a local drive.

The following will run the job on worker004.

#PBS -l nodes=worker004.cluster.loc