|
After this tutorial you will be able to submit complex jobs such as parametric, collection and DAG jobs to the Grid.
Parametric jobs
In the past, running the same jdl with different parameters (e.g. different input files, command line arguments etc.) meant writing a script that dynamically builds the jdls and submits them.
In the wms version, a parametric jdl provides this feature. It is not supported for collection and DAG jobs.
The following jdl runs the blastpgp executable on multiple files
|
JobType = "Parametric"; Executable = "blastpgp"; StdOutput = "blast_out_PARAM_.txt"; StdError = "blast_err_PARAM_.txt"; Parameters = 1000; ParameterStart = 1; ParameterStep = 1; Arguments = "-i seq_PARAM_.fasta -j 2 -d nr"; InputSandbox = { "/usr/local/bin/blastpgp","seq_PARAM_.fasta"}; OutputSandbox = {"blast_out_PARAM_.txt","blast_err_PARAM_.txt" }; RetryCount = 3; ShallowRetryCount = 8; |
The changes from the Normal job are bold
The submission of the Parametric JDL will result in the generation of N jobs, where
N = (Parameters – ParameterStart)/ParameterStep
It is done by replacing the _PARAM_ string with the parameters specified in the Parameters attribute as an upper bound and using the ParameterStart and ParameterStep to create the list of parameters.
The Parameters attribute can also specify a list of values
|
… Executable = "my_sim.exe"; Arguments = "_PARAM_"; Parameters = {alpha, beta, gamma}; … |
If you wish to practice, you can download the following files simple.c and simple_params.jdl
All it does is take two arguments, sleep for the time specified in the first one and do some calculations with the second.
You have to compile the file to get the executable simple
| % gcc simple.c |
And run the command
| % glite-wms-job-submit -a -o params.id simple_params.jdl |
% glite-wms-job-status -i param.id
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://wms.phy.bg.ac.yu:9000/F9vt4m7rzz0ui6eSDn7Vxg
Current Status: Running
Submitted: Sun Feb 17 10:16:53 2008 IST
*************************************************************
- Nodes information for:
Status info for the Job : https://wms.phy.bg.ac.yu:9000/4tQXEgJJLVbrgMhT1QofTg
Current Status: Running
Status Reason: Job successfully submitted to Globus
Destination: ce301.intercol.edu:2119/jobmanager-lcgpbs-see
Submitted: Sun Feb 17 10:16:53 2008 IST
*************************************************************
Status info for the Job : https://wms.phy.bg.ac.yu:9000/KNESTmkBi1vXYydMJN0Q8A
Current Status: Running
Status Reason: Job successfully submitted to Globus
Destination: ce01.athena.hellasgrid.gr:2119/jobmanager-pbs-long
Submitted: Sun Feb 17 10:16:53 2008 IST
*************************************************************
|
Dag JobId: https://wms.phy.bg.ac.yu:9000/F9vt4m7rzz0ui6eSDn7Vxg
- - -
Node Name: Node_50
JobId: https://wms.phy.bg.ac.yu:9000/4tQXEgJJLVbrgMhT1QofTg
Dir: /home/horn/assafgot/NA3/output/Node_50
- - -
Node Name: Node_90
JobId: https://wms.phy.bg.ac.yu:9000/KNESTmkBi1vXYydMJN0Q8A
Dir: /home/horn/assafgot/NA3/output/Node_90
- - -
...
|
Collection Jobs
Collection is a simple way to submit multiple jobs at once.
There are two ways to do it
1. Simply specify in the command line the directory where all the individual jdls reside. No other file types must reside in that directory!
|
% glite-wms-job-submit -a --collection jdl_dir |
|
************************************************************* - Nodes information for: Status info for the Job : https://wms.phy.bg.ac.yu:9000/4xz8Wvy2gnCwrbxtwK408Q Current Status: Submitted Submitted: Sat Feb 16 23:21:34 2008 IST ************************************************************* Status info for the Job : https://wms.phy.bg.ac.yu:9000/76qc0_UX0R_hffChX8FDXw Current Status: Submitted Submitted: Sat Feb 16 23:21:34 2008 IST ************************************************************* ... |
When you retrieve the output of those that have finished (no need to wait until all do), each node will have a directory of its own named Node_<jdl name>
A file named ids_nodes.map will hold the mapping of job ids to output directories
[
Type = "Collection";
VirtualOrganisation = "MyVO";
InputSandbox = {"myjob.exe", "fileA"};
DefaultNodeShallowRetryCount = 5;
Nodes = [
[
Executable = "myjob.exe";
InputSandbox = {root.InputSandbox,
"fileB"};
OutputSandbox = {"myoutput1.txt"};
Requirements = other.GlueCEPolicyMaxWallClockTime > 1440;
],
[
NodeName = "mysubjob";
Executable = "myjob.exe";
OutputSandbox = {"myoutput2.txt"};
ShallowRetryCount = 3;
],
[
File = "/home/doe/test.jdl";
]
]
]
|
The jdl describes common default attributes to all jdls, such as the InputSandbox of myjob.exe and fileA together with the specific attributes per node.
The nodes InputSandbox can be added to the common one by using root.InputSandbox.
This means you can save transfer time by transferring input files only once.
DAG Jobs
A DAG (directed acyclic graph) represents a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs.
We'll show an example of DAG JDL corresponding to the following relationship:
And this is the jdl:
[
Type = "dag";
VirtualOrganisation = "MyVO";
InputSandbox = {
"/tmp/foo/*.exe",
"/home/gliteuser/bar",
"/tmp/myconf"
};
max_nodes_running = 5;
nodes = [
nodeA = [
description = [
JobType = "Normal";
Executable = "a.exe";
InputSandbox = {
"/home/data/myfile.txt",
root.InputSandbox
};
];
];
mynode = [
description = [
JobType = "Normal";
Executable = "b.exe";
Arguments = "1 2 3";
RetryCount = 3;
Requirements =
other.GlueCEInfoTotalCPUs > 2;
Rank = other.GlueCEStateFreeCPUs;
OutputSandbox = {"myoutput.txt",
"myerror.txt"
};
OutputSandboxDestURI ="gsiftp://neo.datamat.it:5432/tmp";
];
];
nodeD = [
description = [
JobType = "Checkpointable";
Executable = "b.exe";
Arguments = "1 2 3";
RetryCount = 3;
InputSandbox = {
"/home/pippo",
root.nodes.mynode.description.OutputSandbox[0]
};
];
];
nodeC = [
file = "/home/test/c.jdl";
];
nodeB = [
file = "foo.jdl";
node_retry_count = 2;
];
];
dependencies = { { nodeA, nodeB }, { nodeA, nodeC },
{nodeA, mynode },
{ { nodeB, nodeC, mynode }, nodeD }
};
];
|
The new attributes relevant to DAG are
This is it. The grid is ready for your jobs, be them simple or complex. Enjoy...