=========================================== ESM Runscripts - Using the Workflow Manager =========================================== Introduction ------------ Starting with Release 6.0, ``esm_runscripts`` allows to define additional :term:`jobs` for e.g. data processing, coupling. Such jobs can be arranged into job-clusters, and the order of execution can be set in a flexible and short way from the runscript. This is applicable for both pre- and postprocessing, but especially useful for iterative coupling jobs, like e.g. coupling PISM to VILMA (see below). In this section we explain the basic concept, describe the keywords that have to be set in the runscript in order to make use of this feature, and give some examples on how to integrate pre- and postprocessing jobs and how to set up jobs for iterative coupling. Default jobs of a general model simulation run ---------------------------------------------- The task of ``esm_runscript`` is split into different subjobs which are:: newrun --> prepcompute --> compute --> observe_compute --> tidy (+ resubmit next run) These standard jobs are all separated and independent, each submitted (or started) by the previous job in one of three ways (see below). Here is what each of the standard jobs do: .. |warn| replace:: ⚠️ **Warning:** It needs to be the first job of any :term:`experiment`. ====================================================== ============================================================= ======================== Job Description Started by ====================================================== ============================================================= ======================== newrun Initializes a new experiment, only very basic stuff, like creating (empty) folders needed by any of the following subjobs/jobs. |warn| prepcompute Prepares the compute job. All the (Python) functionality that newrun needs to be run, up to the job submission. Includes copying files, editing namelists, write batch scripts, etc. compute Actual model integration, nothing else. No Python codes prepcompute via ``sbatch`` or other batch system command involved. observe_compute Python job running at the same time as compute, checking if ``sbatch``, started by its own ``esm_runscripts`` call in the ``.run`` script, after the ``compute`` job has been submitted with ``srun`` or other batch launcher. the compute job is still running, looking for some known errors for monitoring / job termination. tidy Sorts the produced outputs, restarts and log files into observe_compute the correct folders, checks for missing and unknown files, builds coupler restart files if not present ====================================================== ============================================================= ======================== .. note:: None of this has to be edited by the users. The above described workflow jobs form the default set of jobs needed to run any simulation. Changing anyone of these jobs may lead `ESM-Tools` to fail. However, additional jobs can be added to this workflow, as described below, to extend the default workflow. Inspect workflow jobs --------------------- To inspect the workflow and workflow jobs that are defined by e.g. a choosen setup or in an already run simulation/experiment, you can run esm_runscript with the ``-i`` (``--inspect``) option. This can be done for two different cases: - To inspect the workflow previous to running a certain experiment. For example, if you want to add a new workflow job, and need to know which jobs are already defined in a choosen setup or model configuration:: esm_runscripts runscript.yaml -i workflow - To inspect a workflow from an experiment that has beed carried out already or created during a check-run (-c):: esm_runscripts runscript.yaml -e -i workflow It will display the workflow configuration showing the order of workflow jobss and their attributes and possible dependencies. This output should help to find out the correct keyworkds to be set when integrating a new workflow job. **Example output**:: Workflow sequence (cluster [jobs]) ---------------------------------- prepcompute ['prepcompute'] -> compute ['compute'] -> tidy ['tidy'] -> prepcompute ['prepcompute'] and my_own_new_cluster ['my_new_last_job', 'my_second_new_job'] .. _def_workflow_jobs: Defining additional workflow jobs --------------------------------- If it is necessary to complement the default workflow with simulation specific processing steps, this sequence of default workflow jobs can be extended by adapting the runscipt or any component specific configuration files. The workflow manager will evaluate these additional jobss and integrate them into the default sequence of the workflow. In order to integrate the additional jobs correctly, the following information about this job needs to be given in the one of the yaml files: * Name of the script to be run * Name of the python script used for setting up the environment * Path to the directory in which both of the above scripts can be found * Information on how often the job should be called * Information where in the workflow the new job needs to be inserted * In case it isn't clear, which job should resubmit the next run. In general, a workflow can be defined in the runscript or in any component configuration file. But there are some restrictions to the definition that needs to be taken into account: * The name of each job needs to be unique. Otherwise, an exception error will be raised. * The names of the default jobs are not allowed to be used for any new jobs. This will also cause an exception error during runtime. * Settings in the runscript will overwrite settings in other config files. (See also :ref:`yaml_hierarchy:Hierarchy of YAML configuration files`.) Keywords to define a new workflow job ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To provide the information about a new job the following keywords and mappings (key/value pairs) are available (keywords that are indicated with ``< >`` need to be adapted by the user): ====================================================== ============ =========================== ========================================================== Keyword Mandatory (Default) values Function ====================================================== ============ =========================== ========================================================== ``workflow`` yes -- Chapter headline in a runscript or configuration section, indicating that alterations to the standard workflow will be defined here. ``subjobs`` yes user defined string Section within the ``workflow`` chapter that containes new additional workflow jobs. ```` yes user defined string Section within the ``subjobs`` section for each new job. The name of the new job needs to be unique. See also further explenation in :ref:`def_workflow_jobs` ``run_after: `` or ``run_before: `` no default: last job in Key/value entry in each ``job`` section. (default) workflow This mapping defines the (default or user) job of the (e.g. tidy) workflow after or before the new job should be executed. Only one of the two should be specified. ``submit_to_batch_system: `` no **false**, true Key/value entry in each ``job`` section. This mapping defines if the (default or user) job is submitted to batch system or not. ``run_on_queue: `` no None Key/value entry in each ``job`` section. This mapping defines to which queue (name) the job should be submitted to. ``batch_or_shell: `` no **shell**, batch Key/value entry in each ``job`` section. This mapping defines if the (default or user) job is submitted as batch job or as shell script. This attribute will be overwritten depending on ``submit_to_batch_system`` ``cluster: `` no Job name Key/value entry in each ``job`` section. Jobs that have the same entry in ``cluster`` will be run from the same batch script. ``order_in_cluster: `` no **sequential**, concurrent Key/value entry in each ``job`` section. This mapping defines how jobs in the same ```` should be run. Concurrent or serial. ``script: `` yes None Key/value entry in each ``job`` section. This mapping defines the name of the script that is going to be executed during the new workflow job. ``script_dir: `` yes None Key/value entry in each ``job`` section. This mapping defines the path to the script set by the variable ``