Automated Benchmarking using JUBE

Tutorial
Title:	Benchmarking & Scaling
Provider:	HPC.NRW
Contact:	tutorials@hpc.nrw
Type:	Online
Topic Area:	Performance Analysis
License:	CC-BY-SA
Syllabus
1. Introduction & Theory
2. Interactive Manual Benchmarking
3. Automated Benchmarking using a Job Script
4. Automated Benchmarking using JUBE
5. Plotting & Interpreting Results

Introduction

The Jülich Benchmarking Environment is an application that helps you automate your workflow for system and application benchmarking.

JUBE allows you to define different steps of your workflow with dependencies between them.

One key advantage of using JUBE, as opposed to manually running an application in different configurations in a job script is that individual run configurations are automatically separated into separate workpackages with individual run directories, while common files and directories (like input files, preprocessing, etc.) can easily be integrated into the workflow.

Furthermore, application output (such as the runtime of the application) can easily be parsed and output in CSV or human-readable table format.

Writing a minimal configuration

As JUBE executes each workpackage (step with concrete configuration) in its own sandbox, the benchmark configuration must specify a fileset that either copies or links files into the run directory. Parameters have a separator defined (default is ',') that is used to tokenize the parameter string. Each token will be part of a separate configuration. In this example, the comma-separated list of tasks will result in the parameter tasks with one specific value in 15 different workpackages.

 1<?xml version="1.0" encoding="UTF-8"?>
 2<jube>
 3  <benchmark name="GROMACS" outpath="bench_run">
 4    <comment>A minimal JUBE config to run our GROMACS example</comment>
 5
 6    <!-- Configuration -->
 7    <parameterset name="execute_pset">
 8      <parameter name="tasks">1,2,4,8,12,18,24,30,36,42,48,54,60,66,72</parameter>
 9      <!-- you could also compute the list using Python
10         <parameter name="tasks" mode="python">
11           ",".join([str(x) for x in range(1,73) if (x % 6 == 0 and x > 10) or (x % 4 == 0 and x < 10) or x == 2 or x==1 ])
12         </parameter>
13      -->
14    </parameterset>
15
16    <!-- Input files -->
17    <fileset name="gromacs_files">
18      <link>MD_5NM_WATER.deff</link> <!-- link input file -->
19    </fileset>
20
21    <!-- Operation -->
22    <step name="run">
23      <use>execute_pset</use>  <!-- use parameterset -->
24      <use>gromacs_files</use> <!-- use fileset -->
25      <do>srun -n $tasks gmx_mpi -quiet mdrun -deffnm MD_5NM_WATER -nsteps 10000 -ntomp 1 -pin on</do> <!-- start GROMACS -->
26    </step>
27  </benchmark>
28</jube>

Running the workflow

$ jube run gromacs.xml
######################################################################
# benchmark: GROMACS
# id: 1
#
# A minimal JUBE config to run our GROMACS example
######################################################################

Running workpackages (#=done, 0=wait, E=error):
########################################################EEEE ( 14/ 15)

  | stepname | all | open | wait | error | done |
  |----------|-----|------|------|-------|------|
  |      run |  15 |    0 |    0 |     1 |   14 |

>>>> Benchmark information and further useful commands:
>>>>       id: 1
>>>>   handle: jube_run
>>>>      dir: jube_run/000001
>>>> continue: jube continue jube_run --id 1
>>>>  analyse: jube analyse jube_run --id 1
>>>>   result: jube result jube_run --id 1
>>>>     info: jube info jube_run --id 1
>>>>      log: jube log jube_run --id 1
######################################################################

From this output we can see the following: 15 distinct workpackages were processed in the "run" step. 14 of those were successful, and one was not successful. This configuration will already create separate directories for each of the measurements, which makes sure that temporary files written by the application do not interact across the different measurements. Also, it makes it easier to identify which execution was not successful.

$ find jube_run/000001 -name error
jube_run/000001/000013_run/error

Looking at the output of that workpackage, we can see that GROMACS aborted the execution, because the number of processes selected for this execution did not fit the input. The error file, however, only contains the last couple of lines of the error output, so investigating the stderr file in the workpackage directory provides the conclusion.

$ less jube_run/000001/000013_run/work/stderr
...
Fatal error:
There is no domain decomposition for 50 ranks that is compatible with the
given box and a minimum cell size of 0.31625 nm
Change the number of ranks or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition
...

Parsing output

GROMACS outputs performance numbers to `stderr`.

Using 72 MPI processes
Using 1 OpenMP thread per MPI process
...
               Core t (s)   Wall t (s)        (%)
       Time:      116.555        1.619     7199.0
                 (ns/day)    (hour/ns)
Performance:     1067.403        0.022
...

As identifying such output from executions is a core part of benchmarking, JUBE provides infrastructure to parse output and store specific information to output this information later in result tables.

1    <patternset name="gromacs_output_patterns">
2        <pattern name="gromacs_num_procs" unit="s">Using ${jube_pat_int} MPI proc.*</pattern>
3        <pattern name="gromacs_num_threads" unit="s">Using ${jube_pat_int} OpenMP thread.*</pattern>
4        <pattern name="gromacs_core_time" unit="s">Time:\s*${jube_pat_fp}</pattern>
5        <pattern name="gromacs_wall_time" unit="s">Time:\s*${jube_pat_nfp}\s*${jube_pat_fp}</pattern>
6        <pattern name="gromacs_core_perf" unit="ns/day">Performance:\s*${jube_pat_fp}</pattern>
7        <pattern name="gromacs_wall_perf" unit="hours/ns">Performance:\s*${jube_pat_nfp}\s*${jube_pat_fp}</pattern>
8    </patternset>

The pattern matching is done line based with regular expressions and JUBE provides predefined variables, such as ${jube_pat_int} and ${jube_pat_fp} that contain the regular expression pattern to match an integer or floating-point number, respectively. The defined patterns can then be used in a so called analyser, where the patterns are connected to the file they are applied to.

1    <analyser name="gromacs_analyser">
2        <analyse step="run">
3            <file use="gromacs_output_patterns">stderr</file>
4        </analyse>
5    </analyser>

Finally, result tables can be defined with columns referencing any defined parameter or pattern.

 1    <result>
 2        <use>gromacs_analyser</use>
 3        <table name="gromacs_run" style="pretty">
 4            <column title="wp">jube_wp_id</column>
 5            <column>gromacs_core_time</column>
 6            <column>gromacs_wall_time</column>
 7            <column>gromacs_core_perf</column>
 8            <column>gromacs_wall_perf</column>
 9        </table>
10    </result>

As especially in the initial specification phase of a workflow, it may not be clear which patterns need to be used for which files, JUBE allows to update the patterns, rerun the analysis, and reprint the result tables even after the workflow steps have finished. The user can update the definition with the -u <jube-specification> command line flag. Adding the definitions above to the workflow specification gromacs.xml and rerunning the analysis and report generation will result in the following output.

$ jube result -a -u gromacs.xml jube_run --id 1
gromacs_run:
| wp | tasks | gromacs_core_time[s] | gromacs_wall_time[s] | gromacs_core_perf[ns/day] | gromacs_wall_perf[hours/ns] |
|----|-------|----------------------|----------------------|---------------------------|-----------------------------|
|  0 |     1 |               44.366 |               44.366 |                    38.952 |                       0.616 |
|  1 |     2 |               46.942 |               23.471 |                    73.630 |                       0.326 |
|  2 |     4 |               49.548 |               12.387 |                   139.513 |                       0.172 |
|  3 |     8 |               52.969 |                6.621 |                   261.002 |                       0.092 |
|  4 |    12 |               59.370 |                4.948 |                   349.291 |                       0.069 |
|  5 |    18 |               66.097 |                3.672 |                   470.607 |                       0.051 |
|  6 |    24 |               76.391 |                3.183 |                   542.905 |                       0.044 |
|  7 |    30 |               89.233 |                2.975 |                   580.965 |                       0.041 |
|  8 |    36 |               91.187 |                2.533 |                   682.175 |                       0.035 |
|  9 |    42 |               99.743 |                2.375 |                   727.597 |                       0.033 |
| 10 |    48 |              183.114 |                3.815 |                   452.978 |                       0.053 |
| 11 |    54 |              121.728 |                2.255 |                   766.524 |                       0.031 |
| 12 |    60 |              199.882 |                3.332 |                   518.714 |                       0.046 |
| 13 |    66 |                      |                      |                           |                             |
| 14 |    72 |              116.555 |                1.619 |                  1067.403 |                       0.022 |

Finally, we have used the pretty style of result output. As a default, JUBE will output a table as CSV, which is easy to plot. To use this, either explicitly select the style csv or completely ommit the style attribute in the definition of the table.

Further information

Next: Plotting and Interpreting Results

Previous: Automated Benchmarking using a Job Script

Automated Benchmarking using JUBE

Contents

Introduction

Writing a minimal configuration

Running the workflow

Parsing output

Further information

Navigation menu

Search