Difference between revisions of "Benchmarking & Scaling Tutorial/Introduction"
Line 35: | Line 35: | ||
</math> | </math> | ||
− | The '''efficiency''' is the speedup per processor, i.e. | + | The '''efficiency''' <math>E</math> is the speedup per processor, i.e. |
<math> | <math> | ||
Line 82: | Line 82: | ||
</math> | </math> | ||
− | where P is the number of processes, F is again the fraction of the serial part of the code and therefore (1-F) the fraction of the parallel part. | + | where <math>P</math> is the number of processes, <math>F</math> is again the fraction of the serial part of the code and therefore <math>(1-F)</math> the fraction of the parallel part. Here the speedup increases linearly with respect to the number of processors (with a slope smaller than one). |
− | |||
− | |||
− | |||
==== Speedup and Efficiency ==== | ==== Speedup and Efficiency ==== |
Revision as of 12:27, 3 November 2021
Tutorial | |
---|---|
Title: | Benchmarking & Scaling |
Provider: | HPC.NRW
|
Contact: | tutorials@hpc.nrw |
Type: | Online |
Topic Area: | Performance Analysis |
License: | CC-BY-SA |
Syllabus
| |
1. Introduction & Theory | |
2. Interactive Manual Benchmarking | |
3. Automated Benchmarking using a Job Script | |
4. Automated Benchmarking using JUBE | |
5. Plotting & Interpreting Results |
Scalability
Often users who start running applications on an HPC system tend to assume the more resources (compute nodes / cores) they use, the faster their code will run (i.e. they expect a linear behaviour). Unfortunately this is not the case for the majority of applications. How fast a program runs with different amounts of resources is referred to as scalability. For parallel programs a limiting factor is defined by Amdahl's law. It takes into account the fact, that a certain amount of work of your code is done in parallel but the speedup is ultimately limited by the sequential part of the program. However, next to different amount of resources, we can also vary the problem size we want to solve - its behavior is described by Gustofson's law. In this introduction we want to give a you a brief overview about the two models and their consequences for running code on a cluster.
Strong Scaling
Amdahl's Law
We assume that the total execution time Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} of a program is comprised of
- Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t_s} , a part of the code which can only run in serial
- Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t_p} , a part of the code which can be parallelized
- Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t_o} , parallel overheads due to, e.g. communication
The execution time of a serial code would then be
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T(1) = t_s + t_p }
The time for a parallel code, where the work would be perfectly divided by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} processors, would be given by
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T(P) = t_s + \frac{t_p}{P} + t_o(P) }
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t_p/P} is the speed up amount of time due to the usage of multiple CPUs. The total speedup Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S} is defined as the ratio of the sequential to the parallel runtime:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S = \frac{T(1)}{T(P)} }
The efficiency Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle E} is the speedup per processor, i.e.
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle E = \frac{S}{P} = \frac{T(1)}{P \cdot T(P)} }
Speedup and Efficiency
Knowing that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t_o > 0} , and writing Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F = \frac{t_s}{t_s + t_p}} as the fraction of the serial code, we can rewrite this to
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S \leq \frac{1}{F+(1-F)/P} }
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle E \leq \frac{1}{PF+(1-F)} }
This places an upper limit on the strong scalability i.e. how quickly can we solve a problem of fixed size Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N} by increasing Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} . It is known as Amdahl's Law.
Consequences
The figure below shows the upper limits for speedup and efficiency (as derived above) for different fractions of serial code:
We can see that the speedup is never linear in P, therefore the efficiency is never 100%:
- Some Examples:
- For P=2 processors, to achieve E=0.9, you have to parallelize 89% of the code
- For P=10 processors, to achieve E=0.9, you have to parallelize 99% of the code
- For P=10 processors, to achieve E=0.5, you have to parallelize 89% of the code
In the second figure terms for parallel overhead (e.g. communication between nodes) were added. This yields an even worse, but more realistic, behavior when increasing the number of processors:
Weak Scaling
Fortunately for us, the amount of resources scale well with the problem size. Therefore, instead of throwing large amounts of resources at small problem sizes, we can instead try to increase our problem size as we increase the amount of resources. This is referred to as weak scaling.
Gustafson's Law
Gustofson's law states that the parallel part of a program scales linearly with the amount of resources while at the same time the serial part does not increase with increasing problem size. The speedup is here given by
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S = F + (1-F) \cdot N }
where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} is the number of processes, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F} is again the fraction of the serial part of the code and therefore Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (1-F)} the fraction of the parallel part. Here the speedup increases linearly with respect to the number of processors (with a slope smaller than one).
Speedup and Efficiency
Consequences
Knowing about speedup and efficiency we can now try to measure this ourselves.
Next: Interactive Manual Benchmarking
Previous: Overview