User contributions
Jump to navigation
Jump to search
- 15:20, 20 November 2024 diff hist +127 m Performance Monitoring current
- 15:18, 20 November 2024 diff hist +3,917 N Background Performance Monitoring Considerations Created page with "Category:HPC-Admin == Components == For background job Performance Monitoring, in general, a central server for data collection is necessary that receives the measur..." current
- 15:13, 20 November 2024 diff hist +311 N File:Performance background monitoring.png Schematic of background performance monitoring components. Compute nodes run collector services that monitor certain performance metrics and send them to a central server. The central server aggregates and archives the measurements. Users can inspect plots of the measurements via a web interface. current
- 20:29, 15 November 2024 diff hist +621 Performance metrics Add metric levels current
- 18:21, 1 March 2024 diff hist +520 Job efficiency Add load imbalance example
- 18:15, 1 March 2024 diff hist +466 N File:Job efficiency load imbalance.png The figure shows ClusterCockpit measurements for a job that executes on multiple nodes in shared-mode. Three metrics are shown: node-level CPU load, averaged node-level CPU core load and averaged node-level CPU core time. While the CPU load metric may be influenced by other jobs running on the shared nodes, the CPU core load and CPU core time metrics use core-level measurements. These metrics show a difference in CPU utilization on different nodes. current
- 14:25, 29 February 2024 diff hist +754 Job efficiency Add filesystem access example
- 14:10, 29 February 2024 diff hist +237 N File:Job efficiency filesystem access.png The figure shows ClusterCockpit measurement results from job that does regular filesystem operations. These lead to increased network and filesystem utilization. The plots show a regular pattern in the application behavior. current
- 19:37, 28 February 2024 diff hist +441 Job efficiency Add performance expectation details
- 19:27, 28 February 2024 diff hist +1,331 Job efficiency Add details to scaling
- 19:24, 28 February 2024 diff hist +525 N File:Job efficiency scaling performance degradation.png The image shows ClusterCockpit measurement results for an application that is executed with multiple scaling configurations. It is executed with four different configurations on 1 core, 128 cores (1 node), 256 cores (2 nodes) and 512 cores (4 nodes). Three metrics are show for every case: Flops per core, memory bandwidth per socket and transmitted network packets per node. The plots show how with increased number of involved cores and nodes the performance degrades and the communication overh... current
- 18:58, 28 February 2024 diff hist +801 Job efficiency Add details about Filesystem access
- 18:45, 28 February 2024 diff hist +576 Job efficiency Add details to Load imbalance
- 17:41, 28 February 2024 diff hist +679 Job efficiency Add details to Resource oversubscription and Resource underutilization
- 18:14, 27 February 2024 diff hist +746 Job efficiency Add oversubscription example
- 17:54, 27 February 2024 diff hist +292 N File:Job efficiency oversubscription.png ClusterCockpit plots showing CPU load on the node- and core-level and CPU time metrics for two shared jobs running on 16 cores respectively. The top plots show measurements for a job with 2 threads per core. The bottom plots show measurements for a job with 16 threads per core. current
- 14:31, 27 February 2024 diff hist +1,853 Job efficiency Add basic measurement considerations
- 18:21, 26 February 2024 diff hist +349 Job efficiency
- 17:42, 26 February 2024 diff hist +6,687 N Performance metrics Initial summary of performance metrics
- 12:50, 26 February 2024 diff hist +827 N Performance Monitoring Initial summary of cluster-level background performance monitoring
- 12:25, 26 February 2024 diff hist +896 N Job efficiency Initial structure for job efficiency guide
- 11:27, 19 September 2023 diff hist -14 Site-specific documentation Update PC2 Documentation Link current