User contributions

15:20, 20 November 2024 diff hist +127‎ m Performance Monitoring ‎ current
15:18, 20 November 2024 diff hist +3,917‎ N Background Performance Monitoring Considerations ‎ Created page with "Category:HPC-Admin == Components == For background job Performance Monitoring, in general, a central server for data collection is necessary that receives the measur..." current
15:13, 20 November 2024 diff hist +311‎ N File:Performance background monitoring.png ‎ Schematic of background performance monitoring components. Compute nodes run collector services that monitor certain performance metrics and send them to a central server. The central server aggregates and archives the measurements. Users can inspect plots of the measurements via a web interface. current
20:29, 15 November 2024 diff hist +621‎ Performance metrics ‎ Add metric levels current
18:21, 1 March 2024 diff hist +520‎ Job efficiency ‎ Add load imbalance example
18:15, 1 March 2024 diff hist +466‎ N File:Job efficiency load imbalance.png ‎ The figure shows ClusterCockpit measurements for a job that executes on multiple nodes in shared-mode. Three metrics are shown: node-level CPU load, averaged node-level CPU core load and averaged node-level CPU core time. While the CPU load metric may be influenced by other jobs running on the shared nodes, the CPU core load and CPU core time metrics use core-level measurements. These metrics show a difference in CPU utilization on different nodes. current
14:25, 29 February 2024 diff hist +754‎ Job efficiency ‎ Add filesystem access example
14:10, 29 February 2024 diff hist +237‎ N File:Job efficiency filesystem access.png ‎ The figure shows ClusterCockpit measurement results from job that does regular filesystem operations. These lead to increased network and filesystem utilization. The plots show a regular pattern in the application behavior. current
19:37, 28 February 2024 diff hist +441‎ Job efficiency ‎ Add performance expectation details
19:27, 28 February 2024 diff hist +1,331‎ Job efficiency ‎ Add details to scaling
19:24, 28 February 2024 diff hist +525‎ N File:Job efficiency scaling performance degradation.png ‎ The image shows ClusterCockpit measurement results for an application that is executed with multiple scaling configurations. It is executed with four different configurations on 1 core, 128 cores (1 node), 256 cores (2 nodes) and 512 cores (4 nodes). Three metrics are show for every case: Flops per core, memory bandwidth per socket and transmitted network packets per node. The plots show how with increased number of involved cores and nodes the performance degrades and the communication overh... current
18:58, 28 February 2024 diff hist +801‎ Job efficiency ‎ Add details about Filesystem access
18:45, 28 February 2024 diff hist +576‎ Job efficiency ‎ Add details to Load imbalance
17:41, 28 February 2024 diff hist +679‎ Job efficiency ‎ Add details to Resource oversubscription and Resource underutilization
18:14, 27 February 2024 diff hist +746‎ Job efficiency ‎ Add oversubscription example
17:54, 27 February 2024 diff hist +292‎ N File:Job efficiency oversubscription.png ‎ ClusterCockpit plots showing CPU load on the node- and core-level and CPU time metrics for two shared jobs running on 16 cores respectively. The top plots show measurements for a job with 2 threads per core. The bottom plots show measurements for a job with 16 threads per core. current
14:31, 27 February 2024 diff hist +1,853‎ Job efficiency ‎ Add basic measurement considerations
18:21, 26 February 2024 diff hist +349‎ Job efficiency ‎
17:42, 26 February 2024 diff hist +6,687‎ N Performance metrics ‎ Initial summary of performance metrics
12:50, 26 February 2024 diff hist +827‎ N Performance Monitoring ‎ Initial summary of cluster-level background performance monitoring
12:25, 26 February 2024 diff hist +896‎ N Job efficiency ‎ Initial structure for job efficiency guide
11:27, 19 September 2023 diff hist -14‎ Site-specific documentation ‎ Update PC2 Documentation Link current

User contributions

Navigation menu

Search