This is an article from International Conference on Supercomputing in the Proceedings of the 19th annual international conference on Supercomputing. The article provides an in depth analysis of a particular technique in monitoring performance in real-time of hardware performance counters that then further analysis of bottlenecks in the microarchitecture and the software that meets hardware at a high-level abstraction layer. This real-time analysis can improve the optimization of existing software systems and lead to more efficient platforms, even applications in parallel computing. I found this article interesting as it is a technique that can improve the level of hardware literacy not only within the hardware engineering community, but it is also a technique that can be used by software developers to study the performance of their code in real life circumstances.