Программа курса

Title: Performance optimization thru Intel Compilers

N

  Topic

Duration

Comments

1

Intel® processor microarchitectures

1 lecture + 1 practical work

Objectives: After completion of this module you will be able to describe

∙ Components of an IA processor and main factors which affect processor performance
∙ Cash utilization

. Instruction level parallelism
∙ Overhead of branches

. Vectorization and parallelization
Agenda:
∙ Introduction

∙ Notable features
∙ Micro-architecture drill-down
∙ Advanced cache technology, data prefetching
∙ Pipelining and superscalar architecture

. Branch prediction

. Vectorization

. Parallelization

Introduction of Intel Optimizing Compiler

Practical work:

Launch Intel® Compiler from command line

Short description of main language constructs and tools which will be used for experiments during this course.

2

Tools Foundation I: Using the VTune™ Performance Analyzer

1 lectures +1 practical work

At the completion of this module, you will be able to:
∙ Understand the intended purpose and usage models supported by the VTune™ Performance Analyzer.
∙ Identify hotspots by drilling down through various sample views.
∙ Understand how sampling works
∙ Use VTune to find hotspots and key reasons of hotspot large performance weight
∙ Examine command-line functionality
Agenda:
∙ What is the VTune™ Performance Analyzer?
- Performance tuning concepts

- Using the sampling collector
- How sampling works
- Sampling Over Time
- Processor events corresponding to key factors affecting processor performance  

Practical work

∙ Several tests with hotspots of different nature (bad cash utility, unpredicted branches, bad instruction parallelism)
∙ Determine hot places and identify main reason of slowness

3

Tools Foundation II: Using Intel® Compilers + VTune™ Performance Analyzer

5 lectures + 4 practical works

At the successful completion of this module, you will be able to:
. Describe some basic optimizations and main idea why they improve performance

∙ Optimize software for the Architecture
∙ Use some compiler optimization switches

∙ Enhance performance with vectorization, OPENMP, autoparallelization and other techniques

Agenda:
∙ Introduction
. Compiler architecture

. Basic CFG optimizations

. Loop optimization (distribution/fusion, unrolling, loop interchange)

. Vectorization

. OPENMP

. Autoparallelization

. Software prefetching

Practical work

. Several tests for making optimizations in source code  

. Make optimization and show with VTune numbers key factors of performance improvement

3

Tools Foundation III: Using Intel® Compilers

Intel® processor microarchitectures

4 lectures + 4 practical works

At the successful completion of this module, you will be able to:
∙ Use key compiler optimization switches

∙ Optimize software for the Architecture
∙ Enhance performance with usage of interprocedural optimizations, dynamic profiler and other techniques

Agenda:
∙ Introduction
. Compiler architecture (two-pass compilation)

. Basic interprocedural optimization and their sense

. Interprocedural optimizations (inlining, data flow analysis overview)

. Some restructuring optimizations and their sense

.Usage of dynamic profiler (profiler guide optimizations)

∙ Compiler Switches

. Some consecutive algorithm of performance improvement with different compiler switches

Practical work
. Several tests for experiment with different compiler switches and source code modification

. Check that stronger optimization switches demonstrates positive effect

3

Course Summary

lecture + 2 practical works

At the successful completion of this module, you will be able to:
∙ Describe some approximate approach to application performance improvement

Agenda:
Summarizing all discussed question in some approximate algorithm of application performance improvement

Practical work
.Performance dungeon

Analyze some benchmark and suggest source code optimizations which can improve application performance.