3 ECTS credits
75 h study time

Offer 1 with catalog number 4022633FNR for all students in the 2nd semester at a (F) Master - specialised level.

Semester
2nd semester
Enrollment based on exam contract
Impossible
Grading method
Grading (scale from 0 to 20)
Can retake in second session
Yes
Taught in
English
Faculty
Faculteit Ingenieurswetenschappen
Department
Industriƫle ingenieurswetenschappen
Educational team
Jan Lemeire (course titular)
Activities and contact hours
12 contact hours Lecture
18 contact hours Seminar, Exercises or Practicals
Course Content

The course is an advanced course on programming GPUs in an effective and efficient manner. Level by level, we will unravel the programming paradigm of modern GPUs, up to the level of warps and vector instructions. In parallel, the hardware architecture is demonstrated and analysed. After learned how to program GPUs, we will show how to identify the performance bottlenecks and discuss possible alleviations.

The programming paradigm and architecture of modern GPUs is unravelled level by level, each level digging deeper into the complexities of hardware and software. An effective sequence of examples and exercises have been set-up such that the student will follow an effective path towards grasping the necessary skills of GPU computing. We will focus on the standard OpenCL language. Since Nvidia’s proprietary CUDA language is based on the same paradigm, a one-on-one mapping of the basic concepts exist. We will show which additional functions Nvidia provide on top of the standard.

The course consists of 5 theoretical lectures, a lab and a programming project.
All information can be found on http://parallel.vub.ac.be/education/gpu

1. The power of GPUs
2. Programming GPUs
3. The GPU architecture and strategy
4. The pipeline performance model
5. Performance limiters
Insight into the peak performance will be given, as well as the issues that make the performance degenerate. The relation with algorithmic and implementations aspects will be discussed, such that the student gets a first insight into which algorithms/implementations are well-suited for GPUs and which aren't.

Each student will execute a set of related benchmarks and analyze the resulting performance. By a report he will show the understanding of the peak and actual performance, and the reasons for the performance degradation.
After that, the student will port a sequential algorithm to OpenCL in order to accelerate its run time. The programming project happens individually. The student will demonstrate the results and based on the feedback of the professor and assistant, improve his results into a final version.

 

Additional info

Prerequisites

  • good knowledge of computer systems (processor and network)
  • good (!) programming experience
  • knowledge of and experience in multithreading.

Additional questions can be directed to jan.lemeire@vub.be

All information can be found on http://parallel.vub.ac.be/education/gpu

Learning Outcomes

General Learning Outcomes

The student will get to understand how to implement algorithms on GPUs and which aspects to consider for efficient implementations. We will put the GPU into perspective; compare them with other technologies and discuss their weaknesses and the challenges for putting the technology into practice.
The student will acquire a thorough understanding in writing GPU kernels, launching kernels, data transfer, kernel synchronization, vector operations, debugging, the available tools, understanding and optimizing memory access, ... He will be able to apply his understanding of low-level thread and hardware characteristics to devise high-performant, scalable solutions.
By the practica and project, the student will have demonstrated that he can make good judgments about complex situations and communicate his conclusions. Specific or complex parallel solutions are possible, but these are difficult to maintain and less generic. Only simple, clever solutions are feasible. The student will be able to participate to discussions about exploiting parallelism and the proper use of modern GPU technology.

Grading

The final grade is composed based on the following categories:
Other Exam determines 100% of the final mark.

Within the Other Exam category, the following assignments need to be completed:

  • Lab Report with a relative weight of 25 which comprises 25% of the final mark.
  • Programming Project with a relative weight of 75 which comprises 75% of the final mark.

Additional info regarding evaluation

The course is evaluated by a lab report (25%) and a programming project (75%)

Allowed unsatisfactory mark
The supplementary Teaching and Examination Regulations of your faculty stipulate whether an allowed unsatisfactory mark for this programme unit is permitted.

Academic context

This offer is part of the following study plans:
Master in Applied Sciences and Engineering: Applied Computer Science: Standaard traject
Master of Applied Sciences and Engineering: Computer Science: Artificial Intelligence
Master of Applied Sciences and Engineering: Computer Science: Multimedia
Master of Applied Sciences and Engineering: Computer Science: Software Languages and Software Engineering
Master of Applied Sciences and Engineering: Computer Science: Data Management and Analytics