4 ECTS credits
100 h study time
Offer 1 with catalog number 4016449EER for all students in the 1st semester at a (E) Master - advanced level.
Distributed computing refers to a large collaboration between networked processing units that allows for their processing capacity to be put at the service of a large problem. Nowadays, many systems and applications are being distributed for a variety of reasons: fault-tolerance, processing performance, security as well as geographical spreading of the data or the problem requirements.
This course digs into the internals of distributed computing and storage architectures, with particular emphasis on algorithms and techniques that underlie today’s distributed computing systems. Topics addressed in this course include: (1) Introduction to the Cloud, including a view on the components and features in Today’s Cloud; (2) an overview of MapReduce, including the environment of MapReduce and Hadoop, as well as fundamental operations (such as matrix-vector and matrix-matrix multiplication) in MapReduce; (3) The Gossip protocol and its analysis; (4) The Consensus problem in distributed system models and the Paxos algorithm; (5) the problem of fault detection and solutions such as membership and the optimal failure detector; (6) overview of Peer-to-Peer systems (including Napster, Gnutella, FastTrack, BitTorrent, and Chord); (7) description of key-value stores (including well-known solutions such as Cassandra and HBase); (8) the problem of time and synchronization in distributed systems (including, Lamport / vector timestamps, and pulse-coupled oscillators).
The goal of this course is to introduce the fundamental concepts, methods, and technologies relevant for the design of distributed computing and storage systems as well as to present specific challenges associated with the distributed character of the processing task at hand. Distributed computing is an enabling technology for big data analytics, which is one of the most highly growing disciplines nowadays. The students will have the opportunity to follow a set of lectures and to practically use these concepts in the form of a project using Cloudera's hybrid open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop).
At the end of this course, the student will have developed a deep knowledge and understanding in state-of-the-art concepts and technologies in distributed computing. The student will be able to formulate, grasp, and analyse the fundamental designs in big data systems and to solve big problems in the cloud using the MapReduce paradigm. The student will understand the basic technologies in distributed computing systems and will be able to use in practice the acquired knowledge in terms of a project. The student will be able to investigate how fundamental concepts and protocols such as Chord find application in today’s peer-to-peer systems and distributed databases such as Facebook’s Cassandra.
This course contributes to the following programme outcomes of the Master in Applied Computer Sciences:
MA_A: Knowledge oriented competence
3. The Master in Engineering Sciences has in-depth knowledge and understanding of the advanced methods and theories to schematize and model complex problems or processes
4. The Master in Engineering Sciences can reformulate complex engineering problems in order to solve them (simplifying assumptions, reducing complexity)
8. The Master in Engineering Sciences can collaborate in a (multidisciplinary) team
12. The Master in Engineering Sciences has a creative, problem-solving, result-driven and evidence-based attitude, aiming at innovation and applicability in industry and society
MA_C: Specific competence
18. The Master in Applied Computer Sciences is able to design and use systems for efficient storage, access and distribution of digital information
19. The Master in Applied Computer Sciences has knowledge of and is able to use advanced processing methods and tools for the analysis of (big) data in different application domains
The final grade is composed based on the following categories:
Other Exam determines 100% of the final mark.
Within the Other Exam category, the following assignments need to be completed:
The final exam will be a written evaluation, where the students will address theoretical questions and will also be asked to write pseudo-code that solves specific distributed computing problems. The project will examine the students’ involvement in the seminar sessions, evaluate their in-depth understanding of distributed computing algorithms, and assess their practical coding skills.
The final grade is composed based on the following examinations: (1) the result of a final exam, which determines 70% of the final mark; and (2) the result of a project work, which determines 30% of the final mark.
This offer is part of the following study plans:
Master in Applied Sciences and Engineering: Applied Computer Science: Standaard traject