Biodata Mining

5 ECTS credits
125 h study time

Offer 1 with catalog number 4021495ENR for all students in the 2nd semester at a (E) Master - advanced level.

Semester

2nd semester

Enrollment based on exam contract

Impossible

Grading method

Grading (scale from 0 to 20)

Can retake in second session

Yes

Enrollment Requirements

Registration for this course is possible if the student is registered for 'Bioinformatics and Omics'.

Taught in

English

Partnership Agreement

Under interuniversity agreement for degree program

Faculty

Faculty of Sciences and Bioengineering Sciences

Department

Bio-Engineering Sciences

Educational team

Dominique Maes
Wim Vranken (course titular)

Activities and contact hours

26 contact hours Lecture
26 contact hours Seminar, Exercises or Practicals
26 contact hours Independent or External Form of Study

Course Content

This course introduces you to concepts and practical skills in relation to i) gathering, ii) organising, iii) integrating and iv) analysing biological data, in other words ‘data mining’. This is particulary relevant for molecular biology: there is an enormous increase in the availability of biological data, and knowing how to use these data and analyse them will help you advance the quality of your future scientific studies. The course contains an introduction to programming in Python (for data handling) and R (for statistical analysis and plot generation). It focusses on statistical concepts, such a p-values, using genome data in relation to human health.

The content of the course is, in more detail:

1. Using protein sequence databases (e.g. UniProt) to gather sequence information and create multiple sequence alignments (MSA) with these sequences.

2. Gathering data on the effect of amino acid variants (mutations) on the organism phenotype (e.g. gnomAD for human health).

3. Curating and organising the gathered data.

4. Extracting derived information from the protein sequences (e.g. secondary structure or solvent accessibility predictions).

5. Integrating the (derived) data into a single data structure.

6. Analysing the data to find patterns and significant differences. This includes generating plots to visualise distributions, correlations, ...

Additional info

None

Learning Outcomes

General competences

1. Awareness
Recognition of key terms in relation to data handling and analysis, as well as programming.

2. Understanding
Understanding of programming and of statistical approaches relevant for solving biological or medical questions with large data sets.

3. Communication
Ability to communicate constructively with peers in a joint project.

4. Application
Implementation of a Python/R script to gather data from external sources (databases), organise this data, integrate it with other, related, data.

5. Analysis
Analysis of (integrated) data on a large scale using statistical approaches and data visualisation using graphs.

6. Capacity to evaluate
Based on what was learned during the course, evaluate proposed analyses of biological data, with awareness of factors such as data quality, bias or overlap in the data used.

Grading

The final grade is composed based on the following categories:
Other Exam determines 100% of the final mark.

Within the Other Exam category, the following assignments need to be completed:

Other exam with a relative weight of 1 which comprises 100% of the final mark.

Additional info regarding evaluation

You will be evaluated on the basis of a data analysis project, to be performed during the course in pairs, and written up in a report (50%) and a final presentation, where you present your work and will be evaluated on your data analysis knowledge and understanding (50%).

Allowed unsatisfactory mark

The supplementary Teaching and Examination Regulations of your faculty stipulate whether an allowed unsatisfactory mark for this programme unit is permitted.

Academic context

This offer is part of the following study plans:
Master of Molecular Biology: Standaard traject
Master of Biology: Molecular and Cellular Life sciences