Syllabus
Welcome to the Data Analytics (INFOB2DA) course at the Utrecht University. This course is an introduction to key principles and techniques for data analysis and interactively visualizing data. The major goals of this course are to learn how to apply a data-driven approach to problem solving, get an insight into the core techniques used in the field and understand how visual representations can complement the analysis and understanding of complex data.
Learning Objectives
The students are taught elementary theoretical knowledge and get first practical experience in the data analysis domain. They obtain the ability to assess requirements and parameters for the application of fundamental analysis algorithms. Beyond that, students will practically apply and assess the results in an autonomous way.
In the visualization area, they are taught appropriate visual mappings for varying data types, and will apply them to form useful interactive visualization systems. The students will be enabled to judge design decisions considering properties of human perception and to develop and assess visualizations solutions.
After completion of the course you will be able to:
- Evaluate different Data Analysis (DA) processes and their differentiating key aspects.
- Apply selected techniques and algorithms to a data set from a task-oriented perspective.
- Analyze semi-structured and unstructured data, for example using text analysis.
- Use external data sources in analyses to derive new insights.
- Relate the potential negative impact of data quality problems.
- Use principles of human perception and cognition in visualization design
- Conceptualize ideas and interaction techniques using sketching and prototyping
- Apply methods for visualization of data from a variety of fields
- Create web-based interactive visualizations using D3 and Streamlit
- Work constructively as a member of a team to carry out a complex project
For an overview of the schedule with lecture topics and the assignments, please see the Schedule.
Enrollment and Material
This course requires an Osiris registration. Course Material will be online in MS Teams
Prerequisites
Data Analytics is a level-2 bachelor course which assumes you have completed the Scientific Research Methods (INFOWO) and Imperative (INFOIMP) or Programmeren met Python (BETA-B1PYT) course, or similar (external) courses. If you do not have elementary experience on statistics or programming yet, be aware that you will need to put in significantly more time than 20 hours per week in order to be able to complete this course.
You are expected to have programming experience in Python and you should be comfortable picking up new programming languages on your own. Having Javascript and web development experience is a plus, but not required. However, please be aware that learning a new programming language and/or library like python Streamlit, Flask or D3 is a time consuming process!
Course Components
The 2021 course has restructured the homework assignments to 2 week practical group works.
- Lectures
- Practical Assignments (60%)
- Final Exam (40%)
Lectures
The class meets weekly for lectures. Attending lectures is a crucial component of learning the material presented in this course. Please arrive on time, as we will start promptly with a recap and Q&A about the last lecture’s content. At the end of each lecture we will ask you to fill out and submit a one-minute reflection to collect feedback in Mentimeter.
Labs and Tutorials
For most of the semester, we will hold programming and assignment labs during our regular class lab (werkcolleges) times on Monday afternoon. Labs are interactive tutorials with downloadable code that give you an introduction to Python, data science, client-side web programming with HTML, CSS, Javascript, and Dash/D3/Plotly.
Practical Assignments/Group Work
We will feature four practical assignments (worth 60% of the grade) which have to be conducted in groups of three students. These practical assignments are mandatory and are going to provide an opportunity to practice for the Final Exam. See the homework as an opportunity to learn, and not to “earn points”. The homework will be graded carefully and you can earn 100 points per assignment. You will have to earn 50% of all points to be eligible to the final exam.
- Work in group of up to three people
- Assignment duration: 2 weeks (first assignment 3 weeks)
- Deliverables: Code, presentation and reflection document
- Presentation will count into the grading
- All students must be able to explain the code fragments
Final Exam
A significant part of the course is the final exam (40%) that will cover material from lectures, assigned readings, labs/homework assignments. If you do not keep up with the course material, i.e., come to lecture and complete the homework you will be at a severe disadvantage during the final exam. The current plan is to have a ~120 minutes closed-book (in-class) exam, as the COVID-19 situation permits.
The final exam might contain (among others)
- drawing, sketching, and annotation questions
- multiple choice questions with point deduction for wrong answers
- math calculus questions (might require a calculator)
- free text reflective questions
Grading
The course grade comprises:
- Practical Assignments/Group Work (60%)
- Final Exam (40%)
Any concerns about grading errors must be noted in writing and submitted to your TA/TF within one week of receiving the grade.
Note that the minimum grade of this course is a 5.5, which will be rounded to 6.0 in Osiris.
Course Policies
We will strictly examine all submissions for plagiarism. For code that has to be written in practical assigments, we expect you to write the code on your own. If you find yourself to be copying code from the internet, we will mildly deduct points. If the University’s referencing, citing and linking policies for a code fragment are not obeyed, we will grade the subtask with 0 (zero) points. Repeated incidents will be escalated to the Board of Examiners.
Late Policy
No homework assignments or project milestones will be accepted after the deadline. Homework assignments will be posted on the website MS Teams on (Monday morning/afternoon) and will be due the following (Sunday 23:59); details are listed in on the course schedule site. Solutions to homework assignments can be discussed in the weekly werkcolleges. We plan to give outstanding projects a place in our hall of fame.
If you have special circumstances, such as an illness, that interferes with your coursework please let us know as soon as possible.
Collaboration Policy
We expect you to adhere to the Utrecht Code of Conduct and the UU Academic policies and procedures at all times. Failure to adhere to the code of conduct and our policies may result in penalties, up to and including automatic failure in the course and reference to the ad board.
You may discuss your group work and labs with other people, but you are expected to be intellectually honest and give credit where credit is due. In particular:
- you have to write your solutions entirely on your own;
- you cannot share written materials or code with anyone else;
- you should not view any written materials or code created by anyone else for the assignment;
- you should list all your collaborators (everyone you discussed the assignment with) in your submission;
- you may not submit the same or similar work to this course that you have submitted or will submit to another; and
- you may not provide or make available solutions to individuals who take or may take this course in the future.
If the assignment allows it you may use third-party libraries and example code, so long as the material is available to all students in the class and you give proper attribution. Do not remove any original copyright notices and headers.
Devices in Class
We will use smartphones and laptops throughout the lecture to facilitate activities and project work in-class. However, research and student feedback clearly shows that using devices on non-class related activities not only harms your own learning, but other students’ learning as well. Therefore, we only allow device usage during activities that require devices. At all other times, you should not be using your device. We will help you remember this by announcing when to bring devices out and when to put them away.
Accessibility
If you have any concerns about accessibility please contact the Head TF or the Instructor as soon as possible. Failure to do so may prevent us from making appropriate arrangements.
Responsiveness
The Corona time has taught us that teaching can be online. However, we lecturers experienced that the student’s expectations towards the response times are introducing a lot of stress. Within INFOB2DA we will try to answer questions (through teams and email) within 1-2 working days. I do not expect my TAs to work on weekends and neither should you. I will also infrequently check Teams on the weekend.
Course Resources
Online Materials
All class activity handouts, slides, homeworks, labs, and additional readings will be posted on MS Teams.
We will not live record the lectures, but we will upload separate Chapter Review Videos
that stress individual points during the course duration (after each chapter)
Textbooks
Course Textbooks
- Han J., Kamber M., Data Mining: Concepts and Techniques, 2006, Morgan Kaufmann Publishers, Second Edition
- Interactive Data Visualization: Foundations, Techniques, and Application Ward M. and Grinstein, G. and Keim D. A., 2010, A.K. Peters, Ltd, ISBN: 978-1-56881-473-5, http//www.idvbook.com
Recommended Textbooks
- The Art of Data Science, A central reference is Peng and Matsui (2016), which is available as PDF, e-book, paperback, but you can also read the latest version online at https://bookdown.org/rdpeng/artofdatascience/.
- Visual Thinking for Design, Colin Ware, Morgan Kaufman (2008)
-
Interactive Data Visualization for the Web, Scott Murray, O’Reilly (2017) Second edition! (The 2nd edition teaches D3 Version 4, which we will be using in this course!)
- Visualization Analysis and Design, Tamara Munzner, CRC Press (2014)
- The Functional Art: An introduction to information graphics and visualization, Alberto Cairo, New Riders (2012)
-
Design for Information, Isabel Meirelles, Rockport (2013)
- Berthold M., Borgelt C., Höppner F., Klawonn F., Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data: Making Practical Sense of Real Data (Texts in Computer Science), 2010 Springer
- Hand D.J., Mannila H., Smyth P., Principles of Data Mining, 2001, MIT Press
- Spence R., Information Visualization, 2007, ACM Press Books, Second Edition
Discussion Forum
We use MS Teams as our general discussion forum and for all announcements, so it is important that you are signed up as soon as possible. MS Teams should always be your first resource for seeking answers to your questions. You can also post privately so that only the staff sees your message (use @tags to link people; be aware of the Responsiveness Rules)
Office Hours
Teaching fellows will provide office hours for individual questions that you might have about the lecture and practical group work, as well as general questions. As office hours are usually very heavily attended, please consult MS Teams as a first option to get help.