Course Information
Course title
Cloud Computing for High Dimensional Data 
Semester
112-2 
Designated for
Master Program in Statistics of National Taiwan University  
Instructor
CHEN, YAN-BIN 
Curriculum Number
IMPS5010 
Curriculum Identity Number
H41EU0120 
Class
01 
Credits
3.0 
Full/Half
Yr.
Half 
Required/
Elective
Elective 
Time
Monday 7,8,9(14:20~17:20) 
Remarks
The upper limit of the number of students: 15. 
 
Course introduction video
 
Table of Core Capabilities and Curriculum Planning
Association has not been established
Course Syllabus
Please respect the intellectual property rights of others and do not copy any of the course information without permission
Course Description

This course offers practical training in data science, focusing on high-dimensional data computing and dimension reduction algorithms. Practical exercises will be conducted on high-speed GPU servers on the cloud, possibly utilizing resources such as the National Center for High-Performance Computing (國家高速網路與計算中心) or Google Colab. In addition to the hands-on exercises, statistical theories related to dimension reduction algorithms, data visualization, and data interpretation will be introduced. The Python programming skills will be taught during the first month, but please note that this Python segment is part of a combined and quick recap course.

The course is taught in English, but bilingualism is acceptable for discussions and Q&A sessions.

Teaching methods in each week:
50 mins: Lecture.
80 mins: Students engage in hands-on exercises and teamwork.
20 mins: Conclusion of hands-on exercises.

If you would like to take the course but were unable to successfully enroll, please come to class in the first week. We may deliver the authrization codes. 

Course Objective
The students will learn the inherent characteristics of high-dimensional data and dimension reduction techniques. Additionally, they will gain hands-on experience in operating and accessing high-dimensional data on high-speed GPU servers. Students will be expected to complete projects that involve preprocessing, computing, and operating high-dimensional data on the high-speed GPU servers. 
Course Requirement
1. The students should have programming skills (very basic level) in Python before taking.
2. The students should take along with their laptops in the class session. 
Student Workload (expected study time outside of class per week)
6 hours 
Office Hours
Appointment required. 
Designated reading
Month 1: Book1, Chapter 3,5,9
Month 2: Book2, Chapter 1,2
Month 3: Book2, Chapter 5,6
Month 4: Paper study 
References
Book 1: Python for Data Analysis, 3E --- Data Wrangling with Pandas, NumPy, and Jupyter, 2022
By Wes McKinney

Book 2: Nonlinear Dimensionality Reduction Techniques -- A Data Structure Preservation Approach, 2021
By Sylvain Lespinats, Benoit Colange, Denys Dutykh 
Grading
 
No.
Item
%
Explanations for the conditions
1. 
In class: exercise in class session 
20% 
 
2. 
Midterm: paper presentation 
30% 
 
3. 
Final: final project (peer evaluation 10%) 
50% 
 
 
Adjustment methods for students
 
Teaching methods
Provide students with flexible ways of attending courses
Assignment submission methods
Group report replace Personal report, Mutual agreement to present in other ways between students and instructors
Exam methods
Written (oral) reports replace exams
Others
Negotiated by both teachers and students
Progress
Week
Date
Topic
Week 1
  Introduction 
Week 2
  [Part1: A Quick Recap of Python]
Python Environment Setup 
Week 3
  Data Structures and Functions (Book1, Chap 3) 
Week 4
  Pandas (Book1, Chap 5) 
Week 5
  Plot and Visualization (Book1, Chap 9) 
Week 6
  [Part2: Dimensionality Reduction Techniques]
Similarity Measure and Distance Function (Book2, Chap 1) 
Week 7
  Nearest Neighbors in Scikit-learn (Book2, Chap 2) 
Week 8
  Machine Learning for Artificial Intelligence
Paper Presentation 1/4 
Week 9
  Supervised Learning (Book2, Chap 5)
Paper Presentation 2/4 
Week 10
  Unsupervised Dimensionality Reduction: PCA, t-SNE (Book2, Chap 5)
Paper Presentation 3/4 
Week 11
  Deep Learning 1/2, CNN
Paper Presentation 4/4 
Week 12
  Deep Learning 2/2, High Dimensional Data 
Week 13
  Final Project Presentation I 
Week 14
  Final Project Presentation II 
Week 15
  Current Research Issues of High Dimensional Data: Chest X-Ray, Alzheimer's Disease MRI 
Week 16
  Real Case Discussion