Course title |
Cloud Computing for High Dimensional Data |
Semester |
112-2 |
Designated for |
Master Program in Statistics of National Taiwan University |
Instructor |
CHEN, YAN-BIN |
Curriculum Number |
IMPS5010 |
Curriculum Identity Number |
H41EU0120 |
Class |
01 |
Credits |
3.0 |
Full/Half Yr. |
Half |
Required/ Elective |
Elective |
Time |
Monday 7,8,9(14:20~17:20) |
Remarks |
The upper limit of the number of students: 15. |
|
|
Course introduction video |
|
Table of Core Capabilities and Curriculum Planning |
Association has not been established |
Course Syllabus
|
Please respect the intellectual property rights of others and do not copy any of the course information without permission
|
Course Description |
This course offers practical training in data science, focusing on high-dimensional data computing and dimension reduction algorithms. Practical exercises will be conducted on high-speed GPU servers on the cloud, possibly utilizing resources such as the National Center for High-Performance Computing (國家高速網路與計算中心) or Google Colab. In addition to the hands-on exercises, statistical theories related to dimension reduction algorithms, data visualization, and data interpretation will be introduced. The Python programming skills will be taught during the first month, but please note that this Python segment is part of a combined and quick recap course.
The course is taught in English, but bilingualism is acceptable for discussions and Q&A sessions.
Teaching methods in each week:
50 mins: Lecture.
80 mins: Students engage in hands-on exercises and teamwork.
20 mins: Conclusion of hands-on exercises.
If you would like to take the course but were unable to successfully enroll, please come to class in the first week. We may deliver the authrization codes. |
Course Objective |
The students will learn the inherent characteristics of high-dimensional data and dimension reduction techniques. Additionally, they will gain hands-on experience in operating and accessing high-dimensional data on high-speed GPU servers. Students will be expected to complete projects that involve preprocessing, computing, and operating high-dimensional data on the high-speed GPU servers. |
Course Requirement |
1. The students should have programming skills (very basic level) in Python before taking.
2. The students should take along with their laptops in the class session. |
Student Workload (expected study time outside of class per week) |
6 hours |
Office Hours |
Appointment required. |
Designated reading |
Month 1: Book1, Chapter 3,5,9
Month 2: Book2, Chapter 1,2
Month 3: Book2, Chapter 5,6
Month 4: Paper study |
References |
Book 1: Python for Data Analysis, 3E --- Data Wrangling with Pandas, NumPy, and Jupyter, 2022
By Wes McKinney
Book 2: Nonlinear Dimensionality Reduction Techniques -- A Data Structure Preservation Approach, 2021
By Sylvain Lespinats, Benoit Colange, Denys Dutykh |
Grading |
No. |
Item |
% |
Explanations for the conditions |
1. |
In class: exercise in class session |
20% |
|
2. |
Midterm: paper presentation |
30% |
|
3. |
Final: final project (peer evaluation 10%) |
50% |
|
|
Adjustment methods for students |
Teaching methods |
Provide students with flexible ways of attending courses |
Assignment submission methods |
Group report replace Personal report, Mutual agreement to present in other ways between students and instructors |
Exam methods |
Written (oral) reports replace exams |
Others |
Negotiated by both teachers and students |
|
Week |
Date |
Topic |
Week 1 |
|
Introduction |
Week 2 |
|
[Part1: A Quick Recap of Python]
Python Environment Setup |
Week 3 |
|
Data Structures and Functions (Book1, Chap 3) |
Week 4 |
|
Pandas (Book1, Chap 5) |
Week 5 |
|
Plot and Visualization (Book1, Chap 9) |
Week 6 |
|
[Part2: Dimensionality Reduction Techniques]
Similarity Measure and Distance Function (Book2, Chap 1) |
Week 7 |
|
Nearest Neighbors in Scikit-learn (Book2, Chap 2) |
Week 8 |
|
Machine Learning for Artificial Intelligence
Paper Presentation 1/4 |
Week 9 |
|
Supervised Learning (Book2, Chap 5)
Paper Presentation 2/4 |
Week 10 |
|
Unsupervised Dimensionality Reduction: PCA, t-SNE (Book2, Chap 5)
Paper Presentation 3/4 |
Week 11 |
|
Deep Learning 1/2, CNN
Paper Presentation 4/4 |
Week 12 |
|
Deep Learning 2/2, High Dimensional Data |
Week 13 |
|
Final Project Presentation I |
Week 14 |
|
Final Project Presentation II |
Week 15 |
|
Current Research Issues of High Dimensional Data: Chest X-Ray, Alzheimer's Disease MRI |
Week 16 |
|
Real Case Discussion |
|