課程大綱

課程資訊

課程名稱

高效能巨量資料與人工智慧系統
High-Performance Big Data and Artificial Intelligence Systems

開課學期

111-2

授課對象

電機資訊學院資料科學碩士學位學程

授課教師

洪士灝

課號

CSIE5373

課程識別碼

922 U4620

班次

學分

3.0

全/半年

半年

必/選修

必修

上課時間

星期一6,7,8(13:20~16:20)

上課地點

資105

備註

須具備計算機結構與作業系統之基礎。
限學士班三年級以上
總人數上限：50人

課程簡介影片

核心能力關聯

核心能力與課程規劃關聯圖

課程大綱

為確保您我的權利,請尊重智慧財產權及不得非法影印

課程概述

近年巨量資料與人工智慧的快速發展，創造許多新興的應用，為了對於更大量的資料進行分析處理以及追求更強大的人工智慧，許多國家級的科技研究乃至於大型商業應用都開始採用高效能計算(超級電腦)技術來提升競爭力，而如今的高效能計算平台也紛紛開始支援重要的巨量資料與人工智慧應用，因此高效能計算成為帶動前瞻科技的火車頭之一。然而高效能計算平台包含一些進階的技術，包括異質計算、平行計算、分散式處理、高速網路等，往往必須透過軟硬體整合優化的方式，才能打造出高效能與高效率的系統和應用，因此能夠善用高效能計算平台的人才並不多見。對此一領域有興趣的學生，即便修習多項相關課程，恐怕仍然無法完整涵蓋此領域之基本知識與技能，更難以將多門課程所學到的東西加以整合運用。針對以上所述之需求與門檻，本課程將採用問題導向式教學法(Problem-Based Learning)，以巨量資料與人工智慧領域中的實務問題為核心，教授相關的高效能計算知識與技能，並且鼓勵學生進行小組討論、論文研讀、期末專題，以培養學生主動學習、批判思考和問題解決能力。

課程目標

在一學期的課程中，我們將探討各類型巨量資料與人工智慧應用常遇到的系統議題，探討如何打造高效能的系統。學生將學習埋藏在系統內部的關鍵技術，包括系統架構、軟體框架、軟硬體整合與優化，以及最新的技術發展趨勢。在一學期的課程中，我們將探討: (一) 平行與分散式計算原理 (二) 高效能計算的軟硬體架構 (三) 高效率的巨量資料儲存與分析系統 (四) 高效率的人工智慧訓練與推論系統 (五) 資訊安全與隱私保護 (六) 系統效能評估與優化實際案例。以上的每個階段，都包含了軟硬體整合與優化的議題，本課程除了介紹相關的系統架構、軟體框架之外，也將帶領學生探討最新的技術發展趨勢以及應用個案。

課程要求

課堂討論；課後作業；期末專題提案；期末專題報告；

預期每週課後學習時數

3 hours

Office Hours

另約時間備註： Please contact the teacher and the TA. We have two TA's: 何明倩 (Ms. Ho)：r11944009@ntu.edu.tw 劉盛興(Mr. Liu)：r11922123@ntu.edu.tw

指定閱讀

投影片、參考書籍與論文

參考書目

上課時提供

評量方式
(僅供參考)

No.	項目	百分比	說明
1.	Class Attendance	40%	Attend at 11 lectures and participate in classroom discussions. There might be homeworks.
2.	Final Project Proposal	15%	Propose a final project to investigate on performance issues related to HPC, big data and/or AI applications. The proposal should be innovative and meaningful.
3.	Final Project Presentation	25%	Investigate on issues raised by the proposed project. Find and evaluate potential solutions. Present and discuss the results.
4.	Mid-term Exam	20%	Evaluate how students learn.

針對學生困難提供學生調整方式

上課形式	以錄影輔助, 提供學生彈性出席課程方式
作業繳交方式	書面報告取代口頭報告, 學生與授課老師協議改以其他形式呈現
考試形式	延後期末考試日期(時間)
其他	由師生雙方議定

課程進度

週次	日期	單元主題
第1週	2/20	Introduction to the course: What is high-performance computing (HPC)? Why do we want to build high-performance systems? Why high-performance is important to big data analytics and AI applications? How to design high-performance systems for big data analytics and AI applications?
第2週	2/27	Holiday
第3週	3/6	Overview of high-performance computing and basics of parallel computing: Why do we need parallel computing? What are the paradigms for parallel computing? How to pursue high-performance with parallel computing in practice? Where are the performance bottlenecks and how to identify them? How to practice performance analysis?
第4週	3/13	(Recorded Video) Big-data systems: concept and implementation issues. How to store petabyte-scale big data in high-performance cluster filesystems such as HDFS? How to process big data in datacenter with Hadoop MapReduce? Other than parallel computing, the key is data locality and the trick is colocation. How to accelerate data processing with in-memory computing? Lots of open source middleware projects are available for you to explore.
第5週	3/20	AI systems: Basics and implementation issues. Many AI applications contain lots of parallelism, and parallel computing can effectively accelerate these applications. Parallel algorithms have been developed for search and expert systems before the last AI Winter. Datacenter and GPU clusters are keys to open the deep learning era. How to train deep learning models with thousands of GPUs in the datacenter?
第6週	3/27	Edge-cloud computing and system software: Cloud computing, mobile computing, Internet of Things (IoT), autonomous driving, robots... Everything is connected and needs better mechanisms (system software?) to work together via networks. How do things connect? How do they collaborate efficiently?
第7週	4/3	Holiday
第8週	4/10	Information security and data privacy: How to protect data? There are security protocols and cryptographic methods for this purpose. The real new challenges today are to perform big data analytics and develop AI models under data protection. How to do it with techniques such as trusted computing hardware (SGX), federated learning, secure multiparty computation, and homomorphic encryption?
第9週	4/17	Domain specific accelerators and heterogeneous computing: How to estimate the performance for neural networks with or without deep learning accelerators? How to find good neural networks for your application with platform-aware neural architecture search (NAS)? How to compress a neural network to reduce its resource consumption?
第10週	4/24	Large language models: How to train large language models such as GPT3? What are performance issues and the frameworks to address those issues? Can we compress a LLM to run on PC?
第11週	5/1	Midterm Exam Post-Moore - Neuromorphic computing and quantum computing: The increase of computing performance has depended on the Moore's Law for the past 60 years, but the Moore's Law is slowing down and will eventually ends. How to continue improving the capability of big data analytics and AI in the post-Moore era?
第12週	5/8	Final Project Proposal
第13週	5/15	Advanced Topics (HPC) - TBD
第14週	5/22	Advanced Topics (Big Data) - TBD
第15週	5/29	Advanced Topics (AI) - TBD
第16週	6/5	Final Project Presentation