-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathindex.Rmd
executable file
·136 lines (90 loc) · 4.55 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: "Introducing Data Science with R"
author: "Shu-Kai Hsieh"
framework: minimal
github:
branch: gh-pages
repo: rlads2017
user: loperntu
hitheme: solarized_light
logo: assets/img/lopen.jpg
mode: selfcontained
subtitle: Rlads
ext_widgets:
rCharts: libraries/nvd3
---
# R 語言與資料科學導論
<!--
<a href="http://prose.io/#{{site.github.user}}/{{site.github.repo}}/edit/gh-pages/index.Rmd" class="button icon edit">Edit Page</a>
-->
```{r setup, cache = F, echo = F}
knitr::knit_hooks$set(document = function(doc){
gsub('`` ', '```', doc)
})
```
`資料科學家`的工作, 可以視為是一個探索、預測與解讀資料意義的互動歷程。而`語言分析`的工作, 在了解文本資料的語意與情緒表現上是重要的關鍵。本課程結合 了目前統計程式設計與自然語言處理技術, 以較為簡潔容易入門的設計與實際操作導引, 希望可以讓毫無相關程式學習基礎的學生在本課程的帶領下, 達到以下的學習目標:
- 瞭解 R 語言的基本知識。
- 瞭解結構與非結構性資料的特性與預處理工作, 特別是針對中文文本中呈現的語言特性的處理方法。
- <span style="color:blue; font-weight:bold">了解中文的語言特性與文本解析 (text analytics) 的基本概念。</span>
- 選擇適當的變數與特徵並加以合理調製, 對之進行描述統計與視覺探勘, 針對不同的問題點與數據類型, 找出適當的圖形表達與統計分析。
- 學習簡易的自然語言處理與機器學習預測模式, 並應用在自己關心的領域。
- 學習實作資料科學專案與溝通表達。
## 課綱
[詳細一點的版本](dsR2017-8.pdf)
Week | Date | Topic | Lab
-----|:------:| --- | ---
1 | 09/14 | Orientation |
2 | 09/21 | Introduction to Data Science and Text Analytics | Installing R and Rstudio
3 | 09/28 | Introduction to Data Science and Text Analytics | R overview; data types and structures
4 | 10/05 | Preparing / Obtaining Data | data structures; built-in plot; looping
5 | 10/12 | Scrubbing Data | Data wrangling, vectorization, tidyverse
6 | 10/19 | Exploratory Data Analysis and Graphics | encoding; string processing; regular expression
7 | 10/26 | Exploratory Data analysis and Graphics | data manipulation (with regex)
8 | 11/02 | Corpus and Natural Language Processing | handling Chinese textual data
9 | 11/09 | **Mid-term exam** |
10 | 11/16 | Corpus and Statistics | web crawling:HTML parsing (rvest)
11 | 11/23 | Advanced Graphics | ggplot2, plot.ly
12 | 11/30 | Machine Learning Basics: Classification and Clustering |
13 | 12/07 | Regression | Shiny Web application [I]
14 | 12/14 | Sentiment analysis | Shiny Web application [II]
15 | 12/21 | Current Topics in Text Analytics | Shiny Web application [III]
16 | 12/28 | Reporting and Presenting Data | Shiny Web application [IV]
17 | 01/04 | Term project competition/presentation |
18 | 01/11 | **Final term project and report due** |
## 教練團
```coffee
謝舒凱 <[email protected]>
曾昱翔 <[email protected]>
梁文宣 <[email protected]>
李智堯 <[email protected]>
吳小涵 <[email protected]>
```
## 課程投影片
<!--
- <span style="color:blue; font-weight:bold"> Week.1 </span>: [Week 01](lectures/00/index.html)
- <span style="color:blue; font-weight:bold"> Week.2 </span>: [Week 02](lectures/01/index.html)
-->
- [網址](https://github.com/loperntu/rlads2017)
## 助教講義、習題與作業
- [評分標準](http://lope.linguistics.ntu.edu.tw/courses/data_science/grading_policy2016.html)
- [網址](https://sites.google.com/ntu.edu.tw/rladsntu/home)
- [程式網址](https://github.com/RLadsNTU/RLadsLab)
## 課程教材
在課程投影片中講解基本概念,如果有興趣了解進階內容,可參考以下線上教材
- [語言分析與資料科學](https://www.gitbook.com/book/loperntu/ladsbook/details)
- [開放語料庫:製程與分析](https://www.gitbook.com/book/loperntu/copens/details)
## 課程相關活動
- [NTU CEIBA]()
- [學習共筆]()
- [臉書社團](https://www.facebook.com/groups/652099794893097/)
## 課程精神
1. 自主學習
2. 跨學門協作
## 作業分數分佈圖
## 小組作業觀摩
## Capstone projects
- [分組名單](https://docs.google.com/spreadsheets/d/19ggUvFCVbFxfWdmzPcjLmDoZfXnE8DJAaYzuNPNBrbQ/edit#gid=0)
- [pttR]()
<a href='lectures/00/index.html#9'>
<img style='border: 1px solid;' width=100% src='./assets/img/neocilin.png'></img>
</a>