# 2.3.1. 机器学习machine learning¶

## Kinds of Machine Learning¶

### Supervised learning¶

Regression:

“How many hours will this surgery take?”: regression
“How many dogs are in this photo?”: regression.

L1 loss where
𝑙(𝑦,𝑦′)=∑𝑖|𝑦𝑖−𝑦′𝑖|

L2 loss, where
𝑙(𝑦,𝑦′)=∑𝑖(𝑦𝑖−𝑦′𝑖)2


Classification:

optical character recognition (OCR)
It is treated with a different set of algorithms than those used for regression.
In classification, we want our model to look at a feature vector
(e.g., the pixel values in an image)
and then predict which category (formally called classes),
among some (discrete) set of options, an example belongs.
The simplest form of classification is when there are only two classes,
a problem which we call binary classification.


Tagging:

The problem of learning to predict classes that are not mutually exclusive
is called multi-label classification.

A typical article might have 5-10 tags applied because these concepts are correlated.



Search and ranking:

Google搜索, 一个关键词对应的搜索结果


Recommender systems:

推荐系统, 根据你的喜好给你推荐内容: 头条推荐你喜欢的内容


Sequence Learning:

1. Automatic Speech Recognition
2. Text to Speech.
3. Machine Translation.


### Unsupervised learning¶

1. clustering
2. subspace estimation problems.
principal component analysis(If the dependence is linear)
3. representation learning(e.g. Rome  −  Italy  +  France  =  Paris.)
4. causality and probabilistic graphical models


### Interacting with an Environment¶

offline learning:

前讲的都是离线学习. 即:

Asimov’s Robot Series
That means we need to think about choosing actions, not just making predictions.
Moreover, unlike predictions, actions actually impact the environment.



### Reinforcement learning¶

2个实例:

deep Q-network that beat humans at Atari games using only the visual input
the AlphaGo program that dethroned the world champion at the board game Go

• The goal of reinforcement learning is to produce a good policy.
• we can cast any supervised learning problem as an RL problem.

### MDPs, bandits, and friends¶

• When the environment is fully observed, we call the RL problem a
「Markov Decision Process (MDP)」.
• When the state does not depend on the previous actions, we call the problem a
「contextual bandit problem」
• When there is no state, just a set of available actions with initially unknown rewards,
this problem is the classic 「multi-armed bandit problem」.

## 实例¶

自动驾驶: autopilot
NLP:

CV: