Skip to content

Latest commit

 

History

History
102 lines (72 loc) · 5.59 KB

00_ml_math.md

File metadata and controls

102 lines (72 loc) · 5.59 KB

ML数学基础

概率统计

有的面试会直接考察统计与数学知识。即使不是直接考察,在ML环节用数学佐证自己的观点是非常有裨益的。

  • 中心极限定理

    • 中心极限定理指的是给定一个任意分布的总体。每次从这些总体中随机抽取 n 个抽样,一共抽 m 次。 然后把这 m 组抽样分别求出平均值。 这些平均值的分布接近正态分布。
  • Hypothesis testing

    • 通过样本来推测总体是否具备某种性质
    • 和最大似然类似?做出某个假设之后,依据其分布计算出,给出在这个分布下观察到这个现象的概率
  • z检验

    • 均值对比的假设检验方法主要有Z检验和T检验,Z检验面向总体数据和大样本数据,而T检验适用于小规模抽样样本
  • t检验/t-test

    • t检验比z检验的普适性更强,z检验要求知道总体标准差,但实际研究中无法获知总体标准差,一般都会用t检验。且当样本量足够大的时候,数据接近正态分布,t检验几乎成为了z检验,z检验应该说t检验的一个特例
  • F-test

  • P-value

    • 在假设原假设H0正确时,出现当前证据或更强的证据的概率
  • confidence interval

  • correlation matrix

  • VIF

  • R2/ adjusted R2

  • ANOVA

  • 蒙特卡洛

  • 独立同分布IID

    • 机器学习领域的重要假设

AB test

  • sample size计算
  • 不同element increase/decrease对power的影响

矩阵

特征值与特征向量

迹 trace

  • 主对角线上的元素之和
  • 矩阵的迹与特征值之和有关
  • 协方差矩阵的迹是样本方差的和

微积分

机器学习中使用的微积分主要在于优化。

问答

  • a/b testing如何确定sample size
  • What is p-value? What is confidence interval? Explain them to a product manager or non-technical person.
  • How do you understand the "Power" of a statistical test?
  • If a distribution is right-skewed, what's the relationship between medium, mode, and mean?
  • When do you use T-test instead of Z-test? List some differences between these two.
  • Dice problem-1: How will you test if a coin is fair or not? How will you design the process(有时会要求编程实现)? what test would you use?
  • Dice problem-2: How to simulate a fair coin with one unfair coin?
  • 3 door questions.
  • Bayes Questions:Tom takes a cancer test and the test is advertised as being 99% accurate: if you have cancer you will test positive 99% of the time, and if you don't have cancer, you will test negative 99% of the time. If 1% of all people have cancer and Tom tests positive, what is the prob that Tom has the disease? (非常经典的cancer screen的题,做会这一道,其他都没问题了)
  • How do you calculate the sample size for an A/B testing?
  • If after running an A/B testing you find the fact that the desired metric(i.e, Click Through Rate) is going up while another metric is decreasing(i.e., Clicks). How would you make a decision?
  • Now assuming you have an A/B testing result reflecting your test result is kind of negative (i.e, p-value ~= 20%). How will you communicate with the product manager? If given the above 20% p-value, the product manager still decides to launch this new feature, how would you claim your suggestions and alerts?
  • 给你一些visitors and conversations,怎么计算significance
  • 什么是type I/II error
  • 圆周上任取三个点,能组成锐角三角形的概率是多大?
  • rejection sampling
  • 假设现有一枚均匀硬币,现要投掷硬币,直到其两次出现正面,求投掷的期望次数

Reference