平时注意构建知识体系,通过读论文和做实验不断添砖加瓦、完善体系
- 熟悉常见模型的原理、代码、如何实际应用、优缺点、常见面试问题等
- 考察范围包括ML breadth, ML depth, ML application, coding
- 可能持续被追问为什么? 如为什么某个trick能起作用?
- 算法背后的数学原理,写出其主要数学公式,并能进行白板推导
- 一些较新的领域,会考察论文细节
- 每一个算法的scale, 如何将算法map-reduce化
- 每一个算法的复杂度、参数量、计算量
-
手写基础算法,以及一些优化的follow up
-
延伸
- 给一个LSTM network的结构,计算how many parameters
- convolution layer的output size怎么算? 写出公式
- 设计一个sparse matrix (包括加减乘等运算)
-
怎么解决nn的 over-fitting/ under-fitting
- 过拟合:
- 从数据角度,收集更多训练数据。求其次的话,数据增强方法。
- 降低模型复杂度,如神经网络中的层数、宽度,树模型中的树深度、剪枝。模型正则化方法,如正则约束L2。集成学习方法,bagging方法。
- Cross-validation to detect over-fitting.
- Train with more data.
- Data augmentation.
- Feature selection.
- Early stop.
- Regularization.
- Ensemble methods.
- Pretrained model
- 欠拟合:
- 增加新特征,增加模型复杂度,减少正则化系数。
- 训练模型的第一步就是要保证能够过拟合。
- 过拟合:
-
怎么解决样本不平衡问题
- https://imbalanced-learn.org/en/stable/user_guide.html
- 如果是classification,data是long tail的,只是取头部80%的label,其他的label不要了,mark as others
- 如果真的特别imbalance,99.99% 和0.01%,类似spam的情况。 就只能试试别的方法,outlier detection之类
- 最后继续引申到样本的难易
- 评价指标:AP(average_precision_score)
- downsampling
- faster convergence, save disk space, calibration(=upweight?)
- upweight
- every sample contribute the loss equality
-
怎么解决数据缺失的问题
-
怎么解决类别变量中的高基数特征 high-cardinality
-
优化器,如何选择优化器
- MSE, loglikelihood+GD
- SGD-training data太大量
- ADAM-sparse input
-
数据收集
- production data, label
- Internet dataset
-
分布不一致怎么解决
- distribution不是特别指的feature的,也有label的。label只能说多收集data,还是balance data的问题。
- data distribution 改变,就是做auto train, auto deploy.如果参数drop太多,只能人工干预重新训练
-
推荐,scale\abtesting\trouble-shooting
-
怎么提升模型的latency
- 小模型
- 知识蒸馏
- squeeze model to 8bit or 4bit
-
Generative vs Discriminative
- A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data.
- Discriminative models will generally outperform generative models on classification tasks. Discriminative model learns the predictive distribution p(y|x) directly while generative model learns the joint distribution p(x, y) then obtains the predictive distribution based on Bayes' rule.
-
The bias-variance tradeoff is a central problem in supervised learning
-
Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously.
-
High-variance learning methods may be able to represent their training set well but are at risk of overfitting to noisy or unrepresentative training data.
-
In contrast, algorithms with high bias typically produce simpler models that don't tend to overfit but may underfit their training data, failing to capture important regularities.
-
-
模型的并行
- 线性/逻辑回归
- xgboost
- cnn
- RNN
- transformer
- 在深度学习框架中,单个张量的乘法内部会自动并行
- https://github.com/eriklindernoren/ML-From-Scratch
- https://github.com/resumejob/interview-questions
- https://github.com/2019ChenGong/Machine-Learning-Notes
- https://github.com/ctgk/PRML
- https://github.com/nxpeng9235/MachineLearningFAQ/blob/main/bagu.md
- https://docs.qq.com/doc/DR0ZBbmNKc0l3RGR2
- 机器学习八股文的答案
- ML, DL学习面试交流总结