A list of resources for source code analysis application using Machine Learning techniques (eg, Deep Learning, PCA, SVM, Bayesian, proabilistic models, reinformcement learning techniques etc)
Maintainers - Peter Teoh
Please feel free to pull requests, email Peter Teoh ([email protected]) or join our chats to add links.
[Join the chat at https://gitter.im/tthtlc/awesome-source-analysis]
Machine-Learning-Guided Selectively Unsound Static Analysis http://www.seas.upenn.edu/~kheo/home/paper/icse17-heohyi.pdf
A Survey of Machine Learning for Big Code and Naturalness https://arxiv.org/pdf/1709.06182
Ariadne: Analysis for Machine Learning Programs https://arxiv.org/pdf/1805.04058
The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT https://arxiv.org/abs/1010.2511
VulDeePecker: A Deep Learning-Based System for Vulnerability Detection https://arxiv.org/pdf/1801.01681
code2vec: Learning Distributed Representations of Code https://arxiv.org/pdf/1803.09473
Automated software vulnerability detection with machine learning https://arxiv.org/abs/1803.04497
Automatic feature learning for vulnerability prediction https://arxiv.org/pdf/1708.02368
Neural Turing Machines https://arxiv.org/pdf/1410.5401.pdf
DeepCoder: Learning to Write Programs https://arxiv.org/abs/1611.01989
Recent Advances in Neural Program Synthesis https://arxiv.org/pdf/1802.02353
Neural-Guided Deductive Search for Real-Time Program Synthesis https://arxiv.org/pdf/1804.01186
RobustFill: Neural Program Learning under Noisy I/O https://arxiv.org/pdf/1703.07469
On End-to-End Program Generation from User Intention by Deep https://arxiv.org/pdf/1510.07211
Neural Program Search: Solving Programming Tasks from Description https://arxiv.org/pdf/1802.04335
A Syntactic Neural Model for General-Purpose Code Generation https://arxiv.org/pdf/1704.01696
Building Machines That Learn and Think Like People https://arxiv.org/pdf/1604.00289
Differentiable Programs with Neural Libraries https://arxiv.org/pdf/1611.02109
Summary-TerpreT: A Probabilistic Programming Language for Program Induction https://arxiv.org/pdf/1612.00817
Auto-Documenation for Software Development https://arxiv.org/pdf/1701.08485
BOOK: Storing Algorithm-Invariant Episodes for Deep Reinforcement Learning https://arxiv.org/pdf/1709.01308
Boda-RTC: Productive Generation of Portable, Efficient Code ... https://arxiv.org/pdf/1606.00094
Making Neural Programming Architectures Generalize via Recursion https://arxiv.org/pdf/1704.06611
Differentiable Functional Program Interpreters https://arxiv.org/pdf/1611.01988
Utilizing Static Analysis and Code Generation to Accelerate https://arxiv.org/pdf/1206.6466
Deep Probabilistic Programming Languages: A Qualitative Study https://arxiv.org/pdf/1804.06458
BinPro: A Tool for Binary Source Code Provenance https://arxiv.org/pdf/1711.00830
A Survey on Compiler Autotuning using Machine Learning https://arxiv.org/pdf/1801.04405
Estimating defectiveness of source code: A predictive model using GitHub content https://arxiv.org/pdf/1803.07764
EMBER: An Open Dataset for Training Static PE Malware Machine https://arxiv.org/pdf/1804.04637
On End-to-End Program Generation from User Intention by Deep Neural Networks https://arxiv.org/pdf/1510.07211
Utilizing Static Analysis and Code Generation to Accelerate Neural Networks https://arxiv.org/abs/1206.6466
DLPaper2Code: Auto-generation of Code from Deep Learning Research Paper https://arxiv.org/pdf/1711.03543
Inferring Generative Model Structure with Static Analysis https://arxiv.org/pdf/1709.02477
Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities https://arxiv.org/pdf/1707.04742
DeepAPT: Nation-State APT Attribution Using End-to-End Deep Neural Networks https://arxiv.org/pdf/1711.09666
Automatic Structure Discovery for Large Source Code https://arxiv.org/pdf/1202.3335
Comment Generation for Source Code: Survey https://arxiv.org/pdf/1802.02971
Towards Reverse-Engineering Black-Box Neural Networks https://arxiv.org/abs/1711.01768
Database Reverse Engineering based on Association Rule Mining https://arxiv.org/pdf/1004.3272.pdf
Automated detection and classification of cryptographic algorithms in binary programs through machine learning https://arxiv.org/pdf/1503.01186
Automatically Generating Commit Messages from Diffs using Neural Machine Translation https://arxiv.org/pdf/1708.09492
When Coding Style Survives Compilation: De-anonymizing Programmers from Executable https://arxiv.org/pdf/1512.08546
Code smells https://arxiv.org/pdf/1802.06063
Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains https://arxiv.org/pdf/1703.07909
pix2code: Generating Code from a Graphical User Interface Screenshot https://arxiv.org/pdf/1705.07962
Deep Learning in Software Engineering https://arxiv.org/pdf/1805.04825
Predicting Software Defects Through SVM: An Empirical Approach https://arxiv.org/pdf/1803.03220
A Survey of Reverse Engineering and Program Comprehension https://arxiv.org/pdf/cs/0503068
https://www.owasp.org/images/7/72/OWASP_Top_10-2017_%28en%29.pdf.pdf
https://arxiv.org/pdf/1709.07101.pdf
https://arxiv.org/pdf/1805.05206.pdf
https://arxiv.org/pdf/1807.09160.pdf
https://arxiv.org/pdf/1806.07336.pdf
Or just search arxiv.org (inaccuracies in identifying papers expected): recent arxiv.org search
LLVM based vulnerabilities search
As an extension
(this site being an offshoot of the paper: https://arxiv.org/abs/1709.06182)