ModelXRay: On-device Machine Learning Model Analyzer and Extractor for Android Apps

This repository is the static app analysis tool ModelXRay for our paper "Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps"(USENIX Security'21). The main tool is modelxray.py, it requires a default config file modelxray.config. The input of ModelXRay can be either an Android APK file or a directory contains APK files, the output is a directory for each analyzed app. The directory stores the app's profile containing the detailed information about the model files, the machine learning libraries and the evidence. For each machine learning model, it also generate one line containing the model's meta data and stored in a file called reports.

ModelXTractor is based on Frida. For each app, ModelXRay generates the customized instrumentation script for Frida.

modelxray.py: our main tool ModelXRay.
modelxray.config: default config file for ModelXRay.
configure.sh: check the other tools that ModelXRay relies on.
app_collections.txt: statistics for our app collection (we do not provide the original app packages which are huge).
measure_lic_reuse: measure the license reuse among different apps.
gpu_usage_analysis: measure how widely GPU acceleration is used among ML apps/libraries.
intercept_scripts: intermediate scripts used by modelxray.py to generate analysis script for ModelXTractor.
measure_remote_models: measure how many apps use remote models (or cloud-based ML services).
model_encoding_analyzer: some tools used for analyzing model encoding.
ml_app_profile_analysis: further analysis based on the results generated by ModelXRay on our collected apps.

Requirements

ModelXRay is a static analysis tool, it requires a linux/macOS machine with the prerequisite tools listed in configure.sh.
ModelXTractor refers the dynamic analysis, it requires a rooted phone with Frida installed. You also need to install the Android app for analysis. Once this is set, you can run frida on your host machine with the instrumentation script(ModelXTractor) generated by ModelXRay. ModelXTractor will store the suspected model buffer on your phone, and you can pull it onto your host machine for verification.

Usage of ModelXRay

usage: modelxray [-h] [-c CONFIG_FILE] [-r] [-l] [-v] [-f] [-s] [-t] [-j] [-p]
                 [-d]    
                 apkpath                                               
                                                                       
positional arguments:                                                  
  apkpath               path to apk file or directory                                                                                          
                                   
optional arguments:    
  -h, --help            show this help message and exit               
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        the path of modelxray config file                                                                                      
  -r, --regenerate-report                                              
                        regenerate report even if report is there
  -l, --log-file        store log in modelxray.log(default to stdout)
  -v, --verbose         verbose logging info                                                                                                   
  -f, --fast-run        run fast by only analyzing library and assets, not
                        smali code
  -s, --space-efficient                                                                                                                        
                        save space by not storing non-machine learning
                        decomposed apps
  -t, --test-only       donot do anything, just test work splitting for 
                        multiprocessing
  -j, --json-script     automatically generate json for dynamic
                        instrumentation java script
  -p, --package-name    use package name as output directory name, default use
                        apk path name
  -d, --decomposed-package
                        start analysis from already decomposed packages

Issues

If you encounter any problem while using our tool, please open an issue.

For other communications, you can email sun.zhi [at] northeastern.edu.

Citing our paper

@inproceedings {272264,
author = {Zhichuang Sun and Ruimin Sun and Long Lu and Alan Mislove},
title = {Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps},
booktitle = {30th {USENIX} Security Symposium ({USENIX} Security 21)},
year = {2021},
url = {https://www.usenix.org/conference/usenixsecurity21/presentation/sun-zhichuang},
publisher = {{USENIX} Association},
month = aug,
}

Disclaimer

All implementations are only research prototypes!

Our code is NOT safe for production use! Please use it only for tests.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Q&A

How to detect encrypted model file?

We use file entropy as indicator for encryption. File entropy in the context of computing measures the degree of randomness of the data. High entropy file fall into three categories:

P1: Pure Random Data
P2: Compressed files
P3: Encrypted files

For suspected model files with high entropy, we rule out P1, assuming it is a model file. Compressed files usually can be detected with file format checking. For example, we can use file extention(.zip, .gzip, etc.) to rule out P2. For the rest, we now have some confidence that the high entropy file is encrypted. See the paper for how we decide on the boundary of encryption for file entropy value. By the way, file entropy can easily calculated with Linux cmdline tool ent (see here).

Where to get the information about the apps?

Please find the app profiles under ml_app_profile_analysis/reports/ grouped by app store names. For example, ml_app_profile_analysis/reports/gplay.entropy_report.filtered are the suspected app with ML models from Google Play.

Note, the suffix are used to indicate what filter has been applied. .filtered means the initial suspected models are filtered with three metrics mentioned in the paper. .filtered.ecrypted means the encrypted models among the filtered models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelXRay: On-device Machine Learning Model Analyzer and Extractor for Android Apps

Contents

Requirements

Usage of ModelXRay

Issues

Citing our paper

Disclaimer

License

Q&A

How to detect encrypted model file?

Where to get the information about the apps?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
gpu_usage_analysis		gpu_usage_analysis
intercept_scripts		intercept_scripts
measure_lic_reuse		measure_lic_reuse
measure_remote_models		measure_remote_models
ml_app_profile_analysis		ml_app_profile_analysis
model_encoding_analyzer		model_encoding_analyzer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app_collections.txt		app_collections.txt
configure.sh		configure.sh
modelxray.config		modelxray.config
modelxray.py		modelxray.py

License

RiS3-Lab/ModelXRay

Folders and files

Latest commit

History

Repository files navigation

ModelXRay: On-device Machine Learning Model Analyzer and Extractor for Android Apps

Contents

Requirements

Usage of ModelXRay

Issues

Citing our paper

Disclaimer

License

Q&A

How to detect encrypted model file?

Where to get the information about the apps?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages