Skip to content

Unveiling DApp Inconsistencies using LLM and Dataflow-Guided Symbolic Execution

License

Notifications You must be signed in to change notification settings

shuo-young/Hyperion

Repository files navigation

Hyperion

Overview

Artifact repository of the work: Hyperion: Unveiling DApp Inconsistencies using LLM and Dataflow-Guided Symbolic Execution

Workflow

The design of Hyperion refers to the architecture shown below:

arch

Components

There are 4 main parts of our artifact:

  1. HyperText: Instruction-tuned model based on LLaMA2 for DApp description analysis. Please go to this directory for a detailed markdown file, illustrating the complete process of our experiments on the LLM, i.e., prompt design, prompt segmentation, and instruction-tune.

  2. HyperCode: DApp smart contract analysis:

    • gigahorse-toolchain: contains the decompiler and programmed datalog rules in clients/hyperion.dl, serving as a bytecode lifter for obtaining contract IR and dataflow information.

    • symbolic execution: an IR-based symbolic execution module, using the semantics recovered to guide targeted symbolic execution. The IR-based SE is extensible and programmed with the IR instructions.

      • ir_basic_blocks: constructing basic blocks and functions from decompiled contract IR.
      • ir_se: extensible symbolic engine on contract IR, encompassing IR-instruction rules.
      arch
      • state_extractor: on-chain data fetcher for checking attributes, e.g., programmed fee ratio.
      • vargenerator: classic generator for symbolic variables.
      • utils: some utility methods during symbolic execution.
    • semantic_parser: a semantic recovery module based on dataflow information and graph analysis.

      • semantic: main class for preparing recovered semantics from dataflow analysis and graph analysis.
      • decompilation: decompiler dispatcher and information collector for graph analysis.
      • graph_analyzer: connector that assembles dataflow information to prepare function-level information for directed symbolic execution.
      arch
    • nlp: a module for extracting key value attributes from HyperText outputs, based on NLTK.

    • main.py: the entrance for contract bytecode analysis, the supported parameters are showing below:

      -a, --address: Enter the contract address of the DApp.
      -bp, --blockchain_platform: Define the blockchain platform where the DApp contract is deployed (default: "ETH").
       including: 
          1. ETH
          2. Polygon
          3. Cronos
          4. Avalanche
          5. Fantom
          6. Moonbeam
          7. Arbitrum
          8. Tron
          9. Optimism
          10. Moonriver
          11. SKALE
          12. Base
          13. BSC
      -bn, --block_number: Set the tested blockchain snapshot.
      -v, --verbose: Enable verbose output for detailed logging.
      -n, --name: Name of the DApp.
      -d, --directory: Path to the output directory (default: "result").
    • global_params.py: the global parameter settings for contract bytecode analysis.

  3. DApp_dataset: dataset used in our experiments, due to the large file limitation of Github, we put more than 6GB data in an anonymous cloud disk. The dataset includes:

  4. experiment_result: Our experimental results of ground truth and wild DApps, please go further into this folder for detailed experiment information:

    • dataset: csv data that records (DApp's name, DApp contract address, blockchain platform), including ground truth and wild datasets.
    • ground_truth: the running output on ground truth dataset.
    • wild: the running output on wild dataset.
    • We also show details obout missing or wronly reported cases, combining with a real-world case study.

Some other Utils

  • exec: shell scripts used for batch running.
  • exp_utils: python scripts for analyzing experimental results.

About

Unveiling DApp Inconsistencies using LLM and Dataflow-Guided Symbolic Execution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published