- Ana Vidal
- Simão Andrade
- Discord is a popular communication platform that can be used to exfiltrate data
- Data exfiltration can be done in many ways, such as:
- Sending messages
- Sending files
- Using voice channels
One of the biggest challenges in detecting data exfiltration via Discord is the fact that all communication is encrypted. When data is sent to a webhook, Discord uses HTTPS (HTTP over TLS), which means that the data is encrypted during transmission. This encryption makes it impossible for most network security devices, such as firewalls and Deep Packet Inspection (DPI) systems, which analyze individual packets to identify suspicious content, to directly inspect the content of the traffic. So, although it is possible to see that there is communication with Discord, it is not possible to analyze or filter the content of the data sent due to encryption, which makes this channel an ideal vehicle for exfiltrating data undetected.
Firewalls and SIEM systems based on fixed rules face a critical limitation here. Without access to the encrypted content of the packets, these systems can only monitor basic metadata, such as the destination IP address, the port, and the volume of data. However, as Discord is widely used and permitted on many corporate networks, this traffic appears legitimate and doesn't immediately raise suspicions.
Data exfiltration via Discord webhooks is a very popular technique because of the way it takes advantage of the platform's infrastructure to mask malicious activity. These features, designed to facilitate automated integrations and notifications, end up being exploited in cyber attacks, allowing sensitive information to be sent outside the network undetected.
Note
Although Discord made some updates regarding security (link), malicious users still take advantage of tools that allow development of plugins.
Some examples of data exfiltration using Discord:
- The Hacker News - NS Stealer Uses Discord Bots to Exfiltrate Data
- Intel471 - How Discord is Abused for Cybercrime
- It has a built-in function that enables automated messages sent to a text channel in the server (Webhooks) and a API for bot creation (discord.py).
- Allows the upload of a variety file types (e.g. PNG, PDF, MP4).
- The maximum file upload is 10MB.
- Network Pool:
162.159.0.0/16- Owned by Cloudflare, Inc.
Reference: NsLookup.io
- Protocols Used for communication:
TCP- Destination Port(s): TCP/80 and TCP/443
- Source Port(s): UDP/50000-65535
- Protocols Used for voice-communication/attachments:
QUIC
Note
Additional information to look for:
- Sender's user data
- Receiver's user data
- Server where the message is sent through
- Type of data sended
- Notifications received
Wireshark captures (.pcap): link
To perform the analysis, the following data will be extracted:
- Group and Private Conversations – the conversation type is obtained at the packet level (uploads/downloads)
- Daily and Weekly message flow with various formats of files – analyzing the timestamps of interactions (uploads/downloads)
In a testing context we are going to use:
- Wireshark: For network analysis
- Burp Suite: Proxy tools for traffic capturing
But in a real life context, we could use:
- Syslog and Agents: To obtain data from endpoints and discord activity
- Suricata or Palo Alto Networks Firewalls: To monitor ports and protocols in use, e.g. TCP and QUIC, and detect unusual or unauthorized traffic.
We are going to focus on packet data, since there's no abundance of flows used, where it has the following fields:
- IP Source
- IP Destination
- Packet Size (in bytes)
- Packet Timestamp (in seconds)
- IP Protocol Number
In order to convert our qualitative data into quantitive data, we chosen a sampling interval of 1 second.
This allows a balance between the level of detail needed to capture relevant events and the volume of data generated
This metrics obtained in the sampling interval are:
- Download/Upload Size of TCP Packets (in bytes)
- Download/Upload Size of UDP Packets (in bytes)
- Download/Upload Number of TCP Packets
- Download/Upload Number of UDP Packets
In the following order:
tcp_upload_packets, tcp_upload_bytes, udp_upload_packets, udp_upload_bytes, tcp_download_packets, tcp_download_bytes, udp_download_packets, udp_download_bytes
This function extracts the following features:
- Mean and Variance of silence times
- Mean, Variance and 95th and 98th percentile of activity times
- Mean, standard deviation, 60th and 90th percentile of upload and download bytes for TCP and UDP (separately)
- Mean and standard deviation of total bytes
- Mean and standard deviation of number of packets
In the following order:
mean_silence_duration, variance_silence_duration, mean_activity_duration, variance_activity_duration, quartiles_activity_duration,tcp_upload_bytes_std_dev, tcp_download_bytes_std_dev, udp_upload_bytes_std_dev, udp_download_bytes_std_dev,tcp_upload_bytes_mean, tcp_download_bytes_mean, udp_upload_bytes_mean, udp_download_bytes_mean,quartiles_upload_bytes, quartiles_download_bytes,bytes_mean, bytes_std_dev,packets_mean, packets_std_dev
Important
Threshold of silence activity (number of packets) is 3.
Benign Behavior: It will be done by performing normal usage of the application, made by:
- Humans: sending messages and files as usual
- Bots: made by plugins added to the server
Malicious Behavior: It will be done using tree types of bots:
- Easy to Detect:
- Size: 10MB
- Frequency: Periodically (40s)
- Intermediate to Detect:
- Size: 1-10MB
- Frequency: Same variance as a normal behavior
- Hard (almost impossible) to Detect: Through embedded images, using Discord CDN
Note
Command to make random files: dd if=/dev/urandom of=file.txt bs=1M count=10 (10MB)
The files used in the exfiltration process will be located in the data folder.
- Problem Brainstorm
- Data Collection
- Define the Features
- Data Processing
- Choose the Models and define the parameters
- Model Training
- Model Evaluation
.
├── presentation/ (folder with the slides)
│
├── src/
│ ├── .env (file with the Discord Token)
│ ├── data_sampling.py (script to sample the data)
│ ├── data_processing.py (script to extract the features)
│ ├── exfiltration_bot.py
│ ├── model.ipynb (notebook with the model selection)
│ ├── data/ (folder with the data to be exfiltrated)
│ ├── captures/ (folder with the captured data)
│ ├── samples/ (folder with the sampled data)
│ ├── features/ (folder with the extracted features)
│ └── requirements.txt
│
└── README.mdTo create a bot in Discord, follow these steps:
- Go to the Discord Developer Portal
- Click on
New Application - Fill in the
Nameand click onCreate - Go to the
Botsection and click onAdd Bot - Click on
Copyto copy theTokenand paste it in the.envfile - Go to the
OAuth2section and select thebotscope - Copy the URL and paste it in the browser to add the bot to a server
References:
- Portal to build the bot: https://discord.com/developers/applications
- Tutorial in python: https://www.youtube.com/watch?v=UYJDKSah-Ww
- Add virtual environment:
python -m venv venv- Enable the virtual environment:
venv\Scripts\activate- Install the dependencies:
pip install -r requirements.txt- Add the
DISCORD_TOKENin the.envfile:
echo "DISCORD_TOKEN=<your_token>" >> .envImportant
The .env file should be in the src folder
To run the simpler version of the bot, that uses no prior behavior to exfiltrate data, run the following command:
python simple_exfiltration_bot.pyTo run the bot that uses the prior behavior to exfiltrate data, run the following command:
python complex_exfiltration_bot.py --input <input_file> Where the <input_file> is a CSV file with the packet capture data.
This is the command to sample the data (with a sampling interval of 1 second), given the discord_capture.pcap file:
python data_sampling.py --format 3 --input discord_capture.pcap --output <output_file> --delta 1 --cnet <client_network_pool> --snet 0.0.0.0/0This is the command to extract the features using the multi-slide observation window (observation window of 5 minutes width and window slide of 30 seconds), given the output_file.txt file:
python data_processing.py --input output_file.txt --method 3 --width 300 --slide 30Note
Since the sampling interval is 1 second, the width and slide of the observation window are given in seconds
- Autoencoders
- Type: Neural Network-based
- Use Case: Anomaly detection
- How it works:
- Train an autoencoder to reconstruct "normal" network traffic patterns from packet data.
- During inference, unusual traffic (indicative of exfiltration) will have a higher reconstruction error.
- Why suitable: Autoencoders work well with unlabeled data and are ideal for detecting anomalies like data exfiltration.
- Isolation Forest
- Type: Tree-based anomaly detection
- Use Case: Identify outlier network sessions
- How it works:
- Isolation Forest isolates data points by randomly partitioning feature space.
- Exfiltration traffic, which is rare or abnormal, is "isolated" faster.
- Why suitable: Works efficiently with high-dimensional data like packet captures.
- One-Class SVM (Support Vector Machine)
- Type: Kernel-based anomaly detection
- Use Case: Classifies normal vs. anomalous behavior
- How it works:
- Trains on normal packet behavior to create a decision boundary.
- Exfiltration (anomalous data) lies outside the learned boundary.
- Why suitable: Handles packet-level feature extraction well and doesn't require labels.
- Normalization using MinMaxScaler
- Train with normal behavior and test with 50/50 normal and malicious behavior
- PCA
- Linear discriminant analysis
- Non-negative Matrix Factorization
- Generalized discriminant analysis