-
Notifications
You must be signed in to change notification settings - Fork 495
Closed
Labels
misdetectionThis issue is about a misdetection on a content type currently supportedThis issue is about a misdetection on a content type currently supported
Description
Hello,
This Python script that modifies any zip file to be misdetected as any other file:
import binascii
import random
from magika import Magika
with open("test.zip", "rb") as f:
x = f.read()
m = Magika()
prefix = b""
ZIP_FILE_FORMATS = ["zip", "jar", "rpm", "epub", "ods"]
for i in range(10000):
old_res = m.identify_bytes(prefix + x)
new_prefix = prefix
operation = random.choice(
["REMOVE_FIRST", "REMOVE_LAST", "ADD_FIRST", "ADD_LAST"] + ["REPLACE"] * 2
)
if operation == "REMOVE_FIRST" and len(new_prefix) > 1:
new_prefix = new_prefix[1:]
elif operation == "REMOVE_LAST" and len(new_prefix) > 1:
new_prefix = new_prefix[:-1]
elif operation == "ADD_FIRST":
new_prefix = bytes([random.randint(0, 255)]) + new_prefix
elif operation == "ADD_LAST":
new_prefix = new_prefix + bytes([random.randint(0, 255)])
elif operation == "REPLACE" and len(new_prefix) >= 1:
i = random.randint(0, len(new_prefix) - 1)
new_prefix = (
new_prefix[:i] + bytes([random.randint(0, 255)]) + new_prefix[i + 1 :]
)
assert len(new_prefix) == len(prefix)
new_res = m.identify_bytes(new_prefix + x)
if (
new_res.output.ct_label not in ZIP_FILE_FORMATS
and new_res.dl.ct_label not in ZIP_FILE_FORMATS
):
print("success: prefix=", binascii.hexlify(new_prefix), "result=", new_res)
break
if new_res.output.score < old_res.output.score:
prefix = new_prefix
with open("out.zip", "wb") as f:
f.write(new_prefix + x)The above script produces zips misdetected as jpegs, pcaps, etc., even if the magic numbers aren't proper jpeg, pcap magic numbers.
Example:
success: prefix= b'bed801' result= MagikaResult(path='-', dl=ModelOutputFields(ct_label='jpeg', score=0.8492175936698914, group='image', mime_type='image/jpeg', magic='JPEG image data', description='JPEG image data'), output=MagikaOutputFields(ct_label='unknown', score=0.8492175936698914, group='unknown', mime_type='application/octet-stream', magic='data', description='Unknown binary data'))
success: prefix= b'04b224' result= MagikaResult(path='-', dl=ModelOutputFields(ct_label='pcap', score=0.5953978300094604, group='application', mime_type='application/vnd.tcpdump.pcap', magic='pcap capture file', description='pcap capture file'), output=MagikaOutputFields(ct_label='unknown', score=0.5953978300094604, group='unknown', mime_type='application/octet-stream', magic='data', description='Unknown binary data'))
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
misdetectionThis issue is about a misdetection on a content type currently supportedThis issue is about a misdetection on a content type currently supported