-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64 adrp instruction disassembly incorrect #22676
Comments
Try r2 from git |
Same behavior on current dev:
I was able to work around the issue by making radare2 output to json, which gives me the opcode without disassembly.
the "refs" attribute is also very revealing here. I believe this is an issue with radare2. |
yes this is "correct" and the expected behaviour, the address in 0x80000 is exactly the same address as loc.imp.strcmp. so its expected because the disassembler filters that string, the emulation shows the final computation of the value which happens in the add instruction after that adrp. and actually the 0x8000 is shown as a comment in the disassembly. SO i dont see the problem here. because the code can be doing adrp without any add and then you wont see which symbol is referencing. And yes. the value shown by the disassembler is computed by providing the actual instruction position which shows that value. So let me clarify:
|
Either way, it can't be a reference to strcmp. adrp is used here to initialize structs with constant values. The instructinos are from the following function, which doesn't reference strcmp at all:
Here's how Ghidra disassembles the instruction, in case it's of any help:
Ghidra's "Basic Constant Reference Analyzer" correctly determines the constant values, and points me to the correct initial values in data. |
for reference, this is the output from objdump:
which is referencing "main".. what i see, is that those addresses are patched by relocs. and you can solve this if you compile an executable or a library instead of an object file. yes, r2, objdump, and many other tools have issues when processing object files, because relocs and elf objects are a big mess. the same output from objdump of the compiled executable is this:
which in r2 looks like this:
so the constant is there, and the root issue is how object files and relocs are handled. there's nothing wrong with adrp in here imho |
looking further in this object file i've spotted a bunch of other issues:
will be good to extend the #22690 PR to add support for the remaining arm64 relocs, would you like to give it a try? otherwise as a workaround i would suggest you to compile the executable or library and use it instead of the object file. thanks for the sample files, it is ok for you if i push them into the testsuite? |
found another issue in the elf imports, all of them are located in the same address because the plt section is not yet available. gonna fix that in another pr too :) fixed here #22692 |
Sure. |
another improvement here would be to ignore flags pointing to the base address of the loaded module. there's no point on showing a section name if that is not set. but still the right fix to do here would be to support the needed relocs properly. Let me know about adding this binary into the testsuite. |
I will add your sample and add some tests now. Thanks! Let me need if you need some guiding about the relocs. The code is a little broken into different parts that patch, enumerate.. and they are actually stored 3 times. I may find some time after the release to fix that across the 90 bin plugins. But for your needs none of this should affect you and it’s only elf related |
added tests here 8471613 let me know if you need some help for implementing the relocs. feel free to join the discord or telegram channels for more interactions |
ping? |
Environment
Description
disassembly of adrp instructions does not make sense. take the following code, which initializes a struct:
here's radare2's disassembly:
addrp takes an offset, the resulting address is calculated relative to the current instruction. i assume that r2 interprets the offset as an absolute address, resulting in the disassembly showing "loc.imp.strcmp".
Is this expected behavior? If not, can someone point me to where I can fix this?
Also, is there a way to make radare2 give me the raw operand value instead of symbols?
ghidra's output makes more sense:
I don't know why the instruction bytes differ for that specific instruction, but I have verified that radare2's output corresponds to the actual file while ghidra's have changed, probably due to a preformed analysis.
Here are the files I used (source and binary), in case anyone is interested.
random_39855.zip
The text was updated successfully, but these errors were encountered: