-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Better support for real-mode segmented addresses #8366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@astrelsky mmm, I saw! Would like @caheckman to either approve (and merge) or critique it though. |
@bluddy does this handle Windows 16-bit Protected Mode? |
Hi Yotam I think you may need to increment the FORMAT_VERSION = 4 in SlaFormat.java & slaformat.cc For ideas on modifying the loader have a look at these two projects. I'm really curious on how to replicate your adventure and apply the same to better decompilation of win16 NE (NewExecutable) programs, could you post a blog somewhere on how you went about using an AI to author these changes? cheers |
I think it mostly does, but I need a small change to the pattern matching in ia.sinc. It needs to be tested though. And then there's the story of setting up the segments in the first place based on the headers and all that -- no idea if it'll work correctly (I haven't touched that stuff, and it wasn't impressive on a dos executable). But in theory, once you have the segments set up, this should work just as well for protected mode. |
Thanks, I'll update FORMAT_VERSION and thanks for the pointers to those projects -- I'll check them out when I get a chance. I appreciate the kind words, but I think this really is going to be the new normal. I do have years of experience in low level stuff, compilers and C++, which means I can identify when the AI is going off-track and understand the patterns, but I really don't know very much about the ghidra codebase -- I was just guiding Claude Sonnet 4.0 via Cursor and occasionally asking about the architecture. We're going to need to get used to a world where there are no programming limits if you have decent experience. It sounds bombastic and dramatic but it's just the truth. I'll perhaps document this effort, but I don't consider it more than a minor detour on my path to what I really wanted, which is to accelerate the decompilation and rebuilding of old games, which I do for fun and relaxation. My previous effort on Railroad Tycoon in IDA took me about two years, and I estimate that I've done a month's worth of work in 2 days with ghidra-mcp. Any improvements to Ghidra that might result from my efforts would simply be icing on the cake. |
I took care of ia.sinc cleanly though I can't be sure support for 16-bit protected mode will work. I also incremented the format version. I'm not seeing any issues after analyzing large chunks of code so from my perspective this seems ready to be taken out of draft mode, which I will go ahead and do. Loading segments from a file and relocation is a separate topic and I'd need to test that on a large number of executables. Also, interrupt information (provided by the repo mentioned above in Ghidra Toolbox) isn't really relevant, as the AI can easily identify and add comments to interrupts reliably. |
ghidra_memory_exploration.md
Outdated
```sleigh | ||
rel16: reloc is simm16 [ reloc=((inst_next >> 16) << 16) | ((inst_next + simm16) & 0xFFFF); ] | ||
``` | ||
This incorrectly tried to extract segment values from linear address upper bits, which is mathematically impossible since multiple segment:offset combinations map to the same linear address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but from my limited understanding of Ghidra's loading process (for Windows 3.XX EXE/DLLs using ProtectedMode) the loader does fix the addresses such that the original rel16:; reloc is simm16 {...}
works. Therefore there maybe other conseqences to making that change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version I pushed has special handling for protected mode using pattern matching. From what I can tell, the old implementation was not correct for 16-bit protected mode, since it assumes that the segment always starts with the lower 16-bits being 0, which is not a valid assumption. My new implementation should be correct but it needs to be tested on 16-bit protected mode executables. I'll try to dig one up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, accidentally used my work account. The comment came from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do your changes fit with comments like #769 (comment)?
Also, in my git copy of Ghidra I've made several changes (not all in git atm) via #2315 & #3331. Do you expect your changes will conflict?
Is there somewhere we could compare notes offline???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
While there is already some support for segments, especially on the Java side, the way the the segment info was externalized was incomplete because there was no next_seg kind of variable for the scripts. This meant we couldn't use the actual segment info and instead had to try and guess the correct segment address from the linear address. This works some of the time for protected mode, and very little of the time for real mode. So I think that comment is partially true - support was added, but it was somewhat incomplete.
-
I don't think there should be any conflict, since your changes seem orthogonal to what I did. Hard to say for sure until everything is merged together though.
-
I don't really have many more notes than what's here. I'm currently working using my patched version of ghidra disassembling a 16-bit real mode program, and I'm not running into any issues so far. All the issues ghidra initially had with the program seem to have been resolved, and the LLM Agent is working ghidra quite intensively as it renames and pokes around at stuff.
I do want to find a 16-bit Windows executable to test out and make sure that's not broken. Perhaps the old windows Sierra interpreter? I'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After applying your changes, would I need to start my analysis again (I.e. would it be necessary to load the 16-bit apps again for the segments to be understood) or could I just continue using what I've already done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to load the apps from scratch again - I didn't touch segment initialization logic anyway.
The things you want to watch out for is anything getting messed up, like a jump, a call or a data memory access that previously displayed properly. It's not a bad idea to back up your data first just in case if you're working on something important.
If there's any improvement on the 16-bit protected front with my PR, it would be with segments that don't start at multiples of 0x1000. It's possible that by default, ghidra always assigns those particular segment start addresses and everything looks fine. But if you were to dump data from a debugger and load it into ghidra, it would have various kinds of starting offsets for segment mappings, and a relative jump could get messed up with the logic in master, as it does for real mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll duplicate my git+ changes, export all my analysed work, apply your changes to the duplicate, import my exported files and proceed from there!
It'll take some time but I'll let you know,.
Updates:
In any case, I made sure not to harm the current ghidra hack for protected mode. |
I wanted to use ghidra with mcp to quickly disassemble old dos executables I was interested in. Soon though, I found ghidra isn't ready for use on these executables. There is a bit of support for segmentation on the Java side, but not nearly enough.
Normally it would have ended there, but in the era of AI, there's no reason to stop there. I guided the AI to add features I thought would be in the spirit of the code: architecture-neutral with architecture-specific stuff residing in the scripting.
The detailed logs can be found in the .md files -- required to prevent the AI from forgetting its work. I'm dogfooding this as we speak, so if I come across more issues, I might need to add more support.
Oh, this is also based on 11.3.2. It needs to be ported to master, but the mcp server https://github.com/LaurieWired/GhidraMCP only supports 11.3.2 currently.EDIT: Never mind! I ported it to master, and turns out mcp support still works.
Also, I'm not sure if ghidra's exe import works well in terms of setting up the segments. That might need fixing as well. The problem is I'd need to test this against multiple exes, and it's easier for me to just dump the memory segments from dosbox and load them into ghidra.
NOTE: The x86-specific part of this PR enhances only real mode. However, the generic changes made to the Java/C++ code enhance any viable architecture where a specific segment register value is needed to map to a memory address. We don't handle the additional mapping tables required to map segment registers to locations in memory correctly, such as in 16-bit protected mode. For this mode, we keep the current ghidra hack which is sufficient for programs set up in memory entirely by ghidra.