How train? #6

alphaonex86 · 2024-03-18T19:27:17Z

Hi, I wish train on larger base set (like gentoo), and on multiple architecture (RISC-V, ARM, MIPS v2, x86, ...).
How do?
You code support foreign architecture?
I wish too train with old compiler, that's help to analis old unmaintened code on MIPS arch.

albertan017 · 2024-03-19T03:41:30Z

Due to the sequence length constraints of most large language models (LLMs), which typically range from 1,000 to 16,000 tokens, processing extensive inputs directly isn't feasible. It's better to segment your data set into smaller, function-level chunks that pair the binary code with its corresponding source code. Once the data is prepared, it can be feed into the LLM for fine-tuning.

Currently, our model is trained to support C language decompilation on the Linux x86_64 architecture. For your interest in working with older compilers, the LLM generally treats input from various compilers similarly, without significant differentiation.

alphaonex86 · 2024-03-19T11:19:29Z

I understand totally. But the real world where everybody have blocking unmaintained binary, we have lot of crap and large binary. I don't see too how decompile by part, this imply do previous/next chunk into the token and be able to rewrite the previous writed file.
Maybe auto chunk by function.

Currently, our model is trained to support C language decompilation on the Linux x86_64 architecture

Yes, I wish train with more arch because I have some code from router to study (from MIPS) and from gcc 4.6 (kernel modules), obfuscated into multiple .ko and .so for userspace.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How train? #6

How train? #6

alphaonex86 commented Mar 18, 2024 •

edited

Loading

albertan017 commented Mar 19, 2024

alphaonex86 commented Mar 19, 2024

How train? #6

How train? #6

Comments

alphaonex86 commented Mar 18, 2024 • edited Loading

albertan017 commented Mar 19, 2024

alphaonex86 commented Mar 19, 2024

alphaonex86 commented Mar 18, 2024 •

edited

Loading