-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
help with program-memory-overflow #43
Comments
Thank you so much for the report. I hacked A few observations:
I'll take a closer look into the generated code. |
Thank you for your quick and positive reply! You seem to appreciate a challenge :) Your hack is dirty but helpful. I tried the dedicated gcc-flags for keeping the defective output without success. I think we have to disable inlining to find the culprit - most FNs are only used once. BTW: in the meantime i replaced pru-nm --size-sort --print-size out/pru-core0.elf | grep -w '[Tt]'
20000138 0000001c t add32
20000138 0000001c t add64
20000538 00000034 t div_uV_n4
20000154 0000004c t cal_conv_adc_raw_to_uV
200004ec 0000004c t cal_conv_uV_to_dac_raw
2000056c 00000054 t ads8691_init
20000354 00000054 t send_message.constprop.0.isra.0
200002e0 00000074 t send_status.constprop.0
20000474 00000078 t mul64
200001a0 00000080 t harvester_initialize
20000220 000000c0 t harvest_iv_cv
200003a8 000000cc t dac8562_init
200005c0 00001778 T main By using [very small objects omitted]
20000198 0000000c t calibration_initialize
200004e4 0000000c t get_V_intermediate_raw
200004d8 0000000c t get_V_intermediate_uV
20000504 0000000c t get_state_log_intermediate
2000018c 0000000c t simple_mutex_exit
20000168 00000010 t iep_get_cnt_val
20000138 00000010 t iep_get_tmr_cmp_sts
200004f0 00000014 t get_I_mid_out_nA
20000178 00000014 t simple_mutex_enter
200002a0 00000014 t sub32
200002a0 00000014 t sub64
20000cd8 00000018 t dac8562_init.constprop.0
200001b4 00000018 t mul32
200001b4 00000018 t mul64
200009d8 00000018 t ring_init.constprop.0
20000148 00000020 t iep_clear_evt_cmp
200009f0 00000020 t set_batok_pin.constprop.0
20000ca8 00000030 t sample_dbg_adc
20000304 00000034 t div_uV_n4
20001994 00000044 t get_output_inv_efficiency_n4
20000ba8 00000044 t ring_get.constprop.0
20001bc0 00000044 t sample.constprop.0
200001cc 0000004c t cal_conv_uV_to_dac_raw
200002b4 00000050 t cal_conv_adc_raw_to_uV
20000b58 00000050 t ring_put.constprop.0
20000988 00000050 t sample_iv_harvester
20001624 00000054 t ads8691_init
20000dd0 00000054 t send_message.constprop.0.isra.0
20000bec 0000005c t sample_emu_ADCs
20000c48 00000060 t sample_hrv_ADCs
20001844 00000064 t get_input_efficiency_n8
200015c0 00000064 t sample_adc_harvester
20000d64 0000006c t receive_message.constprop.0
20000cf0 00000074 t send_status.constprop.0
20000220 00000080 t cal_conv_adc_raw_to_nA
20000510 00000080 t harvester_initialize
20000f7c 00000084 t sample_dbg_dac
2000110c 00000094 t harvest_adc_cv
200019d8 000000a0 t converter_calc_out_power
20000590 000000c0 t harvest_iv_cv
20000338 000000c4 t converter_initialize
200003fc 000000dc t converter_update_cap_storage
200018a8 000000ec t converter_calc_inp_power
20000888 00000100 t harvest_iv_mppt_opt
20000650 00000108 t harvest_iv_mppt_voc
20001000 0000010c t handle_kernel_com.constprop.0.isra.0
200012dc 00000120 t harvest_adc_mppt_voc
20000758 00000130 t harvest_iv_mppt_po
200011a0 0000013c t harvest_adc_ivcurve
20000a10 00000148 t converter_update_states_and_output.constprop.0
20001a78 00000148 t sample_emulator.constprop.0
20000e24 00000158 t handle_buffer_swap.constprop.0
20001e1c 000001c0 T main
200013fc 000001c4 t harvest_adc_mppt_po
20001678 000001cc t sample_init.constprop.0
20001c04 00000218 t event_loop.constprop.0 |
two more questions regarding
CGT tells me what it uses:
|
FYI, I've been trying for a while to optimize 64-bit arithmetic, but haven't made much progress: #32
You loose the C library. And you gain a little bit of DRAM if you use function pointers. So I don't think it would help your case.
Yes. See https://github.com/dinuxbg/gnupru/tree/master/testing/interop for some GCC and CGT interoperability tests. |
I found two inefficiencies. I'll start working on them. |
ok, so this issue #32 seems to be one part of the problem. I only have a vague idea, but guess there is more to it, right? When comparing sizes the u32-modded versions of u64-functions where half or 1/3 the size. There are mostly shifting-Ops inside - arithmetics are mostly done in the dedicated overflow-safe Fns. And another strange observation is that Allright - your second comment just came in and confirms my theory. Is there anything i can do to help with this? Otherwise I would look into dividing our codebase into the two subsystems. But timing-constraints were tough already. Probably GCC won't help us here for now. But we keep this solution in our sight. Thanks again for your quick and helpful support! |
Yes, excessive register pressure might cause that. I'll check the GCC insn costs for PRU backend.
Unfortunately there is no mitigation from firmware side. But please file more github issues if you notice other missed optimizations. Condensed test cases like the two bugzilla reports above are appreciated, but full firmware source is the next best thing. Thanks, |
With GCC trunk e95e91eccd, code size is down to 9352 bytes (overflow is 1160). I'm continuing the analysis for other areas to improve. |
And with LTO enabled it is even better:
|
I thank you so much for this! I also have LTO enabled on my current dev-branch and you seem to have decreased the overflow by more than 2 kByte!
Adding Do you know an option to see the these size-statistics for the CGT-binary? i looked through the ti-docs and found nothing so far. |
Both CGT and GNU use ELF file format. There are some small differences in how the two toolchains handle the ELF format, but they mainly affect the process of the final program linking. You should be able to invoke |
Two suggestions which would save a few more bytes:
|
This unresolved middle-end issue also affects the code quality of this particular firmware program: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357 |
Hi @dinuxbg,
this is less a bugreport, more like a support question. We currently port our firmware and hoped GCC would produce more efficient code. It worked before for us, but one PRU-codebase is overflowing IMEM by 33 % while LTO and -Os is enabled. It fits and works when compiling it with CGT.
The project uses c99, no float, but u64-math on the overflowing PRU0. u64-math is also the reason we want to switch to GCC, as the CGT shows some flaws there.
We tried enabling -mabi=ti to get closer to CGT-behaviour (especially the smaller pointers), but the linker complained about undefined references to memcpy and memset (which our code-base does not directly use).
Do you have any hints how to solve or approach this?
I read through the gnupru github issues and large parts of the gcc documentation and even started comparing parts of the assembly from both compilers but found no hints yet.
The text was updated successfully, but these errors were encountered: