SRAM and flash usage is too high for parts like the ATtiny212; compare w. ex. ATtiny13 #418
Replies: 21 comments 13 replies
-
Addendum: PlatformIO currently uses the 2.1.5 version of this core, the latest version 2.3.1 doesn't compile at all
Crosspost in PlatformIO forums here. |
Beta Was this translation helpful? Give feedback.
-
I cannot reproduce your finding of more flash usage on the current version of megaTinyCore - I see 1660b of flash with current development version and 69 bytes sram.. I think 2.1.5 was from before I rewrote large portions of the serial code in part to shave a few hundred bytes from the compiled size. so I don't know why it's coming out larger for you! I'm no expert on platform IO so I can't comment on that - it was @MCUdude who made this work with platform IO. And the reason that Serial on this core takes so much more flash ( is that this is a full feature serial implementation. that is fully compatible with the one on official Arduino boards. A few versions ago I put a considerable amount of effort into reducing flash usage of the Serial functionality itself, but it is still a huge flash hog, because the virtual functions get compiled in always, with no attempt made to determine if the function is ever actually called. Even with LTO. It's stupid AF. The serial you were iusing on the tiny13 was likely a very tightly optimized one designed only for space and which would have supported only the most basic functions. Even after I worked over some parts of Serial, I am fully aware there remains optimization and cleanup that could be done there. The virtual function shit is incredibly annoying and I have no idea how to "fix" this so the methods don't need to be declared virtual. The reason it is using ram when the t13's serial doesn't is that on the tiny13 that is a blocking half-duplex software serial implementation. Hence it needs neither a transmit buffer (execution of other code just stops until it's fully sent instead of sending in the background). And if you don;t set it up to receive, I don't think it pulls in anything related to that. as I said above, I am incredibly frustrated by this, I don;t understand why those member functions have to be virtual and what, if anything, I've tried to look it up to understand it but none of the discussions seem to get to the level I need of "Okasy, so that's what a virtual method means. I have a class that has virtual methods because I maintain a library someone in the past wrote. It runs on microcontrollers with as little as 2k of program space, so it is imperative that optimizations like not including unused functions happen. What do I need to do to so that these don't need to be virtual?" And I still despite hours of researching this intensely frustrating subject, do not have a crisp understanding of what 'virtual' qualifier is needed for - like I see examples of the bad behavior it prevents, but I can't understand why that behavior would happen without it.... I like C, and I like inline assembly I am not so good with C++ I do wonder if it is possible for me to, on the 0/1-series parts with only one USART, at least. use #ifdefs to substitute in an alternate more efficient implementation that didn;t need to support multiple instances of the USART. The DRE interrupt in particular isagonizing to look at: <snip>\megatinycore/UART0.cpp:48
#else
#error "Don't know what the Data Received interrupt vector is called for Serial"
#endif
#if defined(HWSERIAL0_DRE_VECTOR)
ISR(HWSERIAL0_DRE_VECTOR) {
554: 1f 92 push r1
556: 0f 92 push r0
558: 0f b6 in r0, 0x3f ; 63
55a: 0f 92 push r0
55c: 11 24 eor r1, r1
55e: 2f 93 push r18
560: 3f 93 push r19
562: 4f 93 push r20
564: 5f 93 push r21
566: 6f 93 push r22
568: 7f 93 push r23
56a: 8f 93 push r24
56c: 9f 93 push r25
56e: af 93 push r26
570: bf 93 push r27
572: ef 93 push r30
574: ff 93 push r31
<snip>\megatinycore/UART0.cpp:49
Serial._tx_data_empty_irq();
576: 8f e9 ldi r24, 0x9F ; 159
578: 98 e3 ldi r25, 0x38 ; 56
57a: 0e 94 b2 00 call 0x164 ; 0x164 <UartClass::_tx_data_empty_irq()>
<snip>\megatinycore/UART0.cpp:50
}
57e: ff 91 pop r31
580: ef 91 pop r30
582: bf 91 pop r27
584: af 91 pop r26
586: 9f 91 pop r25
588: 8f 91 pop r24
58a: 7f 91 pop r23
58c: 6f 91 pop r22
58e: 5f 91 pop r21
590: 4f 91 pop r20
592: 3f 91 pop r19
594: 2f 91 pop r18
596: 0f 90 pop r0
598: 0f be out 0x3f, r0 ; 63
59a: 0f 90 pop r0
59c: 1f 90 pop r1
59e: 18 95 reti
000005a0 <__vector_17>:
__vector_17():
<snip>\megatinycore/UART0.cpp:40
// first place.
#if defined(HAVE_HWSERIAL0)
#if defined(HWSERIAL0_RXC_VECTOR)
ISR(HWSERIAL0_RXC_VECTOR) {
5a0: 1f 92 push r1
5a2: 0f 92 push r0
5a4: 0f b6 in r0, 0x3f ; 63
5a6: 0f 92 push r0
5a8: 11 24 eor r1, r1
5aa: 2f 93 push r18
5ac: 3f 93 push r19
5ae: 4f 93 push r20
5b0: 5f 93 push r21
5b2: 6f 93 push r22
5b4: 7f 93 push r23
5b6: 8f 93 push r24
5b8: 9f 93 push r25
5ba: af 93 push r26
5bc: bf 93 push r27
5be: ef 93 push r30
5c0: ff 93 push r31
<snip>\megatinycore/UART0.cpp:41
Serial._rx_complete_irq();
5c2: 8f e9 ldi r24, 0x9F ; 159
5c4: 98 e3 ldi r25, 0x38 ; 56
5c6: 0e 94 67 01 call 0x2ce ; 0x2ce <UartClass::_rx_complete_irq()>
<snip>\megatinycore/UART0.cpp:42
}
5ca: ff 91 pop r31
5cc: ef 91 pop r30
5ce: bf 91 pop r27
5d0: af 91 pop r26
5d2: 9f 91 pop r25
5d4: 8f 91 pop r24
5d6: 7f 91 pop r23
5d8: 6f 91 pop r22
5da: 5f 91 pop r21
5dc: 4f 91 pop r20
5de: 3f 91 pop r19
5e0: 2f 91 pop r18
5e2: 0f 90 pop r0
5e4: 0f be out 0x3f, r0 ; 63
5e6: 0f 90 pop r0
5e8: 1f 90 pop r1
5ea: 18 95 reti And the thing it calls (note, that's compiled for a different part, but void UartClass::_tx_data_empty_irq(void) {
164: cf 93 push r28
166: df 93 push r29
168: fc 01 movw r30, r24
<snip>\megatinycore/UART.cpp:98
// Check if tx buffer already empty.
if (_tx_buffer_head == _tx_buffer_tail) {
16a: 90 8d ldd r25, Z+24 ; 0x18
16c: 81 8d ldd r24, Z+25 ; 0x19
16e: c4 85 ldd r28, Z+12 ; 0x0c
170: d5 85 ldd r29, Z+13 ; 0x0d
172: 98 13 cpse r25, r24
174: 06 c0 rjmp .+12 ; 0x182 <UartClass::_tx_data_empty_irq()+0x1e>
<snip>\megatinycore/UART.cpp:101
// Buffer empty, so disable "data register empty" interrupt
//VPORTA.IN |= 0x80;
(*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
176: 8d 81 ldd r24, Y+5 ; 0x05
178: 8f 7d andi r24, 0xDF ; 223
17a: 8d 83 std Y+5, r24 ; 0x05
<snip>\megatinycore/UART.cpp:123
if (_tx_buffer_head == _tx_buffer_tail) {
// Buffer empty, so disable "data register empty" interrupt
(*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
//VPORTA.IN |= 0x80;
}
}
17c: df 91 pop r29
17e: cf 91 pop r28
180: 08 95 ret
<snip>\megatinycore/UART.cpp:107
return;
}
// There must be more data in the output
// buffer. Send the next byte
unsigned char c = _tx_buffer[_tx_buffer_tail];
182: a1 8d ldd r26, Z+25 ; 0x19
184: ae 0f add r26, r30
186: bf 2f mov r27, r31
188: b1 1d adc r27, r1
18a: a5 5a subi r26, 0xA5 ; 165
18c: bf 4f sbci r27, 0xFF ; 255
18e: 9c 91 ld r25, X
<snip>\megatinycore/UART.cpp:108
_tx_buffer_tail = (_tx_buffer_tail + 1) & (SERIAL_TX_BUFFER_SIZE-1); //% SERIAL_TX_BUFFER_SIZE;
190: 81 8d ldd r24, Z+25 ; 0x19
192: 8f 5f subi r24, 0xFF ; 255
194: 8f 73 andi r24, 0x3F ; 63
196: 81 8f std Z+25, r24 ; 0x19
<snip>\megatinycore/UART.cpp:113
// clear the TXCIF flag -- "can be cleared by writing a one to its bit
// location". This makes sure flush() won't return until the bytes
// actually got written
(*_hwserial_module).STATUS = USART_TXCIF_bm;
198: 80 e4 ldi r24, 0x40 ; 64
19a: 8c 83 std Y+4, r24 ; 0x04
<snip>\megatinycore/UART.cpp:116
//VPORTA.IN |= 0x40;
(*_hwserial_module).TXDATAL = c;
19c: a4 85 ldd r26, Z+12 ; 0x0c
19e: b5 85 ldd r27, Z+13 ; 0x0d
1a0: 12 96 adiw r26, 0x02 ; 2
1a2: 9c 93 st X, r25
<snip>\megatinycore/UART.cpp:118
if (_tx_buffer_head == _tx_buffer_tail) {
1a4: 90 8d ldd r25, Z+24 ; 0x18
1a6: 81 8d ldd r24, Z+25 ; 0x19
1a8: 98 13 cpse r25, r24
1aa: e8 cf rjmp .-48 ; 0x17c <UartClass::_tx_data_empty_irq()+0x18>
<snip>\megatinycore/UART.cpp:120
// Buffer empty, so disable "data register empty" interrupt
(*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
1ac: 04 84 ldd r0, Z+12 ; 0x0c
1ae: f5 85 ldd r31, Z+13 ; 0x0d
1b0: e0 2d mov r30, r0
1b2: 85 81 ldd r24, Z+5 ; 0x05
1b4: 8f 7d andi r24, 0xDF ; 223
1b6: 85 83 std Z+5, r24 ; 0x05
1b8: e1 cf rjmp .-62 ; 0x17c <UartClass::_tx_data_empty_irq()+0x18> and... when there's more than one USART... EACH ONE GETS THEIR OWN PUSH-POP for like half of all the working registers on the bloody chip. some of them saving and restoring registers that don't even appear to be used! In an ideal world, I wonder if those could be reimplemented as a naked ISR, which just saved the two registers that it needed to passthe address that the actual ISR is stuffing into the z register. then jmp to the actual ISR, which would be declared with signal attribute so the compiler woul treat it like an ISR in terms of it's prologue and epilogue. I wonder if that would be viable. There has go to be some way to make it so each ISR doesn't need to push and pop half the working registers, especially when the function it calls doesn't even seem t need to use them all x_x This is the official Arduino serial class, or it was, before I saved ~200 bytes of flash amd eliminated a bug that could under stranmge corner cases hang the chip in a halfway and which violated the no-astonishment principle anyway (that is "Don't do things in your library where a simple looking functionality does something that people will be astonished by it's doing. For example, if you're writing a Serial UART class, it shouldn't be configuring the CPUINT peripheral to change which interrupts are proritized how; "). In any event, it was targted at the ATmega4809 with 48k flash and 6k ram; like, they weren't designing it with keeping the flash footprint small at the top of their list. But you release a core with the implementation of serial that you have. not the one you want. Also, I did compile your sketch for the 212 and generate hex, map, and lst file if you want to see how it comes out when I build it. (that was with nearly-ready-for-release 1.3.2-dev) The edited map got tidied up with regexes. I need to ask my python dude if he could give me a skeleton of a program that I could add regex substitutions to, and call as part of the sketch export process Also, yes, I can see that the names of the hex files are broken I swear I fixed that multiple times in the past I don't know why it is busted again. My closing thought - do keep in mind you are using the supported part with the very least flash. Every time I get a chance to, I tell people that the user experience of working with a 2k part in Arduino when the core hasn't been very aggressively tweaked for that specific part - is pretty lousy.. Most megaTinyCore users are using 16k and 32k parts, and wouldn't stand for my removing all normal serial functionality. While I try to keep a lid on flash usage, and optimize more than Arduino does, as a matter of policy, I do not bend over backwards (or forwards) to accommodate the bottom of the barrel parts if that comes at the expense of the top-end ones. |
Beta Was this translation helpful? Give feedback.
-
Are you sure? The 124 bytes overflow is the result I get when I use the Arduino IDE with the latest version, installed as per README. |
Beta Was this translation helpful? Give feedback.
-
Oh! With optiboot, yeah that would do it lol. We do not recommend using optiboot on any parts with less than 8k of flash, and I should probably mark it as not recommended for 8k parts too... . At 2k you're giving up a quarter of your precious flash on chip that already is too small to use comfortably with arduino... just in order to program with a serial adapter on the serial pins,, instead of programming with.... a serial adapter and a schottky diode (more reliable than the 4.7k resistor method) on the UPDI pins?. Don't use optiboot on a 2k part. The only reason we support the bootloader on all 2k parts is because it doesn;t require any extra binaries to be built, and when Bill Westfield did his initial port of Optiboot_x, that's what he decided to support. (he didn't notice, either, that the bootloader binaries were identical except for the 8-pin ones with default serial pin; he noticed they were the same for all sizes, but it wasn't until a few months ago that I realized they were identical for the different pincounts other than 8 too., With alt serial pins, all pincounts have the same binary; 3217 uses the same bootloader hex file as the 212! |
Beta Was this translation helpful? Give feedback.
-
I'm going to move this to discussions, as there is no specific defect in the core here - yes, flash usage can always be improved. Maybe I should make a LiteSerial library? with bare minimum of features and smaller flash footprint? But that's a long term thing... |
Beta Was this translation helpful? Give feedback.
-
There - also edited topic name to make it more general. Serial is a real space hog, though it is on classic AVR too. Like everything Arduino, it was written by programmers of average skill whose priority was usability and their target was a 32k or 48k megaAVR device. So the API that we must conform to isn't really meant to fit in these smaller parts |
Beta Was this translation helpful? Give feedback.
-
Man, yeah,. this serial implememtation is reaaaaly disappointing I think I need to do some surgery with a chainsaw and flame thrower anyway in order to support more than one pin mux on the new AVR DD-series parts (which are the most exciting AVR device coming in 2021 - even if we get EA-series thie year, its a yawn compared to DD IMO - obviously the folks who do stuff with analog voltages are far more interested in 2-series and EA, but as far as I am concerned, the ADC is just this annoying peripheral that I need to get analog read working with on every series, not something I need for my own purposes)).. It is fundamentally a basket case. Like, it would be bad if the compiler did a decent job of compiling it.... it doesn't/ It does dumb shit all over the place. You have a row of (*_hwserial_module).register = value... so it loads in the base address of the serial module into the Z pointer register (r30 and r31) with two ldd instructions, then uses sn std indtruction to set the value appropriately, Then on the next line, despite alreadty having the address of the USART loaded into the z register.... goes and loads it again. Tjhere's one place where it does that like 4 times in a row. The compiler really doesn't take advantage of what Microchip gave us with the register layout. FFS. Neither do the Arduino programmers. |
Beta Was this translation helpful? Give feedback.
-
I'd even go as far as just plain copying the bit bang software implementation from picoUART as an ultra low memory option even for devices with hardware UARTS. |
Beta Was this translation helpful? Give feedback.
-
Top 5 Files
|
Beta Was this translation helpful? Give feedback.
-
I get 362 bytes, so you're doing something wierd. My map (you can generate if you do tools export compiled binary)
Most of it is millis, which you can disable by selecting that option from the tools submenu, which results in:
and 142 bytes flash 0 sram used. 0x44 (68) bytes in main, I reckon that effectively all of that is linked in from wiring.c So main starts at 0x46 (cause some of that stuff before is at the end, see. |
Beta Was this translation helpful? Give feedback.
-
Yes! Yes a lot of space is taken up if you leave millis timekeeping enabled, that;'s what vector 14 is. you can turn it off.... from the tools menu if you don't want millis(). |
Beta Was this translation helpful? Give feedback.
-
142 without any millis timekeeping, 258 with TCB, 356 with timer D, and 334 with TCA for millis. |
Beta Was this translation helpful? Give feedback.
-
so how is the attiny13 compile getting it down to
|
Beta Was this translation helpful? Give feedback.
-
Also, I thought that when you created a volatile variable and mapped it to a peripheral register the register is the memory location so you don't have local copies just the bits mapped to hardware are the "memory" so the only things you need are initial states for the clock prescaler, a single timer scaler and a jump vector to kick things all off with, possibly a couple of other things I've missed but as an "initial state" you just want the bare bones and if the user has defined an ADC and a UART and blah blah whatever else they need then you can shove those on top. |
Beta Was this translation helpful? Give feedback.
-
People use the Arduino IDE because they don;'t want to start from nothing, they want to start from something with conveniences like a decent timing facility (as opposed to "pick a timer and, with datasheet in hand, figure out how to get the timing you want out of it"), like an ADC and PWM set up for them, instead of by them, wrappers around reading and writing pins and so on. I actually come down much closer to the "bare metal" side than most Arduino people. When I started off adapting the core they made for the Nano Every and UIno Wifi Rev. 2, the pin states and modes were enums! Pin numbers were passed as 16-bit values instead of 8 (actually, that's how all the official cores do it...) I'm always preaching about the virtues of refering to pins not by arduino pin numbers by port and bit (hence my pushing PIN_Pxn), my cores have the fast digital I/O functions (where if pin, and state for write, are compile time known, it gets optimized down to a single SBI or CBI - either general convenience or transitional to wean people off of arduino digital I/O functions, I've performed surgery on the ArduinoAPI definitions in my cores with a meat cleaver - from the start in megaTinyCore, I have been trying to slim it down. I am keenly aware of the grotesque size of the serial libraries. The versions both here and in ATTinyCore need to be slimmed down (it's aneasier task over there because the mistakes there are dumber; they're fucking passing bit indexes in the constructor for the hardwareserial class! As if that's gonna be different between the two USARTS! Here the real problems with the serial ports are the ISRs and I've stared at the generated assembly number of times... like - I can point out a few inefficiencies, but I'm not going to reimplement the entire Serial class in hand tuned assembly, and the issues aren't huge - not while I have a stupid day job (though if I could continue to have a roof over my head and food in my stomach while writing AVR assembly and improving these cores I would choose that over my software test day job ANY DAY). And the other half of the pain is because there are all these bloody "virtual" methods of the UART class that get compiled in whether they are called or not, and I don't understand the class-based mumbo-jumbo of C++ well enough to know how to get rid of that. (in the case of Wire, early on I changed it from being vbased on Hardware_I2C to plain old Stream, because that, at a stroke, eliminated a bunch of those bullshit virtual methods The ATtiny13 doesn't have anything set up by default (and on a core where it's 100% optimized for the smallest flash one can imagine using with Arduino, which is realistic). The '13 also only has 10 vectors, meaning 20 btytes of dflash used to store vectors. the x12 has 26 vectors, using 52 bytes of memory between them, And like I said, the 412 gets about 16-20 bytes of junk compiled in because the compiler is dumb. |
Beta Was this translation helpful? Give feedback.
-
I have to balance the demands of people who are using parts with 2 or 4k of flash from the bottom of the tinyAVR line, the people using parts at the top who want more convenience features and can't understand why I'm so uptight about flash usage, and the API weenies who piss and moan whenever I do anything non-standard. I think I've managed to drive most of the last group away, at least |
Beta Was this translation helpful? Give feedback.
-
Personally, for my own projects (when did I last have time to work on one of those?) I tend to use 14 and 20 pin tinies.... or 48-pin Dx-series. I had some boards made for a project that used the 32-pin AVR128DBs, and assembled like 2-3 of them,. But.... I'm not going to assemble anymore, I need to do a little bit odf design validation, specifically with regardds to the LCD interface, but I'm gonna respin those wirth the AVR128DB48 - had to ,make too many compromises to keep within the 32 pins..... Obviously, I am very excited for the AVR64DD-series - MVIO, type D timers (and word is they won't have the portmux errata at release) and more ram and flash than I will ever use, in a 14-pin part that I could solder blindfolded. (they actually shine a spotlight on some of the USART issues). Though more likely whatever I do with them will fit in the 16/32k versions). The DD at 5v and MVIO to talk to an ESP at 3.3 for network connectivity seems like a killer combination whole thing would te tiny and dead simple to assemble at home. |
Beta Was this translation helpful? Give feedback.
-
Hi, "It seems like microchip has turned their back on the 8-pin form facto :-(" "I have to balance the demands of people who are using parts with 2 or 4k of flash from the bottom of the tinyAVR line, the people using parts at the top who want more convenience features" "This silicon shortage is punishing" |
Beta Was this translation helpful? Give feedback.
-
Hello foxabilo With the void setup() / void loop() i have 142 Bytes too (disable millis...) With: int main(void) { I have only 76 Bytes Flash, and 0Bytes RAM, for write in C . ---------- 592 / 10 Bytes: void setup() {} . #include <util/delay.h> int main(void) { |
Beta Was this translation helpful? Give feedback.
-
Also, unless I'm mistaken, this:
where the stack pointer is set, is completely redundant - certainly my read of the datasheet for modern AVRs is that the reset values of SPL and SPH are such that the above instructions are setting them to their reset values... in code that runs immediately after reset... And if that's the case, we're just wiping our ass with 8 bytes of flash and flushing em down the john... (and it implies that the waste could be eliminated when I next rebuild the toolchain package by deleting a couple of lines from gcrt1.S - ofc, this would hose code that relied on jumping back to 0x0000 (such code is bad and deserves to be hosed; even in classic-avr days that was true, but with software reset, it's really true), and would break compatibility with third party bootloaders which don;t cleanly reset the chip before jumping to the app (optiboot_x as always achieves this by exiting the bootloader only via WDT when its run, so I think it's safe here, and if it['s not safe as is, we have enough space left in bootloader section to make it reset SPL before jump to app), and the core is not expected to support arbitrary third party bootloaders >.> |
Beta Was this translation helpful? Give feedback.
-
The following code
when built for the ATtiny13a uses 0 bytes of SRAM and 88 bytes of Flash
when building for the ATtiny212 which has a hardware UART so one would assume the code would be somewhat smaller uses 69 bytes of SRAM and 1754 bytes of Flash.
Simple things like digitalWrite are taking 90 to 100+bytes each, EEPROM.put is again 90+ bytes a 20 line program to read EEPROM and ADC values is over 2.5k in size when on the ATtiny13a it's 706 bytes.
Am I missing some compiler directive or build option?
Beta Was this translation helpful? Give feedback.
All reactions