Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TemplateString::InTextSegment unreliable, leads to failed assertion #81

Open
GoogleCodeExporter opened this issue Mar 12, 2015 · 19 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. InTextSegment makes the assumption that everything between _start and 
data_start is read-only.  This is not always the case - I'm running into a 
situation where mmap() is returning memory located between the .text and .data 
segments.  This leads to a failed assertion, after memory is overwritten that 
ctemplate thought was immutable.  (I don't have a good way to isolate and 
reproduce this, unfortunately.)

2. As a related but less important issue: Shared libraries get relocated past 
data_start, so their string literals will always be treated as "mutable", and 
won't benefit from the optimization.  StaticTemplateString seems to be a decent 
alternative, though.

What is the expected output? What do you see instead?

Check failed: TemplateString::IdToString(id) == kStsEmpty || memcmp(str.ptr_, 
TemplateString::IdToString(id).ptr_, str.length_) == 0


What version of the product are you using? On what operating system?

ctemplate-1.0
RHEL 5 / Ubuntu 10.04


Please provide any additional information below.

For the sake of portability, maybe the InTextSegment optimization could be a 
compile-time option, off by default.

Original issue reported on code.google.com by [email protected] on 12 Nov 2011 at 4:29

@GoogleCodeExporter
Copy link
Author

ugh, that bit of code was always a bit of a hack and now it's coming back to 
bite us. :-(  What I really want to capture is the addresses of the beginning 
and end of .text, but that's sadly not exposed via gcc.  Though honestly I'm 
surprised that in your system, .data isn't mapped in right after .text.

I'm wondering if it makes sense to figure out where text_end is by analyzing 
the binary itself somehow.  It should be possible, and I see it done in places 
like this:
   http://geco.mines.edu/papi/html/papif_get_exe_info.html
But it may be a lot of work.  I'm not sure what the best path is here.  Making 
it a compile-time option, like you suggest, would work, but I doubt anyone 
would turn it on (especially since it's hard to guarantee it will be safe).


I think your suggestion is probably the best one, but I don't know under what 
situations to recommend that people turn it on.  (Making it a runtime option 
could help, but that will kill any of the speed advantages, probably.)  I'm not 

Original comment by [email protected] on 12 Nov 2011 at 5:56

  • Added labels: Type-Defect, Priority-Medium

@GoogleCodeExporter
Copy link
Author

The only good way I can think of is to read /proc/self/maps and scan for the 
text segment.  It works, but is not portable.

Since you already have the '_start' symbol, it could be simply a matter of 
looking for the map which contains _start's address:

_start = 0x7fe7ee1a58f0
7fe7ee1a5000-7fe7ee1a6000 r-xp 00000000 08:02 589002                     
/tmp/a.out

Thinking a bit more about what I said about shared library text segments, I 
realized those can't be safely considered immutable after all, particularly in 
programs that use dlopen/dlclose.  The only "safe" segment does seem to be the 
text of the main executable.

Looking at the papi sources, they actually do use a /proc/%d/maps approach for 
Linux targets.  However, rather than look for the _start symbol, they're 
looking for a read-only, executable map with a filename that matches 
readlink("/proc/%d/exe").

Original comment by [email protected] on 12 Nov 2011 at 4:37

@GoogleCodeExporter
Copy link
Author

Thanks for doing this research!  It's good to know the options.

I'm not sure how I feel about doing a file I/O operation to implement what may 
well be a very minor optimization.  We should maybe just encourage folks to use 
STS_INIT more for creating template strings, rather than trying to guess 
whether someone is using a static string (that the compiler put in the text 
segment).

I'm writing to some local linker experts to see if they have any ideas.

Original comment by [email protected] on 15 Nov 2011 at 2:04

@GoogleCodeExporter
Copy link
Author

An interesting development.  Here's what the local expert had to say:
---
The situation described in that bug report can occur when using an old
version of glibc which had a bug which permitted this case to happen.
For OS they list RHEL 5 / Ubuntu 10.4.  I'm not sure what that means.  I
would be horrified if the glibc bug existed on Ubuntu 10.4.  I would not
be at all surprised if it existed on RHEL 5.

The glibc change that I'm thinking of is described at
    http://sourceware.org/ml/libc-alpha/2011-10/msg00023.html
---

Were you only seeing the weird behavior on rhel 5?

He goes on to say:
---
The fixed glibc should be mprotecting the pages between the end of the
read segment and the start of the write segment such that they are not
allocated by future calls to mmap.  I don't know how that will show up
in /proc/self/maps.

All that said, I now wonder whether glibc only fills in holes in shared
libraries, and whether perhaps nothing fills in holes in the executable
itself.  I don't know.
---
so that leaves me not certain of what's going on.  But if you are only seeing 
the problem on RHEL 5, then we can try to protect this code with some libc 
detection, perhaps.


Original comment by [email protected] on 15 Nov 2011 at 10:28

@GoogleCodeExporter
Copy link
Author

After looking at my Ubuntu 10.04 machine again, it looks like I had misread the 
process map originally.  RHEL5 was the only place where the issue actually 
occurred.

By the way, I -finally- managed to reproduce the allocation issue just now on 
the x86_64 RHEL5 machine.  The two keys to reproducing it were:
1) Position-independent executable (gcc -pie -fPIC)
2) The process must fork.  The issue affects the child, but not the parent.

/* Compile with: gcc -pie -fPIC
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
    char cmdbuf[64];

    if (fork()) {
        wait(NULL);
        exit(0);
    }

    mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);

    sprintf(cmdbuf, "cat /proc/%d/maps", getpid());
    system(cmdbuf);

    return 0;
}

And here's the output I get, with the writable map right in the middle of the 
executable, between the text and data:

2af0880f7000-2af0880f8000 r-xp 00000000 fd:00 165147                     
/tmp/a.out
2af0880f8000-2af0881f8000 rw-s 00000000 00:09 74132637                   
/dev/zero (deleted)     <--- XXX here it is
2af0882f7000-2af0882f8000 rw-p 00000000 fd:00 165147                     
/tmp/a.out
2af0882f8000-2af088314000 r-xp 00000000 fd:00 459046                     
/lib64/ld-2.5.so
2af088314000-2af088315000 rw-p 2af088314000 00:00 0 
2af08831e000-2af08831f000 rw-p 2af08831e000 00:00 0 
2af088513000-2af088514000 r--p 0001b000 fd:00 459046                     
/lib64/ld-2.5.so
2af088514000-2af088515000 rw-p 0001c000 fd:00 459046                     
/lib64/ld-2.5.so
2af088515000-2af088662000 r-xp 00000000 fd:00 459047                     
/lib64/libc-2.5.so
2af088662000-2af088862000 ---p 0014d000 fd:00 459047                     
/lib64/libc-2.5.so
2af088862000-2af088866000 r--p 0014d000 fd:00 459047                     
/lib64/libc-2.5.so
2af088866000-2af088867000 rw-p 00151000 fd:00 459047                     
/lib64/libc-2.5.so
2af088867000-2af08886d000 rw-p 2af088867000 00:00 0 
7fff0844a000-7fff0845f000 rw-p 7ffffffe9000 00:00 0                      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]

Original comment by [email protected] on 16 Nov 2011 at 12:27

@GoogleCodeExporter
Copy link
Author

Nice debugging!

I wonder which of these issues are necessary to reproduce the problem in 
theory, and which just happened to be necessary to trigger it in your case.  We 
could perhaps just turn off this optimization if people compiled with -fPIE, 
for instance, if that was a necessary condition.  But I worry the problem could 
trigger without -PIE, maybe with more difficulty.

If the problem really is with RHEL5 only, maybe we could just turn it off for 
older libc's (it looks like RHEL5 is 4.5 years old now?)

Original comment by [email protected] on 16 Nov 2011 at 1:00

@GoogleCodeExporter
Copy link
Author

It looks like the behavior still exists in current kernels (tested with 3.0.0), 
but only commonly occurs under specific conditions.  The kernel has two 
different strategies for finding unused memory for mmap: arch_get_unmapped_area 
(the legacy "bottomup" layout), and arch_get_unmapped_area_topdown.  I have 
only been able to reproduce this in the "bottomup" memory layout.

As of kernel 2.6.25 
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=cc503c1
b43e002e3f1fed70f46d947e2bf349bb6), x86_64 defaults to the topdown approach.  
In previous versions with PIE randomization support -- including the RHEL5 
patched 2.6.18 kernel -- topdown is only supported for 32-bit processes.

There are four cases where recent kernels will use bottomup instead of topdown:
1) The topdown mmap fails, and tries bottomup as a fallback (Kernel comment: /* 
A failed mmap() very likely causes application failure, so fall back to the 
bottom-up function here. This scenario can happen with large stack limits and 
large mmap() allocations. */)
2) The ADDR_COMPAT_LAYOUT personality flag is set:
    setarch x86_64 --addr-compat-layout ./a.out
3) The stack size is unlimited:
    ulimit -s unlimited ; ./a.out
4) The "vm.legacy_va_layout" sysctl is turned on:
    sudo sysctl -w vm.legacy_va_layout=1 ; ./a.out


# topdown example

$ uname -r -m
3.0.0-12-generic x86_64
$ ./a.out
7f2d48491000-7f2d48626000 r-xp 00000000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
7f2d48626000-7f2d48825000 ---p 00195000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
7f2d48825000-7f2d48829000 r--p 00194000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
7f2d48829000-7f2d4882a000 rw-p 00198000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
7f2d4882a000-7f2d48830000 rw-p 00000000 00:00 0 
7f2d48830000-7f2d48851000 r-xp 00000000 08:02 2232612                    
/lib/x86_64-linux-gnu/ld-2.13.so
7f2d4892f000-7f2d48a2f000 rw-s 00000000 00:04 1286890          here ===> 
/dev/zero (deleted)
7f2d48a2f000-7f2d48a32000 rw-p 00000000 00:00 0 
7f2d48a4e000-7f2d48a50000 rw-p 00000000 00:00 0 
7f2d48a50000-7f2d48a51000 r--p 00020000 08:02 2232612                    
/lib/x86_64-linux-gnu/ld-2.13.so
7f2d48a51000-7f2d48a53000 rw-p 00021000 08:02 2232612                    
/lib/x86_64-linux-gnu/ld-2.13.so
7f2d48a53000-7f2d48a54000 r-xp 00000000 08:02 524543                     
/tmp/a.out
7f2d48c53000-7f2d48c54000 r--p 00000000 08:02 524543                     
/tmp/a.out
7f2d48c54000-7f2d48c55000 rw-p 00001000 08:02 524543                     
/tmp/a.out
7fff1507a000-7fff1509b000 rw-p 00000000 00:00 0                          [stack]
7fff1516d000-7fff1516e000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  
[vsyscall]


# bottomup example

$ uname -r -m
3.0.0-12-generic x86_64 
$ setarch x86_64 --addr-compat-layout ./a.out
2b5e42cf0000-2b5e42cf1000 r-xp 00000000 08:02 524543                     
/tmp/a.out
2b5e42cf1000-2b5e42df1000 rw-s 00000000 00:04 1288895          here ===> 
/dev/zero (deleted)
2b5e42ef0000-2b5e42ef1000 r--p 00000000 08:02 524543                     
/tmp/a.out
2b5e42ef1000-2b5e42ef2000 rw-p 00001000 08:02 524543                     
/tmp/a.out
2b5e42ef2000-2b5e42f13000 r-xp 00000000 08:02 2232612                    
/lib/x86_64-linux-gnu/ld-2.13.so
2b5e42f13000-2b5e42f15000 rw-p 00000000 00:00 0 
2b5e43112000-2b5e43113000 r--p 00020000 08:02 2232612                    
/lib/x86_64-linux-gnu/ld-2.13.so
2b5e43113000-2b5e43115000 rw-p 00021000 08:02 2232612                    
/lib/x86_64-linux-gnu/ld-2.13.so
2b5e43115000-2b5e432aa000 r-xp 00000000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
2b5e432aa000-2b5e434a9000 ---p 00195000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
2b5e434a9000-2b5e434ad000 r--p 00194000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
2b5e434ad000-2b5e434ae000 rw-p 00198000 08:02 2232605                    
/lib/x86_64-linux-gnu/libc-2.13.so
2b5e434ae000-2b5e434b7000 rw-p 00000000 00:00 0 
7fff19c87000-7fff19ca8000 rw-p 00000000 00:00 0                          [stack]
7fff19dbb000-7fff19dbc000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  
[vsyscall]

Original comment by [email protected] on 16 Nov 2011 at 6:55

@GoogleCodeExporter
Copy link
Author

Thanks for all the research!  It makes sense that topdown vs bottomup would 
affect how likely this is to occur, since the address space in question is 
pretty low in the virtual address space.

It still leaves open the question of whether any part of libc is supposed to 
protect the page having .text on it, to make it not writable even after the end 
of the text segment.

Still not sure what the best course forward might be.  Any suggestions?

Original comment by [email protected] on 16 Nov 2011 at 11:39

@GoogleCodeExporter
Copy link
Author

While tricks such as /proc/$$/maps (checked once at startup) might work, the 
cross-platform support seems like more hassle than it's worth.

One possibility is for the user to supply the end-of-text pointer to ctemplate. 
 The "etext" symbol appears to be commonly available.  Although ctemplate 
wouldn't be able to use it directly (doing so would likely bind to the "etext" 
of libctemplate.so), the main() program could bind etext and supply a pointer 
to ctemplate.

extern char etext;

int main()
{
    ctemplate::set_etext_pointer(&etext);
    // ...
}

By defaulting the value to NULL, the optimization effectively defaults to 
disabled.  Those users who (a) need it, and (b) know what they're doing, have 
the option of enabling it.

To further refine the idea, the main program might also supply its own "start" 
pointer as well.  This would be useful for applications where the user of 
ctemplate is not the main executable, but rather another shared library.

Original comment by [email protected] on 24 Nov 2011 at 2:57

@GoogleCodeExporter
Copy link
Author

Looking at this, in preparation for a new release (hopefully next week!)

I'm wondering if it's not a simpler fix to just replace data_start with _etext. 
 It looks like the gnu binutils supports _etext, and I think it's the one that 
supports data_start as well, so there's no compatibility issues.  And since the 
reference to _etext happens in a .h file, _etext will be evaluated in the 
context of the main executable, not libctemplate, so the concerns you have 
there wouldn't apply either.

Assuming there are no pitfalls -- and this is all poorly documented, so I have 
no idea if _etext is as portable as data_start or not -- I'll aim for that 
solution.

Original comment by [email protected] on 17 Dec 2011 at 3:04

@GoogleCodeExporter
Copy link
Author

I checked with a linker expert, who said __etext should do what we want, so I'm 
going with that.

Original comment by [email protected] on 19 Dec 2011 at 11:18

  • Changed state: Started

@GoogleCodeExporter
Copy link
Author

That's good to hear.  When I tested it, __etext did what we want it to do 
(especially considering that the function is inlined in a header file.)  
However, what about the "_start" symbol, which would only be valid in the 
context of the main executable?  Does the linker provide an alternate way to 
reliably find the start of the current text segment?

Original comment by [email protected] on 20 Dec 2011 at 2:44

@GoogleCodeExporter
Copy link
Author

Both __etext and _start "should" point to the main executable.  I think that is 
what we want in this case.  My expectation is that most uses of TemplateString 
will be in the main executable.  If not, it's just a little bit of optimization 
lost.

Original comment by [email protected] on 21 Dec 2011 at 12:30

@GoogleCodeExporter
Copy link
Author

I attached some test cases involving various combinations of _start/__etext 
referenced by the main executable, a shared library, and the same shared 
library loaded by dlopen.

The case of dlopen(... RTLD_DEEPBIND) will become an issue if/when the 
php-ctemplate (http://code.google.com/p/php-ctemplate/) project revives itself; 
PHP extensions are loaded with dlopen(..., RTLD_LAZY | RTLD_GLOBAL | 
RTLD_DEEPBIND).

Here's the output of the etext-test "make test" on my Ubuntu machine:

Test 1 - _start/__etext referenced by main executable, no libraries
main: _start=0x400490 __etext=0x400696

Test 2 - _start/__etext referenced by library, but not by main executable
shlib: _start=0x400550 __etext=0x400746

Test 3 - _start/__etext referenced by dlopen() library, but not by main 
executable
dl: _start=(nil) __etext=0x2b44414456a6

Test 4 - _start/__etext referenced both by dlopen() library and by main 
executable
main: _start=0x400690 __etext=0x400956
dl: _start=0x400690 __etext=0x400956

Test 5 - _start/__etext referenced both by dlopen() library with RTLD_DEEPBIND, 
and by main executable
main: _start=0x400690 __etext=0x400956
dl: _start=0x400690 __etext=0x2ad51f4fe6a6

Test 6 - _start/__etext referenced dlopen() library with RTLD_DEEPBIND, but not 
by main executable
dl: _start=(nil) __etext=0x2b2b4d7126a6

Original comment by [email protected] on 21 Dec 2011 at 2:15

Attachments:

@GoogleCodeExporter
Copy link
Author

This is good to know -- thanks for doing the tests.

DEEPBIND indeed can cause trouble -- I've had problems with it before.  Though 
given test 3, it looks like dlopen() is a problem even without DEEPBIND?  It 
looks like as long as code doesn't use dlopen(), _start and __etext behave 
sanely, but once dlopen() enters the picture, all bets are off in the 
dlopened-library.  Is that your reading as well?

I ran your tests replacing __etext with data_start and data_start doesn't have 
the problems that __etext does:
---
Test 1 - _start/__etext referenced by main executable, no libraries
main: _start=0x4004c0 data_start=0x601010

Test 2 - _start/__etext referenced by library, but not by main executable
shlib: _start=0x4005a0 data_start=0x601010

Test 3 - _start/__etext referenced by dlopen() library, but not by main 
executable
dl: _start=(nil) data_start=(nil)

Test 4 - _start/__etext referenced both by dlopen() library and by main 
executable
main: _start=0x4006d0 data_start=0x601038
dl: _start=0x4006d0 data_start=0x601038

Test 5 - _start/__etext referenced both by dlopen() library with RTLD_DEEPBIND, 
and by main executable
main: _start=0x4006d0 data_start=0x601038
dl: _start=0x4006d0 data_start=0x601038

Test 6 - _start/__etext referenced dlopen() library with RTLD_DEEPBIND, but not 
by main executable
dl: _start=(nil) data_start=(nil)
---

I was going to release a new ctemplate version tonight, that replaced 
data_start with __etext.  I'm wondering if it wouldn't be better to stick with 
data_start.  What is your feeling?

Original comment by [email protected] on 21 Dec 2011 at 3:13

@GoogleCodeExporter
Copy link
Author

In my testing, we saw some problems when compiling with -O2, with using 
__etext.  Based on that and the results you showed with dlopen(), I think it's 
premature to use __etext at this point.  I've reverted the change, and will 
look into this again after the new ctemplate release is out.

Original comment by [email protected] on 22 Dec 2011 at 10:20

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

Yeah, I think etext might end up being a dead-end.  In testing on my Ubuntu 
11.10 machine, literal strings aren't actually being put into the .text section 
at all.  They're being put into the separate .rodata (read-only, non-executable 
data) section.  Testing against the range [_start, __etext) would end up 
excluding the very strings we're trying to match.

I tried various tactics such as defining a static "landmark" variable in the 
header file (with the hope of using its address as a reference location), but 
without consistent success.

Another possibility might be the __builtin_constant_p() built-in function 
(http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins), but I 
was only able to get that to work from within #define macros.

Original comment by [email protected] on 23 Dec 2011 at 12:33

@GoogleCodeExporter
Copy link
Author

The documentation says this:
---
You may use this built-in function in either a macro or an inline function. 
However, if you use it in an inlined function and pass an argument of the 
function as the argument to the built-in, GCC will never return 1 when you call 
the inline function with a string constant
---
which is exactly the position we're in, so that all matches up.  It's a good 
idea, though!

Does the current data_start correctly capture stuff in .rodata?  Maybe the 
current test isn't working either, and we can just safely remove it.

Probably the best thing would be to actually get perf numbers to see if it 
makes a difference.  It may be that the string-copying here is so fast, the 
optimization is totally unnecessary.

Original comment by [email protected] on 23 Dec 2011 at 1:09

@GoogleCodeExporter
Copy link
Author

The three sections are ordered:
    .text
    [gap]
    .rodata
    .data

On the older RHEL5 system, there is no .rodata, so it's simply:
    .text
    [gap]
    .data

So yes, the existing code is working as intended (at least for strings located 
in the main executable's .rodata or .text section, as opposed to strings 
belonging to a library), except when mmap() decides to place an anonymous 
writable map inside that gap.

Original comment by [email protected] on 23 Dec 2011 at 2:11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant