Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioning hangs #6

Open
frankscholten opened this issue Dec 22, 2012 · 3 comments
Open

Provisioning hangs #6

frankscholten opened this issue Dec 22, 2012 · 3 comments

Comments

@frankscholten
Copy link
Contributor

I added the full path of a shell script that echos 'hello' to the provisioner variable in the vagueant.conf file and when I do vagueant up it hangs when running the provisioner.

I have the following processes running

root     17897  0.0  0.0   2864  1064 ?        Ss   15:09   0:00 lxc-start -n provisioner-test -c /var/run/lxc/provisioner-test.console -d
root     17898  0.0  0.0   4240   540 pts/1    S+   15:09   0:00 tail -f /var/lib/lxc/provisioner-test/rootfs/var/log/runonce.log
root     18166  0.0  0.0   5224  1412 pts/1    S+   15:09   0:00 /bin/bash /usr/bin/lxc-wait -n provisioner-test -s STOPPED
@frankscholten
Copy link
Contributor Author

I have processes that are in uninteruptable sleep state

frank@frankthetank:~/precise2$ ps aux | grep lxc
123       1619  0.0  0.0   3420   892 ?        S    Dec16   0:00 dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253 --dhcp-no-override --except-interface=lo --interface=lxcbr0
root     19731  0.0  0.0   2864   904 ?        Ds   15:14   0:00 lxc-start -n precise -d -c /var/run/lxc/precise.console
root     20136  0.0  0.0   2864   904 ?        Ds   15:16   0:00 lxc-start -n precise -d -c /var/run/lxc/precise.console
root     20730  0.0  0.0   2864   904 ?        Ds   15:19   0:00 lxc-start -n precise -d -c /var/run/lxc/precise.console
root     21124  0.0  0.0   2864   904 ?        Ds   15:20   0:00 lxc-start -n precise2 -d -c /var/run/lxc/precise2.console
frank    21357  0.0  0.0   4396   820 pts/12   S+   15:23   0:00 grep lxc

In in dmesg I see the following

[517896.334578] INFO: task lxc-start:20136 blocked for more than 120 seconds.
[517896.334580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[517896.334581] lxc-start       D 00000000     0 20136      1 0x00000004
[517896.334585]  ebf5de74 00200082 00000000 00000000 ebf5de20 f746f230 defd8000 0001d6bd
[517896.334591]  c196be00 c196be00 6072be8a 0001d6bd f7babe00 ebd89960 c1107e95 ebf5de30
[517896.334598]  c1107ec6 ebf5de6c c11521a2 eeda7070 c17bc906 ebc72a80 ebc72af8 ebf5de50
[517896.334605] Call Trace:
[517896.334610]  [<c1107e95>] ? __free_pages+0x35/0x40
[517896.334614]  [<c1107ec6>] ? free_pages+0x26/0x30
[517896.334617]  [<c11521a2>] ? mount_fs+0xa2/0x180
[517896.334621]  [<c106d48e>] ? lg_global_unlock+0x3e/0x50
[517896.334625]  [<c15c95d3>] schedule+0x23/0x60
[517896.334628]  [<c15c982d>] schedule_preempt_disabled+0xd/0x10
[517896.334632]  [<c15c8586>] __mutex_lock_slowpath+0xc6/0x120
[517896.334635]  [<c15c8114>] mutex_lock+0x24/0x40
[517896.334638]  [<c14d62cc>] copy_net_ns+0x5c/0xd0
[517896.334642]  [<c106a411>] create_new_namespaces+0xb1/0x150
[517896.334646]  [<c106a5b2>] copy_namespaces+0x72/0xb0
[517896.334650]  [<c10430cb>] copy_process.part.28+0x6db/0x10f0
[517896.334654]  [<c1043c3a>] do_fork+0x11a/0x350
[517896.334658]  [<c10185e4>] sys_clone+0x34/0x40
[517896.334661]  [<c15d12d9>] ptregs_clone+0x15/0x3c
[517896.334665]  [<c15ca5a4>] ? syscall_call+0x7/0xb

I will reboot and try again

@neerolyte
Copy link
Owner

I've been trying to reproduce this but haven't managed anything yet.

My next step will be to add some working (at least on my laptop :p ) examples to the repo to see if that helps.

Although given the current time of year it may be a week or two before I manage to push anything functional up :)

The multiple D state processes has me thinking maybe we some managed to start one with the same name multiple times - maybe I've got a race condition to sort out...

Cheers,
Dave

@frankscholten
Copy link
Contributor Author

Yeah I think it happens if I run 'vagueant up' multiple times or something, or destroying the lxc while it is tailing the provision log. In the meantime I have been working on a 'vagueant template' command, similar to the 'vagrant box' command. Probably have something working in the weekend. For now I wish you a merry christmas and best wishes for 2013! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants