Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unison dual-sync feature unox for efficient syncing and with user/id mapping #64

Merged
merged 31 commits into from
Aug 2, 2016

Conversation

mickaelperrin
Copy link
Contributor

While working on #42, I learned a lot of things about unison, some were already pointed partially by contributors (@gagarine and @onnimonni, thanks for making me think about that).

  1. There is no absoluty need to run unison with root and use --user and --numericids. Running unison under the user we want is sufficient.
  2. Unison is very efficient when used with the option -repeat watch. It follows events in realtime and update its index in realtime while performing sync. When used in conjunction with fsevents, it needs to scan all folder on each request to update it's index.
  3. There is no need to run a server on each side. A simple call of unison with -socket can handle both sync and watch on both sides.

Point 2 and 3 are true, if the system is compatible with unison-fsmonitor. On Mac, you need to install unox to get it working.

This PR introduces :

  • a new sync strategy `unison-unox``
  • a new watch strategy unison
  • an example

The adequate unison image is currently here. I won't push a PR for docker-unison because it is so much simpler than the actual image. Feel free to grab it under your Github account.

I am not sure of the updates of the execution.rb and sync_manager.rb. I wasn't able to get unison working as a new thread, I needed to fork it. This is a problem because, the .join method no longer prevents the main process to exit if you use only this strategy. I use a silly hack with sleep, sure there is better implementation than that. But my 1 week experience with ruby prevent me to find a better way ;)

Hope we will get some good results with this new strategy, until the OSXFS #42 approach will be patched on the Docker side.

- PB: while the command launched has no errors, the sync/watch doesn't
work.
- FIX: it looks like that creating a new process instead of a thread
resolves the issue.
- TODO: ensure that this is the adequate way to resolve the issue
- PB: docker-sync exit because of the lack of threads.
Indeed, unison is launched as fork. So forked unison processes are
not killed when docker-sync ends.
- FIX: sleep indefinitely if there are no threads
@EugenMayer
Copy link
Owner

Wow, that looks very interesting - thank you for outlining your think-process and way to get this running.

I would love to take some time to have a look at this, to see what you are going to.

Its interesting to see that this is a different go on #42 due to the issues you outlined with OSXFS and events after some time.

Do you already have any benchmarks with unox vs fswatch on lets say 15k files?

Thank you for your huge effort!

@mickaelperrin
Copy link
Contributor Author

This is the only approach with magic-sync that allowed me to scaffold a full Drupal project with 15k in realtime and with reasonable CPU usage.

Unison dualside failed with this test and I needed to deactivate syncing to use it while bootstrapping.

I ask me even if this is not faster than magic (if we haven't the osxfs issue) as we rely on both side on native file systems.

@EugenMayer
Copy link
Owner

  1. So basically unison-dualside fails due to CPU usage on bigger projects ( deal breaker for you and me )
  2. And magic fails for now due to the buggy implementation of OSXFS

And this is the third angle to tackle the dual-sync side. Nice progress with those real-case tests

@mickaelperrin
Copy link
Contributor Author

Sorry, I was on the go while I answered your latest message.

To be precise, dualside works with heavy projects, but doesn't support their scaffolding. That's why I implemented watch_stop / watch_start. It helped me to bootstrap with a script like the one provided on the wiki.

I would love to see unox have a simple run and forget workflow. I will update this PR with my observations while using it.

Would love to have your opinions too and especially on the ruby implementation of the process management.

@EugenMayer
Copy link
Owner

Sure, i will. Will have to probably take on this on the weekend though

@mickaelperrin
Copy link
Contributor Author

Added a new feature that removes the annoying config sync_host_port. Now, host port is discovered automatically.

Yet, I don't like parsing the output of cli commands. Maybe, we should think about using a ruby docker api, like docker-api ?

@@ -105,7 +106,7 @@ def start_container
exists = `docker ps --filter "status=exited" --filter "name=#{container_name}" | grep #{container_name}`
if exists == ''
say_status 'ok', "creating #{container_name} container", :white if @options['verbose']
cmd = "docker run -p '#{@options['sync_host_port']}:#{UNISON_CONTAINER_PORT}' \
cmd = "docker run -p '127.0.0.1::#{UNISON_CONTAINER_PORT}' \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like 2x :: only one needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended, I don't specify the host port that should be on the middle of the pattern ip:host_port:container_port

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i see, sorry

@EugenMayer
Copy link
Owner

i think what you did there is practical, i would not yet pull in a other library ( for now ) if we have more reasons for it, i would follow you on this one. What do you think?

@EugenMayer
Copy link
Owner

We probably want to thnk about https://github.com/mauricioklein/docker-compose-api since it better parses and utilizes the compose-file and lets us check status / start things by segments. This could make a lot of sense. Did not think deeply into it, pasted for gathering informations

end

sleep unless has_threads
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you want to design this blocking by design?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said previously in the first message of the PR.

I wasn't able to get unison working as a new thread, I needed to fork it. This is a problem because, the .join method no longer prevents the main process to exit if you use only this strategy. I use a silly hack with sleep, sure there is better implementation than that.

If you don't sleep, the main process exits and you have no way to kill started unison process automatically.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well we have to start thinking if we start working with pid / pid files whatever solutions are common in ruby.

@EugenMayer
Copy link
Owner

For now it seems like you pulled way to many changes into this features. You replaced the way threadexec works https://github.com/mickaelperrin/docker-sync/blob/feature/unox/lib/docker-sync/execution.rb#L24 which needs testing for itself. I would suggest to make this a own PR we can test isolated since this will probably influence all fswatch or at least, ensure fswatch still works like desired.

In general, lets try to now blow this out of proportion or we will have heard time to get it all together. It seems like you try to already implement the "daemon" mode here , mentioned in #51

To be reviewed / documented:

  • new join_threads + sleep in there
  • new forkexec
  • WatchStrategy::Unison::initialize -> why is a sync strategy started here? Is this the "host watcher?"

In general could you document what you did in SyncStrategy::Unison_Unox so what is started what for, why which arguments - so we have some knowledge sharing and can maintain this easier in the future ( or other contributors ). Thanks!

@mickaelperrin
Copy link
Contributor Author

mickaelperrin commented Jul 24, 2016

Well spotted for the threadexec, this is a bad commit. This function doesn't have to change in any way. I will update this.

After reviewing the code, this wasn't a bad commit indeed. I simply factorized the common parts between the threadexec and forkexec functions. To exhibits that they are the same functions, except one use Thread.new and the other Process.fork. I will remove the factorization.

@mickaelperrin
Copy link
Contributor Author

:) This is only unox related indeed.

But, my question is indeed more related to general ruby why a process launched in ruby doesn't perform similarly as a process launched from cli ? should I take attention of some thing in particular.

@onnimonni
Copy link

onnimonni commented Jul 26, 2016

Could it be that the ruby process doesn't have unox in it's $PATH?

@EugenMayer
Copy link
Owner

EugenMayer commented Jul 26, 2016

lets split of the general question then into a special issue, what do you think? The reason for this issue could be on different layers, we have to investigate. ENV like path could be a cause

end
begin
Preconditions::unison_available
Preconditions::unox_available
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onnimonni thanks for trying to help, but I have implemented a precondition to check that unox is available. Or maybe this check is broken ?

- PB: when run with popen3 unison stopped to cath file events suddently.
- FIX: using backticks did the trick.
- TODO: understand why this happens
@mickaelperrin
Copy link
Contributor Author

Using backticks instead of popen3did the trick. Now, need to understand why...

@EugenMayer
Copy link
Owner

You probably want to have a look at http://i.stack.imgur.com/1Vuvp.png ..

So the main differences between popen3 and backticks are, popen is non blocking, backticks are blocking.

The change you just made will not make it into the master branch, since this once again influences fswatch behavior. Please either create a customer execution or solve the issue a different way. We wont stderr/out to be captured - so i suggest you revert that

@mickaelperrin
Copy link
Contributor Author

I agree, I wish it has a "better" solution. But to be precise:

  • this won't influence fswatch as this is implemented in the function forkexec that it called with only this watch strategy
  • stdout is captured, as indicated in your graphic. If we need stderr, I think adding 2&>1 to the command will help.

@EugenMayer
Copy link
Owner

oh sorry, got confused, my bad. I guess we will have to solve a couple of those cli-invocation issues somehow, e.g. a issue with process #69 or #65 . Probably a total different approach as you suggest either kind of a process manager or somewhat different. The issue will be, not to "overcomplicate" things.

I do not have a slighly idea why backticks work, popen do not. I those cases, it was always the ENV being different or one of the commands is pickier about what is needed to be escaped?

@mickaelperrin
Copy link
Contributor Author

I checked that using backticks still ensure that processes are killed when we kill the main process. But maybe, in some circumstances, we will have some zombie processes as you get with fswatch in #69 .

Regarding implementation as a daemon #69 , I think it would be a good enhancement. But to be honest, I need less this feature since the use of unox. As it performs well so far, I let it run even when doing heavy I/O. If something goes wrong, kill and restart correct synchronisation issues.

However from my point of view both #69 and #65 are independent of this PR at the moment. But, yes this PR will need to be updated if we change someting more globally.

@EugenMayer
Copy link
Owner

agreed, independent but it could solve issues occurring here probably, thats why i referenced here, so we could probably keep track of the progress. I just expect that we go different routes solving similar issues and see, what solutions works out best

@mickaelperrin
Copy link
Contributor Author

@EugenMayer I just let you review the latest commits and will merge the branch if it is ok for you.

raise('Could not find unison-fsmonitor binary in path. Please install it, see https://github.com/hnsl/unox, or simply run :
wget -O /usr/local/bin/unison-fsmonitor https://raw.githubusercontent.com/hnsl/unox/master/unox.py \
&& chmod +x /usr/local/bin/unison-fsmonitor')
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mickaelperrin a pitty there is now brew alternatives. I am thinking about creating a own tap and adding this.

Actually, in this case, use yes? and ask if the command should be run, run it automatically then. This makes the most sense here to ease up the installation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about auto install also, but as you didn't provide autoinstall of unison / fswatch / rsync... I stick with a simple help message on how to install.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was, not everybody has brew. In your case, no dependencys are needed and its the only way to install. AFAIK wget is not part of OSX by default, you might want to use curl.

I will change the others also by the way, adding a question, like for the images

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@EugenMayer
Copy link
Owner

@mickaelperrin for my perspective, this looks great. The one thing automating / easing up unox installation with a confirmation would actually help a lot and then just head on mergin it. Great work

@EugenMayer
Copy link
Owner

@mickaelperrin see #80 , we should change back the image name to eugenmayer/unison:unox due to legacy support / pre release support . We chnage it to eugenmayer/unison some releases later, so we are sure no/few are using the older versions.

`#{cmd}`
else
raise("Please install it, see https://github.com/hnsl/unox, or simply run :\n #{cmd}")
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants