Skip to content
This repository was archived by the owner on Jul 26, 2019. It is now read-only.

Wc.go hadoop prep #2

Merged
merged 9 commits into from
Jul 25, 2016
Merged

Conversation

jordan-heemskerk
Copy link
Contributor

  • Cleaned out a bunch of stuff that shouldn't be in the repo
  • Restructured wc.go to work with Hadoop streaming, first step in getting it to go on EMR
  • Restructure graphbuilder.go to work with new wc.go outputs

@jordan-heemskerk
Copy link
Contributor Author

@eburdon this too!

@jordan-heemskerk
Copy link
Contributor Author

ec01ea0 is the crowning achievement... see this fvbock/trie#2

@eburdon
Copy link
Owner

eburdon commented Jul 25, 2016

😮 You contributed to open source?! NIIIIIIIIIIIIICE

Just looking through the files now... I was thinking that once this is stable, we'd just fire up the smallest EC2 cluster and run EMR on that instead of spot instances until the 12th. Shouldn't be too expensive and would prevent Lambda from having to configure every time. Thoughts?

@jordan-heemskerk
Copy link
Contributor Author

I was thinking that once this is stable, we'd just fire up the smallest EC2 cluster and run EMR on that instead of spot instances until the 12th. Shouldn't be too expensive and would prevent Lambda from having to configure every time. Thoughts?

My thoughts exactly. We can spin up a small EMR cluster and leave it running. Lambda can just submit jobs to it using the API (available for most major languages) and then fetch the results from S3 when they are available.

@eburdon
Copy link
Owner

eburdon commented Jul 25, 2016

👍 Just for deleting all the junk alone... I packaged the existing repo just for safety's sake.

Otherwise, looks great, and the plan sounds solid! Merge when ready.

@eburdon
Copy link
Owner

eburdon commented Jul 25, 2016

To confirm, looks like there's 1 input, 1 output now?

@jordan-heemskerk
Copy link
Contributor Author

I don't have permission to merge in this repo I don't think. You can just do it if you want, or gimme god mode :P.

@jordan-heemskerk
Copy link
Contributor Author

Input and output all happens over STDIN and STDOUT now, as required by hadoop streaming. There is one input, it controls whether the execution is mapping or reducing.

@eburdon
Copy link
Owner

eburdon commented Jul 25, 2016

haha ok. Alex can get the command needed to run from lambda from this codebase / readme?

@eburdon eburdon merged commit 284b516 into eburdon:master Jul 25, 2016
@jordan-heemskerk
Copy link
Contributor Author

jordan-heemskerk commented Jul 25, 2016

Alex can get the command needed to run from lambda from this codebase / readme?

@eburdon is he in this repo yet? Have him ask me, its going to depend on what he is calling it from

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants