-
Notifications
You must be signed in to change notification settings - Fork 1
/
BigData.py
200 lines (195 loc) · 14.3 KB
/
BigData.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
"BigData":{
"http://www.dzone.com/mz/big-data"
"Hadoop":[
"http://getindata.com/blog/post/avoiding-the-mess-in-the-hadoop-cluster-part-2/"
"http://getindata.com/blog/post/avoiding-the-mess-from-the-hadoop-cluster-part-1/"
"http://appaloud.com/teradata-acquires-rainstor-address-big-data-archiving-using-hadoop/"
"http://appaloud.com/teradata-offers-mapr-within-teradatas-unified-data-architecture/"
"http://www.datasciencecentral.com/profiles/blogs/6-cloud-based-machine-learning-services"
"http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-processing-with-apache-hadoop/"
"https://adtmag.com/articles/2015/05/21/mapr-adds-drill.aspx"
"http://cscarioni.blogspot.com/2015/02/clustering-customers-for-machine.html"
"http://hortonworks.com/blog/apache-hadoop-infrastructure-considerations-and-best-practices/"
"http://www.hadoop360.com/blog/hadoop-whose-to-choose"
"http://haskell-distributed.github.io/"
"http://appaloud.com/splice-machine-takes-big-step-releasing-version-1-0-hadoop-rdbms/"
"http://siliconangle.com/blog/2015/05/15/oracle-announces-new-spatial-graph-tools-for-hadoop-and-nosql/"
"http://www.datasciencecentral.com/profiles/blogs/get-started-with-hadoop-and-spark-in-10-minutes"
"http://hortonworks.com/blog/resilience-of-yarn-applications-across-nodemanager-restarts/"
"http://hortonworks.com/blog/hortonworks-acquires-sequenceiq-to-provide-automated-deployment-of-hadoop-everywhere/"
"http://www.datasciencecentral.com/profiles/blogs/hadoop-2-helps-systems-integration"
"http://www.infoq.com/articles/Hadoop-Cluster"
"http://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoop-clusters-like-a-boss/"
"https://developer.rackspace.com/blog/monitoring-hadoop-with-rackspace-cloud-services/"
"http://radar.oreilly.com/2015/02/processing-frameworks-for-hadoop.html"
"http://blog.cloudera.com/blog/2015/02/understanding-hdfs-recovery-processes-part-1/"
"https://districtdatalabs.silvrback.com/creating-a-hadoop-pseudo-distributed-environment"
"http://engineering.viki.com/blog/2015/analytics-infrastracture-updates/"
"http://saphanatutorial.com/hadoop-1-0-vs-hadoop-2-0/"
"http://appaloud.com/splice-machine-takes-big-step-releasing-version-1-0-hadoop-rdbms/"
"http://www.hadoop360.com/blog/hadoop-technology-stack"
"http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/.com&utm_campaign=buffer"
"http://www.hadoop360.com/xn/detail/6623215:BlogEntry:9496"
"http://blog.cloudera.com/blog/2015/03/how-to-quickly-configure-kerberos-for-your-apache-hadoop-cluster/.com&utm_campaign=buffer"
"http://www.slideshare.net/andreagioia/fast-data-platforms-hadoop-user-group-italy"
"http://www.hadoop360.com/blog/spark-shark-and-mesos-data-analytics-stack",
"http://www.hadoop360.com/blog/8-hadoop-articles-that-you-should-read"
"http://vision.cloudera.com/data-governance-in-hadoop-part-2/"
"https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html"
],
"ApacheStorm":[
"http://lucidworks.com/blog/integrating-storm-and-solr/"
]
"Flink":[
"http://data-artisans.com/kafka-flink-a-practical-how-to/"
"http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html"
]
"Spark":[
"https://developer.ibm.com/bluemix/2015/09/04/speed-sql-queries-spark-sql/"
"http://blog.madhukaraphatak.com/analysing-csv-data-in-spark/"
"http://blog.cloudera.com/blog/2015/08/using-apache-spark-for-massively-parallel-nlp-at-tripadvisor/"
"http://www.cakesolutions.net/teamblogs/distributed-dataflow-computing-optimisations-in-apache-spark"
"http://www.cakesolutions.net/teamblogs/apache-spark-machine-learning-pipelines"
"https://databricks.com/blog/2015/06/16/zen-and-the-art-of-spark-maintenance-with-cassandra.html?"
"https://www.mapr.com/blog/quickstart-my-spark-kickstarting-your-spark-based-applications#.VX91Vryli1E"
"https://dataissexy.wordpress.com/2015/06/23/processing-json-with-sparkling-sparkling-spark-bigdata-clojure/"
"http://hortonworks.com/hadoop-tutorial/using-apache-spark-technical-preview-with-hdp-2-2/"
"http://www.datasciencecentral.com/profiles/blogs/implementing-a-distributed-deep-learning-network-over-spark"
"https://databricks.com/blog/2015/06/16/zen-and-the-art-of-spark-maintenance-with-cassandra.html?"
"https://www.mapr.com/blog/quickstart-my-spark-kickstarting-your-spark-based-applications#.VX91Vryli1E"
"http://hortonworks.com/hadoop-tutorial/using-apache-spark-technical-preview-with-hdp-2-2/"
"http://blog.madhukaraphatak.com/introduction-to-spark-data-source-api-part-1/"
"http://blog.cloudera.com/blog/2015/04/how-to-translate-from-mapreduce-to-apache-spark-part-2/"
"http://spark-packages.org/package/Stratio/RabbitMQ-Receiver"
"http://snowplowanalytics.com/blog/2015/05/21/first-experiments-with-apache-spark/"
"http://java.dzone.com/articles/spark-sql-against-cassandra"
"http://emerginginsightsnow.com/2015/05/17/apache-spark-ecosystem-grows-rapidly-has-hadoop-met-its-match/"
"http://www.enterpriseappstoday.com/data-management/is-apache-spark-enterprise-ready.html"
"http://www.duchess-france.org/analyze-accelerometer-data-with-apache-spark-and-mllib/"
"http://www.hadoop360.com/blog/spark-shark-and-mesos-data-analytics-stack"
"https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html"
"http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/"
"http://www.infoq.com/news/2015/03/pinterest-memsql-spark-streaming"
"http://rnowling.github.io/spark/2015/04/07/multiuser-spark-mesos.html"
"http://www.zdnet.com/article/sparks-success-overhyped-or-preordained/"
"http://developerblog.redhat.com/2015/01/20/microservice-principles-and-immutability-demonstrated-with-apache-spark-and-cassandra/"
"http://www.kennybastani.com/2015/01/categorical-pagerank-neo4j-spark.html"
"http://blog.sematext.com/2015/01/21/spark-performance-monitoring-use-case/"
"https://www.youtube.com/watch?v=z7bIt143smw"
"http://blog.couchbase.com/introducing-the-couchbase-spark-connector"
"http://www.cakesolutions.net/teamblogs/using-spark-to-analyse-akka-persistence-events-in-cassandra"
"http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/"
"http://java.dzone.com/articles/big-data-processing-spark"
"https://databricks.com/blog/2015/03/13/announcing-spark-1-3.html"
"http://www.infoworld.com/article/2903432/application-development/spark-big-datas-brightest-star-needs-to-grow-up.html"
"http://tech.marksblogg.com/recommendation-engine-spark-python.html"
"http://www.informationweek.com/big-data/big-data-analytics/apache-spark-three-promising-use-cases/a/d-id/1319660?_mc=sm_iwk"
"https://haifengl.wordpress.com/2014/09/07/big-data-analytics-shark-and-spark-sql/"
"http://www.infoq.com/news/2015/03/pinterest-memsql-spark-streaming+news&utm_campaign=calendar"
],
"Apache Thrill":[
"http://blog.cloudera.com/blog/2015/07/thrift-client-authentication-support-in-apache-hbase-1-0/"
]
"Mesos":[
"http://www.projectcalico.org/mesos-networking-leaps-forward-with-calico/"
"http://www.infoq.com/presentations/apache-mesos"
"http://open.mesosphere.com/intro-course/intro.html"
"https://mesosphere.com/blog/2015/07/23/intel-explains-its-new-program-to-accelerate-cloud-computing-for-all/"
"http://blog.memsql.com/deploy-memsql-with-mesosphere/"
"https://mesosphere.com/blog/2015/06/21/web-application-analytics-using-docker-and-marathon/"
"https://speakerdeck.com/pyr/mesomatic-the-cluster-is-a-library"
"https://www.mapr.com/blog/my-experience-running-docker-containers-on-mesos"
"https://www.typesafe.com/blog/using-spark-kafka-cassandra-and-akka-on-mesos-for-real-time-personalization"
"https://www.mapr.com/blog/yarn-vs-mesos-cant-we-all-just-get-along#.VYuY4byli1E"
"http://www.datasciencecentral.com/profiles/blogs/spark-shark-and-mesos-data-analytics-stack"
"http://www.datasciencecentral.com/profiles/blogs/spark-shark-and-mesos-data-analytics-stack"
"www.datasciencecentral.com/profiles/blogs/spark-shark-and-mesos-data-analytics-stack"
"http://www.javacodegeeks.com/2014/01/getting-started-with-apache-mesos-and-apache-aurora-using-vagrant.html"
"http://radar.oreilly.com/2015/02/a-tale-of-two-clusters-mesos-and-yarn.html"
"http://www.antonlindstrom.com/2015/03/29/introduction-to-apache-mesos.html"
"http://www.javacodegeeks.com/2015/04/apache-mesos-marathon-and-java-ee.html",
],
"ApacheDrill":[
"https://www.mapr.com/blog/how-convert-csv-file-apache-parquet-using-apache-drill"
"https://www.mapr.com/blog/apache-drill-how-create-new-function?utm_source=twitter&utm_medium=social&utm_content=Oktopost-twitter-profile&utm_campaign=Oktopost-Content+Curation+July+2015&campaign=2015_Social_twitter&source=Social#.VbrhyJO1nDc"
"https://www.mapr.com/blog/apache-drill-how-create-new-function#.VbiU0JO1nDc"
"http://www.datasciencecentral.com/profiles/blogs/implementing-a-distributed-deep-learning-network-over-spark"
]
"Mapr":[
"https://www.mapr.com/blog/high-performance-c-apis-mapr-db"
"http://appaloud.com/nosql-mapr-db-now-available-for-free-within-mapr-community-edition/"
"https://www.mapr.com/blog/mining-big-data-wheel#.VbbXypO1nDc"
"https://www.mapr.com/blog/evolution-database-schemas-using-sql-nosql#.VZtolryli1E"
"https://www.mapr.com/blog/hbase-and-mapr-db-designed-distribution-scale-and-speed#.VY5XMbyli1E"
"https://adtmag.com/articles/2015/06/10/new-mapr-hadoop.aspx"
"http://www.itbusinessedge.com/blogs/it-unmasked/mapr-ships-drill-sql-engine.html"
"https://www.mapr.com/blog/hadoop-adoption-is-the-cluster-half-full#.VVbvlOSli1E"
"https://www.mapr.com/blog/5-core-business-functions-can-benefit-hadoop#.VU1smeSEi-k"
"https://www.mapr.com/blog/4-critical-things-consider-when-building-hadoop-rfp#.VTHF8OSEi-k"
"https://www.mapr.com/services/mapr-academy/big-data-hadoop-online-training"
"https://www.mapr.com/blog/strategic-data-beyond-hadoop-and-big-data#.VSWDCuSEi-k"
"https://www.mapr.com/blog/how-use-mapreduce-api"
"https://www.mapr.com/blog/polyglot-data-management-big-data-everywhere-recap#.VQ0LIFmNDec"
"https://www.mapr.com/resources/videos/apache-drill-redefining-sql-hadoop-0",
]
"ApacheDrill":[
"http://www.datanami.com/2015/05/19/apache-drill-poised-to-crack-tough-data-challenges/"
"http://drill.apache.org/"
"http://www.zdnet.com/article/sql-and-hadoop-its-complicated"
"https://www.mapr.com/blog/industrys-first-schema-free-sql-engine-apache-drill-10-now-generally-available#.VVvwOryli1F"
"https://www.youtube.com/watch?v=FkcegazNuio"
"http://www.datasciencecentral.com/profiles/blogs/drill-data-with-apache-drill"
"http://www.dbta.com/BigDataQuarterly/Articles/The-Importance-of-Apache-Drill-to-the-Big-Data-Ecosystem-103000.aspx"
"http://drill.apache.org/blog/2015/03/31/drill-0.8-released/"
],
"ApacheKafka":[
"http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/"
"https://www.sigmoid.com/integrating-spark-kafka-hbase-to-power-a-real-time-dashboard/"
"http://thenewstack.io/apache-kafka-spark-database-real-time-trinity/?"
"https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+Interface"
"http://blog.confluent.io/2015/02/25/stream-data-platform-1/"
"http://java.dzone.com/articles/using-apache-kafka-integration"
"http://blog.confluent.io/2015/04/07/hands-free-kafka-replication-a-lesson-in-operational-simplicity/"
"http://svds.com/post/using-docker-build-data-acquisition-pipeline-kafka-and-hbase"
"http://blog.cloudera.com/blog/2015/02/how-to-do-real-time-log-analytics-with-apache-kafka-cloudera-search-and-hue/"
"https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html"
"https://engineering.linkedin.com/kafka/running-kafka-scale"
]
"ApacheCamel":[
"http://examples.javacodegeeks.com/core-java/apache-camel-timer-example/"
"http://examples.javacodegeeks.com/core-java/apache-camel-hello-world-example/"
]
"ApacheMahout":[
"http://www.javacodegeeks.com/2012/02/apache-mahout-getting-started.html"
"http://analyticsbot.ml/2015/06/generating-recommendations-using-apache-mahout-part-1/"
"http://analyticsbot.ml/2015/06/generating-recommendations-using-apache-mahout-part-2-using-hadoop/"
]
"Others":[
"http://www.datasciencecentral.com/profiles/blogs/batch-vs-real-time-data-processing"
"http://www.datasciencecentral.com/profiles/blogs/lambda-architecture-for-big-data-systems"
"http://www.analyticbridge.com/profiles/blogs/10-common-nlp-terms-explained-for-the-text-mining-novice"
"http://www.datasciencecentral.com/profiles/blogs/hadoop-vs-nosql-vs-sql-vs-newsql-by-example"
"http://www.analyticbridge.com/profiles/blogs/10-common-nlp-terms-explained-for-the-text-mining-novice"
"http://www.datasciencecentral.com/profiles/blogs/hadoop-vs-nosql-vs-sql-vs-newsql-by-example"
"http://www.dzone.com/mz/big-data"
"http://www.datasciencecentral.com/group/resources/forum/topics/big-data-poster"
"http://www.theregister.co.uk/2012/11/07/big_data_analytics"
"http://www.analyticbridge.com/profiles/blogs/what-mapreduce-can-t-do"
"http://www.datasciencecentral.com/profiles/blogs/s3-instead-of-hdfs-with-hadoop"
"http://www.bigdatanews.com/profiles/blogs/big-data-s-real-3vs"
"http://www.vbprofiles.com/press_releases/55367cb8b4e919dc55000c5c"
"http://www.hadoop360.com/blog/accumulo-sqrrl-nosql-secure-database"
"http://blog.algorithmia.com/post/116365814879/how-machines-see-the-web-exploring-the-web"
"http://www.datasciencecentral.com/profiles/blogs/the-7-most-unusual-applications-of-big-data-you-ve-ever-seen"
"http://grigory.us/blog/rdbms-mapreduce/"
"https://www.youtube.com/watch?v=kRjk_Xsf7t4"
"http://www.hadoop360.com/blog/batch-vs-real-time-data-processing"
"http://www.javacodegeeks.com/2013/07/mapreduce-algorithms-understanding-data-joins-part-1.html.com&utm_campaign=buffer"
"http://www.efytimes.com/e1/fullnews.asp?edid=162753"
"https://haifengl.wordpress.com/2014/08/18/big-data-analytics-mapreduce/"
"http://www.datameer.com/blog/webinars/datameer-bigstep-big-data-analytics-for-your-department-now-not-in-months.html"
"http://radar.oreilly.com/2015/04/a-real-time-processing-revival.html",
"http://blogs.teradata.com/tdmo/best-practices-big-data-strategy-execution/"
"http://blog.swiftype.com/building-an-asynchronous-api-to-improve-performance"
]
},