forked from apache/spark-website
-
Notifications
You must be signed in to change notification settings - Fork 0
/
documentation.html
399 lines (344 loc) · 23.8 KB
/
documentation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>
Documentation | Apache Spark
</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet"
integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700&Courier+Prime:wght@400;700&display=swap" rel="stylesheet">
<link href="/css/custom.css" rel="stylesheet">
<!-- Code highlighter CSS -->
<link href="/css/pygments-default.css" rel="stylesheet">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
</head>
<body class="global">
<nav class="navbar navbar-expand-lg navbar-dark p-0 px-4" style="background: #1D6890;">
<a class="navbar-brand" href="/">
<img src="/images/spark-logo-rev.svg" alt="" width="141" height="72">
</a>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarContent"
aria-controls="navbarContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse col-md-12 col-lg-auto pt-4" id="navbarContent">
<ul class="navbar-nav me-auto">
<li class="nav-item">
<a class="nav-link active" aria-current="page" href="/downloads.html">Download</a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="libraries" role="button" data-bs-toggle="dropdown"
aria-expanded="false">
Libraries
</a>
<ul class="dropdown-menu" aria-labelledby="libraries">
<li><a class="dropdown-item" href="/sql/">SQL and DataFrames</a></li>
<li><a class="dropdown-item" href="/streaming/">Spark Streaming</a></li>
<li><a class="dropdown-item" href="/mllib/">MLlib (machine learning)</a></li>
<li><a class="dropdown-item" href="/graphx/">GraphX (graph)</a></li>
<li>
<hr class="dropdown-divider">
</li>
<li><a class="dropdown-item" href="/third-party-projects.html">Third-Party Projects</a></li>
</ul>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="documentation" role="button" data-bs-toggle="dropdown"
aria-expanded="false">
Documentation
</a>
<ul class="dropdown-menu" aria-labelledby="documentation">
<li><a class="dropdown-item" href="/docs/latest/">Latest Release</a></li>
<li><a class="dropdown-item" href="/documentation.html">Older Versions and Other Resources</a></li>
<li><a class="dropdown-item" href="/faq.html">Frequently Asked Questions</a></li>
</ul>
</li>
<li class="nav-item">
<a class="nav-link active" aria-current="page" href="/examples.html">Examples</a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="community" role="button" data-bs-toggle="dropdown"
aria-expanded="false">
Community
</a>
<ul class="dropdown-menu" aria-labelledby="community">
<li><a class="dropdown-item" href="/community.html">Mailing Lists & Resources</a></li>
<li><a class="dropdown-item" href="/contributing.html">Contributing to Spark</a></li>
<li><a class="dropdown-item" href="/improvement-proposals.html">Improvement Proposals (SPIP)</a>
</li>
<li><a class="dropdown-item" href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a>
</li>
<li><a class="dropdown-item" href="/powered-by.html">Powered By</a></li>
<li><a class="dropdown-item" href="/committers.html">Project Committers</a></li>
<li><a class="dropdown-item" href="/history.html">Project History</a></li>
</ul>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="developers" role="button" data-bs-toggle="dropdown"
aria-expanded="false">
Developers
</a>
<ul class="dropdown-menu" aria-labelledby="developers">
<li><a class="dropdown-item" href="/developer-tools.html">Useful Developer Tools</a></li>
<li><a class="dropdown-item" href="/versioning-policy.html">Versioning Policy</a></li>
<li><a class="dropdown-item" href="/release-process.html">Release Process</a></li>
<li><a class="dropdown-item" href="/security.html">Security</a></li>
</ul>
</li>
</ul>
<ul class="navbar-nav ml-auto">
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="apacheFoundation" role="button"
data-bs-toggle="dropdown" aria-expanded="false">
Apache Software Foundation
</a>
<ul class="dropdown-menu" aria-labelledby="apacheFoundation">
<li><a class="dropdown-item" href="https://www.apache.org/">Apache Homepage</a></li>
<li><a class="dropdown-item" href="https://www.apache.org/licenses/">License</a></li>
<li><a class="dropdown-item"
href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
<li><a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html">Thanks</a></li>
<li><a class="dropdown-item" href="https://www.apache.org/security/">Security</a></li>
<li><a class="dropdown-item" href="https://www.apache.org/events/current-event">Event</a></li>
</ul>
</li>
</ul>
</div>
</nav>
<div class="container">
<div class="row mt-4">
<div class="col-12 col-md-9">
<h2><span class="text-capitalize">Apache Spark<span class="tm">™</span></span> Documentation</h2>
<p>Setup instructions, programming guides, and other documentation are available for each stable version of Spark below:</p>
<ul>
<li><a href="/docs/3.4.0/">Spark 3.4.0</a></li>
<li><a href="/docs/3.3.2/">Spark 3.3.2</a></li>
<li><a href="/docs/3.3.1/">Spark 3.3.1</a></li>
<li><a href="/docs/3.3.0/">Spark 3.3.0</a></li>
<li><a href="/docs/3.2.4/">Spark 3.2.4</a></li>
<li><a href="/docs/3.2.3/">Spark 3.2.3</a></li>
<li><a href="/docs/3.2.2/">Spark 3.2.2</a></li>
<li><a href="/docs/3.2.1/">Spark 3.2.1</a></li>
<li><a href="/docs/3.2.0/">Spark 3.2.0</a></li>
<li><a href="/docs/3.1.3/">Spark 3.1.3</a></li>
<li><a href="/docs/3.1.2/">Spark 3.1.2</a></li>
<li><a href="/docs/3.1.1/">Spark 3.1.1</a></li>
<li><a href="/docs/3.0.3/">Spark 3.0.3</a></li>
<li><a href="/docs/3.0.2/">Spark 3.0.2</a></li>
<li><a href="/docs/3.0.1/">Spark 3.0.1</a></li>
<li><a href="/docs/3.0.0/">Spark 3.0.0</a></li>
<li><a href="/docs/2.4.8/">Spark 2.4.8</a></li>
<li><a href="/docs/2.4.7/">Spark 2.4.7</a></li>
<li><a href="/docs/2.4.6/">Spark 2.4.6</a></li>
<li><a href="/docs/2.4.5/">Spark 2.4.5</a></li>
<li><a href="/docs/2.4.4/">Spark 2.4.4</a></li>
<li><a href="/docs/2.4.3/">Spark 2.4.3</a></li>
<li><a href="/docs/2.4.2/">Spark 2.4.2</a></li>
<li><a href="/docs/2.4.1/">Spark 2.4.1</a></li>
<li><a href="/docs/2.4.0/">Spark 2.4.0</a></li>
<li><a href="/docs/2.3.4/">Spark 2.3.4</a></li>
<li><a href="/docs/2.3.3/">Spark 2.3.3</a></li>
<li><a href="/docs/2.3.2/">Spark 2.3.2</a></li>
<li><a href="/docs/2.3.1/">Spark 2.3.1</a></li>
<li><a href="/docs/2.3.0/">Spark 2.3.0</a></li>
<li><a href="/docs/2.2.3/">Spark 2.2.3</a></li>
<li><a href="/docs/2.2.2/">Spark 2.2.2</a></li>
<li><a href="/docs/2.2.1/">Spark 2.2.1</a></li>
<li><a href="/docs/2.2.0/">Spark 2.2.0</a></li>
<li><a href="/docs/2.1.3/">Spark 2.1.3</a></li>
<li><a href="/docs/2.1.2/">Spark 2.1.2</a></li>
<li><a href="/docs/2.1.1/">Spark 2.1.1</a></li>
<li><a href="/docs/2.1.0/">Spark 2.1.0</a></li>
<li><a href="/docs/2.0.2/">Spark 2.0.2</a></li>
<li><a href="/docs/2.0.1/">Spark 2.0.1</a></li>
<li><a href="/docs/2.0.0/">Spark 2.0.0</a></li>
<li><a href="/docs/1.6.3/">Spark 1.6.3</a></li>
<li><a href="/docs/1.6.2/">Spark 1.6.2</a></li>
<li><a href="/docs/1.6.1/">Spark 1.6.1</a></li>
<li><a href="/docs/1.6.0/">Spark 1.6.0</a></li>
<li><a href="/docs/1.5.2/">Spark 1.5.2</a></li>
<li><a href="/docs/1.5.1/">Spark 1.5.1</a></li>
<li><a href="/docs/1.5.0/">Spark 1.5.0</a></li>
<li><a href="/docs/1.4.1/">Spark 1.4.1</a></li>
<li><a href="/docs/1.4.0/">Spark 1.4.0</a></li>
<li><a href="/docs/1.3.1/">Spark 1.3.1</a></li>
<li><a href="/docs/1.3.0/">Spark 1.3.0</a></li>
<li><a href="/docs/1.2.1/">Spark 1.2.1</a></li>
<li><a href="/docs/1.1.1/">Spark 1.1.1</a></li>
<li><a href="/docs/1.0.2/">Spark 1.0.2</a></li>
<li><a href="/docs/0.9.2/">Spark 0.9.2</a></li>
<li><a href="/docs/0.8.1/">Spark 0.8.1</a></li>
<li><a href="/docs/0.7.3/">Spark 0.7.3</a></li>
<li><a href="/docs/0.6.2/">Spark 0.6.2</a></li>
</ul>
<p>Documentation for preview releases:</p>
<ul>
<li><a href="/docs/3.0.0-preview2/">Spark 3.0.0 preview2</a></li>
<li><a href="/docs/3.0.0-preview/">Spark 3.0.0 preview</a></li>
<li><a href="/docs/2.0.0-preview/">Spark 2.0.0 preview</a></li>
</ul>
<p>The documentation linked to above covers getting started with Spark, as well the built-in components <a href="/docs/latest/mllib-guide.html">MLlib</a>,
<a href="/docs/latest/streaming-programming-guide.html">Spark Streaming</a>, and <a href="/docs/latest/graphx-programming-guide.html">GraphX</a>.</p>
<p>In addition, this page lists other resources for learning Spark.</p>
<h3>Videos</h3>
<p>See the <a href="https://www.youtube.com/channel/UCRzsq7k4-kT-h3TDUBQ82-w">Apache Spark YouTube Channel</a> for videos from Spark events. There are separate <a href="https://www.youtube.com/channel/UCRzsq7k4-kT-h3TDUBQ82-w/playlists">playlists</a> for videos of different topics. Besides browsing through playlists, you can also find direct links to videos below.</p>
<h4>Screencast Tutorial Videos</h4>
<ul>
<li><a href="/screencasts/1-first-steps-with-spark.html">Screencast 1: First Steps with Spark</a></li>
<li><a href="/screencasts/2-spark-documentation-overview.html">Screencast 2: Spark Documentation Overview</a></li>
<li><a href="/screencasts/3-transformations-and-caching.html">Screencast 3: Transformations and Caching</a></li>
<li><a href="/screencasts/4-a-standalone-job-in-spark.html">Screencast 4: A Spark Standalone Job in Scala</a></li>
</ul>
<h4>Spark Summit Videos</h4>
<ul>
<li>Videos from Spark Summit 2014, San Francisco, June 30 - July 2 2013
<ul>
<li><a href="https://spark-summit.org/2014/agenda">Full agenda with links to all videos and slides</a></li>
<li><a href="https://spark-summit.org/2014/training">Training videos and slides</a></li>
</ul>
</li>
<li>Videos from Spark Summit 2013, San Francisco, Dec 2-3 2013
<ul>
<li><a href="https://spark-summit.org/2013#agendapluginwidget-4">Full agenda with links to all videos and slides</a></li>
<li><a href="https://www.youtube.com/playlist?list=PL-x35fyliRwjXj33QvAXN0Vlx0gc6u0je">YouTube playlist of all Keynotes</a></li>
<li><a href="https://www.youtube.com/playlist?list=PL-x35fyliRwiNcKwIkDEQZBejiqxEJ79U">YouTube playlist of Track A (Spark Applications)</a></li>
<li><a href="https://www.youtube.com/playlist?list=PL-x35fyliRwiNcKwIkDEQZBejiqxEJ79U">YouTube playlist of Track B (Spark Deployment, Scheduling & Perf, Related projects)</a></li>
<li><a href="https://www.youtube.com/playlist?list=PL-x35fyliRwjR1Umntxz52zv3EcKpbzCp">YouTube playlist of the Training Day (i.e. the 2nd day of the summit)</a></li>
</ul>
</li>
</ul>
<h4><a name="meetup-videos"></a>Meetup Talk Videos</h4>
<p>In addition to the videos listed below, you can also view <a href="http://www.meetup.com/spark-users/files/">all slides from Bay Area meetups here</a>.</p>
<style type="text/css">
.video-meta-info {
font-size: 0.95em;
}
</style>
<ul>
<li><a href="https://www.youtube.com/watch?v=NUQ-8to2XAk&list=PL-x35fyliRwiP3YteXbnhk0QGOtYLBT3a">Spark 1.0 and Beyond</a> (<a href="http://files.meetup.com/3138542/Spark%201.0%20Meetup.ppt">slides</a>) <span class="video-meta-info">by Patrick Wendell, at Cisco in San Jose, 2014-04-23</span></li>
<li><a href="https://www.youtube.com/watch?v=ju2OQEXqONU&list=PL-x35fyliRwiP3YteXbnhk0QGOtYLBT3a">Adding Native SQL Support to Spark with Catalyst</a> (<a href="http://files.meetup.com/3138542/Spark%20SQL%20Meetup%20-%204-8-2012.pdf">slides</a>) <span class="video-meta-info">by Michael Armbrust, at Tagged in SF, 2014-04-08</span></li>
<li><a href="https://www.youtube.com/watch?v=MY0NkZY_tJw&list=PL-x35fyliRwiP3YteXbnhk0QGOtYLBT3a">SparkR and GraphX</a> (slides: <a href="http://files.meetup.com/3138542/SparkR-meetup.pdf">SparkR</a>, <a href="http://files.meetup.com/3138542/graphx%40spark_meetup03_2014.pdf">GraphX</a>) <span class="video-meta-info">by Shivaram Venkataraman & Dan Crankshaw, at SkyDeck in Berkeley, 2014-03-25</span></li>
<li><a href="https://www.youtube.com/watch?v=5niXiiEX5pE&list=PL-x35fyliRwiP3YteXbnhk0QGOtYLBT3a">Simple deployment w/ SIMR & Advanced Shark Analytics w/ TGFs</a> (<a href="http://files.meetup.com/3138542/tgf.pptx">slides</a>) <span class="video-meta-info">by Ali Ghodsi, at Huawei in Santa Clara, 2014-02-05</span></li>
<li><a href="https://www.youtube.com/watch?v=C7gWtxelYNM&list=PL-x35fyliRwiP3YteXbnhk0QGOtYLBT3a">Stores, Monoids & Dependency Injection - Abstractions for Spark</a> (<a href="http://files.meetup.com/3138542/Abstractions%20for%20spark%20streaming%20-%20spark%20meetup%20presentation.pdf">slides</a>) <span class="video-meta-info">by Ryan Weald, at Sharethrough in SF, 2014-01-17</span></li>
<li><a href="https://www.youtube.com/watch?v=IxDnF_X4M-8">Distributed Machine Learning using MLbase</a> (<a href="http://files.meetup.com/3138542/sparkmeetup_8_6_13_final_reduced.pdf">slides</a>) <span class="video-meta-info">by Evan Sparks & Ameet Talwalkar, at Twitter in SF, 2013-08-06</span></li>
<li><a href="https://www.youtube.com/watch?v=vJQ2RZj9hqs">GraphX Preview: Graph Analysis on Spark</a> <span class="video-meta-info">by Reynold Xin & Joseph Gonzalez, at Flurry in SF, 2013-07-02</span></li>
<li><a href="https://www.youtube.com/watch?v=D1knCQZQQnw">Deep Dive with Spark Streaming</a> (<a href="http://www.slideshare.net/spark-project/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617">slides</a>) <span class="video-meta-info">by Tathagata Das, at Plug and Play in Sunnyvale, 2013-06-17</span></li>
<li><a href="https://www.youtube.com/watch?v=cAZ624-69PQ">Tachyon and Shark update</a> (slides: <a href="http://files.meetup.com/3138542/2013-05-09%20Shark%20%40%20Spark%20Meetup.pdf">Shark</a>, <a href="http://files.meetup.com/3138542/Tachyon_2013-05-09_Spark_Meetup.pdf">Tachyon</a>) <span class="video-meta-info">by Ali Ghodsi, Haoyuan Li, Reynold Xin, Google Ventures, 2013-05-09</span></li>
<li><a href="https://www.youtube.com/playlist?list=PLxwbieuTaYXmWTBovyyw2NibPfUaJk-h4">Spark 0.7: Overview, pySpark, & Streaming</a> <span class="video-meta-info">by Matei Zaharia, Josh Rosen, Tathagata Das, at Conviva on 2013-02-21</span></li>
<li><a href="https://www.youtube.com/watch?v=49Hr5xZyTEA">Introduction to Spark Internals</a> (<a href="http://files.meetup.com/3138542/dev-meetup-dec-2012.pptx">slides</a>) <span class="video-meta-info">by Matei Zaharia, at Yahoo in Sunnyvale, 2012-12-18</span></li>
</ul>
<p><a name="summit"></a></p>
<h3>Training Materials</h3>
<ul>
<li><a href="https://spark-summit.org/2014/training">Training materials and exercises from Spark Summit 2014</a> are available online. These include videos and slides of talks as well as exercises you can run on your laptop. Topics include Spark core, tuning and debugging, Spark SQL, Spark Streaming, GraphX and MLlib.</li>
<li><a href="https://spark-summit.org/2013">Spark Summit 2013</a> included a training session, with slides and videos available on <a href="https://spark-summit.org/summit-2013/#agendapluginwidget-5">the training day agenda</a>.
The session also included <a href="https://spark-summit.org/2013/exercises/">exercises</a> that you can walk through on Amazon EC2.</li>
<li>The <a href="https://amplab.cs.berkeley.edu/">UC Berkeley AMPLab</a> regularly hosts training camps on Spark and related projects.
Slides, videos and EC2-based exercises from each of these are available online:
<ul>
<li><a href="http://ampcamp.berkeley.edu/4/">AMP Camp 4</a> (Strata Santa Clara, Feb 2014) — focus on BlinkDB, MLlib, GraphX, Tachyon</li>
<li><a href="http://ampcamp.berkeley.edu/3/">AMP Camp 3</a> (Berkeley, CA, Aug 2013)</li>
<li><a href="http://ampcamp.berkeley.edu/amp-camp-two-strata-2013/">AMP Camp 2</a> (Strata Santa Clara, Feb 2013)</li>
<li><a href="http://ampcamp.berkeley.edu/agenda-2012/">AMP Camp 1</a> (Berkeley, CA, Aug 2012)</li>
</ul>
</li>
</ul>
<h3>Hands-On Exercises</h3>
<ul>
<li><a href="https://spark-summit.org/2014/training">Hands-on exercises from Spark Summit 2014</a>. These let you install Spark on your laptop and learn basic concepts, Spark SQL, Spark Streaming, GraphX and MLlib.</li>
<li><a href="https://spark-summit.org/2013/exercises/">Hands-on exercises from Spark Summit 2013</a>. These exercises let you launch a small EC2 cluster, load a dataset, and query it with Spark, Shark, Spark Streaming, and MLlib.</li>
</ul>
<h3>External Tutorials, Blog Posts, and Talks</h3>
<ul>
<li><a href="http://codeforhire.com/2014/02/18/using-spark-with-mongodb/">Using Spark with MongoDB</a> — by Sampo Niskanen from Wellmo</li>
<li><a href="https://spark-summit.org/2013">Spark Summit 2013</a> — contained 30 talks about Spark use cases, available as slides and videos</li>
<li><a href="http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/">A Powerful Big Data Trio: Spark, Parquet and Avro</a> — Using Parquet in Spark by Matt Massie</li>
<li><a href="http://www.slideshare.net/EvanChan2/cassandra2013-spark-talk-final">Real-time Analytics with Cassandra, Spark, and Shark</a> — Presentation by Evan Chan from Ooyala at 2013 Cassandra Summit</li>
<li><a href="http://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923">Run Spark and Shark on Amazon Elastic MapReduce</a> — Article by Amazon Elastic MapReduce team member Parviz Deyhim</li>
<li><a href="http://www.ibm.com/developerworks/library/os-spark/">Spark, an alternative for fast data analytics</a> — IBM Developer Works article by M. Tim Jones</li>
</ul>
<h3>Books</h3>
<ul>
<li><a href="http://shop.oreilly.com/product/0636920028512.do">Learning Spark</a>, by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia (O'Reilly Media)</li>
<li><a href="http://www.manning.com/bonaci/">Spark in Action</a>, by Marko Bonaci and Petar Zecevic (Manning)</li>
<li><a href="http://shop.oreilly.com/product/0636920035091.do">Advanced Analytics with Spark</a>, by Juliet Hougland, Uri Laserson, Sean Owen, Sandy Ryza and Josh Wills (O'Reilly Media)</li>
<li><a href="https://www.manning.com/books/spark-graphx-in-action">Spark GraphX in Action</a>, by Michael Malak (Manning)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/fast-data-processing-spark-second-edition">Fast Data Processing with Spark</a>, by Krishna Sankar and Holden Karau (Packt Publishing)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-spark">Machine Learning with Spark</a>, by Nick Pentreath (Packt Publishing)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/spark-cookbook">Spark Cookbook</a>, by Rishi Yadav (Packt Publishing)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing">Apache Spark Graph Processing</a>, by Rindra Ramamonjison (Packt Publishing)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/mastering-apache-spark">Mastering Apache Spark</a>, by Mike Frampton (Packt Publishing)</li>
<li><a href="http://www.apress.com/9781484209653">Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis</a>, by Mohammed Guller (Apress)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/large-scale-machine-learning-spark">Large Scale Machine Learning with Spark</a>, by Md. Rezaul Karim, Md. Mahedi Kaysar (Packt Publishing)</li>
<li><a href="https://www.packtpub.com/big-data-and-business-intelligence/big-data-analytics">Big Data Analytics with Spark and Hadoop</a>, by Venkat Ankam (Packt Publishing)</li>
</ul>
<h3>Examples</h3>
<ul>
<li>The <a href="/examples.html">Spark examples page</a> shows the basic API in Scala, Java and Python.</li>
</ul>
<h3>Research Papers</h3>
<p>
Spark was initially developed as a UC Berkeley research project, and much of the design is documented in papers.
The <a href="/research.html">research page</a> lists some of the original motivation and direction.
</p>
</div>
<div class="col-12 col-md-3">
<div class="news" style="margin-bottom: 20px;">
<h5>Latest News</h5>
<ul class="list-unstyled">
<li><a href="/news/spark-3-4-0-released.html">Spark 3.4.0 released</a>
<span class="small">(Apr 13, 2023)</span></li>
<li><a href="/news/spark-3-2-4-released.html">Spark 3.2.4 released</a>
<span class="small">(Apr 13, 2023)</span></li>
<li><a href="/news/spark-3-3-2-released.html">Spark 3.3.2 released</a>
<span class="small">(Feb 17, 2023)</span></li>
<li><a href="/news/spark-3-2-3-released.html">Spark 3.2.3 released</a>
<span class="small">(Nov 28, 2022)</span></li>
</ul>
<p class="small" style="text-align: right;"><a href="/news/index.html">Archive</a></p>
</div>
<div style="text-align:center; margin-bottom: 20px;">
<a href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png" style="max-width: 100%;"/>
</a>
</div>
<div class="hidden-xs hidden-sm">
<a href="/downloads.html" class="btn btn-cta btn-lg d-grid" style="margin-bottom: 30px;">
Download Spark
</a>
<p style="font-size: 16px; font-weight: 500; color: #555;">
Built-in Libraries:
</p>
<ul class="list-none">
<li><a href="/sql/">SQL and DataFrames</a></li>
<li><a href="/streaming/">Spark Streaming</a></li>
<li><a href="/mllib/">MLlib (machine learning)</a></li>
<li><a href="/graphx/">GraphX (graph)</a></li>
</ul>
<a href="/third-party-projects.html">Third-Party Projects</a>
</div>
</div>
</div>
<footer class="small">
<hr>
Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are either registered
trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
See guidance on use of Apache Spark <a href="/trademarks.html">trademarks</a>.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Copyright © 2018 The Apache Software Foundation, Licensed under the
<a href="https://www.apache.org/licenses/">Apache License, Version 2.0</a>.
</footer>
</div>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"
integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM"
crossorigin="anonymous"></script>
<script src="https://code.jquery.com/jquery.js"></script>
<script src="/js/lang-tabs.js"></script>
<script src="/js/downloads.js"></script>
</body>
</html>