Skip to content

Commit 6c30109

Browse files
committed
add publications section to index, add medium, new about info
1 parent dbab92f commit 6c30109

16 files changed

+143
-14
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@ _site
22
_posts/blog/drafts
33
Thumbs.db
44
.DS_Store
5+
_posts/project

_config.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ social:
3333
url: http://www.linkedin.com/in/loldja
3434
- title: twitter
3535
url: http://www.twitter.com/urbanplans
36-
- title: angellist
37-
url: https://angel.co/lauren-oldja
36+
- title: medium
37+
url: https://medium.com/@loldja
3838

3939

4040
# Postal address (add as many lines as necessary). Shown in footer and on Contact page.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
layout: post
3+
title: "Installing Apache Spark (PySpark): The missing “quick start” guide for Windows"
4+
date: 2018-01-28 18:40:46
5+
author: "Lauren Oldja | Reposted from Medium"
6+
categories:
7+
- blog
8+
- devops
9+
img: install-windows-spark.png
10+
thumb: metis-logo.gif
11+
---
12+
So you saw the latest Stack Overflow chart of popularity of new languages, and--deciding maybe there’s something to this “big data” trend after all--you feel it’s time to get familiar with Apache Spark.<!--more-->
13+
14+
<blockquote class="twitter-tweet centered" data-lang="en"><p lang="en" dir="ltr">Stack overflow has never seen growth like the tensorflow tag -<a href="https://twitter.com/drob?ref_src=twsrc%5Etfw">@drob</a> <a href="https://twitter.com/hashtag/ddtx18?src=hash&amp;ref_src=twsrc%5Etfw">#ddtx18</a>. My take: the deep learning hype is real <a href="https://t.co/9Mt56WZr2j">pic.twitter.com/9Mt56WZr2j</a></p>&mdash; Emily Robinson (@robinson_es) <a href="https://twitter.com/robinson_es/status/957380397935026176?ref_src=twsrc%5Etfw">January 27, 2018</a></blockquote>
15+
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
16+
17+
Sure, you could get up and running with a few keystrokes on UNIX/MacOS, but what if all you have at home is an old Windows laptop? I tried following the installation instructions from the O’Reilly book Learning Spark ([which, like many wonderful tech reference materials, may be available for free from your local library](https://borrow.bklynlibrary.org/r1s/iii/encore/record/C__Rb11847889__Slearning%20spark__Orightresult__X7), but the chapter is a bit sparse on details for Windows users and just didn’t work “out of the box” for me. Instead, the following is based on the official Quick Start Guide, trial and error, and lots of Googling.
18+
19+
## This guide assumes the following:
20+
21+
You’re on a Windows 8.1 Pro system. Similarly-old versions of Windows would probably also be similar. [Windows 10 users might want to check out its Linux Subsystem support](https://docs.microsoft.com/en-us/windows/wsl/install-win10) instead.
22+
23+
You have [Anaconda Python](https://conda.io/docs/user-guide/install/download.html) already installed.
24+
25+
You have removed any previously installed or otherwise conflicting versions of Hadoop. You’ve downloaded a pre-compiled stable version of [Apache Spark with Hadoop included](https://spark.apache.org/downloads.html) (for me, this was *spark-2.2.1-bin-hadoop2.7*)
26+
27+
Your [Java is currently up-to-date](https://java.com/en/download/) (*v.8 update 161*, at the time of this writing).
28+
29+
If you see a command that starts with >, these are commands you can enter into your Command Prompt. (You don’t enter the “>” part).
30+
31+
Open Command Prompt as Administrator. You can do this by right-clicking the windows icon (usually bottom left corner of toolbar) and choosing “Command Prompt (Admin)” option.
32+
33+
---
34+
35+
1. Unzip the downloaded Spark .tar file using [7-Zip](http://www.7-zip.org/download.html) or similar utility. Move the contents of this folder to a new directory you’ve made
36+
```> mkdir C:\spark```
37+
I usually just do this via the Windows GUI rather than on the command line. You can open up explorer in the current directory anytime by typing
38+
```> explorer .```
39+
(Make sure to include the period for current directory)
40+
41+
2. Download *winutils.exe*. Choose the version that corresponds to the version of Hadoop you downloaded. Move to a new directory you’ve made just for this purpose:
42+
```> mkdir c:\hadoop\bin```
43+
44+
3. Create environment variables for *SPARK_HOME* and *HADOOP_HOME* and related *PATH* variables. You can do this in the Command Prompt
45+
```> set SPARK_HOME=c:\spark```
46+
```> set HADOOP_HOME=c:\hadoop```
47+
```> set PATH=%SPARK_HOME%\bin;%PATH%```
48+
```> set PATH=%HADOOP_HOME%\bin;%PATH%```
49+
or right-click the [Windows icon, navigate to System, choose Advanced System Settings, then click on the button for Environment Variables](https://www.computerhope.com/issues/ch000549.htm).
50+
51+
4. [Activate your conda virtual environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) with the version of Python you’d like to use. You could make a new environment (recommended!) to try things out, or YOLO just ```> conda install pyspark``` into your main Python environment. **Note on commands**:
52+
```> source activate snakes``` (UNIX/MacOS)
53+
is just ```> activate snakes``` in Windows where your environment’s name in both cases is “snakes”
54+
Also make an environment variable for your Python path (your command will differ, but it’ll be something like the following):
55+
```> set PYTHONPATH=C:\Users\Lauren\Anaconda```
56+
57+
5. You need to enable access to the default scratch directory for Hive. First, make sure the directory *C:\tmp\hive* is created; if it doesn’t exist, create it.
58+
Second, you need to give it permission to access *winutils.exe*. Navigate back to where you put this .exe file then run the permission command
59+
60+
```> cd c:\hadoop\bin```
61+
62+
```> winutils.exe chmod -R 777 C:\tmp\hive```
63+
64+
---
65+
66+
## Pause here to make sure Spark is working.
67+
68+
Run ```> C:\spark\bin\pyspark``` and you should see a welcome screen like this:
69+
70+
[Spark Launch Screen](/assets/img/blog/welcome-to-spark.png)
71+
72+
If you see the Spark ASCII art, you’re in. If you don’t, try closing and restarting the Command Prompt. If it’s still not working, more tutorials are [here](http://deelesh.github.io/pyspark-windows.html), [here](https://anchalkataria.wordpress.com/2016/03/09/installing-apache-spark-in-local-mode-on-windows-8-2/), and [here](https://medium.com/@GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c).
73+
74+
75+
If you do see the Spark welcome screen, you can now run some of the example scripts included with the pre-built Spark download to ensure its working, or try out a few lines from the official [Quick Start Guide](https://spark.apache.org/docs/1.2.0/quick-start.html).
76+
77+
```>>> textFile = sc.textFile("README.md")```
78+
79+
```>>> textFile.first() ```
80+
81+
```'# Apache Spark'```
82+
83+
At this point you should also be able to access the Spark UI from your favorite browser at [http://localhost:4040](http://localhost:4040).
84+
85+
Back in the Command Prompt, exit Spark with the ```>>> quit()``` command.
86+
87+
88+

about.html

+9-9
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,24 @@
66

77
<div class="container mtb">
88
<div class="row">
9-
<div class="col-lg-6">
9+
<div class="col-lg-6 centered">
10+
<img class="img-responsive" src="{{ "/assets/img/about-lo.jpg" | prepend: site.baseurl }}" alt="selfie">
11+
</div>
12+
<div class="col-lg-6">
1013
<h4>My name is Lauren, and I am a Data Scientist.</h4>
1114
<div class="hline"></div>
12-
<p>Professionally, I am an <a href="http://www.bfaglobal.com/" target="_new">Associate at BFA</a>, a niche financial inclusion international consulting firm, on the Quantitative Analytics team.</p>
15+
<p>Most recently, I've worked as an <a href="http://www.bfaglobal.com/" target="_new">Associate at BFA</a>, a niche financial inclusion global advisory firm, on the Quantitative Analytics &amp; Inclusive Fintech team.</p>
1316
<p>Previously, I focused on the monitoring and evaluation (M&amp;E) of international health programs. I am an experienced manager and technical advisor, having overseen program and research field teams in South Sudan, Bangladesh, and South Africa.</p>
14-
<p>With my family I also own a <a href="http://www.oldjaenterprises.com/" target="_new">kitchen remodeling business</a>.</p>
1517
<p><a href="http://dusp.mit.edu/" target="_new">MIT</a> and <a href="http://www.jhsph.edu/departments/international-health" target="_new">Hopkins</a> trained in quantitative and qualitative methods.</p>
1618
<p>I am also an alumna of the <a href="http://www.thisismetis.com/" target="_new">Metis</a> Data Science bootcamp.</p>
1719
<p>My scientific publications are on <a href="https://www.ncbi.nlm.nih.gov/pubmed/?term=Oldja+L" target="_new">PubMed</a>.</p>
20+
<p>My family owns a <a href="http://www.oldjaenterprises.com/" target="_new">kitchen remodeling business</a>.</p>
1821
<p>Proud member of <a href="https://www.eff.org/" target="_new">EFF</a>.</p>
1922
<p>Conversant in Spanish.</p>
20-
21-
<!-- <p>My culture writing is on <a href="https://medium.com/@loldja" target="_new">Medium</a>.</p> -->
23+
2224
<p><br/><a href="/contact/" class="btn btn-theme">Contact Me</a></p>
2325
</div>
24-
<div class="col-lg-6">
25-
<img class="img-responsive" src="{{ "/assets/img/web-photo.png" | prepend: site.baseurl }}" alt="selfie">
26-
</div>
26+
2727
</div>
2828
<div class="spacing"></div>
2929
<div class="row">
@@ -34,7 +34,7 @@ <h4>About this site:</h4>
3434
<p>Color scheme inspired by the <a href="https://www.pantone.com/color-of-the-year-2018" target="_new">PANTONE Color of the Year 2018</a></p>
3535
<p>Open web-fonts: <a href="https://www.google.com/fonts/specimen/Open+Sans" target="_new">Open Sans</a> and <a href="https://www.google.com/fonts/specimen/Lato" target="_new">Lato</a></p>
3636
<p>Images/GIFs edited in Adobe Photoshop</p>
37-
<p>HTML/CSS/JS/Liquid/YAML edited with Sublime Text</p>
37+
<p>HTML/CSS/JS/Liquid/YAML edited with Sublime Text/Notepad++</p>
3838
</div>
3939
</div>
4040
</div><! --/container -->

assets/img/Thumbs.db

-92 KB
Binary file not shown.

assets/img/about-lo.jpg

56.3 KB
Loading

assets/img/blog/Thumbs.db

-113 KB
Binary file not shown.
27.9 KB
Loading

assets/img/blog/thumbs/Thumbs.db

-25 KB
Binary file not shown.

assets/img/blog/welcome-to-spark.png

21 KB
Loading

assets/img/clients/Thumbs.db

-12 KB
Binary file not shown.

assets/img/main/Thumbs.db

-56 KB
Binary file not shown.

assets/img/members/Thumbs.db

-83.5 KB
Binary file not shown.

assets/img/project/Thumbs.db

-136 KB
Binary file not shown.

assets/img/project/carousel/Thumbs.db

-16.5 KB
Binary file not shown.

index.html

+43-3
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,22 @@
77
<div id="service">
88
<div class="container mt">
99
<div class="row centered">
10-
<div class="col-md-12"><p>Hi, I'm <span class="highlight">Lauren Oldja</span>, a Social Scientist-turned-<span class="highlight">Data Scientist</span> based in Brooklyn.</p>
10+
<div class="col-md-12"><h1>Hi, I'm <span class="highlight">Lauren Oldja</span>, a Social Scientist<i class="fa fa-heartbeat fa-lg" aria-hidden="true"></i>-turned-Data Scientist<i class="fa fa-bar-chart fa-lg" aria-hidden="true"></i> based in Brooklyn.</h1></div>
1111
</div><!-- --/row---->
1212
</div><!-- --/container---->
1313
</div><!-- --/service ---->
14+
<hr></hr>
15+
1416
<div id="portfoliowrap">
15-
<p><b>Blog Posts</b></p>
17+
<h2><b>Blog Posts</b></h2>
1618
<div class="portfolio-centered">
1719
<div class="recentitems portfolio">
1820
{% for post in site.categories['blog']%}
1921
<div class="portfolio-item graphic-design">
2022
<div class="he-wrap tpl6">
2123
<a href="{{ post.url | prepend: site.baseurl }}">
2224
<img src="{{ "/assets/img/blog/" | prepend: site.baseurl }}{{ post.img }}" alt="">
23-
<div class="he-view">
25+
<div class="he-view">
2426
<div class="bg a0" data-animate="fadeIn">
2527
<h3 class="a1" data-animate="fadeInDown">{{ post.title }}</h3>
2628
</div><!-- he bg -->
@@ -32,3 +34,41 @@ <h3 class="a1" data-animate="fadeInDown">{{ post.title }}</h3>
3234
</div><!-- portfolio -->
3335
</div><!-- portfolio container -->
3436
</div><!--/Portfoliowrap -->
37+
38+
<div id="service">
39+
<div class="container">
40+
<div class="row centered">
41+
<h2><b>Publications and Acknowledgements</b></h2>
42+
<hr></hr>
43+
</div>
44+
<div class="col-md-4 clearfix">
45+
<p><i class="fa fa-file-pdf-o fa-2x fa-pull-left fa-border" aria-hidden="true"></i>Enteric Infections in Young Children are Associated with Environmental Enteropathy and Impaired Growth (2018)<br/><div><a href="http://onlinelibrary.wiley.com/doi/10.1111/tmi.13002/abstract" class="btn btn-theme">Full Text Online <span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
46+
</div>
47+
<div class="col-md-4 clearfix">
48+
<p><i class="fa fa-rss fa-2x fa-pull-left fa-border" aria-hidden="true"></i><b>NextBillion blog</b><br/>"Stealthy" Saving: Building on Payroll Credit to Automate Savings (2017)
49+
<br/><div><a href="https://nextbillion.net/stealthy-saving-building-on-payroll-credit-to-automate-savings/" class="btn btn-theme">Full Text Online <span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
50+
</div>
51+
<div class="col-md-4 clearfix">
52+
<p><i class="fa fa-file-pdf-o fa-2x fa-pull-left fa-border" aria-hidden="true"></i>Mouthing of Soil Contaminated Objects is Associated with Environmental Enteropathy in Young Children (2017)<br/><div><a href="http://onlinelibrary.wiley.com/doi/10.1111/tmi.12869/abstract" class="btn btn-theme">Full Text Online <span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
53+
</div>
54+
<div class="col-md-4 clearfix">
55+
<p><i class="fa fa-file-pdf-o fa-2x fa-pull-left fa-border" aria-hidden="true"></i>Geophagy Is Associated with Growth Faltering in Children in Rural Bangladesh (2016)
56+
<br/><div><a href="http://www.jpeds.com/article/S0022-3476(16)30506-6/pdf" class="btn btn-theme">Full Text Online <span class="fa fa-usd fa-fw" aria-hidden="true"></span><span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
57+
</div>
58+
<div class="col-md-4 clearfix">
59+
<p><i class="fa fa-file-pdf-o fa-2x fa-pull-left fa-border" aria-hidden="true"></i>Unsafe Child Feces Disposal is Associated with Environmental Enteropathy and Impaired Growth (2016)<br/><div><a href="http://www.jpeds.com/article/S0022-3476(16)30243-8/fulltext" class="btn btn-theme">Full Text Online <span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
60+
</div>
61+
<div class="col-md-4 clearfix">
62+
<p><i class="fa fa-file-pdf-o fa-2x fa-pull-left fa-border" aria-hidden="true"></i>Shigella Infections in Household Contacts of Pediatric Shigellosis Patients in Rural Bangladesh (2015)<br/><div><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4622242/" class="btn btn-theme">Full Text Online <span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
63+
</div>
64+
<div class="col-md-4 clearfix">
65+
<p><i class="fa fa-file-pdf-o fa-2x fa-pull-left fa-border" aria-hidden="true"></i>Fecal Markers of Environmental Enteropathy are Associated with Animal Exposure and Caregiver Hygiene in Bangladesh (2015)
66+
<br/><div><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4530746/" class="btn btn-theme">Full Text Online <span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
67+
</div>
68+
<div class="col-md-4 clearfix">
69+
<p><i class="fa fa-file-pdf-o fa-2x fa-pull-left fa-border" aria-hidden="true"></i>Geophagy is associated with environmental enteropathy and stunting in children in rural Bangladesh (2015)
70+
<br/><div><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4458812/" class="btn btn-theme">Full Text Online <span class="fa fa-external-link fa-fw" aria-hidden="true"></span></a></div></p>
71+
</div>
72+
</div><! --/container -->
73+
</div><! --/service -->
74+

0 commit comments

Comments
 (0)