1
1
# Installation of Python for Data Science
2
2
3
- This tutorial gives a recommendation for a data science python stack. This tutorial
4
- is merely a suggestion for one out of many different usefull alternatives.
3
+ This tutorial gives a recommendation for installing and organizing a data-science python stack.
4
+ In the tutorial I describe a system for organizing the different components. I found that
5
+ this organization scheme scales particularly well when working on many different projects over time.
6
+ It allows for different python environments and version to be used for different projects.
7
+ However, this tutorial is merely a suggestion for one out of many different usefull alternatives.
5
8
6
9
Overview:
7
10
- The first section describes the fastest path
@@ -12,7 +15,7 @@ I will point out where you are taking shortcuts shortcuts.
12
15
13
16
# Pythonic Data-Science Stack: fast route
14
17
15
- #### Install Anaconda Python
18
+ ### Install Anaconda Python
16
19
17
20
For data science in python Python, you need a python interpreter plus various numerical packages.
18
21
Continuum Analytics provides commercial images for various cloud platforms that include
@@ -93,71 +96,120 @@ Get the latest PyCharm from [jetbrains](https://www.jetbrains.com/pycharm/).
93
96
/Users/alex/Development/Python/pycharm
94
97
```
95
98
96
- #### Create a PyCharm Project
99
+ ### Create a PyCharm Project
97
100
98
101
When PyCharm first opens, it presents you with a welcome screen
99
102
and prompts you create or open a project. In PyCharm, a project
100
103
is simply a configuration that tells PyCharm which python
101
- packages and folders should be opened and what python environments to use to
104
+ packages and folders should be opened and what python environments to use to
102
105
execute what code. A project _points_ to python code, but
103
- the project configuration folder itself should _not contain_ python sources.
106
+ the project configuration folder itself should _not contain_ python sources.
104
107
I generally create a new Project for each topic I am working on. In addition,
105
108
I keep a `sandbox` project for random little experiments.
106
109
107
- As you might want to open some python sources in the context of several projects,
108
- I recommend keeping the PyCharm Projects separate from the python source code.
110
+ As you might want to open some python sources in the context of several projects,
111
+ I recommend keeping the PyCharm Projects separate from the python source code.
109
112
While I check out all source code into `/Users/alex/Git/`,
110
113
my PyCharm project configurations live in `/Users/alex/Development/Python/IDE-Project-Configurations/`.
111
114
112
115
**Creating the `sandbox` Project** (for MacOS):
113
- 1. On the welcome screen, select `Create New Project`. Alternatively,
116
+ 1. On the welcome screen, select `Create New Project`. Alternatively,
114
117
when closing the main window of the PyCharm IDE, it will go back to the
115
118
welcome screen.
116
- 2. You will be presented with:
117
- 
118
-
119
-
120
- The project's name is implicitly determined by the tailing folder name. Hence,
121
- to create a project with the name `sandbox`, specify as folder:
119
+ 2. Your project configuration could look something like this:
120
+ 
121
+ - The project's name is implicitly determined by the tailing folder name. Hence,
122
+ to create a project with the name `sandbox`, specify the `location`:
122
123
```
123
124
/Users/alex/Development/Python/IDE-Project-Configurations/sandbox
124
125
```
125
-
126
- Lets name our first project
127
-
128
- 3) Configure Anaconda as an interpreter in pycharm:
129
-
130
-
131
- /Users/alex/Development/Python/Python3.6-x64_Anaconda-5.2.0/bin/python
132
-
133
-
134
-
135
-
136
- https://www.anaconda.com/download/
137
-
138
- 1) dow Anaconda Python 3.6: https://www.anaconda.com/download/#macos
139
-
140
-
141
-
142
- 2) create virtual environment (Python 3 already comes with everything to create a virtual environment):
143
- `python -m venv --symlinks <path-to-new-virtual-environment>`
144
- `source <path-to-new-virtual-environment>/bin/activate`
145
- this will change your default python to the one you activated _only in the current shell session_
146
- to go back to default: `deactivate`
147
- details: https://docs.python.org/3/library/venv.html (edited)
148
- 3) In the _activated_ python environment, install tensorflow (https://www.tensorflow.org/install/install_mac):
149
- • Ensure pip ≥8.1 is installed:
150
- `easy_install -U pip`
151
- • install TensorFlow
152
- `pip3 install --upgrade tensorflow`
153
- • install other useful dependencies for data science
154
- `pip install matplotlib pandas h5py` (edited)
126
+ - Configure the default python interpreter that will be used to
127
+ execute code in the project:
128
+ - Select `Existing Interpreter` and lick on the `...` button on the right.
129
+ (If you have previously already configured PyCharm to use Anaconda, it
130
+ should be available in the drop-down menu).
131
+ - In the window 
132
+ choose `System Interpreter` (left) and use the `...` button to select an
133
+ already installed python environment. You need to select the `python` executable,
134
+ in our example:
135
+ ```
136
+ /Users/alex/Development/Python/Python3.6-x64_Anaconda-5.2.0/bin/python
137
+ ```
138
+
139
+ ### PyCharm in Action
140
+
141
+ In the following, I will use the [python-tutorials repository](https://github.com/AlexHentschel/python-tutorials) as an example.
142
+ I assume you already have cloned the repo so you can execute the provided examples. In the following,
143
+ lets assume the repository to be located in `/Users/alex/Git/python-tutorials`.
144
+
145
+ 1. Open Pycharm and the `sandbox` project (`File` -> `Open Recent` lets you switch projects).
146
+ 2. Now, we are going to _add_ the `python-tutorials` folder to PyCharm's `sandbox` project.
147
+ (This merely instructs PyCharm to add the folder you choose to a list of displayed folders.
148
+ Files and code remain where they are.)
149
+ - Go to `File` -> `Open` and select the folder `/Users/alex/Git/python-tutorials`.
150
+ - Now `python-tutorials` should be listed in the left of the IDE with its location on your
151
+ hard disk next to it in grey print.
152
+
153
+ ### The iPython console
154
+
155
+ The iPython console allows you to _interactively_ execute code _while_ you are developing it.
156
+ I find this immensely useful, specifically for data science and machine learning projects.
157
+
158
+ - Open (drouble click) the python script `python-tutorials/example_code/hello_world.py`
159
+ - Mark all lines of code and press `Control`+`Shift`+`e`. The `Python Console` will open
160
+ and execute the selected code.
161
+ - You can open _multiple_ iPython consoles and work with them in parallel.
162
+
163
+ ### Interactive plotting
164
+
165
+ Similarly, iPython allows you to _interactively_ plot graphs.
166
+
167
+ - An example is given in `python-tutorials/example_code/hello_plot.py`
168
+ - Again, execute all the lines of code using `Control`+`Shift`+`e`.
169
+ Now try to edit the code while *keeping the plot open*.
170
+ On mys system, the plot always stays in front covering up part of the
171
+ PhCarm editor. I found this rather irritating and counterproductive.
172
+ - Execute `python-tutorials/example_code/hello_plot2.py` in a _newly opened_
173
+ Python Console. In this example, we use the `TKAgg` backend for `matplotlib`
174
+ which fixed this behaviour for me.
175
+ - You can make the backend change permanent by editing your
176
+ `~/.matplotlib/matplotlibrc` file. In its default configuration, your
177
+ `matplotlibrc` states for MacOS
178
+ ```
179
+ backend : macosx
180
+ ```
181
+ Change this to
182
+ ```
183
+ backend : TKAgg
184
+ ```
185
+ (see [here](http://matplotlib.org/users/customizing.html#the-matplotlibrc-file) for
186
+ more details)
155
187
156
188
# Pythonic Data-Science Stack: best practices
157
189
158
- #### Use Miniconda as root environment
190
+ ### Use Miniconda as root environment:
159
191
160
192
https://conda.io/miniconda.html
161
193
162
-
163
- ###
194
+ ### Use Virtual Python Environment
195
+
196
+ **Creation of virtual environments**:
197
+ - Python 3 already comes with everything to create a virtual environment.
198
+ Execute in the command line:
199
+ ```
200
+ /Users/alex/Development/Python/Python3.6-x64_Anaconda-5.2.0/bin/python
201
+ ```
202
+ Make sure you select the correct python distribution that should serve
203
+ as a root.
204
+ - On the command line, you can select a virtual environment by
205
+ ```
206
+ source <path-to-new-virtual-environment >/bin/activate
207
+ ```
208
+ this will change your default python to the one you just activated,
209
+ but _only in the current shell session_.
210
+ Note: the command prompt will change and display the python environment.
211
+ To deactivate (go back to the default python environment), type
212
+ ```
213
+ deactivate
214
+ ```
215
+ (further reading on virtual environments: https://docs.python.org/3/library/venv.html)
0 commit comments