Skip to content

Commit 07498e7

Browse files
authored
Merge pull request #3613 from cal-itp/curriculum_docs_update
Curriculum Docs Update
2 parents c9f7a2d + a7c861e commit 07498e7

File tree

6 files changed

+18
-11
lines changed

6 files changed

+18
-11
lines changed

docs/analytics_new_analysts/03-data-management.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Below is a series of tips, tricks and use-cases for managing data throughout the
2828

2929
### GCS
3030

31-
Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/notebooks.html#connecting-to-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).
31+
Our team often uses Google Cloud Storage (GCS) for object storage. If you haven't set up your Google authentication, go [here](https://docs.calitp.org/data-infra/analytics_tools/jupyterhub.html#connecting-to-the-warehouse) for the instructions. For a walkthrough on how to use GCS buckets, go [here](https://docs.calitp.org/data-infra/analytics_tools/storing_data.html#in-gcs).
3232

3333
By putting data on GCS, anybody on the team can use/access/replicate the data without having to transfer data files between machines.
3434

docs/analytics_new_analysts/overview.md

+2
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,14 @@ This section is geared towards data analysts who are new to Python. The followin
2020
- If you are new to Python, take a look at [all the Python tutorials](https://www.linkedin.com/learning/search?keywords=python&u=36029164) available through Caltrans. There are many introductory Python courses [such as this one.](https://www.linkedin.com/learning/python-essential-training-18764650/getting-started-with-python?autoplay=true&u=36029164)
2121
- [Joris van den Bossche's Geopandas Tutorial](https://github.com/jorisvandenbossche/geopandas-tutorial)
2222
- [Practical Python for Data Science by Jill Cates](https://www.practicalpythonfordatascience.com/intro.html)
23+
- [General Python Functions](https://pandas.pydata.org/pandas-docs/stable/reference/general_functions.html)
2324
- [Ben-Gurion University of the Negev - Geometric operations](https://geobgu.xyz/py/geopandas2.html)
2425
- [Geographic Thinking for Data Scientists](https://geographicdata.science/book/notebooks/01_geo_thinking.html)
2526
- [PyGIS Geospatial Tutorials](https://pygis.io/docs/a_intro.html)
2627
- [Python Courses, compiled by our team](https://docs.google.com/spreadsheets/d/1Omow8F0SUiMx1jyG7GpbwnnJ5yWqlLeMH7SMtKxwG80/edit?usp=sharing)
2728
- [Why Dask?](https://docs.dask.org/en/stable/why.html)
2829
- [10 Minutes to Dask](https://docs.dask.org/en/stable/10-minutes-to-dask.html)
30+
- [Jupyter Notebook Tutorial](https://www.youtube.com/watch?v=LW2Rye_l8L0)
2931

3032
### Books
3133

docs/analytics_tools/knowledge_sharing.md

-1
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,6 @@ Here are some resources data analysts have collected and referenced, that will h
6060
- When working with data sets where the "merge on" column is a string data type, it can be difficult to get the DataFrames to join. For example, df1 lists <i>County of Sonoma, Human Services Department, Adult and Aging Division</i>, but df2 references the same department as: <i>County of Sonoma (Human Services Department) </i>.
6161
- Potential Solution #1: [fill in a column in one DataFrame that has a partial match with the string values in another one.](https://stackoverflow.com/questions/61811137/based-on-partial-string-match-fill-one-data-frame-column-from-another-dataframe)
6262
- Potential Solution #2: [use the package fuzzymatcher. This will require you to carefully comb through for any bad matches.](https://pbpython.com/record-linking.html)
63-
- Potential Solution #3: [if you don't have too many values, use a dictionary.](https://github.com/cal-itp/data-analyses/blob/main/drmt_grants/TIRCP_functions.py#:~:text=%23%23%23%20RECIPIENTS%20%23%23%23,%7D)
6463

6564
(dates)=
6665

docs/analytics_tools/storing_data.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ In order to save data being used in a report, you can use two methods:
4646

4747
Watch the screencast below and read the additional information to begin.
4848

49-
**Note**: To access Google Cloud Storage you will need to have set up your Google authentication. If you have yet to do so, [follow these instructions](https://docs.calitp.org/data-infra/analytics_tools/notebooks.html#connecting-to-warehouse).
49+
**Note**: To access Google Cloud Storage you will need to have set up your Google authentication. If you have yet to do so, [follow these instructions](https://docs.calitp.org/data-infra/analytics_tools/jupyterhub.html#connecting-to-the-warehouse).
5050

5151
(storing-new-data-screencast)=
5252

docs/publishing/sections/2_static_files.md

+13-7
Original file line numberDiff line numberDiff line change
@@ -101,15 +101,21 @@ jupyter nbconvert --to html --no-input --no-prompt my_notebook.ipynb
101101
weasyprint my_notebook.html my_notebook.pdf
102102
```
103103

104-
- There are assignments that require you to rerun the same notebook for different values and save each of these new notebooks in PDF format. This essentially combines parameterization principles using `papermill` with the `weasyprint` steps above. You can reference the code that was used to generate the CSIS scorecards [here](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/run_papermill.py). This script iterates over [this notebook](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/sb1_scorecard.ipynb) to produce 50+ PDF files for each of the nominated projects.
104+
- There are assignments that require you to rerun the same notebook for different values and save each of these new notebooks in PDF format. This essentially combines parameterization principles using papermill with the weasyprint steps above. You can reference the code that was used to generate the CSIS scorecards [here](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/_make_scorecard.py). This script iterates over [this notebook](https://github.com/cal-itp/csis-metrics/blob/main/project_prioritization/metrics_summaries/08_csis_scorecard.ipynb) to produce PDF files for each of the nominated projects found [here](<https://console.cloud.google.com/storage/browser/calitp-analytics-data/data-analyses/general_csis/scorecards?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=cal-itp-data-infra>).
105105

106-
Briefly, the script above does the following:
106+
Note: Viewer may need access to the private CSIS repository.
107107

108-
- Automates the naming of the new PDF files by taking away punctuation that isn't allowed.
109-
- Saves the notebook as html files.
110-
- Converts the html files to PDF.
111-
- Saves each PDF to the folder (organized by district) to our GCS.
112-
- Deletes irrelevant files.
108+
Briefly, the script above does the following:
109+
110+
- Automates the naming of the new PDF files by taking away punctuation that isn't allowed.
111+
112+
- Saves the notebook as html files.
113+
114+
- Converts the html files to PDF.
115+
116+
- Saves each PDF to the folder (organized by district) to our GCS.
117+
118+
- Deletes irrelevant files.
113119

114120
- Here are some tips and tricks when converting notebooks to HTML before PDF conversions.
115121

docs/publishing/sections/5_analytics_portfolio_site.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ build_my_reports:
210210
git add portfolio/sites/my_report.yml
211211
```
212212
213-
### Delete Portfolio/ Refresh Index Page
213+
### Redeploying Portfolio/ Refresh Index Page
214214
215215
When redeploying your portfolio with new content and there’s an old version with existing files or content on your portfolio site or in your local environment, it’s important to clean up the old files before adding new content.
216216

0 commit comments

Comments
 (0)