From 09d1522bd3968771150e4a325522ae3d78c1a1fa Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Fri, 24 May 2019 10:25:55 -0400 Subject: [PATCH 01/11] Refactor repo to use nested directories --- README.md | 45 ++++++++----------- .../Dockerizing-Postgres.md | 0 .../Dump-and-restore-Postgres.md | 0 .../Interacting-with-a-remote-database.md | 0 .../quick-n-dirty-sqlalchemy.md | 0 .../lxml-for-web-scraping.md | 0 .../moving-keys-between-servers.md | 0 .../tmux-best-practices.md | 0 8 files changed, 18 insertions(+), 27 deletions(-) rename Dockerizing-Postgres.md => postgres/Dockerizing-Postgres.md (100%) rename Dump-and-restore-Postgres.md => postgres/Dump-and-restore-Postgres.md (100%) rename Interacting-with-a-remote-database.md => postgres/Interacting-with-a-remote-database.md (100%) rename quick-n-dirty-sqlalchemy.md => postgres/quick-n-dirty-sqlalchemy.md (100%) rename lxml-for-web-scraping.md => scraping/lxml-for-web-scraping.md (100%) rename moving-keys-between-servers.md => shell/moving-keys-between-servers.md (100%) rename tmux-best-practices.md => shell/tmux-best-practices.md (100%) diff --git a/README.md b/README.md index 63e9a54..cdb6110 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,23 @@ -# tutorials +# howto _📚 Doing all sorts of things, the DataMade way_ ## What's this? -Here at DataMade, we do a lot of computer programming. Sometimes, we create entire repos dedicated to the practices we employ: - -- [Making Data, the DataMade Way](https://github.com/datamade/data-making-guidelines) -- [Analyzing Data, the DataMade Way](https://github.com/datamade/data-analysis-guidelines) -- [Writing Tests, the DataMade Way](https://github.com/datamade/testing-guidelines) - -But our daily computing also involves a lot of small, but important, tasks. In the spirit of [better living through documentation](https://datamade.us/blog/better-living-through-documentation), we're preserving guides to those tasks, here. - -## Index - -### Python libraries - -- [A quick and dirty introduction to `sqlalchemy`](/quick-n-dirty-sqlalchemy.md) -- Prefetching objects in Django -- [`lxml` for web scraping](/lxml-for-web-scraping.md) - -### Database operations - -- [Interacting with a remote database](/Interacting-with-a-remote-database.md) -- [Dumping and restoring a Postgres database](/Dump-and-restore-Postgres.md) -- [Dockerizing Postgres](/Dockerizing-Postgres.md) - -### Devops - -- [tmux, best practices](tmux-best-practices.md) -- [How to move a gpg key between servers](moving-keys-between-servers.md) +Here at DataMade, we do a lot of computer programming. In the spirit of [better living through documentation](https://datamade.us/blog/better-living-through-documentation), we're preserving guides to how we do that here. + +## Contents + +- [Data processing and ETL](https://github.com/datamade/data-making-guidelines) +- [Reproducible data analysis](https://github.com/datamade/data-analysis-guidelines) +- [Software testing](https://github.com/datamade/testing-guidelines) +- [PostgreSQL](/postgres/) + - [A quick and dirty introduction to `sqlalchemy`](/postgres/quick-n-dirty-sqlalchemy.md) + - [Interacting with a remote database](/postgres/Interacting-with-a-remote-database.md) + - [Dumping and restoring a Postgres database](/postgres/Dump-and-restore-Postgres.md) + - [Dockerizing Postgres](/postgres/Dockerizing-Postgres.md) +- [Web scraping](/scraping/) + - [`lxml` for web scraping](/scraping/lxml-for-web-scraping.md) +- [The shell and Ubuntu](/shell/) + - [tmux, best practices](/shell/tmux-best-practices.md) + - [How to move a gpg key between servers](/shell/moving-keys-between-servers.md) diff --git a/Dockerizing-Postgres.md b/postgres/Dockerizing-Postgres.md similarity index 100% rename from Dockerizing-Postgres.md rename to postgres/Dockerizing-Postgres.md diff --git a/Dump-and-restore-Postgres.md b/postgres/Dump-and-restore-Postgres.md similarity index 100% rename from Dump-and-restore-Postgres.md rename to postgres/Dump-and-restore-Postgres.md diff --git a/Interacting-with-a-remote-database.md b/postgres/Interacting-with-a-remote-database.md similarity index 100% rename from Interacting-with-a-remote-database.md rename to postgres/Interacting-with-a-remote-database.md diff --git a/quick-n-dirty-sqlalchemy.md b/postgres/quick-n-dirty-sqlalchemy.md similarity index 100% rename from quick-n-dirty-sqlalchemy.md rename to postgres/quick-n-dirty-sqlalchemy.md diff --git a/lxml-for-web-scraping.md b/scraping/lxml-for-web-scraping.md similarity index 100% rename from lxml-for-web-scraping.md rename to scraping/lxml-for-web-scraping.md diff --git a/moving-keys-between-servers.md b/shell/moving-keys-between-servers.md similarity index 100% rename from moving-keys-between-servers.md rename to shell/moving-keys-between-servers.md diff --git a/tmux-best-practices.md b/shell/tmux-best-practices.md similarity index 100% rename from tmux-best-practices.md rename to shell/tmux-best-practices.md From 4aa3a9009d12fca280e568c9800e918552870f82 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Fri, 24 May 2019 10:43:26 -0400 Subject: [PATCH 02/11] Add READMEs to each subdir and add doc for stack changes --- README.md | 2 + postgres/README.md | 11 ++++++ process/README.md | 8 ++++ process/stack-changes.md | 82 ++++++++++++++++++++++++++++++++++++++++ scraping/README.md | 7 ++++ shell/README.md | 9 +++++ 6 files changed, 119 insertions(+) create mode 100644 postgres/README.md create mode 100644 process/README.md create mode 100644 process/stack-changes.md create mode 100644 scraping/README.md create mode 100644 shell/README.md diff --git a/README.md b/README.md index cdb6110..0407538 100644 --- a/README.md +++ b/README.md @@ -21,3 +21,5 @@ Here at DataMade, we do a lot of computer programming. In the spirit of [better - [The shell and Ubuntu](/shell/) - [tmux, best practices](/shell/tmux-best-practices.md) - [How to move a gpg key between servers](/shell/moving-keys-between-servers.md) +- [Collaborative processes](/process/) + - [Making changes to the standard stack](/process/stack-changes.md) diff --git a/postgres/README.md b/postgres/README.md new file mode 100644 index 0000000..da7e791 --- /dev/null +++ b/postgres/README.md @@ -0,0 +1,11 @@ +# PostgreSQL + +This directory records best practices for working with the object-relational +database management system PostgreSQL, our primary choice of database. + +## Guides + +- [A quick and dirty introduction to `sqlalchemy`](./quick-n-dirty-sqlalchemy.md) +- [Interacting with a remote database](./Interacting-with-a-remote-database.md) +- [Dumping and restoring a Postgres database](./Dump-and-restore-Postgres.md) +- [Dockerizing Postgres](./Dockerizing-Postgres.md) diff --git a/process/README.md b/process/README.md new file mode 100644 index 0000000..676842f --- /dev/null +++ b/process/README.md @@ -0,0 +1,8 @@ +# Collaborative processes + +This directory records best practices for collaborative software development +processes. + +## Guides + +- [Making changes to the standard stack](./stack-changes.md) diff --git a/process/stack-changes.md b/process/stack-changes.md new file mode 100644 index 0000000..090797e --- /dev/null +++ b/process/stack-changes.md @@ -0,0 +1,82 @@ +# Making changes to the stack + +## Background + +Thanks to the addition of a formal research and development process, we have begun to identify opportunities for changing portions of the DataMade Standard Stack™. This process is intended to guide, but not hinder, the exploration of new tooling. Our goal is to empower lead developers to move as fast as possible in adopting new tools that will help the team work more productively, while minimizing the number of stray projects that we must maintain using tools that no one knows how to use. + +This process will: + +- Enable partners to confidently delegate authority for technical leadership +- Empower lead developers to pursue and implement technical changes +- Make transparent to developers the standard toolkit and process for changes +- Minimize maintenance burden for future developers + +This document is a work in progress. If any step in this process is consistently and unnecessarily painful, it is subject to amendment. Amendments to this process should be proposed and agreed upon by lead developers, and approved by partners. + +## Process + +This process may be exited at any step if lead developers decide that a tool is not a good fit for the team. Lead developers should agree on abandoning a new tool, and the decision should be documented on the research issue tracking work on the tool. For an example of documented abandonment, see the [R&D project on Netlify Add-ons](https://github.com/datamade/ops/issues/610). + +### 1. Propose research project to team of lead developers + +Lead developers are the primary drivers of the research process. As such, all lead developers should comment and give approval on a research proposal before research begins. + +Lead developers should log proposals as GitHub notes in the [R&D project in this repo](https://github.com/datamade/tutorials/projects/1). Once work begins on a proposal, it should be converted into an issue. Proposals can take a number of forms, from a quick description to a formal project plan. Lead developers should work together to settle on a proposal standard that works for them. + +### 2. Conduct research and develop proof of concept + +The goal of a proof of concept is to test the tool under conditions that are as similar as possible to a real DataMade project. To this end, it’s typically useful to implement the proof-of-concept as a refactor or a rewrite of an existing app. + +Depending on the complexity of the tool, this step can take anywhere from one to four R&D days to finish. Only once the proof-of-concept reaches feature parity with its counterpart should lead developers move on to the next step. + +### 3. Recommend adoption, further research, or abandonment + +There are three outcomes that we expect from a proof-of-concept: + +1. The tool works as advertised, and lead developers recommend its adoption. +2. The tool has mixed results, and lead developers recommend more research. +3. The tool does not work as advertised, and lead developers recommend abandoning it. + +Whatever the outcome, the recommendation should include a cost/benefit analysis comparing the tool to other tools that DataMade uses to solve similar problems on a number of levels, including implementation time, prerequisite skills, and maintenance outlook. This cost/benefit analysis will be a draft and will be updated if developers recommend adoption, as they learn more about the tool. + +Lead developers should make this recommendation as a group. If one lead developer is leading up the research effort, this collaboration can take the form of a draft that other lead developers help revise. If the lead developers are doing research collaboratively, they might consider collaborating on a recommendation. Either way, the group should reach consensus before moving on. + +If lead developers recommend adoption, move on to step 5. If lead developers recommend further research, return to step 1. If lead developers recommend abandonment, document the reasons for abandonment and exit this process. + +### 4. Notify partners of recommendation + +Once lead developers have reached a consensus opinion on adoption or abandonment, they should forward their recommendation to partners for review. Partners may seek clarification, request further research, or approve the recommendation and begin planning for the next step. + +### 5. Pilot use of the tool on a project + +Once lead developers and partners are in consensus on a recommendation of adoption, they will pilot the new tool on a project. + +Piloting new tools should be undertaken carefully. At this point it’s likely that only one developer has expertise in the tool, and we would like to avoid above all a situation where we have to maintain technology that we have decided not to adopt. + +Some ideal projects on which to pilot a new tool might include: + +- A small, self-contained feature being implemented in a mature codebase +- A one-off project that we don’t expect to spend much time maintaining in the future +- A small greenfield project that will involve chunks of major development in the future, offering the possibility to refactor + +This phase should be conducted in collaboration with another developer in order to diversify perspectives on the new tool. Collaborators should be selected based on a demonstrated foundation of knowledge that will allow them to adapt to the new tool quickly. + +If no other developers yet have the necessary foundation of knowledge to make use of the tool, lead developers will coordinate with partners to plan time for training the collaborator before project work begins. + +### 6. Produce adoption artifacts + +After the pilot project is complete, lead developers should schedule a retrospective to gather feedback and learning about the tool from all developers involved. + +The retrospective is intended to help produce adoption artifacts that will guide future use of the tool, including: + +- Any updates to initial materials produced during R&D +- A list of lessons learned about the tool, including links to helpful resources +- If applicable, a template for bootstrapping a project with the tool in the future + +These artifacts should be written up as a pull request against this repo, which serves as the central knowledge store for the DataMade Standard Stack™. Templates or other documentation may exist in separate docs/repos, but they should still be referenced in the tutorials repo in order to encourage centralization of knowledge. + +If developers somehow reach this step and decide to abandon the tool, they should devise a contingency plan for dealing with future maintenance of the project. This contingency plan might include: + +- Refactoring the code to remove the tool + - Ideally this should happen during the course of normal business, but if no budget is available for the project, lead developers should expect to have to use R&D time to do this cleaning +- Adding extra documentation to help future developers understand the context of the tool diff --git a/scraping/README.md b/scraping/README.md new file mode 100644 index 0000000..464ee7a --- /dev/null +++ b/scraping/README.md @@ -0,0 +1,7 @@ +# Web scraping + +This directory records best practices for scraping the web. + +## Guides + +- [`lxml` for web scraping](./lxml-for-web-scraping.md) diff --git a/shell/README.md b/shell/README.md new file mode 100644 index 0000000..48aaa53 --- /dev/null +++ b/shell/README.md @@ -0,0 +1,9 @@ +# The shell and Ubuntu + +This directory records best practices for working with the shell (at DataMade, +usually Bash) and Ubuntu, our choice of operating system for our servers. + +## Guides + +- [tmux, best practices](./tmux-best-practices.md) +- [How to move a gpg key between servers](./moving-keys-between-servers.md) From d4465ce10f39f5c2b12e2083d5f8b95db9c43405 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Fri, 24 May 2019 11:56:43 -0400 Subject: [PATCH 03/11] Add directory for GatsbyJS --- README.md | 1 + gatsby/README.md | 71 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+) create mode 100644 gatsby/README.md diff --git a/README.md b/README.md index 0407538..25a3241 100644 --- a/README.md +++ b/README.md @@ -21,5 +21,6 @@ Here at DataMade, we do a lot of computer programming. In the spirit of [better - [The shell and Ubuntu](/shell/) - [tmux, best practices](/shell/tmux-best-practices.md) - [How to move a gpg key between servers](/shell/moving-keys-between-servers.md) +- [GatsbyJS](/gatsby/) - [Collaborative processes](/process/) - [Making changes to the standard stack](/process/stack-changes.md) diff --git a/gatsby/README.md b/gatsby/README.md new file mode 100644 index 0000000..5fb5be4 --- /dev/null +++ b/gatsby/README.md @@ -0,0 +1,71 @@ +# GatsbyJS + +This directory records best practices for working with [GatsbyJS](https://github.com/datamade/tutorials/projects/1), a static site generator built on top of React and GraphQL. + +## Contents + +- [When to use Gatsby](#when-to-use-gatsby) + - [Gatsby vs. Jekyll](#gatsby-vs-jekyll) + - [Gatsby vs. Django](#gatsby-vs-django) +- [Resources](#resources) + +## When to use Gatsby + +How does Gatsby compare to other tools in the DataMade toolkit, and when should you decide to use it? + +### Gatsby vs. Jekyll + +Gatsby's most obvious competitor is [Jekyll](https://jekyllrb.com/), a static site generator built in Ruby. We've used Jekyll for a number of static sites in the past, including [the DataMade website](https://github.com/datamade/datamade.us), [dedupe.io](https://github.com/dedupeio/dedupe.io), and [CUAMP](https://github.com/datamade/CUAMP-live). + +Gatsby and Jekyll can both be deployed on Netlify as purely static sites, which makes them attractive candidates for sites that require no dynamic backend functionality like faceted search, persistent data storage, user management, or admin interfaces. + +In general, you should prefer Gatsby to Jekyll when: + +- You have a team that has working knowledge of ES6, NPM, React, and GraphQL +- You would like to load and display data from datasources other than Markdown + +#### Pros + +- Gatsby is built on top of JavaScript, a language that we use much more regularly than Ruby. +- Gatsby supports data loading from a huge variety of datasources via its extensible data layer API. Jekyll can only read data from Markdown posts, or from specially-formatted YAML, JSON, or CSV files living in the `_data` folder. +- Gatsby uses React under the hood and exposes a fully-configured React development environment, meaning you can make use of JSX and the React plugin ecosystem. Jekyll requires you to use the [Liquid templating language](https://jekyllrb.com/docs/liquid/), which does not integrate as closely with JavaScript as JSX. +- Gatsby builds in a lot of performance optimizations, including code splitting, progressive rendering, and resized images for different devices. (For a deep dive on Gatsby's performance optimizations, see [Why is Gatsby so fast?](https://www.gatsbyjs.org/blog/2017-09-13-why-is-gatsby-so-fast/)) +- Gatsby bakes in a modern JavaScript development environment by default, including hot reloading, source maps, and JavaScript package management. + +#### Cons + +- By nature of offering more flexible data management and a modern JavaScript environment, a Gatsby project is typically more complicated to set up than a Jekyll project. +- Gatsby requires working knowledge of ES6, NPM, React, GraphQL. While these tools are very powerful and fun to use once you understand them, they may be overwhelming at first for devs who are learning them all at once. +- Since it is built on top of React, Gatsby does not play well with JQuery. This has the disadvantage of making it incompatible with many of the JQuery libraries DataMade uses regularly, and it means you'll often need to use a React plugin to get older JavaScript libraries to work (e.g. [react-leaflet](https://react-leaflet.js.org/) in addition to vanilla [Leaflet](https://leafletjs.com/)). + +### Gatbsy vs. Django + +At first glance, Gatsby and Django may appear to be very different tools -- Gatsby is a static site generator, while Django is a full-featured MVC web framework -- but in practice, Gatsby can be a good choice for many simple apps that we would otherwise build in Django. + +In particular, Gatsby's flexible data layer means that it can [generate pages programmatically from data](https://www.gatsbyjs.org/tutorial/part-seven/). This means that Gatsby can easily generate list and detail views based on a CSV or Postgres database. And since Gatsby can be deployed as a static site on Netlify, you can save a lot of time that you would otherwise spend provisioning staging and production environments, and serve up fast prebuilt assets to boot. + +In general, you should prefer Gatsby to Django when: + +- You have a team that has working knowledge of ES6, NPM, React, and GraphQL +- Your app doesn't require any dynamic backend behavior like faceted search, persistent data storage, user management, or admin interfaces +- Your primary motivation for considering Django over static HTML/CSS/JS is the ability to generate views based on data + +#### Pros + + +- By treating JavaScript as a first-class citizen, Gatsby and React make the process of building frontend interactions much more intuitive (and much more testable) than Django does. +- Deploying on Netlify reduces the overhead of Continuous Deployment and server provisioning/management. +- In addition to Gatsby's [built-in performance optimizations](https://www.gatsbyjs.org/blog/2017-09-13-why-is-gatsby-so-fast/), the fact that you're deploying static assets instead of rendering responses on a server means that a Gatsby app by default will load pages much faster than a Django app. + +#### Cons + +- So far, we don't have a good way of doing faceted search in a Gatsby app. [David Eads has done it](https://www.propublica.org/nerds/the-ticket-trap-news-app-front-to-back-david-eads-propublica-illinois) but we still need to prove it out in our stack. +- Gatsby lacks a simple solution for user management and admin views. [We're tracking the progress of Netlify Identity](https://jeancochrane.com/blog/netlify-identity-dealbreakers), but it's not production-ready yet. +- Gatsby requires working knowledge of ES6, NPM, React, and GraphQL. As of May 2019, these technologies are less well-known on our team than Python and Django. + +## Resources + +- [The Gatsby.js tutorial](https://www.gatsbyjs.org/tutorial/) - Gatsby's official tutorial. A full walkthrough of building a simple blog from scratch in Gatsby, including quick sidebars on the basics of React, JSX, and GraphQL. +- [The React docs](https://reactjs.org/docs/hello-world.html) - A step-by-step guide through the basics of React. Useful for getting your bearings in JSX, components, and React state management, which are important prerequisites for Gatsby. +- [The Fullstack Tutorial for GraphQL](https://www.howtographql.com/) - Gatsby's recommended tutorial for learning GraphQL. We recommend reading the "GraphQL fundamentals" section. +- [@jeancochrane's lunch&learn on Gatsby](https://gist.github.com/jeancochrane/705dda18da74fafe4b8182d15284114d) - A set of brief notes giving a quick overview of Gatsby's features. From d5e8263e4feb6a5568b0761dfff8997611174006 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Fri, 24 May 2019 12:07:16 -0400 Subject: [PATCH 04/11] Clean up typos and formatting in READMEs --- gatsby/README.md | 5 ++--- postgres/README.md | 5 ++--- process/README.md | 3 +-- shell/README.md | 3 +-- 4 files changed, 6 insertions(+), 10 deletions(-) diff --git a/gatsby/README.md b/gatsby/README.md index 5fb5be4..f4aa251 100644 --- a/gatsby/README.md +++ b/gatsby/README.md @@ -22,12 +22,12 @@ Gatsby and Jekyll can both be deployed on Netlify as purely static sites, which In general, you should prefer Gatsby to Jekyll when: - You have a team that has working knowledge of ES6, NPM, React, and GraphQL -- You would like to load and display data from datasources other than Markdown +- You would like to load and display data from data sources other than Markdown #### Pros - Gatsby is built on top of JavaScript, a language that we use much more regularly than Ruby. -- Gatsby supports data loading from a huge variety of datasources via its extensible data layer API. Jekyll can only read data from Markdown posts, or from specially-formatted YAML, JSON, or CSV files living in the `_data` folder. +- Gatsby supports data loading from a huge variety of data sources via its extensible data layer API. Jekyll can only read data from Markdown posts, or from specially-formatted YAML, JSON, or CSV files living in the `_data` folder. - Gatsby uses React under the hood and exposes a fully-configured React development environment, meaning you can make use of JSX and the React plugin ecosystem. Jekyll requires you to use the [Liquid templating language](https://jekyllrb.com/docs/liquid/), which does not integrate as closely with JavaScript as JSX. - Gatsby builds in a lot of performance optimizations, including code splitting, progressive rendering, and resized images for different devices. (For a deep dive on Gatsby's performance optimizations, see [Why is Gatsby so fast?](https://www.gatsbyjs.org/blog/2017-09-13-why-is-gatsby-so-fast/)) - Gatsby bakes in a modern JavaScript development environment by default, including hot reloading, source maps, and JavaScript package management. @@ -52,7 +52,6 @@ In general, you should prefer Gatsby to Django when: #### Pros - - By treating JavaScript as a first-class citizen, Gatsby and React make the process of building frontend interactions much more intuitive (and much more testable) than Django does. - Deploying on Netlify reduces the overhead of Continuous Deployment and server provisioning/management. - In addition to Gatsby's [built-in performance optimizations](https://www.gatsbyjs.org/blog/2017-09-13-why-is-gatsby-so-fast/), the fact that you're deploying static assets instead of rendering responses on a server means that a Gatsby app by default will load pages much faster than a Django app. diff --git a/postgres/README.md b/postgres/README.md index da7e791..4920999 100644 --- a/postgres/README.md +++ b/postgres/README.md @@ -1,9 +1,8 @@ # PostgreSQL -This directory records best practices for working with the object-relational -database management system PostgreSQL, our primary choice of database. +This directory records best practices for working with the object-relational database management system PostgreSQL, our primary choice of database. -## Guides +## Guides - [A quick and dirty introduction to `sqlalchemy`](./quick-n-dirty-sqlalchemy.md) - [Interacting with a remote database](./Interacting-with-a-remote-database.md) diff --git a/process/README.md b/process/README.md index 676842f..28fb960 100644 --- a/process/README.md +++ b/process/README.md @@ -1,7 +1,6 @@ # Collaborative processes -This directory records best practices for collaborative software development -processes. +This directory records best practices for collaborative software development processes. ## Guides diff --git a/shell/README.md b/shell/README.md index 48aaa53..fb38ef8 100644 --- a/shell/README.md +++ b/shell/README.md @@ -1,7 +1,6 @@ # The shell and Ubuntu -This directory records best practices for working with the shell (at DataMade, -usually Bash) and Ubuntu, our choice of operating system for our servers. +This directory records best practices for working with the shell -- typically Bash -- and Ubuntu, our choice of operating system for our servers. ## Guides From 1a4022c260953e41e8d8d1be0423e00666ef0074 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Tue, 28 May 2019 15:40:33 -0400 Subject: [PATCH 05/11] Add docs for communicating between EC2 <> RDS --- README.md | 2 ++ aws/rds.md | 26 ++++++++++++++++++++++++++ 2 files changed, 28 insertions(+) create mode 100644 aws/rds.md diff --git a/README.md b/README.md index 25a3241..afe0277 100644 --- a/README.md +++ b/README.md @@ -24,3 +24,5 @@ Here at DataMade, we do a lot of computer programming. In the spirit of [better - [GatsbyJS](/gatsby/) - [Collaborative processes](/process/) - [Making changes to the standard stack](/process/stack-changes.md) +- [Amazon Web Services](/aws/) + - [AWS Relational Database Service (RDS)](/aws/rds.md) diff --git a/aws/rds.md b/aws/rds.md new file mode 100644 index 0000000..7ae5ba5 --- /dev/null +++ b/aws/rds.md @@ -0,0 +1,26 @@ +# AWS Relational Database Service (RDS) + +This doc presents guides on working with AWS RDS, Amazon's Database-as-a-Service provider. + +## Connecting with an EC2 instance + +In order to connect to an RDS instance from an EC2 instance, you'll need to configure the security rules for the database to allow access from the instance. Broadly speaking, AWS allows you to do this by 1) creating the database and the EC2 instance in the same Virtual Private Cloud (VPC) and 2) creating an inbound rule for the database's security group that permits access to PostgreSQL from the EC2 instance's security group. + +These instructions are loosely based on [AWS's official documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario1). + +- Navigate to the page for your database in the RDS console. +- Under `Connectivity & security > Networking > VPC`, copy the ID of the VPC that the database runs in. +- Navigate to the EC2 console and begin to launch an EC2 instance [following our guide](https://github.com/datamade/deploy-a-site/blob/master/Launch-a-new-EC2-Instance.md) +- During Step 3, the "Configure Instance Details" page, specify the following options: + - For `Network`, select the VPC matching the ID that you copied above. This will launch your instance in the same VPC as your database. + - For `Subnet`, Select any public subnet. This will allow the EC2 instance to access the public Internet. +- During Step 6, the "Configure Security Group" page, create a new security group for this instance instead of using the `default`. Give it any HTTP/SSH access rules you need for your application. +- Finish launching the instance as normal. +- In the EC2 console, copy the ID of the security group for your new EC2 instance. +- In the RDS console, navigate to the security group for your database. +- In the detail view for your database's security group, select `Inbound > Edit > Add Rule` with the following attributes: + - Type: `PostgreSQL` + - Protocol: `TCP` + - Port range: `5432`j + - Source: `Custom`, and paste in the value of the EC2 instance security group from above +- Test that your server can access your database by [opening up an SSH tunnel](/postgres/Interacting-with-a-remote-database.md) and attempting to `psql` into the database. From ca6ac19312d2957d1fcee091affabe05dcc81e60 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Thu, 6 Jun 2019 15:10:40 -0500 Subject: [PATCH 06/11] Remove Gatsby docs --- README.md | 1 - gatsby/README.md | 70 ------------------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 gatsby/README.md diff --git a/README.md b/README.md index afe0277..4245e1c 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,6 @@ Here at DataMade, we do a lot of computer programming. In the spirit of [better - [The shell and Ubuntu](/shell/) - [tmux, best practices](/shell/tmux-best-practices.md) - [How to move a gpg key between servers](/shell/moving-keys-between-servers.md) -- [GatsbyJS](/gatsby/) - [Collaborative processes](/process/) - [Making changes to the standard stack](/process/stack-changes.md) - [Amazon Web Services](/aws/) diff --git a/gatsby/README.md b/gatsby/README.md deleted file mode 100644 index f4aa251..0000000 --- a/gatsby/README.md +++ /dev/null @@ -1,70 +0,0 @@ -# GatsbyJS - -This directory records best practices for working with [GatsbyJS](https://github.com/datamade/tutorials/projects/1), a static site generator built on top of React and GraphQL. - -## Contents - -- [When to use Gatsby](#when-to-use-gatsby) - - [Gatsby vs. Jekyll](#gatsby-vs-jekyll) - - [Gatsby vs. Django](#gatsby-vs-django) -- [Resources](#resources) - -## When to use Gatsby - -How does Gatsby compare to other tools in the DataMade toolkit, and when should you decide to use it? - -### Gatsby vs. Jekyll - -Gatsby's most obvious competitor is [Jekyll](https://jekyllrb.com/), a static site generator built in Ruby. We've used Jekyll for a number of static sites in the past, including [the DataMade website](https://github.com/datamade/datamade.us), [dedupe.io](https://github.com/dedupeio/dedupe.io), and [CUAMP](https://github.com/datamade/CUAMP-live). - -Gatsby and Jekyll can both be deployed on Netlify as purely static sites, which makes them attractive candidates for sites that require no dynamic backend functionality like faceted search, persistent data storage, user management, or admin interfaces. - -In general, you should prefer Gatsby to Jekyll when: - -- You have a team that has working knowledge of ES6, NPM, React, and GraphQL -- You would like to load and display data from data sources other than Markdown - -#### Pros - -- Gatsby is built on top of JavaScript, a language that we use much more regularly than Ruby. -- Gatsby supports data loading from a huge variety of data sources via its extensible data layer API. Jekyll can only read data from Markdown posts, or from specially-formatted YAML, JSON, or CSV files living in the `_data` folder. -- Gatsby uses React under the hood and exposes a fully-configured React development environment, meaning you can make use of JSX and the React plugin ecosystem. Jekyll requires you to use the [Liquid templating language](https://jekyllrb.com/docs/liquid/), which does not integrate as closely with JavaScript as JSX. -- Gatsby builds in a lot of performance optimizations, including code splitting, progressive rendering, and resized images for different devices. (For a deep dive on Gatsby's performance optimizations, see [Why is Gatsby so fast?](https://www.gatsbyjs.org/blog/2017-09-13-why-is-gatsby-so-fast/)) -- Gatsby bakes in a modern JavaScript development environment by default, including hot reloading, source maps, and JavaScript package management. - -#### Cons - -- By nature of offering more flexible data management and a modern JavaScript environment, a Gatsby project is typically more complicated to set up than a Jekyll project. -- Gatsby requires working knowledge of ES6, NPM, React, GraphQL. While these tools are very powerful and fun to use once you understand them, they may be overwhelming at first for devs who are learning them all at once. -- Since it is built on top of React, Gatsby does not play well with JQuery. This has the disadvantage of making it incompatible with many of the JQuery libraries DataMade uses regularly, and it means you'll often need to use a React plugin to get older JavaScript libraries to work (e.g. [react-leaflet](https://react-leaflet.js.org/) in addition to vanilla [Leaflet](https://leafletjs.com/)). - -### Gatbsy vs. Django - -At first glance, Gatsby and Django may appear to be very different tools -- Gatsby is a static site generator, while Django is a full-featured MVC web framework -- but in practice, Gatsby can be a good choice for many simple apps that we would otherwise build in Django. - -In particular, Gatsby's flexible data layer means that it can [generate pages programmatically from data](https://www.gatsbyjs.org/tutorial/part-seven/). This means that Gatsby can easily generate list and detail views based on a CSV or Postgres database. And since Gatsby can be deployed as a static site on Netlify, you can save a lot of time that you would otherwise spend provisioning staging and production environments, and serve up fast prebuilt assets to boot. - -In general, you should prefer Gatsby to Django when: - -- You have a team that has working knowledge of ES6, NPM, React, and GraphQL -- Your app doesn't require any dynamic backend behavior like faceted search, persistent data storage, user management, or admin interfaces -- Your primary motivation for considering Django over static HTML/CSS/JS is the ability to generate views based on data - -#### Pros - -- By treating JavaScript as a first-class citizen, Gatsby and React make the process of building frontend interactions much more intuitive (and much more testable) than Django does. -- Deploying on Netlify reduces the overhead of Continuous Deployment and server provisioning/management. -- In addition to Gatsby's [built-in performance optimizations](https://www.gatsbyjs.org/blog/2017-09-13-why-is-gatsby-so-fast/), the fact that you're deploying static assets instead of rendering responses on a server means that a Gatsby app by default will load pages much faster than a Django app. - -#### Cons - -- So far, we don't have a good way of doing faceted search in a Gatsby app. [David Eads has done it](https://www.propublica.org/nerds/the-ticket-trap-news-app-front-to-back-david-eads-propublica-illinois) but we still need to prove it out in our stack. -- Gatsby lacks a simple solution for user management and admin views. [We're tracking the progress of Netlify Identity](https://jeancochrane.com/blog/netlify-identity-dealbreakers), but it's not production-ready yet. -- Gatsby requires working knowledge of ES6, NPM, React, and GraphQL. As of May 2019, these technologies are less well-known on our team than Python and Django. - -## Resources - -- [The Gatsby.js tutorial](https://www.gatsbyjs.org/tutorial/) - Gatsby's official tutorial. A full walkthrough of building a simple blog from scratch in Gatsby, including quick sidebars on the basics of React, JSX, and GraphQL. -- [The React docs](https://reactjs.org/docs/hello-world.html) - A step-by-step guide through the basics of React. Useful for getting your bearings in JSX, components, and React state management, which are important prerequisites for Gatsby. -- [The Fullstack Tutorial for GraphQL](https://www.howtographql.com/) - Gatsby's recommended tutorial for learning GraphQL. We recommend reading the "GraphQL fundamentals" section. -- [@jeancochrane's lunch&learn on Gatsby](https://gist.github.com/jeancochrane/705dda18da74fafe4b8182d15284114d) - A set of brief notes giving a quick overview of Gatsby's features. From 242a8221fb89c0a990d90e3461d75896ca41bd0c Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Thu, 6 Jun 2019 15:18:16 -0500 Subject: [PATCH 07/11] Move stack change docs to CONTRIBUTING.md --- process/stack-changes.md => CONTRIBUTING.md | 0 README.md | 6 ++++-- process/README.md | 7 ------- 3 files changed, 4 insertions(+), 9 deletions(-) rename process/stack-changes.md => CONTRIBUTING.md (100%) delete mode 100644 process/README.md diff --git a/process/stack-changes.md b/CONTRIBUTING.md similarity index 100% rename from process/stack-changes.md rename to CONTRIBUTING.md diff --git a/README.md b/README.md index 4245e1c..617ae34 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,9 @@ Here at DataMade, we do a lot of computer programming. In the spirit of [better - [The shell and Ubuntu](/shell/) - [tmux, best practices](/shell/tmux-best-practices.md) - [How to move a gpg key between servers](/shell/moving-keys-between-servers.md) -- [Collaborative processes](/process/) - - [Making changes to the standard stack](/process/stack-changes.md) - [Amazon Web Services](/aws/) - [AWS Relational Database Service (RDS)](/aws/rds.md) + +## Contributing + +The process for making changes to the DataMade Stack, and by extension this repo, is documented in [`CONTRIBUTING.md`](./CONTRIBUTING.md). diff --git a/process/README.md b/process/README.md deleted file mode 100644 index 28fb960..0000000 --- a/process/README.md +++ /dev/null @@ -1,7 +0,0 @@ -# Collaborative processes - -This directory records best practices for collaborative software development processes. - -## Guides - -- [Making changes to the standard stack](./stack-changes.md) From 3f3c35d9b9e9d6e49739a296e9105fc1f9545be2 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Thu, 6 Jun 2019 15:18:37 -0500 Subject: [PATCH 08/11] Change the name of the repo to 'how-to' --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 617ae34..cc28b65 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# howto +# how-to _📚 Doing all sorts of things, the DataMade way_ From 86b55bfbcb1adc2f137d53197dc489fb5cd16443 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Thu, 6 Jun 2019 15:28:15 -0500 Subject: [PATCH 09/11] Update EC2 and RDS networking docs --- aws/rds.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/aws/rds.md b/aws/rds.md index 7ae5ba5..0520174 100644 --- a/aws/rds.md +++ b/aws/rds.md @@ -6,17 +6,24 @@ This doc presents guides on working with AWS RDS, Amazon's Database-as-a-Service In order to connect to an RDS instance from an EC2 instance, you'll need to configure the security rules for the database to allow access from the instance. Broadly speaking, AWS allows you to do this by 1) creating the database and the EC2 instance in the same Virtual Private Cloud (VPC) and 2) creating an inbound rule for the database's security group that permits access to PostgreSQL from the EC2 instance's security group. -These instructions are loosely based on [AWS's official documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario1). +This process is useful when you have a database that is running in a VPC and that does not have access to the public Internet. Generally, we consider it a best practice to provision databases this way, so that they are protected from attacks over the Internet. For an example of a project that uses this configuration, see https://gitlab.com/ChicagoDataCooperative/court-terminal. + +These instructions are loosely based on [AWS's official documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario1). They assume that you already have an RDS instance up and running in a private VPC (docs forthcoming). + +### 1. Create the EC2 instance in the same VPC as the database - Navigate to the page for your database in the RDS console. - Under `Connectivity & security > Networking > VPC`, copy the ID of the VPC that the database runs in. - Navigate to the EC2 console and begin to launch an EC2 instance [following our guide](https://github.com/datamade/deploy-a-site/blob/master/Launch-a-new-EC2-Instance.md) -- During Step 3, the "Configure Instance Details" page, specify the following options: - - For `Network`, select the VPC matching the ID that you copied above. This will launch your instance in the same VPC as your database. - - For `Subnet`, Select any public subnet. This will allow the EC2 instance to access the public Internet. -- During Step 6, the "Configure Security Group" page, create a new security group for this instance instead of using the `default`. Give it any HTTP/SSH access rules you need for your application. -- Finish launching the instance as normal. -- In the EC2 console, copy the ID of the security group for your new EC2 instance. + - During Step 3, the "Configure Instance Details" page, specify the following options: + - For `Network`, select the VPC matching the ID that you copied above. This will launch your instance in the same VPC as your database. + - For `Subnet`, Select any public subnet. This will allow the EC2 instance to access the public Internet. + - During Step 6, the "Configure Security Group" page, create a new security group for this instance instead of using the `default`. Give it any HTTP/SSH access rules you need for your application. + - Finish launching the instance as normal. + +### 2. Use security groups to grant the EC2 instance access to the database + +- In the EC2 console, click on the name of the security group that you created for your new EC2 instance, and copy the security group ID. - In the RDS console, navigate to the security group for your database. - In the detail view for your database's security group, select `Inbound > Edit > Add Rule` with the following attributes: - Type: `PostgreSQL` From 200eae81656b9f2ff59a7241603e08efebbc17fa Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Thu, 6 Jun 2019 15:29:21 -0500 Subject: [PATCH 10/11] Update aws/rds.md to fix vim-related typo Co-Authored-By: Hannah Cushman --- aws/rds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/aws/rds.md b/aws/rds.md index 0520174..4926621 100644 --- a/aws/rds.md +++ b/aws/rds.md @@ -28,6 +28,6 @@ These instructions are loosely based on [AWS's official documentation](https://d - In the detail view for your database's security group, select `Inbound > Edit > Add Rule` with the following attributes: - Type: `PostgreSQL` - Protocol: `TCP` - - Port range: `5432`j + - Port range: `5432` - Source: `Custom`, and paste in the value of the EC2 instance security group from above - Test that your server can access your database by [opening up an SSH tunnel](/postgres/Interacting-with-a-remote-database.md) and attempting to `psql` into the database. From de8b4d72386241da4cdecfbcc43d1fb0aa37b9a1 Mon Sep 17 00:00:00 2001 From: Jean Cochrane Date: Thu, 6 Jun 2019 15:30:16 -0500 Subject: [PATCH 11/11] Update aws/rds.md to clarify testing SSH access Co-Authored-By: Hannah Cushman --- aws/rds.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/aws/rds.md b/aws/rds.md index 4926621..f1c70ed 100644 --- a/aws/rds.md +++ b/aws/rds.md @@ -30,4 +30,4 @@ These instructions are loosely based on [AWS's official documentation](https://d - Protocol: `TCP` - Port range: `5432` - Source: `Custom`, and paste in the value of the EC2 instance security group from above -- Test that your server can access your database by [opening up an SSH tunnel](/postgres/Interacting-with-a-remote-database.md) and attempting to `psql` into the database. +- Shell into your server and test that you can access your database by [opening up an SSH tunnel](/postgres/Interacting-with-a-remote-database.md) to your RDS instance and attempting to `psql` into the database. (N.b., you can find the URL to your RDS instance on the database detail page in the RDS console.)