Merge pull request #10 from datamade/jfc/stack-changes-and-gatsby

Refactor this repo and add stack change docs
datamade · Jun 14, 2019 · f26cc09 · f26cc09
2 parents 8a0fac2 + de8b4d7
commit f26cc09
Show file tree

Hide file tree

Showing 13 changed files with 164 additions and 27 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,82 @@
+# Making changes to the stack
+
+## Background
+
+Thanks to the addition of a formal research and development process, we have begun to identify opportunities for changing portions of the DataMade Standard Stack™. This process is intended to guide, but not hinder, the exploration of new tooling. Our goal is to empower lead developers to move as fast as possible in adopting new tools that will help the team work more productively, while minimizing the number of stray projects that we must maintain using tools that no one knows how to use.
+
+This process will:
+
+- Enable partners to confidently delegate authority for technical leadership
+- Empower lead developers to pursue and implement technical changes
+- Make transparent to developers the standard toolkit and process for changes
+- Minimize maintenance burden for future developers
+
+This document is a work in progress. If any step in this process is consistently and unnecessarily painful, it is subject to amendment. Amendments to this process should be proposed and agreed upon by lead developers, and approved by partners.
+
+## Process
+
+This process may be exited at any step if lead developers decide that a tool is not a good fit for the team. Lead developers should agree on abandoning a new tool, and the decision should be documented on the research issue tracking work on the tool. For an example of documented abandonment, see the [R&D project on Netlify Add-ons](https://github.com/datamade/ops/issues/610).
+
+### 1. Propose research project to team of lead developers
+
+Lead developers are the primary drivers of the research process. As such, all lead developers should comment and give approval on a research proposal before research begins.
+
+Lead developers should log proposals as GitHub notes in the [R&D project in this repo](https://github.com/datamade/tutorials/projects/1). Once work begins on a proposal, it should be converted into an issue. Proposals can take a number of forms, from a quick description to a formal project plan. Lead developers should work together to settle on a proposal standard that works for them.
+
+### 2. Conduct research and develop proof of concept
+
+The goal of a proof of concept is to test the tool under conditions that are as similar as possible to a real DataMade project. To this end, it’s typically useful to implement the proof-of-concept as a refactor or a rewrite of an existing app.
+
+Depending on the complexity of the tool, this step can take anywhere from one to four R&D days to finish. Only once the proof-of-concept reaches feature parity with its counterpart should lead developers move on to the next step.
+
+### 3. Recommend adoption, further research, or abandonment
+
+There are three outcomes that we expect from a proof-of-concept:
+
+1. The tool works as advertised, and lead developers recommend its adoption.
+2. The tool has mixed results, and lead developers recommend more research.
+3. The tool does not work as advertised, and lead developers recommend abandoning it.
+
+Whatever the outcome, the recommendation should include a cost/benefit analysis comparing the tool to other tools that DataMade uses to solve similar problems on a number of levels, including implementation time, prerequisite skills, and maintenance outlook. This cost/benefit analysis will be a draft and will be updated if developers recommend adoption, as they learn more about the tool.
+
+Lead developers should make this recommendation as a group. If one lead developer is leading up the research effort, this collaboration can take the form of a draft that other lead developers help revise. If the lead developers are doing research collaboratively, they might consider collaborating on a recommendation. Either way, the group should reach consensus before moving on.
+
+If lead developers recommend adoption, move on to step 5. If lead developers recommend further research, return to step 1. If lead developers recommend abandonment, document the reasons for abandonment and exit this process.
+
+### 4. Notify partners of recommendation
+
+Once lead developers have reached a consensus opinion on adoption or abandonment, they should forward their recommendation to partners for review. Partners may seek clarification, request further research, or approve the recommendation and begin planning for the next step.
+
+### 5. Pilot use of the tool on a project
+
+Once lead developers and partners are in consensus on a recommendation of adoption, they will pilot the new tool on a project.
+
+Piloting new tools should be undertaken carefully. At this point it’s likely that only one developer has expertise in the tool, and we would like to avoid above all a situation where we have to maintain technology that we have decided not to adopt.
+
+Some ideal projects on which to pilot a new tool might include:
+
+- A small, self-contained feature being implemented in a mature codebase
+- A one-off project that we don’t expect to spend much time maintaining in the future
+- A small greenfield project that will involve chunks of major development in the future, offering the possibility to refactor
+
+This phase should be conducted in collaboration with another developer in order to diversify perspectives on the new tool. Collaborators should be selected based on a demonstrated foundation of knowledge that will allow them to adapt to the new tool quickly.
+
+If no other developers yet have the necessary foundation of knowledge to make use of the tool, lead developers will coordinate with partners to plan time for training the collaborator before project work begins.
+
+### 6. Produce adoption artifacts
+
+After the pilot project is complete, lead developers should schedule a retrospective to gather feedback and learning about the tool from all developers involved.
+
+The retrospective is intended to help produce adoption artifacts that will guide future use of the tool, including:
+
+- Any updates to initial materials produced during R&D
+- A list of lessons learned about the tool, including links to helpful resources
+- If applicable, a template for bootstrapping a project with the tool in the future
+
+These artifacts should be written up as a pull request against this repo, which serves as the central knowledge store for the DataMade Standard Stack™. Templates or other documentation may exist in separate docs/repos, but they should still be referenced in the tutorials repo in order to encourage centralization of knowledge.
+
+If developers somehow reach this step and decide to abandon the tool, they should devise a contingency plan for dealing with future maintenance of the project. This contingency plan might include:
+
+- Refactoring the code to remove the tool
+    - Ideally this should happen during the course of normal business, but if no budget is available for the project, lead developers should expect to have to use R&D time to do this cleaning
+- Adding extra documentation to help future developers understand the context of the tool
diff --git a/README.md b/README.md
@@ -1,32 +1,29 @@
-# tutorials
+# how-to
 
 _📚 Doing all sorts of things, the DataMade way_
 
 ## What's this?
 
-Here at DataMade, we do a lot of computer programming. Sometimes, we create entire repos dedicated to the practices we employ:
-
-- [Making Data, the DataMade Way](https://github.com/datamade/data-making-guidelines)
-- [Analyzing Data, the DataMade Way](https://github.com/datamade/data-analysis-guidelines)
-- [Writing Tests, the DataMade Way](https://github.com/datamade/testing-guidelines)
-
-But our daily computing also involves a lot of small, but important, tasks. In the spirit of [better living through documentation](https://datamade.us/blog/better-living-through-documentation), we're preserving guides to those tasks, here.
-
-## Index
-
-### Python libraries
-
-- [A quick and dirty introduction to `sqlalchemy`](/quick-n-dirty-sqlalchemy.md)
-- Prefetching objects in Django
-- [`lxml` for web scraping](/lxml-for-web-scraping.md)
-
-### Database operations
-
-- [Interacting with a remote database](/Interacting-with-a-remote-database.md)
-- [Dumping and restoring a Postgres database](/Dump-and-restore-Postgres.md)
-- [Dockerizing Postgres](/Dockerizing-Postgres.md)
-
-### Devops
-
-- [tmux, best practices](tmux-best-practices.md)
-- [How to move a gpg key between servers](moving-keys-between-servers.md)
+Here at DataMade, we do a lot of computer programming. In the spirit of [better living through documentation](https://datamade.us/blog/better-living-through-documentation), we're preserving guides to how we do that here.
+
+## Contents
+
+- [Data processing and ETL](https://github.com/datamade/data-making-guidelines)
+- [Reproducible data analysis](https://github.com/datamade/data-analysis-guidelines)
+- [Software testing](https://github.com/datamade/testing-guidelines)
+- [PostgreSQL](/postgres/)
+    - [A quick and dirty introduction to `sqlalchemy`](/postgres/quick-n-dirty-sqlalchemy.md)
+    - [Interacting with a remote database](/postgres/Interacting-with-a-remote-database.md)
+    - [Dumping and restoring a Postgres database](/postgres/Dump-and-restore-Postgres.md)
+    - [Dockerizing Postgres](/postgres/Dockerizing-Postgres.md)
+- [Web scraping](/scraping/)
+    - [`lxml` for web scraping](/scraping/lxml-for-web-scraping.md)
+- [The shell and Ubuntu](/shell/)
+    - [tmux, best practices](/shell/tmux-best-practices.md)
+    - [How to move a gpg key between servers](/shell/moving-keys-between-servers.md)
+- [Amazon Web Services](/aws/)
+    - [AWS Relational Database Service (RDS)](/aws/rds.md)
+
+## Contributing
+
+The process for making changes to the DataMade Stack, and by extension this repo, is documented in [`CONTRIBUTING.md`](./CONTRIBUTING.md).
diff --git a/aws/rds.md b/aws/rds.md
@@ -0,0 +1,33 @@
+# AWS Relational Database Service (RDS)
+
+This doc presents guides on working with AWS RDS, Amazon's Database-as-a-Service provider.
+
+## Connecting with an EC2 instance
+
+In order to connect to an RDS instance from an EC2 instance, you'll need to configure the security rules for the database to allow access from the instance. Broadly speaking, AWS allows you to do this by 1) creating the database and the EC2 instance in the same Virtual Private Cloud (VPC) and 2) creating an inbound rule for the database's security group that permits access to PostgreSQL from the EC2 instance's security group.
+
+This process is useful when you have a database that is running in a VPC and that does not have access to the public Internet. Generally, we consider it a best practice to provision databases this way, so that they are protected from attacks over the Internet. For an example of a project that uses this configuration, see https://gitlab.com/ChicagoDataCooperative/court-terminal.
+
+These instructions are loosely based on [AWS's official documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario1). They assume that you already have an RDS instance up and running in a private VPC (docs forthcoming).
+
+### 1. Create the EC2 instance in the same VPC as the database
+
+- Navigate to the page for your database in the RDS console.
+- Under `Connectivity & security > Networking > VPC`, copy the ID of the VPC that the database runs in.
+- Navigate to the EC2 console and begin to launch an EC2 instance [following our guide](https://github.com/datamade/deploy-a-site/blob/master/Launch-a-new-EC2-Instance.md)
+    - During Step 3, the "Configure Instance Details" page, specify the following options:
+        - For `Network`, select the VPC matching the ID that you copied above. This will launch your instance in the same VPC as your database.
+        - For `Subnet`, Select any public subnet. This will allow the EC2 instance to access the public Internet.
+    - During Step 6, the "Configure Security Group" page, create a new security group for this instance instead of using the `default`. Give it any HTTP/SSH access rules you need for your application.
+    - Finish launching the instance as normal.
+
+### 2. Use security groups to grant the EC2 instance access to the database
+
+- In the EC2 console, click on the name of the security group that you created for your new EC2 instance, and copy the security group ID.
+- In the RDS console, navigate to the security group for your database.
+- In the detail view for your database's security group, select `Inbound > Edit > Add Rule` with the following attributes:
+    - Type: `PostgreSQL`
+    - Protocol: `TCP`
+    - Port range: `5432`
+    - Source: `Custom`, and paste in the value of the EC2 instance security group from above
+- Shell into your server and test that you can access your database by [opening up an SSH tunnel](/postgres/Interacting-with-a-remote-database.md) to your RDS instance and attempting to `psql` into the database. (N.b., you can find the URL to your RDS instance on the database detail page in the RDS console.)
diff --git a/Dockerizing-Postgres.md → postgres/Dockerizing-Postgres.md b/Dockerizing-Postgres.md → postgres/Dockerizing-Postgres.md
diff --git a/Dump-and-restore-Postgres.md → postgres/Dump-and-restore-Postgres.md b/Dump-and-restore-Postgres.md → postgres/Dump-and-restore-Postgres.md
diff --git a/Interacting-with-a-remote-database.md → ...res/Interacting-with-a-remote-database.md b/Interacting-with-a-remote-database.md → ...res/Interacting-with-a-remote-database.md
diff --git a/postgres/README.md b/postgres/README.md
@@ -0,0 +1,10 @@
+# PostgreSQL
+
+This directory records best practices for working with the object-relational database management system PostgreSQL, our primary choice of database.
+
+## Guides
+
+- [A quick and dirty introduction to `sqlalchemy`](./quick-n-dirty-sqlalchemy.md)
+- [Interacting with a remote database](./Interacting-with-a-remote-database.md)
+- [Dumping and restoring a Postgres database](./Dump-and-restore-Postgres.md)
+- [Dockerizing Postgres](./Dockerizing-Postgres.md)
diff --git a/quick-n-dirty-sqlalchemy.md → postgres/quick-n-dirty-sqlalchemy.md b/quick-n-dirty-sqlalchemy.md → postgres/quick-n-dirty-sqlalchemy.md
diff --git a/scraping/README.md b/scraping/README.md
@@ -0,0 +1,7 @@
+# Web scraping
+
+This directory records best practices for scraping the web.
+
+## Guides
+
+- [`lxml` for web scraping](./lxml-for-web-scraping.md)
diff --git a/lxml-for-web-scraping.md → scraping/lxml-for-web-scraping.md b/lxml-for-web-scraping.md → scraping/lxml-for-web-scraping.md
diff --git a/shell/README.md b/shell/README.md
@@ -0,0 +1,8 @@
+# The shell and Ubuntu
+
+This directory records best practices for working with the shell -- typically Bash -- and Ubuntu, our choice of operating system for our servers.
+
+## Guides
+
+- [tmux, best practices](./tmux-best-practices.md)
+- [How to move a gpg key between servers](./moving-keys-between-servers.md)
diff --git a/moving-keys-between-servers.md → shell/moving-keys-between-servers.md b/moving-keys-between-servers.md → shell/moving-keys-between-servers.md
diff --git a/tmux-best-practices.md → shell/tmux-best-practices.md b/tmux-best-practices.md → shell/tmux-best-practices.md