doc, outputs

lifemapper · Feb 12, 2024 · 7a81126 · 7a81126
1 parent a936101
commit 7a81126
Show file tree

Hide file tree

Showing 2 changed files with 26 additions and 26 deletions.
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 3. Assemble static ancillary inputs (local, AWS)
 4. Resolve RIIS records to GBIF accepted taxa (local, copy to AWS S3)
 5. Subset GBIF data to BISON (AWS Redshift/Glue)
-6. Load BISON subset and ancillary inputs to Redshift  
+6. Load BISON subset and ancillary inputs to Redshift
 7. Annotate BISON subset with regions and RIIS status (AWS Redshift)
 8. Summarize BISON subset by regions and RIIS status (AWS Redshift)
 9. Create Presence Absence Matrix (PAM) and compute statistics (local)
@@ -17,8 +17,8 @@
 ### Download the repository
 
 The `LmBISON repository <https://github.com/lifemapper/bison>`_  can be installed by
-downloading from Github.  This code repository contains python code, scripts for AWS 
-tools, Docker composition files, configuration files, and test data for creating the 
+downloading from Github.  This code repository contains python code, scripts for AWS
+tools, Docker composition files, configuration files, and test data for creating the
 outputs.
 
 Type `git` at the command prompt to see if you have git installed.  If you do not,
@@ -33,7 +33,7 @@ the command line:
 
 When the clone is complete, move to the top directory of the repository, `bison`.
 All hands-on commands will be executed in a command prompt window from this
-directory location.  
+directory location.
 
 ### Install dependencies
 
@@ -63,8 +63,8 @@ Use the most current version of the United States Register of Introduced and Inv
   * Year 4 data: https://doi.org/10.5066/P95XL09Q
   * Year 5 data: TBA
 
-The current file is named US-RIIS_MasterList_2021.csv, and is available in the 
-data/input directory of this repository.  Upload this file to 
+The current file is named US-RIIS_MasterList_2021.csv, and is available in the
+data/input directory of this repository.  Upload this file to
 s3://<S3 bucket>/input_data
 
 ### Census data for county/state
@@ -91,32 +91,32 @@ Upload the shapefile to s3://<S3 bucket>/input_data
 
 ### US Protected Areas Database (US-PAD)
 
-Unable to intersect these data with records because of the complexity of the shapefiles.  
+Unable to intersect these data with records because of the complexity of the shapefiles.
 Next time will try using AWS Redshift with a "flattened" version of the data.
 
 Try:
 * PAD-US 3.0 Vector Analysis File https://www.sciencebase.gov/catalog/item/6196b9ffd34eb622f691aca7
 * PAD-US 3.0 Raster Analysis File https://www.sciencebase.gov/catalog/item/6196bc01d34eb622f691acb5
- 
+
 These are "flattened" though spatial analysis prioritized by GAP Status Code
 (ie GAP 1 > GAP 2 > GAP > 3 > GAP 4), these are found on bottom of
 https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-data-download page.
 
-The vector datasets are available only as ESRI Geodatabases.  The raster datasets are 
-Erdas Imagine format.  It appears to contain integers between 0 and 92, but may have 
+The vector datasets are available only as ESRI Geodatabases.  The raster datasets are
+Erdas Imagine format.  It appears to contain integers between 0 and 92, but may have
 additional attributes for those classifications. Try both in AWS Redshift.
 
-Upload the raster and vector flattened zip files (test which works best later) to 
+Upload the raster and vector flattened zip files (test which works best later) to
 s3://<S3 bucket>/input_data
 
 ## 4. Resolve RIIS records to GBIF accepted taxa
 
-Run this locally until it is converted to an AWS step.  Make sure that the 
-data/config/process_gbif.json file is present.  From the bison repo top directory, 
+Run this locally until it is converted to an AWS step.  Make sure that the
+data/config/process_gbif.json file is present.  From the bison repo top directory,
 making sure the virtual environment is activated, run:
 
   ```commandline
-  python process_gbif.py --config_file=data/config/process_gbif.json resolve  
+  python process_gbif.py --config_file=data/config/process_gbif.json resolve
   ```
 
 Upload the output file (like data/input/US-RIIS_MasterList_2021_annotated_2024-02-01.csv
@@ -129,7 +129,7 @@ For all redshift steps, do the following with the designated script:
 
 * In AWS Redshift console, open `Query Editor`, and choose the button `Script Editor`.
 * Open existing or Create a new script (with +) and copy in the appropriate script.
-* Update the date string this processing step with the first day of the current month, 
+* Update the date string this processing step with the first day of the current month,
   for example, replace all occurrences of 2024_01_01 with 2024_02_01.
 * Run
 
@@ -139,12 +139,12 @@ GBIF Input
 
 * Use the Global Biodiversity Information Facility (GBIF) Species Occurrences on the
   AWS Open Data Registry (ODR) in S3. https://registry.opendata.aws/gbif/
-* These data are updated on the first day of every month, with the date string in 
-  the S3 address.  
-* The date string is appended to all outputs, and referenced in the subset scripts 
+* These data are updated on the first day of every month, with the date string in
+  the S3 address.
+* The date string is appended to all outputs, and referenced in the subset scripts
   (Redshift and Glue)
 * The data are available in each region, stay within the same AWS ODR region as the
-  BISON bucket.  
+  BISON bucket.
 
 ### Redshift subset (3 min)
 
@@ -157,7 +157,7 @@ GBIF Input
 * Run
 * If this method is used, must still load the results into Redshift for steps 7, 8
 
-## 6. Load ancillary inputs to from AWS S3 to AWS Redshift  
+## 6. Load ancillary inputs to from AWS S3 to AWS Redshift
 
 * Perform Redshift steps, using script:  `aws_script/rs_load_ancillary_data.sql`
 
@@ -170,7 +170,7 @@ GBIF Input
 ## 8. Summarize BISON subset by regions then export to S3 (AWS Redshift)
 
 * Perform Redshift steps, using script: `aws_scripts/rs_aggregate_export`
-* Outputs annotated records as CSV files in bucket/folder 
+* Outputs annotated records as CSV files in bucket/folder
   s3://bison-321942852011-us-east-1/out_data
   * aiannh_lists_<datestr>_000.csv
   * state_lists_<datestr>_000.csv
@@ -181,9 +181,9 @@ GBIF Input
 
 ## 9. Create Presence Absence Matrix (PAM) and compute statistics (local)
 
-* On a local machine, with the virtual environment activated, run the script 
+* On a local machine, with the virtual environment activated, run the script
   aws_scripts/bison_matrix_stats.py
-  
+
   ```commandline
   python aws_scripts/bison_matrix_stats.py
   ```

diff --git a/aws_scripts/bison_matrix_stats.py b/aws_scripts/bison_matrix_stats.py
@@ -661,7 +661,7 @@ def write_to_csv(self, filename):
 
 """
 from aws_scripts.bison_matrix_stats import (
-    read_s3_parquet_to_pandas,  reframe_to_heatmatrix, reframe_to_pam, get_logger, 
+    read_s3_parquet_to_pandas,  reframe_to_heatmatrix, reframe_to_pam, get_logger,
     SiteMatrix)
 
 import boto3
@@ -737,14 +737,14 @@ def write_to_csv(self, filename):
 # whittaker_new = pam.whittaker()
 # species_ct_new = pam.num_species
 # site_ct_new = pam.num_sites
-# 
+#
 # # new site stats
 # beta_new = pam.beta()
 # alpha_new = pam.alpha()
 # alpha_proportional_new = pam.alpha_proportional()
 # phi_new = pam.phi()
 # phi_average_proportional_new = pam.phi_average_proportional()
-# 
+#
 # # new species stats
 # omega_new = pam.omega()
 # omega_proportional_new = pam.omega_proportional()