Skip to content

Commit

Permalink
Db refactor (#316)
Browse files Browse the repository at this point in the history
* Updated to add keywords to search terms table

* Added Dockerfile and docker-compose for pgadmin + secrets

* Added JsonKeywords model

* Modified logic to add + use SearchTerms table, added logic to write json to JsonKeywords table

* Fixed json.loads(...) error

* Changed error handling to populate json_keywords table with error messages instead of using Meta* tables

* Removed Meta* and Keywords classes from ORM

* Modified Datafiles ORM, includes level, removed edr.  Modified search terms.

Modified Datafiles ORM to include level as a single-character column.  Fixes #210.
Modified Datafiles ORM to eliminate 'edr' from source / label.  Fixes #211.
Modified search terms table to include upctime, added footprints, removed
  boundingboxintersections, added max/min/center for phase/incidence/emission angle,
  added err_flag column.  Fixes #197.

* Added functionality to zip .map file

* testing zip fix

* Added conditional decode to Qfile2Qwork function

* Finished error handling refactor

* Added namespace argument

* Added .map to zipped files.  Addressed decode errors.

* Removed debugging print statements.

* Removed final references to Meta* tables.

"Bands" information is now placed in keywordsOBJ dictionary, which
is converted to JSON when placed into the database.

Needs testing.

* Fixed 'engine' bugs, removed 'meta' from instruments, table

* Added DataFiles to auto-generated tables

* Fixed function definition ordering

* Functional prototype up and running.

* Removed debugging print statement

* Initial database_test.py

* Refactored ORM to include foreign keys in search terms + JSONB in json keywords

* Added geoserver Dockerfile

* Added scripts for REST API and GeoServer, the notebook has taken the BASH database file and converted them to python calls

* Updated with docs and removed scripts

* Removed secrets

* Initial database initialization tool

* Removed table creation functionality from UPC_process

* Updated with new version of PGAdmin

* Refactored upc error reporting (json keywords, queues, logging) (#270)

Removed UPC start/stop times from error dictionary.
Added file name to error dictionary.
Adding warn-level logging for upc failure.
Added UPC error queue + queueing for inputfile + isis program that failed.

* Added mechanisms to allow upc queueing to filter results by substring. (#279)

Added a 'filter' flag to upc queueing.  This parameter functions as
a subquery that uses a substring to filter the files that will
be added to the UPC ready queue.

Closes #276

* INCOMPLETE: Readying for parallel processing.

* INCOMPLETE: Readying for parallel processing. (#285)

Just noting for the record that I am merging this PR so that we have a baseline for conducting a code review of UPC_process.py and that I understand that UPC-related code in this PR is not functional.

* Various bugfixes

* Refactored HPCJobber to accept arguments that will be passed to subsequent procedures.

* Removed now-deprecated 'ingest-override' functionality in favor of parameterized commands.

* Addressed logging, parameterization, and querying issues.

- Added 'persist' flag
- Updated logger formatting to include slurm array id and task id
- Swapped hard-coded paths for config file paths
- Set detached label column in database
- Added spacecraft id to query for instrument lookup
- Removed 'successfully' descriptor in final log message

* Moved duplicate functionality outside of conditionals, removed branches in which logic was identical.

* Fixed detached label and EDRsource

* Added loglevel as argument to UPC_process (#314)

* Merged master into db_refactor (#315)

* Removed args classes in favor of more traditional argparse usage.

* Fix POW recipe for Voyager (#253)

* Update issue templates

* Update issue templates

* PDSinfo.json: replaced .LBL with .IMG in cassini_iss upc_reqs (#272)

Edited pdsinfo file to more accurately represent cassini's upc requirements and allow for POW processing.

* Added error message for incorrect volume specification (#273)

* Fixed logic for map and pow locking (#278)

* Service lock (#280)

* Added service lock

* Added logic to exit if services are locked

* Delete bug_report.md

* Delete feature_request.md
  • Loading branch information
AustinSanders authored Sep 25, 2019
1 parent 0a6e3e0 commit f99d4a5
Show file tree
Hide file tree
Showing 15 changed files with 949 additions and 584 deletions.
13 changes: 13 additions & 0 deletions REST/GeoServerPy/cURL_scripts/config.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

<!-- This configuration is only needed if you are going to be using the BASH 'database.sh' file -->
<dataStore>
<name>name_of_store</name>
<connectionParameters>
<host>hostname</host>
<port>port</port>
<database>db_name</database>
<user>username</user>
<passwd>password</passwd>
<dbtype>db_type</dbtype>
</connectionParameters>
</dataStore>
15 changes: 15 additions & 0 deletions REST/GeoServerPy/cURL_scripts/database.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@

# Creates a workspace in your running instance of GeoServer
curl -v -u admin:geoserver -XPOST -H "Content-type: text/xml" -d "<workspace><name>upc</name></workspace>" http://localhost:8080/geoserver/rest/workspaces

# Creates a DB connection which is called a store in GeoServer
curl -v -u admin:geoserver -XPOST -T config.xml -H "Content-type: text/xml" http://localhost:8080/geoserver/rest/workspaces/upc/datastores

# Gets the info about the database back so you can make sure it worked
curl -v -u admin:geoserver -XGET http://localhost:8080/geoserver/rest/workspaces/upc/datastores/upcdev.xml

# This makes a table form the database available from the database or 'publish' it
curl -v -u admin:geoserver -XPOST -H "Content-type: text/xml" -d "<featureType><name>datafiles_w_footprints</name></featureType>" http://localhost:8080/geoserver/rest/workspaces/upc/datastores/upcdev/featuretypes

# This returns an image from the datafiles_w_footprints table from the upcdev database as an example
wget http://localhost:8080/geoserver/wms/reflect?layers=upc:datafiles_w_footprints
255 changes: 255 additions & 0 deletions REST/GeoServerPy/notebooks/Python for GeoServer.ipynb

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions containers/geoserver/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# For an example on how to run this docker container,
# please check the buildDocker.sh script... or just configure and run the script!
# This container creates an instance of GeoServer 2.15.1

FROM alpine:3.7
USER root

# Installs JDK8
RUN apk update
RUN apk fetch openjdk8
RUN apk add openjdk8

# Creates directories for geoserver and it's downloads
RUN mkdir /usr/share/geoserver
RUN mkdir ~/geoserver_download

RUN wget -P ~/geoserver_download https://sourceforge.net/projects/geoserver/files/GeoServer/2.15.1/geoserver-2.15.1-bin.zip
RUN ls -l ~/geoserver_download
RUN unzip ~/geoserver_download/geoserver-2.15.1-bin.zip -d /usr/share/geoserver

# Set environment variable to the correct home
ENV GEOSERVER_HOME="/usr/share/geoserver/geoserver-2.15.1/"

# Runs the geoserver startup.sh
ENTRYPOINT ["/usr/share/geoserver/geoserver-2.15.1/bin/startup.sh"]
22 changes: 22 additions & 0 deletions containers/postgres/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
version: "3.1"
services:
postgres:
image: mdillon/postgis:9.6-alpine
environment:
POSTGRES_PASSWORD: 1234
POSTGRES_USER: upcmgr
POSTGRES_DB: upc
volumes:
- "/scratch/sakins/upc:/var/lib/postgresql/data"
ports:
- "9010:5432"

admin:
image: dpage/pgadmin4
environment:
PGADMIN_DEFAULT_EMAIL: admin
PGADMIN_DEFAULT_PASSWORD: 1234
volumes:
- ./pgadmin-config:/var/lib/pgadmin
ports:
- "9011:80"
42 changes: 42 additions & 0 deletions containers/postgres/pgadmin_container/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
FROM debian:8.5

ENV LANG=C.UTF-8 LC_ALL=C.UTF-8

RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion

# The --no-check-certificate is due to internal SSL issues
RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
wget --no-check-certificate --quiet https://repo.continuum.io/miniconda/Miniconda3-4.3.21-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh

RUN apt-get install -y curl grep sed dpkg && \
TINI_VERSION=`curl -k https://github.com/krallin/tini/releases/latest | grep -o "/v.*\"" | sed 's:^..\(.*\).$:\1:'` && \
curl -k -L "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb" > tini.deb && \
dpkg -i tini.deb && \
rm tini.deb && \
apt-get clean

ENV PATH /opt/conda/bin:$PATH

ENTRYPOINT [ "/usr/bin/tini", "--" ]
CMD [ "/bin/bash" ]

# To get a gcc compiler in for building C libs
RUN apt-get -y install build-essential

# Get PGAdmin4 installed w/ dependencies
RUN pip install https://ftp.postgresql.org/pub/pgadmin/pgadmin4/v1.6/pip/pgadmin4-1.6-py2.py3-none-any.whl --trusted-host=ftp.postgresql.org --trusted-host=pypi.python.org

# PGAdmin port
EXPOSE 5050

# Set the server parameter to be false
RUN sed -i 's/SERVER_MODE = True/SERVER_MODE = False/g' /opt/conda/lib/python3.6/site-packages/pgadmin4/config.py
RUN sed -i "s/DEFAULT_SERVER = '127.0.0.1'/DEFAULT_SERVER = '0.0.0.0'/g" /opt/conda/lib/python3.6/site-packages/pgadmin4/config.py

RUN python /opt/conda/lib/python3.6/site-packages/pgadmin4/config.py

CMD ["python", "/opt/conda/lib/python3.6/site-packages/pgadmin4/pgAdmin4.py"]
43 changes: 35 additions & 8 deletions pds_pipelines/HPCjobber.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,20 @@ def parse_args(self):
parser.add_argument('--process', '-p', dest="process", required=True,
choices = choices, help="Enter process - {}".format(choices))

parser.add_argument('--jobarray', '-j', dest="jobarray",
parser.add_argument('--jobarray', '-j', dest="jobarray", type=int,
help="Enter string to set job array size")

parser.add_argument('--norun', action='store_true')

parser.add_argument('--args', dest='process_args', nargs='*', required=False)

parser.set_defaults(norun=False)
args = parser.parse_args()

self.process = args.process
self.jobarray = args.jobarray
self.norun = args.norun
self.process_args = args.process_args


def main():
Expand All @@ -60,6 +67,7 @@ def main():
logger.addHandler(logFileHandle)
logger.info(job['info'])


# Parametrize the HPC job using the configuration file
date = datetime.datetime.now(pytz.utc).strftime("%Y%m%d%M")
jobOBJ = HPCjob()
Expand All @@ -74,28 +82,47 @@ def main():
# a @date@ tag that we replace with the current date
SBfile = job['SBfile'].replace('@date@', date)
cmd = job['cmd']
if args.process_args:
cmd += ' ' + ' '.join(args.process_args)

if args.jobarray:
JA = args.jobarray
JA = int(args.jobarray)
try:
sctrl = subprocess.Popen("scontrol show config".split(), stdout=subprocess.PIPE)
grep = subprocess.Popen("grep -E MaxArraySize".split(), stdin=sctrl.stdout, stdout=subprocess.PIPE)
output, error = grep.communicate()
max_jobs = int(output.decode('utf-8').split('=')[1])
except:
logger.error("Unable to detect job array size")
exit()

if JA > max_jobs:
logger.error("%d exceeds job limit of %d", JA, max_jobs)
exit()
else:
JA = 1

jobOBJ.setJobArray(JA)
jobOBJ.setJobArray(str(JA))
jobOBJ.setCommand(cmd)
jobOBJ.MakeJobFile(SBfile)

logger.info('SBATCH file: %s', SBfile)

try:
sb = open(SBfile)
sb.close
logger.info('SBATCH File Creation: Success')
except IOError as e:
logger.error('SBATCH File %s Not Found', SBfile)

try:
jobOBJ.Run()
logger.info('Job Submission to HPC: Success')
except IOError as e:
logger.error('Jobs NOT Submitted to HPC')
if args.norun:
logger.info('No-run mode, will not submit HPC job.')
else:
try:
jobOBJ.Run()
logger.info('Job Submission to HPC: Success')
except IOError as e:
logger.error('Jobs NOT Submitted to HPC')

if __name__ == "__main__":
sys.exit(main())
3 changes: 2 additions & 1 deletion pds_pipelines/PDSinfo.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@
},
"galileo_ssi_edr": {
"archiveid": "25",
"path": "/pds_san/PDS_Archive/Galileo/SSI/"
"path": "/pds_san/PDS_Archive/Galileo/SSI/",
"bandbinQuery": "FilterName"
},
"kaguya_lism": {
"archiveid": "92",
Expand Down
1 change: 1 addition & 0 deletions pds_pipelines/ServiceFinal.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ def main(log_level, namespace, key):
datetime.datetime.now().strftime("%Y-%m-%d %H:%M") + "\n")

logOBJ.write(" ISIS VERSION: " + isis_version)

if infoHash.getStatus() == 'ERROR':
logOBJ.write(" JOB STATUS: " +
infoHash.getStatus() + " See Details Below\n")
Expand Down
Loading

0 comments on commit f99d4a5

Please sign in to comment.