Skip to content

Latest commit

 

History

History
5529 lines (4640 loc) · 217 KB

index.org

File metadata and controls

5529 lines (4640 loc) · 217 KB

Open Software for Restricted Data: an Environmental Epidemiology example

Copyright

Copyright is the GNU General Public License version 2 WITHOUT ANY WARRANTY, Ivan C Hanigan <[email protected]>

Abstract


Introduction /introduction.html

Deploying Virtual Machines

Description

We want to deploy the service as a symbiotic pair of virtual images that together will provide integrated data storage and analysis services in the cloud. Our recent review of the system prototype I developed suggested a need for two servers to comprise this system; to provide a more robust and powerful combination of structural and functional aspects.

The services hosted by the pair of servers will be:

  • The Brains: a Statistical Analysis engine (running the R environment) integrated with a metadata registry, and
  • The Brawn: a PostgreSQL Geographical Information System (GIS) Database server.

Accessing Virtual Machines on the Australian Research Cloud

Launch an instance

  • Log on to the research cloud using your Australian University details http://www.nectar.org.au/research-cloud/.
  • create two security groups for brawn and brains, this will be the special firewall settings for each
  • add ssh port 22 (add http(s): 80 etc, postgres 5432, icmp -1 as needed)
  • Under images and snapshots launch the centos 6.2 amd64 image
  • give your server a name, add security group
  • specify a ssh keypair http://support.rc.nectar.org.au/demos/launching_123.html
  • launch and note the ip address
  • go to the access and security section and edit the rules (ie for http add a rule to allow access from port 8080 to 9000 with 0.0.0.0/0 (CIDR))
  • Note that this allows access to the whole world, we will think about securing the server later

security-groups header

requesting help

To: [email protected] Subject: Request about snapshot status

Hi,

My AAF email I login to the cloud with is: [email protected]

What is currently happening: I took a snapshot of one VM and it said success but now in the status column it has ‘shutoff’ and a spinning wheel.

What should happen: It should return to status ‘Active’.

Cheers, Ivan Hanigan Data Management Officer. Thursday PhD scholar (ANU/CSIRO). National Centre for Epidemiology and Population Health. College of Medicine, Biology and Environment. Australian National University Canberra, ACT, 0200. Ph: +61 2 6125 7767. Fax: +61 2 6125 0740. Mob: 0428 265 976. CRICOS provider #00120C.

connect using ssh (mention this is required to set pwd on ec2-user so you can use the web dash - use sudo su; passwd ec2-user)

If using Windows make sure you have git or WinSCP installed. Opening a bash shell from these will enable you to connect to your server using ssh. Then the contents of the following orgmode chunks can be evaluated on the remote server. If your on linux then orgmode can execute these chunks using C-c C-c.

In the next chunk, insert the relevant ip address and you may have to answer yes to the question about adding this RSA fingerprint to your list.

cd ~/.ssh
ssh -i keypairname [email protected]
# it is prudent to set a hideously long password for root
# passwd root

Setting up the basic server framework

This is done in a similar way for both the Brains and the Brawn servers.

Install any updates using the yum package manager

This is recommended to do every week to maintain the server in good condition, especially regarding security software.

Security measures

The following section sets some restrictions on the server. I would like to know how important it is to restrict root login and also if we can permit login via ssh and port 22 if it is only open to the NCEPH VPN range? If so I just leave the below as yes yes yes? I had to get a bit of advice from a sysadmin at work about the following but I am sure it is still pretty unsecure. There are various TODOs hidden in the source code document.

add a ssh key

  • get the private/public key generated
  • insert the public key

Restrict ssh access to a specific ip address/range

Setting the host.* TCP Wrappers

Setting the host.* TCP Wrappers header

Security Enhanced Linux (selinux)

To run the Database and the Rstudio server it is best to disable the selinux.

Find out if this is necessary for PostgreSQL as well as Rstudio.

# selinux config
vi /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# Change SELINUX=enforcing to disabled

Install some base packages

There are a few commonly used packages recommended for both Brawn and Brains.

yum install gcc-gfortran gcc-c++ readline-devel libpng-devel libX11-devel libXt-devel texinfo-tex.x86_64 tetex-dvips docbook-utils-pdf cairo-devel java-1.6.0-openjdk-devel libxml2-devel make unzip hdparm

Hardware set-up

Now we will note the size and number of the disc partitions.

Swap space /swapon.html

Persistent Net Rules Should Be Avoided On Centos

The current “CentOS 6.2 amd64” image (ID 21401), if launched and snapshotted, is not connectible when launched from this snapshot due to networking. (davidjb, Mon Mar 26, 2012 support.rc.nectar.org.au/forum).

Disk Storage

Every VM on the Research Cloud has a 10GB primary disk which is used for the image you launch. The Primary disk is copied in a snapshot, so anything on this primary disk can be backed up via snapshots. In addition every Virtual Machine will get secondary storage. This secondary disk is not copied or backed up via snapshots. If you reboot your virtual machine, the secondary disk data remains in tact. If you shut down your VM, the data disappears (it is not persistent).

find out recommended practice for backups, and purpose of ‘object storage’

import boto.s3.connection

connection = boto.s3.connection.S3Connection( aws_access_key_id=’7b7a0a6f71994e42a07c8e9b0de0f8ca’, aws_secret_access_key=’2d54f6a679cd4b06820550459451cb50’, port=8888, host=’swift.rc.nectar.org.au’, is_secure=True, calling_format=boto.s3.connection.OrdinaryCallingFormat())

buckets = connection.get_all_buckets() print buckets

  • downside is you have to use python to work with the data?

The Brawn

The Brawn is a PostgreSQL Geographical Information System (GIS) Database server.

OpenGeo Suite (Postgres and friends)

Restricted OpenGeo Suite /opengeosuite-restricted.html

The OpenGeo Suite Installer

Upgrade tomcat6

Secure tomcat

  • We have a service called Nessus that reviewed the VM tomcat port
  • found some minor issues
Apache Tomcat servlet/JSP container default files

Synopsis The remote web server contains example files. Description Example JSPs and Servlets are installed in the remote Apache Tomcat servlet/JSP container. These files should be removed as they may help an attacker uncover information about the remote Tomcat install or host itself. Or they may themselves contain vulnerabilities such as cross-site scripting issues. Solution Review the files and delete those that are not needed. Risk Factor Medium CVSS Base Score 6.8 (CVSS2#AV:N/AC:M/Au:N/C:P/I:P/A:P) Plugin Information: Publication date: 2004/03/02, Modification date: 2012/01/20 Ports tcp/8081

The following default files were found :

/examples/servlets/index.html /examples/jsp/snp/snoop.jsp /examples/jsp/index.html

Just remove this dir? cd usr/share/apache-tomcat-6.0.37/webapps rm -R -f examples

CGI Generic Cross-Site Scripting (persistent, 3rd Pass)

A CGI application hosted on the remote web server is potentially prone to cross-site scripting attacks. Description The remote web server hosts one or more CGI scripts that fail to adequately sanitize request strings containing malicious JavaScript. By leveraging this issue, an attacker may be able to cause arbitrary HTML and script code to be executed in a user’s browser within the security context of the affected site. This script identified patterns that were injected to test ‘reflected’ (aka ‘non-persistent’) XSS. The issues are likely to be ‘persistent’ (or ‘stored’) after all Solution Restrict access to the vulnerable application and contact the vendor for a patch or upgrade

Postgres standalone

PostgreSQL /postgresql.html

#### Code
    # install the PostgreSQL 9.2 repo package
    rpm -ivh http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-6.noarch.rpm
    # install PostgreSQL 9.2 with single command:
    yum groupinstall "PostgreSQL Database Server PGDG"
    # This will install PostgreSQL 9.2 server, along with -contrib subpackage.
    # Once it is installed, first initialize the cluster:
    service postgresql-9.2 initdb
    # Now, you can start PostgreSQL 9.2:
    service postgresql-9.2 start
    # If you want PostgreSQL 9.2 to start everytime on boot, run this:
    chkconfig postgresql-9.2 on

Configure PostgreSQL connection settings

Allow connection to postgres through the firewall

PostGIS 2.0 /postgis.html

Postgis

GDAL, PROJ and GEOS

Create Database

Create a GIS user and a group

Specific transformations grid for Australian projections AGD66 to GDA94

Test transform

PostgreSQL Migration /postgres-migrate.html

Migrate the data

Set up backups

From here to a secure location at my work.

Tuning Postgres Performance

Important shared memory settings /sharedmemory.html

Postgres conf

I noticed when installing opengeo on RHEL and Ubuntu that the installer changes more parameters on the Ubuntu conf file. Compare these with diff and update RHEL to reflect the settings for ubuntu.

Test loading some shapefiles

on your ubuntu desktop install postgis and gdal (see above)

then let’s demo the Tasmanian SLAs:

download the shapefiles
upload the shp2psql

The Brains

The Brains is a Statistical Analysis engine (running the R environment) integrated with a metadata registry.

R Server

R

#rpm -Uvh #http://mirror.as24220.net/pub/epel/6/i386/epel-release-6-8.noarch.rpm

To update R as ‘root’ on your system simply type

package management and R updates

Kudos2 http://zvfak.blogspot.com.au/2012/06/updating-r-but-keeping-your-installed.html The problem is that when you update R you usually need to re-install your libraries or change .libPaths() to point to a location that has your previous libraries.

The solution below will work for unix-like operating systems including Mac OS X.

Rstudio (always best to get current version from the website)

check out the RSudio versions at: http://rstudio.org/download/server

firewall access

Check that you can log on via port 8787. Note that this is an unsecure R server and the following steps are required to make this a secure server. We will need to block port 8787 later on. to run a different port try vi /etc/rstudio/rserver.conf

www-port=8081

SSL/HHTPS and running a proxy server

git

ssh for github

  • in rstudio
  • tools / options / version control
  • create rsa key, ok, ok
  • view pub key, copy, paste to your github account

gdal

geos

or under ubuntu

test readOGR

NB only possible once the PostGIS install is complete, see below.

rgraphviz

test

install just the postgres bits required for RPostgreSQL package

postgis utilities

unixODBC

Test the Backups of this Minimal R Sever.

To test that this will restore successfully if there is a crash or the VM becomes corrupted take a snapshot from the ResearchCloud dashboard website and launch it to a new VM. Note that this will only restore the 10GB primary disc which has our software but not the user data which is on the secondary disc so try mounting that and check that the new VM will be a good replacement of the old one.

Backup the 2nd Disc

Launch from this snapshot and test the R server and 2nd Disc

the permissions of the user on their home directory cause issues for logging in.

I’ve just used userdel -r username, then useradd etc and use their own user account to scp the backup files across. One solution is to not offer backup services, then users will have to undertake to load all data again from a fresh server if it failed. Data could be backed up on Brawn only?

Oracle XE Permissions and Users System

Installing Oracle XE, this is used as a data registry and I have a HTML-DB application available from XXXXXX to be used. It is called OraPUS because the Oracle XE license restricts any references to Oracle in the name of the application. There are other restrictions on this licence such as XXXXX.

backup local ubuntu version

I have a ubuntu pc as my main desktop and also backups to create oracle on this I needed a centos virtual Machines use the instructions at http://extr3metech.wordpress.com/2012/10/25/centos-6-3-installation-in-virtual-box-with-screenshots/ and also the comments ie: @Gunwant, One way for configuring the network, open your virtual box settings and go to the “Network” tab and choose the option “Attached to” to “Bridged Adapter”.

Then boot your centos virtual machine, and type the following in your terminal:

And then change the file to the following:

DEVICE=”eth0″ BOOTPROTO=”dhcp” ONBOOT=”yes

And save the file and exit. After that you need to restart the network service and you can do that by typing the following in the terminal:

Now, you can check your command

Now, it should work. I will be posting a detailed article soon for my readers soon. Thank you for visiting my blog. Feel free to ask any questions in the comments section. Have fun!

INIT

################################################################
# name:init
yum update
# SElinux config
# vi /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# Change SELINUX=enforcing to disabled and you must reboot the server after applying the change.
# vi /etc/hosts
127.0.0.1  localhost.localdomain localhost ADD-VMNAME 

SWAP

MAKE SURE YOU HAVE 2GB SWAP, this is required for install but I have deleted it aftward and things still work http://bvirtual.nl/2009/08/add-more-swap-space-to-running-linux.html I SET TO 3000, didn’t set to start at boot so have to swapon /swapfile

################################################################
# name:swap
#Create an empty file called /swapfile from about 2GB
dd if=/dev/zero of=/swapfile bs=1024000 count=3000
#Format the new file to make it a swap file
mkswap /swapfile
#Enable the new swapfile. Only the swapon command is needed, but with
#the free command you can clearly see the swap space is made available
#to the system.
free -m | grep Swap
swapon /swapfile
free -m | grep Swap
# TOO LAZY TO ADD TO BOOT SO JUST DO SWAPON EVERY TIME?

DOWNLOAD AND SCP

create free oracle account. download to local and then upload to server.

scp oracle-xe-11.2.0-1.0.x86_64.rpm.zip user@ip.address.of.server:/home/

Install the database

I followed these online guides: http://youtu.be/DRZg8tfmldA (this needs VNC or other) http://www.davidghedini.com/pg/entry/install_oracle_11g_xe_on

yum install libaio bc flex unzip 
cd /home
mkdir OracleXE
unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip -d OracleXE 
cd OracleXE/Disk1
rpm -ivh oracle-xe-11.2.0-1.0.x86_64.rpm  
/etc/init.d/oracle-xe configure
# accept defaults
#set up the etc/group file so that oracle is in dba (and oinstall? not there?)
id -a oracle
#log on as oracle and then 
su - oracle
# set the environment
cd /u01/app/oracle/product/11.2.0/xe/bin  
. ./oracle_env.sh 
# NOT DONE, which users needs this?  which dot file?  both?  To set
# the environment permanently for users, add the following to the
# .bashrc or .bash_profile of the users you want to access the
# environment:
     . /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh 
sqlplus /nolog
connect sys/password as sysdba
#To allow remote access to Oracle 11g XE GUI (as well as Application Express GUI) issue the following from SQL*Plus
EXEC DBMS_XDB.SETLISTENERLOCALACCESS(FALSE);  
exit
su - root
# modify firewall
# vi /etc/sysconfig/iptables
# copy -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
# twice and modify for 8080 and 1521
# restrict 1521 to your administration IP address
# will also need the port 8080 enabled on the research cloud firewall
service iptables restart

Now we can check the database service is running by pointing a browser at http://ip.address.of.server:8080/apex

Although Apex is already installed, we still need to set the Internal Admin password.

# To do so, run the apxchpwd.sql located under /u01/app/oracle/product/11.2.0/xe/apex:
su - oracle
. /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh
sqlplus /nolog
connect sys/password as sysdba
@/u01/app/oracle/product/11.2.0/xe/apex/apxchpwd.sql 
#Note: pick something simple like Password123! as you will be prompted to change it on first log in anyways.
# access the Application Express GUI at:
http://ip.address.of.server:8080/apex/f?p=4550:1

Now log on:

  • Workspace: Internal
  • User Name: admin
  • Password: (whatever you selected above).
  • log in, change password
  • log in again

Now create Workspace:

  • Workspace Name = DDIINDEXDB
  • next page say existing schema “no” and write DDIINDEXDB into the schema Name
  • schema password same as the user you want
  • 100MB
  • admin user IVAN

import application and set Security

import the application 102 then set authentication I followed the advice from the oracle documentation http://docs.oracle.com/cd/E17556_01/doc/user.40/e15517/sec.htm#BCGICEAH Changing the Authentication Scheme Associated with an Application. To change the authentication scheme for an application:

Navigate to the Authentication Schemes:

  • On the Workspace home page, click the Application Builder icon.
  • Select an application.
  • On the Application home page, click Shared Components.
  • The Shared Components page appears.
  • Under Security, select Authentication Schemes.
  • Click the Change Current tab at the top of the page.
  • From Available Authentication Schemes, select a new authentication scheme and click Next.
  • (Application Express is good)
  • Click Make Current.

explain the table creation script

now go to lib/setup_oracle_of_delphe.r to build the tables

set up R, RJDBC and ROracle

Using RJDBC is what I recommend. I found ROracle suboptimal.

su root
#make sure env is set
export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0/xe/lib:$LD_LIBRARY_PATH

R
connect2oracle <- function(){
if(!require(RJDBC)) install.packages('RJDBC'); require(RJDBC)
drv <- JDBC("oracle.jdbc.driver.OracleDriver",
            '/u01/app/oracle/product/11.2.0/xe/jdbc/lib/ojdbc6.jar')
p <- readline('enter password: ')
h <- readline('enter target ipaddres: ')
d <- readline('enter database name: ')
ch <- dbConnect(drv,paste("jdbc:oracle:thin:@",h,":1521",sep=''),d,p)
return(ch)
}  
ch <- connect2oracle()


# OR
#install.packages('ROracle')

maintenance

unlock

unlock header

extending/revising the database to track servers and files

Aim:

  • the db is good for files, and even books etc.
  • I think it might also be good for tracking server maintenance tasks
  • I will experiment by modifying the names of the data entry fields ie from file to server
  • and the tables stay the same. then I can also modify the app.

change the names and fields

  • go to page, edit page, remove fields one at a time
  • keep the pages to the bare minimum
  • go to edit page and change their names

##### stdydscr becomes Projects

  • idno becomes project_id
  • titl = project_title
  • authenty = primary_investigator
  • distrbtr = contact_address
  • abstract = business_requirements

##### filedscr becomes servers

  • filename = vm_name
  • Filelocation = ip_address
  • Filedscr = Description
  • reqid = authorised_access_group_id
  • notes = Server function

##### datadscrs becomes tasks

  • LABL = task

SAVE VERSIONS

export application to dropbox/Catalogue/BACKups

DDIindex

The DDIindex is a java based catalogue that was written by the Australian Data Archives at the ANU (Formerly the ASSDA http://assda.anu.edu.au/ddiindex.html). The original application source is still available from the ASSDA site but I am using the one I have archived because it has some configurations that I don’t remember who to make again at the moment.

TOMCAT

http://coastalrocket.blogspot.com.au/2012/08/installing-geostack-on-centos-6-64bit.html

###########################################################################
# newnode: install-tomcat
yum -y install tomcat6 tomcat6-webapps
# not tomcat6-admin-webapps 
vi /etc/tomcat6/tomcat-users.xml
#add a user with roles of admin,manager like this
<user name="tomcat" password="password" roles="admin,manager" />
###########################################################################
# newnode: for-restarts
chkconfig tomcat6 on

changing the Tomcat Port

  • Locate server.xml in {Tomcat installation folder}\ conf \
  • change <Connector port=”8181”/> from 8080
  • and connector executor too? only if it is not commented out (was on centos, not on RHEL)
  • change iptables and restart
service tomcat6 restart

UPLOAD THE DDIINDEX

MySQL

http://www.if-not-true-then-false.com/2010/install-mysql-on-fedora-centos-red-hat-rhel/

rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
yum --enablerepo=remi,remi-test list mysql mysql-server
yum --enablerepo=remi,remi-test install mysql mysql-server
/etc/init.d/mysqld start ## use restart after update
## OR ##
service mysqld start ## use restart after update

chkconfig --levels 235 mysqld on

mysqladmin -u root password [your_password_here]
mysql -h localhost -u root -p
then create database, 
database user and
database password 
## CREATE DATABASE ##
mysql> CREATE DATABASE security;
 
## CREATE USER ##
mysql> CREATE USER 'ddiindex'@'localhost' IDENTIFIED BY 'BlahBlahBlah';
 
## GRANT PERMISSIONS ##
mysql> GRANT ALL ON security.* TO 'ddiindex'@'localhost';
 
##  FLUSH PRIVILEGES, Tell the server TO reload the GRANT TABLES  ##
mysql> FLUSH PRIVILEGES;

# nd then use next at command line:

mysql -h localhost -u root -p security < security_2010-04-27-1259.sql; 
# Enter password: 
# deprecated let thru iptables port 3306?

make sure ddiindex can connect to mysql as ddiindex user

# now go to /usr/share/tomcat/webapps/ddiindex/WEB-INF/classes/index-conf.xml
# dburl change to 
    <dbUrl>jdbc:mysql://localhost:3306/security</dbUrl>
    <dbUserName>ddiindex</dbUserName>
    <dbPassword>BlahBlahBlah</dbPassword>
# other personalisations here such as <dataServer>
mkdir /opt/ddiindex/ewedb
mkdir /opt/ddiindex/ewedb_s

from home.jsp removed <%@taglib uri=”http://assda.anu.edu.au/ddiindex” prefix=”ddiindex”%> and login.jsp <%@taglib uri=”http://assda.anu.edu.au/ddiindex” prefix=”ddiindex”%> and search.jsp <%@taglib uri=”http://assda.anu.edu.au/ddiindex” prefix=”ddiindex”%>

Check the security implications of allowing write permissions here

I found I need to allow write access to the xmldata folder for uploads of the xml metadata files.

adduser ddiindex
chown -R ddiindex /xmldata
chmod 0777 /xmldata
# bad for security, try 0750 instead
# chmod 0777 /opt/ddiindex/nceph
# or this works
chmod -R 777 /opt/ddiindex/nceph
chmod 0777 /opt/ddiindex/nceph_s
chmod -R 777 /opt/ddiindex/ewedb
chmod 0777 /opt/ddiindex/ewedb_s

Running the Indexer

Log on as an admin user (set in the MySQL userEJB table) and you can run the indexer.jsp Before doing this the first time I did the following:

  • went to nceph and nceph_s deleted the recent created files
  • went to xmldata, deleted all
  • restart TOMCAT
  • go to indexer, upload xml
  • hit the ‘add to index’ button.

personalise the ddiindex

ie home.jsp <div class=”intro”>The Graduate Information Literacy Program at ANU now offers resources for researchers and graduate students on data management planning: a comprehensive Manual and a Data Management Planning Template. <a href=”http://ilp.anu.edu.au/dm/” target=”_blank”>http://ilp.anu.edu.au/dm</a></div>

Set up a private git server for data and code

True-Crypt encrypted volumes

The servers are secure but we will sometimes want local copies and so should use truecrypt or another option. I am using Ubuntu as my desktop so here is the instructions.

################################################################
#http://www.liberiangeek.net/2012/07/install-truecrypt-via-ppa-in-ubuntu-12-04-precise-pangolin/
sudo add-apt-repository ppa:michael-astrapi/ppa
sudo apt-get update && sudo apt-get install truecrypt

#### FAILED TO INSTALL ON REDHAT ####
# or http://ubuntupop.blogspot.com.au/2012/10/how-to-install-truecrypt-on-centos-63.html
# 1. Download Truecrypt package. Or use wget instead
# wget http://www.truecrypt.org/download/truecrypt-7.1a-linux-x64.tar.gz
# 2.  Now extract the package
# tar xzvf truecrypt-7.1a-linux-x64.tar.gz
# 3. It will produce a file called truecrypt-7.1a-setup-x64
# chmod +x truecrypt-7.1a-setup-x64 
# ./truecrypt-7.1a-setup-x64
# installer ran but then 
# truecrypt -h 
# truecrypt: error while loading shared libraries: libfuse.so.2: cannot open shared object file: No such file or directory
# yum install libfuse.so.2 didn't help

# DIDN'T TRY  http://vinnn.blogspot.com.au/2009/01/tryecrypt-6-on-centos-5.html

ResearchData storage

################################################################
# name:ResearchData
cd /home
mkdir ResearchData
# and them to a particular group
groupadd researchdata
usermod -a -G researchdata username
chown -R root:researchdata /home/ResearchData
# TODO check what the security numeric codes do
chmod -R 0777 /home/ResearchData
chmod -R 0750 /home/ResearchData
service rstudio-server restart
# user may need to q() from R session and sign out of rstudio?
# for a user to have this link in their home go
su username
ln -s /home/ResearchData /home/username/ResearchData

example workflow to dev a local data package and publish to shared

  • work in personal workspace then as root

[root@i-00002e43 ResearchData]# mkdir GARNAUT_CLIMATE_CHANGE_REVIEW [root@i-00002e43 ResearchData]# chown -R root:researchdata /home/ResearchData [root@i-00002e43 ResearchData]# chmod -R 0750 /home/ResearchData [root@i-00002e43 ResearchData]# ls -l [root@i-00002e43 ResearchData]# cd GARNAUT_CLIMATE_CHANGE_REVIEW/ [root@i-00002e43 GARNAUT_CLIMATE_CHANGE_REVIEW]# cp -R home/ivan_hanigan/projects/GARNAUT_CLIMATE_CHANGE_REVIEW/* . [root@i-00002e43 GARNAUT_CLIMATE_CHANGE_REVIEW]# rm GARNAUT_CLIMATE_CHANGE_REVIEW.Rproj rm: remove regular file `GARNAUT_CLIMATE_CHANGE_REVIEW.Rproj’? y

Share with windows network

Procedures for add users to the system.

Description of the access procedure

Getting Access

The “Getting Access” procedure to help users apply for and gain access to mortality data is shown in Figure 1, and a more detailed description of the process is shown in the Appendix, Figure 4. The process is a set of formally defined steps that are designed to move the User through two general stages:

  • Requesting data: guiding the researcher through the process of (a) gaining Ethics Approval from a Human Research Ethics Committee, and (b) Project Level Approval from the Registrar of Births, Deaths and Marriages.
  • Providing data: provision of a confidentialised (often aggregated) dataset in an appropriately secured manner such as access to remote secure servers or encrypted archives accessed on local disk media, with security determined by (a) the nature of the data and (b) any project management related criteria.

Managing Access

Procedures for managing access are shown in Figure 2 (and in the Appendix, Figure 5). These activities are intended to maintain information on the current state of projects using the mortality data, and to report any changes in situation to the State Registries. The User Administrator is responsible for conduct of the “Managing Access” procedures.

The process is initiated by running a query on the ANU-User-DB to make a list of all Projects and Users, and then each Project is sent a reminder to report any changes in Project Status (sent annually to coincide with a similar reminder sent by the ANU Human Research Ethics Committee). The purpose of these reminders is to ensure that Project management plans continue to consider data security as a primary concern, even during long multi-year projects where many project management and staffing issues inevitably arise.

Once a response is received, the User Administrator then enters the relevant information into the ANU-User-DB, and if the project has been concluded will then initiate the final “Ending Access” process.

Ending Access

The procedure for ending access aims to ensure that data are both securely and sustainably stored. It is very important that files used for authorised projects are never re-used in un-authorised projects, but that future researchers may have the opportunity to create an authorised project and potentially replicate historical analyses. This is an important part of reproducible research and the robust practice of scientific enquiry.

The process user administrators go through to set up users

When adding users go to the oraphi and track their permissions

lodge request in user db

  • new study into nceph ddiindexdb
  • new file with filelocation = brawns EWEDB schema, working directory
  • in primary database record (not project) add access requestor, accessors add accessor, date to end
  • finalise approval

r-nceph

production goes thru david fisher testbed done by me

brains

linux user

adduser userxxx
passwd userxxx 
# (I use R to randomly generate passwords by mixing words, letters,
# numbers and symbols) 

mysql (check ddiindex)

mysql -u root -p security
select * from UserEJB;
insert into UserEJB (id, password, admin) values ('userxxx', 'passwd', 0);
oracle (oraphi)
  • log in as admin user to ddiindexdb workspace and go to administration,
  • on the right there is a tasks pane with create user
github
  • downstream users can just download zips from github
  • collaborators need to sign up to github
  • will create a walk through for those

brawn

postgres

Need to find out if this 0.0.0.0/0 is a big no-no?

# first edit your pg_hba.conf file
# under /home/pgsql/9.2/data/pg_hba.conf I added a super user from my ip
# address and allowed all the www ip addresses access
host    all             postgres        my.desk.ip.address/32       md5
host    all             gislibrary      0.0.0.0/0                   md5
# and save.  now
su - postgres
psql postgres postgres
CREATE ROLE userxxx LOGIN PASSWORD 'xxxxx';
GRANT gislibrary TO userxxx;
CREATE SCHEMA userxxx;
grant ALL on schema userxxx to userxxx;
GRANT ALL ON ALL TABLES IN SCHEMA userxxx TO userxxx;
grant ALL on all functions in schema userxxx to userxxx;
grant ALL on all sequences in schema userxxx to userxxx; 
# might want to set a date for expiry?  now 
select pg_reload_conf();
# or if not connected just service postgresql-9.2 restart

geoserver

  • general users don’t need log in
  • admin users can log on as admin?

alliance wiki

https://alliance.anu.edu.au/welcome/ uses their anu password

email details without password

Dear user,

The new EWEDB server is now ready. You can configure your Kepler environment by following the tutorial at http://swish-climate-impact-assessment.github.io/2013/05/hello-ewedb/

The server details are: database = ewedb host = ipaddress user = user_name

Your password will be sent to you as an SMS to your designated mobile phone number.

NB Please note that the connection information will be saved in plain text to your user profile. As this presents a security risk we request that you only use this service from a secure computer.

Thankyou, Ivan Hanigan Data Management Officer. National Centre for Epidemiology and Population Health. College of Medicine, Biology and Environment. Australian National University Canberra, ACT, 0200. Ph: +61 2 6125 7767. Fax: +61 2 6125 0740. Mob: 0428 265 976. CRICOS provider #00120C.

text message the set password

This solution is adequate. Might want to invest in a sacrificial SIM card to avoid having too many phone calls?

edits to oraphi via sql

update accessors 
set othermat = 'phone number'
where accessornames = 'username';

revocation

Backups

General

maintenance

nearline for potential restore

archive and remove

General concerns

Brains

Backup oraphi

Needs to be done from Brains as it needs the oracle client set up.

sync the home directory

Brawn

Find out how big is it?

AKA brawn-dbsize

Dump and download it to a secure computer

file system backup

backup-brawn-filesystem header

pgdump

Backup the pg_dump to your workplace or another VM on the Research Cloud.

First do this to setup the backup directory

Optionally you could set up a scheduled cron job

Then add this script.

or simple home backup

#(set up passwordless ssh) #http://www.linuxproblem.org/art_9.html

crontab -e 15 0 * * * rsync -arvR –delete home [email protected]:/ :wq

from a backup machine vi /etc/sysconfig/iptables

als go to /etc/hosts.allow and allow this machine

from the remote machine copy public key to the backup user’s home directory on the backup machine, like this: scp ~/.ssh/id_rsa.pub [email protected]:

copy public SSH key into the list of authorised keys for that user. #mkdir -p .ssh cat id_rsa.pub >> .ssh/authorized_keys

Then lock down the permissions on .ssh and its contents, since the SSH server is fussy about this.

$ chmod 700 .ssh $ chmod 400 .ssh/authorized_keys

That’s it, passwordless-SSH is now set up: log out of the server then log back in:

Now set up the remote prod server to send it’s home dir at a regular time to the backup server crontab -e 15 0 * * * rsync -arvR –delete home [email protected]:/home/brains_backup/ ESC :wq

Restore databse dump into a new database on another machine.

launch a new Nectar VM from the snapshot image

mount the 2nd disc and load/restore the postgres db and data into it

Disaster Recovery Plan

test a snapshot

should test snapshots 3-monthly?

cleaning up links on github

in case of a completely failed VM, launching a fresh VM from snapshot will give a different IP-address and so Github and other links, and user’s bookmarks will need to be revised.

60GB disk is not being saved in snapshots

Will also need to investigate the restore procedure (and security) for the secondary disc.

Restore ORAPHI

This should have restored ok with data intact from the last snapshot. If not first set up oracle and the tables. Get the most recent backup of the tables. Load the tables to the new ORAPHI. In the event of an upgrade to a tested replacement you may find that the contents on the running VM that is getting retired is more current than the old stuff in the replacement tested server so will need to sync.

Examples

Pumilio for Bioacoustics

~/projects/pumilio-bushfm/index.org