Skip to content

TobyHFerguson/cdsw_install

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CDSW INSTALL

This project is all about installing a CDSW cluster using Director on any of the Big Three Cloud Providers.

Basic idea is to have a single definition of a cluster which is shared across multiple cloud providers and to make it very simple for a user to say ‘I want this cluster to be on cloud provider X, or cloud provider Y’, confident that the cluster definition is the same; i.e. to separate out the cluster configuration that is independent of cloud providers from that which is unique to each provider and to make it easy for the user to indicate which cloud provider to use.

This project is focused on making this easy; not in exposing the end user to every possible configuration alternative.

We support and test on three cloud providers (AWS, Azure and GCP), and the user choose which cloud provider to use by choosing the top level or provider conf file (aws.conf, azure.conf or gcp.conf).

Building a cluster this way takes about an hour, AFTER which it can take up to an HOUR after the cluster is ready for CDSW to also be ready. I’ve seen this on Azure. CDSW on AWS and GCP seems to take around 10-20 mins to get ready to deliver service.

If you want to struggle then stop reading right here, and just whack at it. Lots of people have gone that route and enjoyed the exercise.

If you’d prefer to get your job done and a CDSW cluster constructed, continue reading …

Workflow

This project is based upon the following workflow. See subsequent sections for details:

  • Install pre-requisites (i.e. Director and optionally an MIT KDC on the Director instance)
  • Get this repository onto the Director instance (git, scp … however)
  • Edit the appropriate files to reflect your environment
  • Add the necessary SECRET and ssh key files
  • Bootstrap the cluster

It will make more sense if you read the Project Structure section.

Pre-requisites

Installing Director

AWS

I simply setup a vm (4 CPUs, 16G RAM) and then use install_director.sh to install director.

GCP

I simply setup a vm (4 CPUs, 16G RAM) and then use install_director.sh to install director.

GCP Director Configuration

For GCP you will need to ensure that the plugin supports rhel7. Do this by adding the following line to your google.conf file. This file should be located in the provider directory: /var/lib/cloudera-director-plugins/google-provider-*/etc (where the * matches the version - something like 1.0.4 - of your plugins). You will likely have to create your own copy of google.conf by copying google.conf.example located in the same directory. Note that the exact path to the relevant image is obtained by navigating to GCP’s ‘Images’ section and finding the corresponding OS/URL pair.

rhel7 = "https://www.googleapis.com/compute/v1/projects/rhel-cloud/global/images/rhel-7-v20171025"

Assuming gcloud is on your path in that director instance, then this script will do exactly what you need:

sudo tee /var/lib/cloudera-director-plugins/google-provider-*/etc/google.conf 1>/dev/null <<EOF
google {
  compute {
    imageAliases {
      centos6="$(gcloud compute images list  --filter='name ~ centos-6-v.*' --uri)",
      centos7="$(gcloud compute images list  --filter='name ~ centos-7-v.*' --uri)",
      rhel6="$(gcloud compute images list  --filter='name ~ rhel-6-v.*' --uri)",
      rhel7="$(gcloud compute images list  --filter='name ~ rhel-7-v.*' --uri)"
    }
  }
}
EOF

Azure

Create a Cloudera Director by using the Microsoft Marketplace. In keeping with minimizing what you have to do this project assumes you have chosen the defaults whenever possible (e.g. networking etc)

You’ll need to note:

  • the Resource Group that the director instance is created in
  • The Region that the Resource Group is setup in
  • the publicdomain name prefix of the director instance. (i.e. the hostname and instance name of the director VM)
  • the host fqdn suffix (aka Private DNS domain name). This is the DNS zone in which the Director and cluster will be constructed.
  • the private IP address of the director instance that is created (if you’re going to put an MIT KDC on the Director instance)

Installing MIT Kerberos (optional)

If I choose to use MIT Kerberos I install the MIT KDC on the Director VM, no matter which cloud provider I’m using.

I do that using install_mit_kdc.sh to install an mit kdc. (There’s also install_mit_client.sh to create a client for testing purposes.).

Preparation

  • Choose the cloud provider you’re going to work with and edit the $PROVIDER/*.properties and $PROVIDER/SECRET files appropriately.
  • Ensure that all the files (including the SSH key file) is available to director (i.e copy or clone as necessary to the director server machine).
  • Ensure that the SECRET files are in place
  • Ensure that the $PROVIDER/kerberos.properties file is either absent (you don’t want a kerberized cluster) or is present and correct (in particular you want to ensure that the KDC_HOST_IP property is set to the ip address of the KDC server host (which should also be the Director host). Note that its the ip address that you should use here because of a CDSW/Kubernetes defect: DSE-1796

Cluster Creation

  • Execute a director bootstrap command using the cloud provider you chose, but make sure you do it from the top directory (i.e. the one where the common.conf file is located).
cloudera-director bootstrap-remote $PROVIDER.conf --lp.remote.username=admin --lp.remote.password=admin

See No provider for what happens if you’re in the wrong directory.

Post Creation

  • Once completed, use your cloud provider’s console to find the public IP (e.g. 104.92.37.53) address of the CDSW instance. Its name in the cloud provider’s console will begin with cdsw-.
  • You can reach the CDSW at cdsw.104.92.37.53.nip.io. See NIP.io tricks for details about how that nip.io stuff works.

All nodes in the cluster will contain the user cdsw. That user’s password is Cloudera1. (If you used my mit kdc installation scripts from below then you’ll also find that this user’s kerberos username and password are cdsw and Cloudera1 also).

Project Structure

File Organization

Overview

The system comprises a set of files, some common across cloud providers, and some specific to a particular cloud provider. The common files (and those which indicate which cloud provider to user) are all in the top level directory; the cloud provider specific files are cloud provider specific directories.

File Kinds

There are three kinds of files:

  • Property Files - You are expected to modify these. They match the *.properties shell pattern and use the Java Properties format
  • Conf files - You are not expected to modify these. They match the *.conf shell pattern and use the HOCON format (a superset of JSON).
  • SECRET files - these have the prefix SECRET and are used to hold secrets for each provider. The exact format is provider specific.

The intent is that those items that you need to edit are in a format (i.e. *.properties files) that is easy to edit, whereas those items that you don’t need to touch are in the harder to edit HOCON format (i.e. *.conf files).

Directory Structure

The top level directory contains the main conf files (aws.conf, azure.conf & gcp.conf). These are the files that indicate which cloud provider is to be used.

The aws, azure and gcp directories contain the files relevant to each cloud provider. We’ll reference the general notion of a provider directory using the $PROVIDER nomenclature, where $PROVIDER takes the value aws, azure or gcp.

The main configuration file is $PROVIDER.conf. This file itself includes the files needed for the specific cloud provider. We will only describe the properties files here:

  • $PROVIDER/provider.properties - a file containing the provider configuration for Amazon Web Services
  • $PROVIDER/ssh.properties - a file containing the details required to configure passwordless ssh access into the machines that director will create.
  • $PROVIDER/kerberos.properties - an optional file containing the details of the Kerberos Key Distribution Center (KDC) to be used for kerberos authentication. (See Kerberos Tricks below for details on how to easily setup an MIT KDC and use it). If kerberos.properties is provided then a secure cluster is set up. If kerberos.properties is not provided then an insecure cluster will be setup.

SECRET files

SECRET files are ignored by GIT and you must construct them yourself. We recommend setting their mode to 600, although that is not enforced anywhere.

AWS

The secret file for AWS is aws/SECRET.properties. It is in Java Properties format and contains the AWS secret access key:

AWS_SECRET_ACCESS_KEY=

Mine, with dots hiding characters from the secret key, looks like:

AWS_SECRET_ACCESS_KEY=53Hrd................r0wiBbKn3

If you fail to set up the AWS_SECRET_KEY then you’ll find that cloudera-director silently fails, but grepping for AWS_SECRET_KEY in the local log file will reveal all:

[centos@ip-10-0-0-239 ~]$ unset AWS_ACCESS_KEY_ID #just to make sure its undefined!
[centos@ip-10-0-0-239 ~]$ cloudera-director bootstrap-remote filetest.conf --lp.remote.username=admin --lp.remote.password=admin
Process logs can be found at /home/centos/.cloudera-director/logs/application.log
Plugins will be loaded from /var/lib/cloudera-director-plugins
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
Cloudera Director 2.4.0 initializing ...
[centos@ip-10-0-0-239 ~]$ 

Looks like its failed, right, because it doesn’t continue on. No error message! But if you execute:

[centos@ip-10-0-0-239 ~]$ grep AWS_SECRET ~/.cloudera-director/logs/application.log
com.typesafe.config.ConfigException$UnresolvedSubstitution: filetest.conf: 28: Could not resolve substitution to a value: ${AWS_SECRET_ACCESS_KEY}

You’ll discover the problem! (Or there’s another problem, and you should look in that log file for details).

Azure

Within Azure, applications requiring access to an account are registered in the tenant, and are assigned an authentication key, otherwise known as a client secret. This is documented in the Use portal to create an Azure Active Directory application and service principal that can access resources document. Within that document the section Get application ID and authentication key provides the details to get the application ID and authentication key, or client secret.

The secet file for Azure is called SECRET.properties. It contains a single key value pair, where the key is CLIENTSECRET.

Here’s my azure/SECRET.properties file:

CLIENTSECRET=jhwf4Gf+ ... zD+e3k=

GCP

The secret file for GCP is called SECRET.json. It contains the full Google Secret Key, in JSON format, that you obtained when you made your google account.

Mine, with characters of the private key id and lines of the private key replaced by dots, looks like:

{
  "type": "service_account",
  "project_id": "gcp-se",
  "private_key_id": "b27f..................66fea",
  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDMUKtOk000wkvJ\np/ZdwfkbpowUGMqpn2a0oQ9eTwIaLnPvrTIP3JcibWU7xkzoPXlD4hiANlkSqDqy
.
.
.
.
.
.
UC2sMUZ1rtLCv14qg4iiXuA/RExTs1zRaZZ0r4c\nTDiZwBJEbs0flCAziv7mJ4TZ3LfGKCtrTOhUWRw/jfDHP+uJOpH2isGmytZ7uWVN\ndfllnxLITzHEQEMh0rbc/g3n\n-----END PRIVATE KEY-----\n",
  "client_email": "[email protected]",
  "client_id": "108988546221753267035",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/tobys-service-account%40gcp-se.iam.gserviceaccount.com"
}

Troubleshooting

There are two logs of interest:

  • client log: $HOME/.cloudera-director/logs/application.log on client machine
  • server log: /var/log/cloudera-director-server/application.log on server machine

If the cloudera-director client fails before communicating with the server you should look in the client log. Otherwise look in the server log.

The server log can be large - I truncate it frequently (i.e. echo > /var/log/cloudera-director-server/application.log) while the Director server is running; especially before using a new conf file! Don’t simply delete it; doing so will mess up the Director (unless the Director server is stopped)

No provider

If you see this:

* No provider configuration block found

then you’ve likely executed cloudera-bootstrap in the PROVIDER directory. You need to be in the top directory (where the common.conf file is) and execute cloudera-bootstrap there.

GCP

No Plugin

If the client fails with this message:

* ErrorInfo{code=PROVIDER_EXCEPTION, properties={message=Mapping for image alias 'rhel7' not found.}, causes=[]}

then you’ve not configured the plugin for GCP, as detailed in the GCP Director Configuration section.

Old Plugin

If the client fails thus:

* Requesting an instance for Cloudera Manager ............ done
* Installing screen package (1/1) .... done
* Suspended due to failure ...

and the server log contains something like this:

peers certificate marked as not trusted by the user

then you’ve got a plugin configured, but its out of date. Update is, as per the GCP Director Configuration section.

Limitations & Issues

Relies on NIP.io tricks to make it work.

Requires that the CDSW port be on the public internet.

Appendix

NIP.io tricks

NIP.io is a public bind server that uses the FQDN given to return an address. A simple explanation is if you have your kdc at IP address 10.3.4.6, say, then you can refer to it as kdc.10.3.4.6.nip.io and this name will be resolved to 10.3.4.6 (indeed, foo.10.3.4.6.nip.io will likewise resolve to the same actual IP address). (Note that earlier releases of this project used xip.io, but that’s located in Norway and for me in the USA nip.io, located in the Eastern US, works better.)

This technique is used in two places: + In the director conf file to specify the IP address of the KDC - instead of messing around with bind or /etc/hosts in a bootstrap script etc. simply set the KDC\_HOST to kdc.A.B.C.D.xip.io (choosing appropriate values for A, B, C & D as per your setup) + When the cluster is built you will access the CDSW at the public IP address of the CDSW instance. Lets assume that that address is C.D.S.W (appropriate, some might say) - then the URL to access that instance would be http://ec2.C.D.S.W.xip.io

This is great for hacking around with ephemeral devices such as VMs and Cloud images!

Useful Scripts

Server Config

In /var/kerberos/krb5kdc/kdc.conf:

[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]
 HADOOPSECURITY.LOCAL = {
 acl_file = /var/kerberos/krb5kdc/kadm5.acl
 dict_file = /usr/share/dict/words
 admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
 supported_enctypes = aes256-cts-hmac-sha1-96:normal aes128-cts-hmac-sha1-96:normal arcfour-hmac-md5:normal
 max_renewable_life = 7d
}

In /var/kerberos/krb5kdc/kadm5.acl I setup any principal with the /admin extension as having full rights:

*/[email protected]    *

I then execute the following to setup the users etc:

sudo kdb5_util create -P Passw0rd!
sudo kadmin.local addprinc -pw Passw0rd! cm/admin
sudo kadmin.local addprinc -pw Cloudera1 cdsw

systemctl start krb5kdc
systemctl enable krb5kdc
systemctl start kadmin
systemctl enable kadmin

Note that the CM username and credentials are cm/[email protected] and Passw0rd! respectively.

Client Config (Managed by Cloudera Manager)

In /etc/krb5.conf I have this:

[libdefaults]
 default_realm = HADOOPSECURITY.LOCAL
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5
 default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5
 permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5

[realms]
 HADOOPSECURITY.LOCAL = {
  kdc = 10.0.0.82
  admin_server = 10.0.0.82
 }

(Note that the IP address used is that of the private IP address of the director server; this is stable over reboot so works well)

ActiveDirectory

(Deprecated - I found this image to be unstable. It would just stop working after 3 days or so.) I use a public ActiveDirectory ami setup by Jeff Bean: ami-a3daa0c6 to create an AD instance.

The username/password to the image are Administrator/Passw0rd!

Allow at least 5, maybe 10 minutes for the image to spin up and work properly.

The kerberos settings (which you’d put into kerberos.conf) are:

krbAdminUsername: "[email protected]"
krbAdminPassword: "Passw0rd!
KDC_TYPE: "Active Directory"
KDC_HOST: "hadoop-ad.hadoopsecurity.local"
KDC_HOST_IP: # WHATEVER THE INTERNAL IP ADDRESS IS FOR THIS INSTANCE
SECURITY_REALM: "HADOOPSECURITY.LOCAL"
AD_KDC_DOMAIN: "OU=hadoop,DC=hadoopsecurity,DC=local"
KRB_MANAGE_KRB5_CONF: true
KRB_ENC_TYPES: "aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5"

(Don’t forget to drop the aes256 encryption if your images don’t have the Java Crypto Extensions installed)

Standard users and groups

I use the following to create standard users and groups, running this on each machine in the cluster:

sudo groupadd supergroup
sudo useradd -G supergroup -u 12354 hdfs_super
sudo useradd -G supergroup -u 12345 cdsw
echo Cloudera1 | sudo passwd --stdin cdsw

And then adding the corresponding hdfs directory from a single cluster machine:

kinit cdsw
hdfs dfs -mkdir /user/cdsw

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages