Skip to content
This repository was archived by the owner on May 23, 2022. It is now read-only.

add TOKEN documentation, deprecate GSI, delete gratia #929

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 74 additions & 98 deletions docs/other/install-gwms-frontend.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Before Starting

Before starting the installation process, consider the following points (consulting [the Reference section below](#references) as needed):

- **User IDs:** If they do not exist already, the installation will create the Linux users `apache` (UID 48), `condor`, `frontend`, and `gratia`
- **User IDs:** If they do not exist already, the installation will create the Linux users `apache` (UID 48), `condor`, and `frontend`.
- **Network:** The VO frontend must have reliable network connectivity and be on the public internet (i.e. no NAT).
The latest version requires the following TCP ports to be open:
- 80 (HTTP) for monitoring and serving configuration to workers
Expand All @@ -54,9 +54,9 @@ Before starting the installation process, consider the following points (consult
port has changed from 9615 to 9618, and you need to update your firewall rules to reflect this change.
You can figure out which port will be used by running the following command:

``` console
``` console
condor_config_val SHARED_PORT_ARGS
```
```

For more detailed information, see [Configuring GlideinWMS Frontend](#configuring-glideinwms-frontend).

Expand All @@ -68,7 +68,8 @@ As with all OSG software installations, there are some one-time (per host) steps
- Prepare the [required Yum repositories](../common/yum.md)
- Install [CA certificates](../common/ca.md)

### Credentials and Proxies
## GSI Credentials (3.5 only)
### Credentials and Proxies

The VO Frontend will use two credentials in its interactions with the other GlideinWMS services. At this time, these will be proxy files.

Expand Down Expand Up @@ -108,13 +109,32 @@ x509 -in /etc/grid-security/hostcert.pem -subject -issuer -dates -noout`. You
will need that to find out information for the configuration files and the
request to the GlideinWMS Factory.

## JWT (Token) Credentials

Support for GSI authentication is ending with OSG release 3.5. Starting with 3.6, it is replaced with JWT, or **Json Web Token** authentication.
There are currently two types of JWT credentials used by the VO Frontend, which are HTCondor **IDTOKENS** and Open Science Grid **SciTokens**.
Both types are required for interactions with the other GlideinWMS services.

1. An IDTOKEN is an HTCondor defined JWT that is used for authentication between most local and remote condor daemons, as well as VOFrontend and Factory daemons.
For a remote process to authenticate with a local condor daemon using IDTOKENS,, the local operator must issue a token to the remote process owner. This is
typically done with the `condor_token_create` or `condor_token_request_approve` commands
2. A [SciToken](https://scitokens.org/) is used to authenticate between a VOFrontend and a CE. Its purpose for the Frontend is allowing job submissions to the CEs .
There are various ways to generate and renew SciTokens, one of the most convenient being the [OSG Token Renewer](../osg-token-renewer/)



### OSG Factory access

Before installing the GlideinWMS VO Frontend you need the information about a [Glidein Factory](http://glideinwms.fnal.gov/doc.prd/factory/index.html) that you can access:

1. (recommended) OSG is managing a factory at UCSD
2. You have another Glidein Factory that you can access

!!! Note
The following instructions apply to requesting GSI access to the Factory and VOs. The exact
process for obtaining JWT access is still being worked out between Factory, VOFrontend, and CE
maintainers, but will be similar.

To request access to the OSG Glidein Factory at UCSD you have to send an email to <[email protected]> providing:

1. Your Name
Expand Down Expand Up @@ -197,7 +217,7 @@ root@host # yum install glideinwms-userschedd
In addition, you will need to perform the following steps:

- On the vofrontend and userschedd, modify `CONDOR_HOST` to point to your usercollector. This is in `/etc/condor/config.d/00_gwms_general.config`. You can also override this value by placing it in a new config file. (For instance, `/etc/condor/config.d/99_local_custom.config` to avoid rpmsave/rpmnew conflicts on upgrades).
- In `/etc/condor/certs/condor_mapfile`, you will need to add the DNs of each machine (userschedd, usercollector, vofrontend). Take great care to escape all special characters. Alternatively, you can use the `glidecondor_addDN` to add these values.
- (GSI) In `/etc/condor/certs/condor_mapfile`, you will need to add the DNs of each machine (userschedd, usercollector, vofrontend). Take great care to escape all special characters. Alternatively, you can use the `glidecondor_addDN` to add these values.
- In the `/etc/gwms-frontend/frontend.xml` file, change the schedd locations to match the correct server. Also change the collectors tags at the bottom of the file. More details on frontend.xml are in the following sections.

Configuring GlideinWMS Frontend
Expand Down Expand Up @@ -238,6 +258,7 @@ Both the `classad_proxy` and `absfname` files should be owned by `frontend` user


:::xml
<!-- osg 3.5 and earlier -->
<security classad_proxy="/tmp/vo_proxy" proxy_DN="DN of vo_proxy"
proxy_selection_plugin="ProxyAll"
security_name="The security name, this is used by factory"
Expand All @@ -248,6 +269,17 @@ Both the `classad_proxy` and `absfname` files should be owned by `frontend` user
</credentials>
</security>

<!-- osg 3.5 and above -->
<security classad_proxy="/etc/grid-security/hostcert.pem" proxy_DN="DN of hostcert.pem"
proxy_selection_plugin="ProxyAll"
security_name="The security name, this is used by factory"
sym_key="aes_256_cbc">
<credentials>
<credential absfname="/tmp/pilot_scitoken" security_class="frontend"
trust_domain="OSG" type="scitoken"/>
</credentials>
</security>

4. The schedd information.

- The `DN` of the **VO Frontend Proxy** described previously [here](#credentials-and-proxies).
Expand All @@ -269,39 +301,51 @@ Both the `classad_proxy` and `absfname` files should be owned by `frontend` user
- The `DN` of the **VO Frontend Proxy** described previously [here](#credentials-and-proxies).
- The `node` attribute is the full hostname of the collectors (`hostname --fqdn`) and port
- The `secondary` attribute indicates whether the element is for the primary or secondary collectors (True/False).
The default HTCondor configuration of the VO Frontend starts multiple Collector processes on the host (`/etc/condor/config.d/11_gwms_secondary_collectors.config`). The `DN` and `hostname` on the first line are the hostname and the host certificate of the VO Frontend. The `DN` and `hostname` on the second line are the same as the ones in the first one. The hostname (e.g. hostname.domain.tld) is filled automatically during the installation. The secondary collector connection can be defined as sinful string for the sock case , e.g., hostname.domain.tld:9618?sock=collector16.

[Example 1]
- The default HTCondor configuration of the VO Frontend starts multiple Collector processes on the
host (`/etc/condor/config.d/11_gwms_secondary_collectors.config`). The `DN` and `hostname` on the first
line are the hostname and the host certificate of the VO Frontend. The `DN` and `hostname` on the second line
are the same as the ones in the first one. The hostname (e.g. hostname.domain.tld) is filled automatically
during the installation. The secondary collector connection can be defined as sinful string for the sock
case , e.g., **hostname.domain.tld:9618?sock=collector16**.
- Example 1

:::xml
<collector DN="DN of main collector"
node="hostname.domain.tld:9618" secondary="False"/>
<collector DN="DN of secondary collectors (usually same as DN in line above)"
node="hostname.domain.tld:9620-9660" secondary="True"/>

!!! note
!!! note

In GlideinWMS v3.4.1, shared port only configuration is incompatible if talking to older Factories (v3.4 or older). We strongly recommend any user of GlideinWMS Frontend v3.4.1 or newer, to transition to the use of shared port for secondary collectors and CCBs.
The shared port configuration is incompatible if your Frontend is talking to Factories v3.4 or older and you'll get an error telling you to wait.
To transition to the use of shared port for secondary collectors, you have to change the collectors section in the Frontend configuration. If you are using the default port range for the secondary collectors as shown in [Example 2] below, then you should replace it with port `9618` and the sock-range as shown in [Example 1] above.
In GlideinWMS v3.4.1, shared port only configuration is incompatible if talking to older
Factories (v3.4 or older). We strongly recommend any user of GlideinWMS Frontend v3.4.1 or newer, to
transition to the use of shared port for secondary collectors and CCBs.
The shared port configuration is incompatible if your Frontend is talking to Factories v3.4 or older and you'll get an error telling you to wait.
To transition to the use of shared port for secondary collectors, you have to change the collectors section in the
Frontend configuration. If you are using the default port range for the secondary collectors as shown
in [Example 2] below, then you should replace it with port `9618` and the sock-range as shown in [Example 1] above.

If you have a more complex configuration, please read the [detailed GlideinWMS configuration](http://glideinwms.fnal.gov/doc.prd/frontend/configuration.html)
If you have a more complex configuration, please read the [detailed GlideinWMS configuration](http://glideinwms.fnal.gov/doc.prd/frontend/configuration.html)

[Example 2]
- Example 2

:::xml
<collector DN="DN of main collector"
<collector DN="DN of main collector"
node="hostname.domain.tld:9618" secondary="False"/>
<collector DN="DN of secondary collectors (usually same as DN in line above)"
<collector DN="DN of secondary collectors (usually same as DN in line above)"
node=“hostname.domain.tld:9618?sock=collector0-40" secondary="True"/>

6. The CCBs information.
If you have a different configuration of the HTCondor Connection Brokering (CCB servers) from the default (usually the section is empty as the User Collectors acts as CCB if needed), you can set the connection in the CCB section the same way that User Collector information previously mentioned. Also, the same rules for transition to shared_port of the connections, apply to the CCBs.

:::xml
<ccb DN="DN of the CCB server"
If you have a different configuration of the HTCondor Connection Brokering (CCB servers)
from the default (usually the section is empty as the User Collectors acts as CCB if needed),
you can set the connection in the CCB section the same way that User Collector information
previously mentioned. Also, the same rules for transition to shared_port of the connections
apply to the CCBs.

:::xml
<ccb DN="DN of the CCB server"
node="hostname.domain.tld:9618"/>
<ccb DN="DN of the CCB server"
<ccb DN="DN of the CCB server"
node=“hostname.domain.tld:9618?sock=collector0-40" secondary="True"/>


Expand Down Expand Up @@ -385,6 +429,9 @@ listed.

#### Creating a HTCondor grid mapfile.

!!! Note
This section not needed for JWT authentication

The HTCondor mapfile (`/etc/condor/certs/condor_mapfile`) is used for
authentication between the GlideinWMS pilot running on a remote worker node, and
the local collector. HTCondor uses the mapfile to map certificates to pseudo-users
Expand All @@ -401,7 +448,10 @@ on the local machine. It is important that you map the DN's of:
- **Frontend proxy**: The DN of the proxy that the Frontend uses to communicate with the other GlideinWMS services. Specified in the frontend.xml security element `proxy_DN` attribute:

:::xml
<security classad_proxy="/tmp/vo_proxy" proxy_DN="DN of vo_proxy" ....
<!-- osg 3.5 and lower-->
<security classad_proxy="/tmp/vo_proxy" proxy_DN="DN of vo_proxy" .... />
<!-- osg 3.6+ -->
<security classad_proxy="/etc/grid-security/hostcert.pem" proxy_DN="DN of hostcert.pem" .... />

- **Each pilot proxy** The DN of __each__ proxy that the frontend forwards to the factory to use with the GlideinWMS pilots. This allows the GlideinWMS pilot jobs to communicate with the User Collector. Specified in the frontend.xml proxy `absfname` attribute (you need to specify the `DN` of each of those proxies:

Expand Down Expand Up @@ -433,7 +483,7 @@ respective proxies.
After configuring HTCondor, be sure to restart HTCondor:

:::console
root@host # service condor restart
root@host # systemctl restart condor

### Proxy Configuration

Expand Down Expand Up @@ -506,80 +556,8 @@ and renew the **pilot proxies** and **VO Frontend proxy**. To configure this ser
!!! note
The `[COMMON]` section is required but its contents are optional

### Adding Gratia Accounting and a Local Monitoring Page on a Production Server

You must report accounting information if you are running more than a few test jobs on the OSG .

1. Install the GlideinWMS Gratia Probe on each of your access points in your GlideinWMS installation:

:::console
root@host # yum install gratia-probe-glideinwms

2. Edit the ProbeConfig located in `/etc/gratia/condor/ProbeConfig`. First, edit the `SiteName` and `ProbeName` to be a unique identifier for your GlideinWMS access point. There can be multiple probes (with different names) per site. If you haven't already, you should register your GlideinWMS access point in [OIM](https://github.com/opensciencegrid/topology/). Then you can use the name you used to register the resource.

ProbeName="condor:<hostname>"
SiteName="HCC-GlideinWMW-Frontend"

Next, turn the probe on by setting `EnableProbe`:

EnableProbe="1"

3. Reconfigure HTCondor:

:::console
root@host # condor_reconfig

#### Optional Accounting Configuration

The following sections contain additional configuration that may be required depending on the customizations you've made to your GlideinWMS frontend installation.

##### Users without Certificates #####

If you have users that submit jobs without a certificate explicitly declared in the submit file, you will need to add `MapUnknownToGroup` to the ProbeConfig. In the file `/etc/gratia/condor/ProbeConfig`, add the value after the `EnableProbe`.

``` file hl_lines="4"
...
SuppressGridLocalRecords="0"
EnableProbe="1"
MapUnknownToGroup="1"

Title3="Tuning parameter"
...
```

Further, if you want to record all usage as coming from a single VO, you can configure the probe to override the 'guessed' VO. In the below example, replace `<ENGAGE>` with a registered VO that you would like to report as. If you don't have a VO that you are affiliated with, you may use "Engage".

``` file hl_lines="4"
...
MapUnknownToGroup="1"
MapGroupToRole="1"
VOOverride="<ENGAGE>"
...
```

##### Non-Standard HTCondor Install #####

If HTCondor is installed in a non-standard location (i.e., not RPMs, or relocated RPM outside `/usr/bin`), then you need to tell the probe where to find the HTCondor binaries. This can be done with a script with a special attribute in `/etc/gratia/condor/ProbeConfig`, `CondorLocation`. Point it to the location of the HTCondor install, such that `CondorLocation/bin/condor_version` exists.

##### New Data Directory #####

If your `PER_JOB_HISTORY_DIR` HTCondor configuration variable is different from the default value, you must update the value of `DataFolder` in `/etc/gratia/condor/ProbeConfig`. To check the value of `PER_JOB_HISTORY_DIR` run the following command:

``` console
user@host $ condor_config_val PER_JOB_HISTORY_DIR
```

##### Different collector and other customizations #####

By default the probe reports to the OSG GRACC. To change that you must edit the configuration file, `/etc/gratia/condor/ProbeConfig`, and replace the OSG production host with your desired one:

``` file
...
CollectorHost="gratia-osg-prod.opensciencegrid.org:80"
SSLHost="gratia-osg-prod.opensciencegrid.org:443"
SSLRegistrationHost="gratia-osg-prod.opensciencegrid.org:80"
...
```

### Optional Configuration

Expand Down Expand Up @@ -673,8 +651,7 @@ In addition to the GlideinWMS service itself, there are a number of supporting s

| Software | Service name | Notes |
|:-----------|:--------------------------------------|:----------------------------------------------------------------------------------|
| Fetch CRL | EL8: `fetch-crl.timer` <br> EL7: `fetch-crl-boot` and `fetch-crl-cron` | See [CA documentation](../common/ca.md#managing-fetch-crl-services) for more info |
| Gratia | `gratia-probes-cron` | Accounting software |
| Fetch CRL | `fetch-crl-boot` and `fetch-crl-cron` | See [CA documentation](../common/ca.md#managing-fetch-crl-services) for more info |
| HTCondor | `condor` | |
| HTTPD | `httpd` | GlideinWMS monitoring and staging |
| GlideinWMS | `gwms-renew-proxies.timer` | [Automatic proxy renewal](#proxy-configuration) |
Expand Down Expand Up @@ -958,7 +935,6 @@ The Glidein WMS Frontend installation will create the following users unless the
| `apache` | 48 | Runs httpd to provide the monitoring page (installed via dependencies). |
| `condor` | none | HTCondor user (installed via dependencies). |
| `frontend` | none | This user runs the glideinWMS VO frontend. It also owns the credentials forwarded to the factory to use for the glideins. |
| `gratia` | none | Runs the Gratia probes to collect accounting data (optional see [the Gratia section below](#adding-gratia-accounting-and-a-local-monitoring-page-on-a-production-server)) |

!!! warning
UID 48 is reserved by RedHat for user `apache`. If it is already taken by a different username, you will experience errors.
Expand Down