Skip to content

Commit

Permalink
Merge pull request #620 from Worklytics/rc-v0.4.43
Browse files Browse the repository at this point in the history
v0.4.43
  • Loading branch information
eschultink authored Dec 20, 2023
2 parents a18bdfa + bcd4214 commit 18c4364
Show file tree
Hide file tree
Showing 148 changed files with 8,792 additions and 822 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/build-java.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ jobs:
java-version: ${{ inputs.java-version }}
# https://github.com/actions/setup-java#supported-distributions
distribution: ${{ inputs.java-distribution }}
- name: Cache Maven packages
uses: actions/cache@v3
with:
path: ~/.m2
key: ${{ runner.os }}-m2-v1-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-m2-v1-
- name: Clear our artifacts from Maven cache # q: does this work!?!?!
run: |
rm -rf ~/.m2/repository/co/worklytics/
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ Changes to be including in future/planned release notes will be added here.
then wildcard policy to read shared also grants read of secrets across all connectors)
- keys/salts per value kind (PII, item id, etc)

## [0.4.43](https://github.com/Worklytics/psoxy/release/tag/v0.4.43)
* if you're using the NodeJS test tool, it will be re-installed on your next `terraform apply` due
to a dependency change.

## [0.4.41](https://github.com/Worklytics/psoxy/release/tag/v0.4.41)
* GCP only : Compute Engine API will be enabled in the project. Newer versions of GCP terraform
provider seem to require this. You may see this in your next `terraform plan`, although it may
Expand Down
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,16 +120,19 @@ must authorize the Azure Application you provision (with [provided terraform mod
below. This is done via the Azure Portal (Active Directory). If you use our provided Terraform
modules, specific instructions that you can pass to the Microsoft 365 Admin will be output for you.

| Source                 | Examples    | Application Scopes |
|--------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| Active Directory | [data](docs/sources/microsoft-365/directory/example-api-responses) - [rules](docs/sources/microsoft-365/directory/directory.yaml) | `User.Read.All` `Group.Read.All` |
| Calendar | [data](docs/sources/microsoft-365/outlook-cal/example-api-responses) - [rules](docs/sources/microsoft-365/outlook-cal/outlook-cal.yaml) | `User.Read.All` `Group.Read.All` `OnlineMeetings.Read.All` `Calendars.Read` `MailboxSettings.Read` |
| Mail | [data](docs/sources/microsoft-365/outlook-mail/example-api-responses) - [rules](docs/sources/microsoft-365/outlook-mail/outlook-mail.yaml) | `User.Read.All` `Group.Read.All` `Mail.ReadBasic.All` `MailboxSettings.Read` |
| Source                 | Examples    | Application Scopes |
|--------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| Active Directory | [data](docs/sources/microsoft-365/directory/example-api-responses) - [rules](docs/sources/microsoft-365/directory/directory.yaml) | `User.Read.All` `Group.Read.All` |
| Calendar | [data](docs/sources/microsoft-365/outlook-cal/example-api-responses) - [rules](docs/sources/microsoft-365/outlook-cal/outlook-cal.yaml) | `User.Read.All` `Group.Read.All` `OnlineMeetings.Read.All` `Calendars.Read` `MailboxSettings.Read` |
| Mail | [data](docs/sources/microsoft-365/outlook-mail/example-api-responses) - [rules](docs/sources/microsoft-365/outlook-mail/outlook-mail.yaml) | `User.Read.All` `Group.Read.All` `Mail.ReadBasic.All` `MailboxSettings.Read` |
| Teams **beta** | [data](docs/sources/microsoft-365/msft-teams/example-api-responses) - [rules](docs/sources/microsoft-365/msft-teams/msft-teams.yaml)| `User.Read.All` `Team.ReadBasic.All` `Channel.ReadBasic.All` `Chat.Read.All` `ChannelMessage.Read.All` `CallRecords.Read.All` `OnlineMeetings.Read.All` |

NOTE: the above scopes are copied from [infra/modules/worklytics-connector-specs](infra/modules/worklytics-connector-specs)./
Please refer to that module for a definitive list.

See details: [docs/sources/msft-365/readme.md](docs/sources/msft-365/readme.md)
NOTE: usage of the Microsoft Teams APIs may be billable, depending on your Microsoft 365 licenses and level of Teams usage. Please review: [Payment models and licensing requirements for Microsoft Teams APIs](https://learn.microsoft.com/en-us/graph/teams-licenses)

See details: [docs/sources/microsoft-365/readme.md](docs/sources/microsoft-365/README.md)

### Other Data Sources via REST APIs

Expand Down
18 changes: 12 additions & 6 deletions docs/bulk-file-sanitization.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,18 @@ instance (GCP Cloud Function or AWS Lambda), which will read the file, sanitize
result to a corresponding `-sanitized` bucket.

You should limit the size of files processed by proxy to 200k rows or less, to ensure processing of
any single file finishes within the run time limitations of the host platform (AWS, GCP).

Additionally, you should compress (gzip) the files in your `-input` bucket to reduce storage cost and
improve performance. Psoxy will decompress gzip files before processing and then compress the result
before writing to the `-sanitized` bucket. Ensure that you set `Content-Encoding: gzip` on all files
in your `-input` bucket to enable this behavior.
any single file finishes within the run time limitations of the host platform (AWS, GCP). There is
some flexibility here based on the complexity of your rules and file schema, but we've found 200k
to be a conservative target.

### Compression

To improve performance and reduce storage costs, you should compress (gzip) the files you write to
the `-input` bucket. Psoxy will decompress gzip files before processing and then compress the
result before writing to the `-sanitized` bucket. Ensure that you set `Content-Encoding: gzip` on
all files in your `-input` bucket to enable this behavior. Note that if you are uploading files via
the web UI in GCP/AWS, it is not possible to set this metadata in the initial upload - so you cannot
use compression in such a scenario.

## Sanitization Rules

Expand Down
50 changes: 50 additions & 0 deletions docs/faq-security.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,53 @@ data. There is no database to be vulnerable to SQL injections.
A WAF could make sense if you are using Psoxy to expose an on-prem, in-house built tool to
Worklytics that is otherwise not exposed to the internet.

## Can I deploy Psoxy instances in a VPC?

As of November 2023, that is not directly supported by the Worklytics-provided Terraform modules.
If you have a use-case that you believe requires a VPC, please let us know.

The usual cases for deploying infra in a VPC is to isolate a complex, potentially insecure component,
such as a VM with lots of packages/software/etc running on it. Proxy instances are small java bundles,
compiled, packaged, and deployed by your team - into serverless infrastructure sandboxes
(AWS Lambda, GCP CLoud Functions, etc). The code is source-available for your team to review; it
undergoes full automated testing and continual vulnerability scanning.

Access to proxy instances is based on Workload Identity Federation (OIDC) and IAM policies, which is
equivalent to how internal access between your AWS/GCP cloud resources is currently secured.

No workplace data is stored by proxy instances. For the connectors that sanitize bulk file data,
this data is only persisted into GCS/S3 buckets. AWS/GCS do not support deploying such buckets
"in a VPC". You could write an IAM policy that grants access to such buckets *from* a VPC, but
this is functionally equivalent to an IAM policy that grants access to such a buckets from a specific
lambda/cloud function.

## Is Domain-wide Delegation (DWD) for Google Workspace secure?

DWD deserves scrutiny. It is broad grant of data access, generally covering all Google accounts in
your workspace domain. And the UX - pasting a numeric service account ID and a CSV of oauth scopes -
creates potential for errors/exploitation by malicious actors.

To use DWD securely, you must trust the numeric ID; in a typical scenario, where someone or some
web app is asking you to paste this ID into a form, this is a risk. It is NOT a 3-legged oauth
flow, where the redirects between

However, the Psoxy workflow mitigates this risk in several ways:
- DWD grants required for Psoxy connections are made to *your own service accounts, provisioned
by you and residing in your own GCP project*. They do not belong to a 3rd party. As such you
need not trust a number shown to you in a web app or email; you can use the GCP web console,
CLI, etc to confirm the validity of the service account ID independently.
- Your GCP logs can provide transparency into the usage of the service account, to validate what
data it is being used to access, and from where.
- You remain in control of the only key that can be used to authenticate as the service account -
you may revoke/rotate this key at any moment should you suspect malicious activity.

Hence, using DWD via Psoxy is more secure than the typical DWD scenario that many security
researchers complain about.

If you remain uncomfortable with DWD, a private Google Marketplace App is a possible alternative,
albeit more tedious to configure. It requires a dedicated GCP project, with additional APIs enabled
in the project.




16 changes: 16 additions & 0 deletions docs/gcp/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,19 @@ If some resources seem to not be properly provisioned, try `terraform taint` or
to force re-creation. Use `terrafrom state list | grep` to search for specific resource ids.


## Error 400 : One or more users named in policy do not belong to a permitted Customer

If you receive an error such as:

```
Error: Error applying IAM policy for cloudfunctions cloudfunction googleapi: Error 400: One or more users named in the policy do not belong to a permitted customer.
```

This may be due to an [Organization Policy](https://cloud.google.com/resource-manager/docs/organization-policy/overview)
that restricts the domains that can be used in IAM policies. See
https://cloud.google.com/resource-manager/docs/organization-policy/restricting-domains

You may need define an exception for the GCP project in which you're deploying the proxy, or add
the domain of your Worklytics Tenant SA to the list of allowed domains.


51 changes: 49 additions & 2 deletions docs/sources/google-workspace/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,28 @@ provide can be configured to do this for you if applied regularly.
More information:
https://developers.google.com/workspace/guides/auth-overview


## Provisioning API clients without Terraform
To initially authorize each connector, a sufficiently privileged Google Workspace Admin must make a
Domain-wide Delegation grant to the Oauth Client you create, by pasting its numeric ID and a CSV of
of the required OAuth Scopes into the Google Workspace Admin console. This is a one-time setup step.

If you use the provided Terraform modules (namely, `google-workspace-dwd-connection`), a TODO file
with detailed instructions will be created for you, including the actual numeric ID and scopes
required.

Note that while Domain-wide Delegation is a broad grant of data access, the implementation of it
in proxy is mitigated in several ways because the GCP Service Account resides in your own GCP
project, and remains under your organizes control - unlike the most common Domain-wide Delegation
scenarios which have been the subject of criticism by security researchers. In particular:
- you may directly verify the numeric ID of the service account in the GCP web console, or via the
GCP CLI; you don't need to take our word for it.
- you may monitor and log the use of each service account and its key as you see fit.
- you can ensure there is never more than one active key for each service account, and rotate
keys at any time.
- the key is only used from infrastructure (GCP CLoud Function or Lambda) in your environment; you
should be able to reconcile logs and usage between your GCP and AWS environments should you
desire to ensure there has been no malicious use of the key.

### Provisioning API clients without Terraform

While not recommend, it is possibly to set up Google API clients without Terraform, via the GCP web
console:
Expand All @@ -129,3 +149,30 @@ console:

NOTE: you could also use a single Service Account for everything, but you will need to store it's
key repeatedly in AWS/GCP as the `SERVICE_ACCOUNT_KEY` for each of your Google Workspace connections.

## Domain-wide Delegation Alternative
If you remain uncomfortable with Domain-wide Delegation, a private Google Marketplace App is a
possible, if tedious and harder to maintain, alternative. Here are some trade-offs:

Pros:
- Google Workspace Admins may perform a single Marketplace installation, instead of multiple DWD
grants via the admin console
- "install" from the Google Workspace Marketplace is less error-prone/exploitable than copy-paste
a numeric service account ID
- visual confirmation of the oauth scopes being granted by the install
- ability to "install" for a Org Unit, rather than the entire domain

Cons:
- you must use a dedicated GCP project for the Marketplace App; "installation" of a Google
Marketplace App grants all the service accounts in the project access to the listed oauth scopes.
You must undeterstand the the OAuth grant is to the project, not a specific service account.
- you must enable additional APIs in the GCP project (marketplace SDK).
- as of Dec 2023, Marketplace Apps cannot be completely managed by Terraform resources; so there
are more out-of-band steps that someone must complete by hand to create the App; and a simple
`terraform destroy` will not remove the associated infrastructure. In contrast,
`terraform destroy` in the DWD approach will result in revocation of the access grants when the
service account is deleted.
- You must monitor how many service accounts exist in the project and ensure only the expected
ons are created. Note that all Google Workspace API access, as of Dec 2023, requires the
service account to authenticate with a key; so any SA without a key provisioned cannot access
your data.
Loading

0 comments on commit 18c4364

Please sign in to comment.