Skip to content

Commit 679d906

Browse files
Origin/feature generate examples (#1)
* Added generate examples script and trusted dashboard table
1 parent 9f8f60b commit 679d906

File tree

13 files changed

+24119
-9452
lines changed

13 files changed

+24119
-9452
lines changed

Diff for: .gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ terraform.tfstate*
88
*.tfstate
99
.venv
1010
node_modules
11+
looker.ini
1112

1213
.vertex_cf_auth_token
1314
dist

Diff for: explore-assistant-backend/terraform/bigquery_examples.tf

+25
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,31 @@ resource "google_bigquery_job" "create_explore_assistant_examples_table" {
4242
}
4343
}
4444

45+
resource "google_bigquery_job" "create_explore_assistant_examples_table" {
46+
job_id = "create_explore_assistant_examples_table-${formatdate("YYYYMMDDhhmmss", timestamp())}"
47+
query {
48+
query = <<EOF
49+
CREATE OR REPLACE TABLE `${google_bigquery_dataset.dataset.dataset_id}.trusted_dashboards` (
50+
explore_id STRING OPTIONS (description = 'Explore id of the explore to pull examples for in a format of -> lookml_model:lookml_explore'),
51+
lookml STRING OPTIONS (description = 'LookML dashboard copy for authoritative dashboard(s) based on the given explore_id.')
52+
)
53+
EOF
54+
create_disposition = ""
55+
write_disposition = ""
56+
allow_large_results = false
57+
flatten_results = false
58+
maximum_billing_tier = 0
59+
schema_update_options = [ ]
60+
use_legacy_sql = false
61+
}
62+
63+
location = var.deployment_region
64+
depends_on = [ time_sleep.wait_after_apis_activate]
65+
66+
lifecycle {
67+
ignore_changes = [query, job_id]
68+
}
69+
}
4570

4671
resource "google_bigquery_job" "create_explore_assistant_refinement_examples_table" {
4772
job_id = "create_explore_assistant_refinement_examples_table-${formatdate("YYYYMMDDhhmmss", timestamp())}"

Diff for: explore-assistant-examples/README.md

+42-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# BigQuery Data Loader
22

3-
This script facilitates the loading of JSON data into Google BigQuery while managing data freshness by ensuring existing rows related to an `explore_id` are deleted before new data is inserted. The script employs a temporary table mechanism to circumvent limitations related to immediate updates or deletions in BigQuery's streaming buffer.
3+
This folder includes two scripts.
4+
The first script (generate_examples.py) will create input/output example pairs for training or one-shot use. These are based on the top queries for a chosen model and explore. The script will also create measure and dimension lists for later use.
5+
6+
The loading script (load_examples.py) facilitates the loading of JSON data into Google BigQuery while managing data freshness by ensuring existing rows related to an `explore_id` are deleted before new data is inserted. The script employs a temporary table mechanism to circumvent limitations related to immediate updates or deletions in BigQuery's streaming buffer.
47

58
## Prerequisites
69

@@ -10,6 +13,11 @@ Before you run this script, you need to ensure that your environment is set up w
1013
2. **Google Cloud SDK** - Install and configure the Google Cloud SDK (gcloud).
1114
3. **BigQuery API Access** - Ensure that the BigQuery API is enabled in your Google Cloud project.
1215
4. **Google Cloud Authentication** - Set up authentication by downloading a service account key and setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable pointing to that key file.
16+
5. **Looker SDK Initialization** - Set up authentication for the Looker SDK by specifying these variables:
17+
`LOOKERSDK_BASE_URL` A URL like https://my.looker.com:19999. No default value.
18+
`LOOKERSDK_CLIENT_ID` API credentials client_id. This and client_secret must be provided in some fashion to the Node SDK, or no calls to the API will be authorized. No default value.
19+
`LOOKERSDK_CLIENT_SECRET` API credentials client_secret. No default value.
20+
1321

1422
## Setup
1523

@@ -23,7 +31,7 @@ pip install -r requirements.txt
2331
```
2432
## Usage
2533

26-
### Script Parameters
34+
### Loading Script Parameters
2735

2836
The script accepts several command line arguments to specify the details required for loading data into BigQuery:
2937

@@ -33,7 +41,7 @@ The script accepts several command line arguments to specify the details require
3341
- `--explore_id`: **Required.** A unique identifier for the dataset rows related to a specific use case or query (used in deletion and insertion).
3442
- `--json_file`: The path to the JSON file containing the data to be loaded. Defaults to `examples.json`.
3543

36-
### Running the Script
44+
### Running the Loading Script
3745

3846
**Before Running:** make sure the .env file in this directory is updated to reference your project_id, dataset_id and explore_id
3947

@@ -79,9 +87,19 @@ chmod +x update_examples.sh
7987
./update_examples.sh
8088
```
8189

90+
91+
Load the trusted dashboard lookml
92+
93+
```bash
94+
python load_examples.py --project_id YOUR_PROJECT_ID --explore_id YOUR_EXPLORE_ID --table_id trusted_dashboards --json_file trusted_dashboards.lkml --format text --column_name lookml
95+
=======
96+
chmod +x update_examples.sh
97+
```
98+
99+
82100
### Description
83101

84-
This Python script is designed to manage data uploads from a JSON file into a Google BigQuery table, particularly focusing on scenarios where specific entries identified by an `explore_id` need to be refreshed or updated in the dataset.
102+
The load_examples Python script is designed to manage data uploads from a JSON file into a Google BigQuery table, particularly focusing on scenarios where specific entries identified by an `explore_id` need to be refreshed or updated in the dataset.
85103

86104
1. **Command Line Interface (CLI)**:
87105
- The script uses `argparse` to define and handle command line inputs that specify the Google Cloud project, dataset, and table details, as well as the path to the JSON data file.
@@ -100,3 +118,23 @@ This Python script is designed to manage data uploads from a JSON file into a Go
100118

101119
6. **Error Handling**:
102120
- Throughout the data deletion and insertion processes, the script checks for and reports any errors that occur. This is vital for debugging and ensuring data integrity.
121+
122+
### Generation Script Parameters
123+
The generate_examples.py script accepts several command line arguments to specify the details required for generating example files:
124+
125+
- `--model`: Required. Looker model name.
126+
- `--explore`: Required. Looker explore name.
127+
- `--project_id`: Required. Google Cloud project ID.
128+
- `--location`: Required. Google Cloud location.
129+
130+
# Running the Generation Script
131+
The generate_examples.py script fetches information about an explores' fields and top queries. It calls Gemini to generate sample questions that could be answered by the top queries. These can be tuned or used directly as examples to upload to the Explore Assistant.
132+
133+
```bash
134+
python generate_examples.py --model YOUR_MODEL_NAME --explore YOUR_EXPLORE_NAME --project_id YOUR_GCP_PROJECT_ID --location YOUR_GCP_LOCATION
135+
```
136+
137+
If desired, you can directly upload the files after generation by using the --chain_load argument.
138+
```bash
139+
python generate_examples.py --model YOUR_MODEL_NAME --explore YOUR_EXPLORE_NAME --project_id YOUR_GCP_PROJECT_ID --location YOUR_GCP_LOCATION --chain_load
140+
```

0 commit comments

Comments
 (0)