You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: explore-assistant-examples/README.md
+42-4
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,9 @@
1
1
# BigQuery Data Loader
2
2
3
-
This script facilitates the loading of JSON data into Google BigQuery while managing data freshness by ensuring existing rows related to an `explore_id` are deleted before new data is inserted. The script employs a temporary table mechanism to circumvent limitations related to immediate updates or deletions in BigQuery's streaming buffer.
3
+
This folder includes two scripts.
4
+
The first script (generate_examples.py) will create input/output example pairs for training or one-shot use. These are based on the top queries for a chosen model and explore. The script will also create measure and dimension lists for later use.
5
+
6
+
The loading script (load_examples.py) facilitates the loading of JSON data into Google BigQuery while managing data freshness by ensuring existing rows related to an `explore_id` are deleted before new data is inserted. The script employs a temporary table mechanism to circumvent limitations related to immediate updates or deletions in BigQuery's streaming buffer.
4
7
5
8
## Prerequisites
6
9
@@ -10,6 +13,11 @@ Before you run this script, you need to ensure that your environment is set up w
10
13
2.**Google Cloud SDK** - Install and configure the Google Cloud SDK (gcloud).
11
14
3.**BigQuery API Access** - Ensure that the BigQuery API is enabled in your Google Cloud project.
12
15
4.**Google Cloud Authentication** - Set up authentication by downloading a service account key and setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable pointing to that key file.
16
+
5.**Looker SDK Initialization** - Set up authentication for the Looker SDK by specifying these variables:
17
+
`LOOKERSDK_BASE_URL` A URL like https://my.looker.com:19999. No default value.
18
+
`LOOKERSDK_CLIENT_ID` API credentials client_id. This and client_secret must be provided in some fashion to the Node SDK, or no calls to the API will be authorized. No default value.
19
+
`LOOKERSDK_CLIENT_SECRET` API credentials client_secret. No default value.
20
+
13
21
14
22
## Setup
15
23
@@ -23,7 +31,7 @@ pip install -r requirements.txt
23
31
```
24
32
## Usage
25
33
26
-
### Script Parameters
34
+
### Loading Script Parameters
27
35
28
36
The script accepts several command line arguments to specify the details required for loading data into BigQuery:
29
37
@@ -33,7 +41,7 @@ The script accepts several command line arguments to specify the details require
33
41
-`--explore_id`: **Required.** A unique identifier for the dataset rows related to a specific use case or query (used in deletion and insertion).
34
42
-`--json_file`: The path to the JSON file containing the data to be loaded. Defaults to `examples.json`.
35
43
36
-
### Running the Script
44
+
### Running the Loading Script
37
45
38
46
**Before Running:** make sure the .env file in this directory is updated to reference your project_id, dataset_id and explore_id
This Python script is designed to manage data uploads from a JSON file into a Google BigQuery table, particularly focusing on scenarios where specific entries identified by an `explore_id` need to be refreshed or updated in the dataset.
102
+
The load_examples Python script is designed to manage data uploads from a JSON file into a Google BigQuery table, particularly focusing on scenarios where specific entries identified by an `explore_id` need to be refreshed or updated in the dataset.
85
103
86
104
1.**Command Line Interface (CLI)**:
87
105
- The script uses `argparse` to define and handle command line inputs that specify the Google Cloud project, dataset, and table details, as well as the path to the JSON data file.
@@ -100,3 +118,23 @@ This Python script is designed to manage data uploads from a JSON file into a Go
100
118
101
119
6.**Error Handling**:
102
120
- Throughout the data deletion and insertion processes, the script checks for and reports any errors that occur. This is vital for debugging and ensuring data integrity.
121
+
122
+
### Generation Script Parameters
123
+
The generate_examples.py script accepts several command line arguments to specify the details required for generating example files:
124
+
125
+
-`--model`: Required. Looker model name.
126
+
-`--explore`: Required. Looker explore name.
127
+
-`--project_id`: Required. Google Cloud project ID.
128
+
-`--location`: Required. Google Cloud location.
129
+
130
+
# Running the Generation Script
131
+
The generate_examples.py script fetches information about an explores' fields and top queries. It calls Gemini to generate sample questions that could be answered by the top queries. These can be tuned or used directly as examples to upload to the Explore Assistant.
0 commit comments