Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1085] [Bug] When using iceberg format, dbt docs generate is unable to populate the columns information #485

Open
2 tasks done
shishircc opened this issue Jan 3, 2024 · 1 comment
Labels
feature:iceberg Issues related to Iceberg support help-wanted Extra attention is needed pkg:dbt-spark Issue affects dbt-spark type:bug Something isn't working as documented

Comments

@shishircc
Copy link

shishircc commented Jan 3, 2024

Is this a new bug in dbt-spark?

  • I believe this is a new bug in dbt-spark
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When using iceberg table format, dbt docs generate creates empty catalog.json and hence provides no column information in documentation

Expected Behavior

DBT docs generate should generate properly populated catalog.json

Steps To Reproduce

  1. Configure EMR for working with iceberg and glue catalog
  2. Setup thrift server
  3. Run dbt project on EMR using thrift server
  4. Run dbt docs generate

Relevant log output

0m02:50:24.631713 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa619c86bb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617c572b0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617c57a60>]}


============================== 02:50:24.638185 | c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35 ==============================
[0m02:50:24.638185 [info ] [MainThread]: Running with dbt=1.7.4
[0m02:50:24.639539 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'False', 'fail_fast': 'False', 'log_path': '/home/ec2-user/environment/dbtproject/dags/dbt_blueprint/c360-datalake/logs', 'version_check': 'True', 'profiles_dir': '/home/ec2-user/.dbt', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'invocation_command': 'dbt docs generate --vars {"day": "31","hour": "0","month": "12","raw_bucket":"c360-raw-data-*****-us-east-1","ts": "2023-12-31T00:00:00+00:00","year": "2023"}', 'introspect': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
[0m02:50:24.958174 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617bece50>]}
[0m02:50:25.203736 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6177c2af0>]}
[0m02:50:25.204633 [info ] [MainThread]: Registered adapter: spark=1.7.0
[0m02:50:25.223511 [debug] [MainThread]: checksum: 577537e0073da8fb99e9f3abffc643b153c4ab719d0d0e1e2dce7637653d4e74, vars: {'day': '31',
 'hour': '0',
 'month': '12',
 'raw_bucket': 'c360-raw-data-********-us-east-1',
 'ts': '2023-12-31T00:00:00+00:00',
 'year': '2023'}, profile: , target: , version: 1.7.4
[0m02:50:25.260540 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
[0m02:50:25.261176 [debug] [MainThread]: Partial parsing enabled, no changes found, skipping parsing
[0m02:50:25.269396 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6175a6f10>]}
[0m02:50:25.272216 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6176cb2b0>]}
[0m02:50:25.272952 [info ] [MainThread]: Found 7 models, 6 sources, 0 exposures, 0 metrics, 439 macros, 0 groups, 0 semantic models
[0m02:50:25.273827 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6176cb2e0>]}
[0m02:50:25.276742 [info ] [MainThread]: 
[0m02:50:25.278117 [debug] [MainThread]: Acquiring new spark connection 'master'
[0m02:50:25.280561 [debug] [ThreadPool]: Acquiring new spark connection 'list_None_c360bronze'
[0m02:50:25.295222 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
[0m02:50:25.295972 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.296507 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show table extended in c360bronze like '*'
  
[0m02:50:25.296985 [debug] [ThreadPool]: Opening a new connection, currently in state init
[0m02:50:25.447602 [debug] [ThreadPool]: Spark adapter: Poll response: TGetOperationStatusResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), operationState=5, sqlState=None, errorCode=0, errorMessage='org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;\nShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]\n+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]\n\n\tat org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43)\n\tat 
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)\n\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:226)\n\t... 16 more\n', taskStatus=None, operationStarted=None, operationCompleted=None, hasResultSet=None, progressUpdateResponse=None)
[0m02:50:25.448538 [debug] [ThreadPool]: Spark adapter: Poll status: 5
[0m02:50:25.449121 [debug] [ThreadPool]: Spark adapter: Error while running:
/* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show table extended in c360bronze like '*'
  
[0m02:50:25.449863 [debug] [ThreadPool]: Spark adapter: Database Error
  org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
  ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
  +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
  
  	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43)
  	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:261)

  Caused by: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
  ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
  +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
  
  	at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:2040)
  	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:224)
  	... 16 more
  
[0m02:50:25.450777 [debug] [ThreadPool]: Spark adapter: Error while running:
macro list_relations_without_caching
[0m02:50:25.451525 [debug] [ThreadPool]: Spark adapter: Runtime Error
  Database Error
    org.apache.hive.service.cli.HiveSQLException: Error running query: [_LEGACY_ERROR_TEMP_1200] org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
    ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
    +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
    
    	at java.lang.Thread.run(Thread.java:750)
    Caused by: org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
    ShowTableExtended *, [namespace#9839, tableName#9840, isTemporary#9841, information#9842]
    +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@6b21d3da, [c360bronze]
    
    	at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:2040)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:224)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1$adapted(CheckAnalysis.scala:163)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:338)
    	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:226)
    	... 16 more
    
[0m02:50:25.457505 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.458084 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
show tables in c360bronze like '*'
  
[0m02:50:25.697421 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:25.698186 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:25.708726 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.709422 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_clickstream
  
[0m02:50:25.895379 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:25.896238 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:25.905061 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:25.905728 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_clickstream2
  
[0m02:50:26.128223 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.128935 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.139356 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.140353 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__cart_items
  
[0m02:50:26.370865 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.371741 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.381025 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.381722 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__customer
  
[0m02:50:26.572163 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.573853 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.584021 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.584680 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__order_items
  
[0m02:50:26.783624 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.784357 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.795421 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.796071 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__product
  
[0m02:50:26.987528 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:26.988230 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:26.996066 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:26.996669 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_salesdb__product_rating
  
[0m02:50:27.228290 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:27.229005 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:27.237495 [debug] [ThreadPool]: Using spark connection "list_None_c360bronze"
[0m02:50:27.238161 [debug] [ThreadPool]: On list_None_c360bronze: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "list_None_c360bronze"} */
describe extended c360bronze.stg_supportdb__support_chat
  
[0m02:50:27.439212 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:27.439901 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
[0m02:50:27.445604 [debug] [ThreadPool]: On list_None_c360bronze: ROLLBACK
[0m02:50:27.446268 [debug] [ThreadPool]: Spark adapter: NotImplemented: rollback
[0m02:50:27.446799 [debug] [ThreadPool]: On list_None_c360bronze: Close
[0m02:50:27.570900 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c74f1dd7-ae7c-4cbe-aa70-cdad60a70d35', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa617832d60>]}
[0m02:50:27.572181 [info ] [MainThread]: Concurrency: 1 threads (target='dev')
[0m02:50:27.573155 [info ] [MainThread]: 
[0m02:50:27.576111 [debug] [Thread-1  ]: Began running node model.c360.stg_clickstream
[0m02:50:27.578024 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly list_None_c360bronze, now model.c360.stg_clickstream)
[0m02:50:27.578766 [debug] [Thread-1  ]: Began compiling node model.c360.stg_clickstream
[0m02:50:27.603421 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_clickstream"
[0m02:50:27.604594 [debug] [Thread-1  ]: Timing info for model.c360.stg_clickstream (compile): 02:50:27.579148 => 02:50:27.604210
[0m02:50:27.605277 [debug] [Thread-1  ]: Began executing node model.c360.stg_clickstream
[0m02:50:27.606030 [debug] [Thread-1  ]: Timing info for model.c360.stg_clickstream (execute): 02:50:27.605629 => 02:50:27.605653
[0m02:50:27.607610 [debug] [Thread-1  ]: Finished running node model.c360.stg_clickstream
[0m02:50:27.608426 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__cart_items
[0m02:50:27.609975 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_clickstream, now model.c360.stg_salesdb__cart_items)
[0m02:50:27.610700 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__cart_items
[0m02:50:27.619178 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__cart_items"
[0m02:50:27.620232 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__cart_items (compile): 02:50:27.611076 => 02:50:27.619876
[0m02:50:27.620860 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__cart_items
[0m02:50:27.621648 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__cart_items (execute): 02:50:27.621216 => 02:50:27.621229
[0m02:50:27.624942 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__cart_items
[0m02:50:27.625894 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__customer
[0m02:50:27.627310 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__cart_items, now model.c360.stg_salesdb__customer)
[0m02:50:27.628163 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__customer
[0m02:50:27.635786 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__customer"
[0m02:50:27.636779 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__customer (compile): 02:50:27.628650 => 02:50:27.636440
[0m02:50:27.637526 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__customer
[0m02:50:27.638335 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__customer (execute): 02:50:27.637949 => 02:50:27.637961
[0m02:50:27.639622 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__customer
[0m02:50:27.640276 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__order_items
[0m02:50:27.641442 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__customer, now model.c360.stg_salesdb__order_items)
[0m02:50:27.642196 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__order_items
[0m02:50:27.650906 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__order_items"
[0m02:50:27.652276 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__order_items (compile): 02:50:27.642683 => 02:50:27.651808
[0m02:50:27.653043 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__order_items
[0m02:50:27.653692 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__order_items (execute): 02:50:27.653397 => 02:50:27.653410
[0m02:50:27.655082 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__order_items
[0m02:50:27.655742 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__product
[0m02:50:27.656697 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__order_items, now model.c360.stg_salesdb__product)
[0m02:50:27.657630 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__product
[0m02:50:27.744326 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__product"
[0m02:50:27.745610 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product (compile): 02:50:27.658152 => 02:50:27.745075
[0m02:50:27.746830 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__product
[0m02:50:27.747641 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product (execute): 02:50:27.747323 => 02:50:27.747337
[0m02:50:27.749546 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__product
[0m02:50:27.750300 [debug] [Thread-1  ]: Began running node model.c360.stg_salesdb__product_rating
[0m02:50:27.751857 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__product, now model.c360.stg_salesdb__product_rating)
[0m02:50:27.752630 [debug] [Thread-1  ]: Began compiling node model.c360.stg_salesdb__product_rating
[0m02:50:27.760353 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_salesdb__product_rating"
[0m02:50:27.761355 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product_rating (compile): 02:50:27.753148 => 02:50:27.761013
[0m02:50:27.762149 [debug] [Thread-1  ]: Began executing node model.c360.stg_salesdb__product_rating
[0m02:50:27.762897 [debug] [Thread-1  ]: Timing info for model.c360.stg_salesdb__product_rating (execute): 02:50:27.762503 => 02:50:27.762526
[0m02:50:27.764336 [debug] [Thread-1  ]: Finished running node model.c360.stg_salesdb__product_rating
[0m02:50:27.765582 [debug] [Thread-1  ]: Began running node model.c360.stg_supportdb__support_chat
[0m02:50:27.768105 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly model.c360.stg_salesdb__product_rating, now model.c360.stg_supportdb__support_chat)
[0m02:50:27.768876 [debug] [Thread-1  ]: Began compiling node model.c360.stg_supportdb__support_chat
[0m02:50:27.776378 [debug] [Thread-1  ]: Writing injected SQL for node "model.c360.stg_supportdb__support_chat"
[0m02:50:27.777509 [debug] [Thread-1  ]: Timing info for model.c360.stg_supportdb__support_chat (compile): 02:50:27.769323 => 02:50:27.777059
[0m02:50:27.778705 [debug] [Thread-1  ]: Began executing node model.c360.stg_supportdb__support_chat
[0m02:50:27.779701 [debug] [Thread-1  ]: Timing info for model.c360.stg_supportdb__support_chat (execute): 02:50:27.779171 => 02:50:27.779367
[0m02:50:27.781293 [debug] [Thread-1  ]: Finished running node model.c360.stg_supportdb__support_chat
[0m02:50:27.782634 [debug] [MainThread]: Connection 'master' was properly closed.
[0m02:50:27.783134 [debug] [MainThread]: Connection 'model.c360.stg_supportdb__support_chat' was properly closed.
[0m02:50:27.785129 [debug] [MainThread]: Command end result
[0m02:50:27.800011 [debug] [MainThread]: Acquiring new spark connection 'generate_catalog'
[0m02:50:27.800584 [info ] [MainThread]: Building catalog
[0m02:50:27.804828 [debug] [ThreadPool]: Acquiring new spark connection 'spark_catalog.c360raw'
[0m02:50:27.805619 [debug] [ThreadPool]: On "spark_catalog.c360raw": cache miss for schema ".spark_catalog.c360raw", this is inefficient
[0m02:50:27.811565 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
[0m02:50:27.812110 [debug] [ThreadPool]: Using spark connection "spark_catalog.c360raw"
[0m02:50:27.812600 [debug] [ThreadPool]: On spark_catalog.c360raw: /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "c360", "target_name": "dev", "connection_name": "spark_catalog.c360raw"} */
show table extended in spark_catalog.c360raw like '*'
  
[0m02:50:27.813343 [debug] [ThreadPool]: Opening a new connection, currently in state init
[0m02:50:30.996262 [debug] [ThreadPool]: Spark adapter: Poll status: 2, query complete
[0m02:50:30.997081 [debug] [ThreadPool]: SQL status: OK in 3.0 seconds
[0m02:50:31.017468 [debug] [ThreadPool]: While listing relations in database=, schema=spark_catalog.c360raw, found: cart_items, customer, order_items, product, product_rating, simulation, support_chat
[0m02:50:31.018414 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.cart_items
[0m02:50:31.019253 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.customer
[0m02:50:31.020171 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.order_items
[0m02:50:31.020980 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.product
[0m02:50:31.021837 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.product_rating
[0m02:50:31.022754 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.simulation
[0m02:50:31.023464 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360raw.support_chat
[0m02:50:31.030727 [debug] [ThreadPool]: On spark_catalog.c360raw: ROLLBACK
[0m02:50:31.032244 [debug] [ThreadPool]: Spark adapter: NotImplemented: rollback
[0m02:50:31.034169 [debug] [ThreadPool]: On spark_catalog.c360raw: Close
[0m02:50:31.153712 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly spark_catalog.c360raw, now c360bronze)
[0m02:50:31.154930 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_clickstream
[0m02:50:31.156753 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_clickstream2
[0m02:50:31.159402 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__cart_items
[0m02:50:31.160427 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__customer
[0m02:50:31.161208 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__order_items
[0m02:50:31.161775 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__product
[0m02:50:31.162315 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_salesdb__product_rating
[0m02:50:31.162877 [debug] [ThreadPool]: Spark adapter: Getting table schema for relation c360bronze.stg_supportdb__support_chat
[0m02:50:31.191842 [info ] [MainThread]: Catalog written to /home/ec2-user/environment/dbtproject/dags/dbt_blueprint/c360-datalake/target/catalog.json
[0m02:50:31.195884 [debug] [MainThread]: Resource report: {"command_name": "generate", "command_success": true, "command_wall_clock_time": 6.6277905, "process_user_time": 3.394287, "process_kernel_time": 0.149442, "process_mem_max_rss": "104176", "process_out_blocks": "4960", "process_in_blocks": "0"}
[0m02:50:31.198437 [debug] [MainThread]: Command `dbt docs generate` succeeded at 02:50:31.198171 after 6.63 seconds
[0m02:50:31.199040 [debug] [MainThread]: Connection 'generate_catalog' was properly closed.
[0m02:50:31.199666 [debug] [MainThread]: Connection 'c360bronze' was properly closed.
[0m02:50:31.200187 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa619c86bb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6179c4370>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fa6177c2af0>]}
[0m02:50:31.201812 [debug] [MainThread]: Flushing usage events

Environment

- OS:Amazon linux {"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.7.4", "generated_at": "2024-01-02T02:39:55.359210Z", "invocation_id": "4f9b9ed4-e962-49bf-8329-df43b335419a", "env": {}}, "nodes": {}, "sources": {}, "errors": null}
- Python: 3.10
- dbt-core: 1.7.4
- dbt-spark: 1.7.0

Additional Context

This is the catalog.json generated by dbt docs generate ...
{"metadata": {"dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json", "dbt_version": "1.7.4", "generated_at": "2024-01-02T02:39:55.359210Z", "invocation_id": "4f9b9ed4-e962-49bf-8329-df43b335419a", "env": {}}, "nodes": {}, "sources": {}, "errors": null}

@shishircc shishircc added type:bug Something isn't working as documented triage:product In Product's queue labels Jan 3, 2024
@github-actions github-actions bot changed the title [Bug] When using iceberg format doc generate misses the columns information [ADAP-1085] [Bug] When using iceberg format doc generate misses the columns information Jan 3, 2024
@shishircc shishircc changed the title [ADAP-1085] [Bug] When using iceberg format doc generate misses the columns information [ADAP-1085] [Bug] When using iceberg format docs generate is unable to populate the columns information Jan 3, 2024
@shishircc shishircc changed the title [ADAP-1085] [Bug] When using iceberg format docs generate is unable to populate the columns information [ADAP-1085] [Bug] When using iceberg format, dbt docs generate is unable to populate the columns information Jan 3, 2024
@shishircc
Copy link
Author

shishircc commented Jan 4, 2024

emr_dag_automation_blueprint.py.txt
Attached DAG can be used to create the EMR with the above configuration

Here is the requirements.txt

--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt"

apache-airflow==2.6.3
apache-airflow-providers-salesforce
apache-airflow-providers-apache-spark
apache-airflow-providers-amazon
apache-airflow-providers-postgres
apache-airflow-providers-mongo
apache-airflow-providers-ssh
apache-airflow-providers-common-sql
astronomer-cosmos
boto3
simplejson
pymongo
pymssql
smart-open
psycopg2==2.9.5
simple-salesforce

@dbeatty10 dbeatty10 added the feature:iceberg Issues related to Iceberg support label Feb 7, 2024
@Fleid Fleid added help-wanted Extra attention is needed and removed triage:product In Product's queue labels Feb 22, 2024
@mikealfare mikealfare added the pkg:dbt-spark Issue affects dbt-spark label Jan 13, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-spark Jan 13, 2025
mikealfare pushed a commit that referenced this issue Jan 14, 2025
* Fix regression require_partition_filter usage
with insert_overwrite fails on second run

* review changes

* merge remote

* test clean up

* test clean up

* remove log call

---------

Co-authored-by: colin-rogers-dbt <[email protected]>
Co-authored-by: Colin <[email protected]>
mikealfare pushed a commit that referenced this issue Jan 20, 2025
* convert test_store_test_failures to functional test

* test not requiring a region in hostname

* remove dev-requirements.txt change

* add unit test and clean up logic

* add changie

* simplify region inference logic

* remove region logic entirely
mikealfare pushed a commit that referenced this issue Jan 24, 2025
colin-rogers-dbt pushed a commit that referenced this issue Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:iceberg Issues related to Iceberg support help-wanted Extra attention is needed pkg:dbt-spark Issue affects dbt-spark type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

4 participants