Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_snafu does not protect against bad JSON data in certain failures #372

Open
RobertKrawitz opened this issue Nov 5, 2021 · 2 comments

Comments

@RobertKrawitz
Copy link

Running hammerdb-mssql on kata, we've seen occasional errors indicating parse failure on JSON output. The pod then errors out and there's no way to capture the bad JSON.

IMO it should catch the exception and print a more useful diagnostic, perhaps the bad JSON data.

[root@perf-sm5039-3-1 benchmark-runner]# oc logs hammerdb-kata-workload-27917bd1--1-swjfq
2021-11-05T19:16:05Z - INFO     - MainProcess - run_snafu: logging level is INFO
2021-11-05T19:16:05Z - INFO     - MainProcess - _load_benchmarks: Successfully imported 3 benchmark modules: coremarkpro, systemd_analyze, uperf
2021-11-05T19:16:05Z - INFO     - MainProcess - _load_benchmarks: Failed to import 0 benchmark modules: 
2021-11-05T19:16:05Z - INFO     - MainProcess - run_snafu: Using elasticsearch server with host: http://10.1.184.179:9200
2021-11-05T19:16:05Z - INFO     - MainProcess - run_snafu: Using index prefix for ES: hammerdb-test-ci
2021-11-05T19:16:05Z - INFO     - MainProcess - run_snafu: Connected to the elasticsearch cluster with info as follows:
/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.14/security-minimal-setup.html to enable security.
  warnings.warn(message, category=ElasticsearchWarning)
2021-11-05T19:16:05Z - INFO     - MainProcess - run_snafu: {
    "name": "894e68c1af5d",
    "cluster_name": "docker-cluster",
    "cluster_uuid": "TVGTUkgJStG0xhcZNXGx3w",
    "version": {
        "number": "7.14.0",
        "build_flavor": "default",
        "build_type": "docker",
        "build_hash": "dd5a0a2acaa2045ff9624f3729fc8a6f40835aa1",
        "build_date": "2021-07-29T20:49:32.864135063Z",
        "build_snapshot": false,
        "lucene_version": "8.9.0",
        "minimum_wire_compatibility_version": "6.8.0",
        "minimum_index_compatibility_version": "6.0.0-beta1"
    },
    "tagline": "You Know, for Search"
}
2021-11-05T19:16:05Z - INFO     - MainProcess - py_es_bulk: Using streaming bulk indexer
2021-11-05T19:16:05Z - INFO     - MainProcess - wrapper_factory: identified hammerdb as the benchmark wrapper
2021-11-05T19:16:05Z - INFO     - MainProcess - trigger_hammerdb: Starting hammerdb run
2021-11-05T19:16:53Z - INFO     - MainProcess - trigger_hammerdb: Parsing stdout
2021-11-05T19:16:53Z - INFO     - MainProcess - trigger_hammerdb: generating json payload
Traceback (most recent call last):
  File "/usr/local/bin/run_snafu", line 33, in <module>
    sys.exit(load_entry_point('snafu', 'console_scripts', 'run_snafu')())
  File "/opt/snafu/snafu/run_snafu.py", line 142, in main
    es, process_generator(index_args, parser), parallel_setting
  File "/opt/snafu/snafu/utils/py_es_bulk.py", line 172, in streaming_bulk
    for ok, resp_payload in streaming_bulk_generator:
  File "/usr/local/lib/python3.6/site-packages/elasticsearch/helpers/actions.py", line 320, in streaming_bulk
    actions, chunk_size, max_chunk_bytes, client.transport.serializer
  File "/usr/local/lib/python3.6/site-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
    for action, data in actions:
  File "/opt/snafu/snafu/utils/py_es_bulk.py", line 118, in actions_tracking_closure
    for cl_action in cl_actions:
  File "/opt/snafu/snafu/run_snafu.py", line 199, in process_generator
    for action, index in data_object.emit_actions():
  File "/opt/snafu/snafu/hammerdb/trigger_hammerdb.py", line 284, in emit_actions
    timestamp,
  File "/opt/snafu/snafu/hammerdb/trigger_hammerdb.py", line 178, in _json_payload
    "worker": data[i][0],
IndexError: list index out of range
@RobertKrawitz
Copy link
Author

@ebattat

@RobertKrawitz
Copy link
Author

I have a bit more informatoin on this. It appears that it was triggered when the backend mssql database failed. I copied strace into the workload pod and found this:

sh-4.4$ /var/tmp/strace -s 65536 -f -p 6
/var/tmp/strace: Process 6 attached
read(4, "\rError in Virtual User 1: Connection to DRIVER=ODBC Driver 17 for SQL Server;SERVER=tcp:mssql-deployment.mssql-db,1433;UID=SA;PWD=XXXXXXXXXXX could not be established : [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired\n[Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2749\n[Microsoft][ODBC Driver 17 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.\n(connecting to database)\n", 3283) = 672
read(4, "\rVuser 1:FINISHED FAILED\n", 2611) = 25
read(4, "\rError in Virtual User 2: Connection to DRIVER=ODBC Driver 17 for SQL Server;SERVER=tcp:mssql-deployment.mssql-db,1433;UID=SA;PWD=XXXXXXXXXXX could not be established : [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired\n[Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2749\n[Microsoft][ODBC Driver 17 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.\n(connecting to database)\n", 2586) = 672
read(4, "\rVuser 2:FINISHED FAILED\n", 1914) = 25
read(4, "\rError in Virtual User 3: Connection to DRIVER=ODBC Driver 17 for SQL Server;SERVER=tcp:mssql-deployment.mssql-db,1433;UID=SA;PWD=XXXXXXXXXXX could not be established : [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired\n[Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2749\n[Microsoft][ODBC Driver 17 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.\n(connecting to database)\n", 1889) = 672

I can't find the resulting JSON in the strace log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant