Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to send rsp msg intermittently #1

Open
leiyu-bytedance opened this issue Apr 2, 2021 · 5 comments
Open

Failed to send rsp msg intermittently #1

leiyu-bytedance opened this issue Apr 2, 2021 · 5 comments

Comments

@leiyu-bytedance
Copy link

The kcsbridge gets Failed to send rsp msg error intermittently, where the return value is 0 and the ERROR is "Invalid argument"

The full json-pretty output is:

{
        "CODE_LINE" : "76",
        "_BOOT_ID" : "fc13807c5e13439983e879841967cb6a",
        "_SYSTEMD_INVOCATION_ID" : "d47c47ccf8654964beaea2e3b7962eb3",
        "_COMM" : "kcsbridged",
        "TRANSACTION_ID" : "65467255",
        "__CURSOR" : "s=61672c7be0e848ec99ac8c10313b1bc6;i=11fc;b=fc13807c5e13439983e879841967cb6a;m=2e3a80a8d;t=5bee385a56da0;x=b2bef7d1394a65f0",
        "ERROR" : "Invalid argument",
        "CMD" : "0x23",
        "_SYSTEMD_SLICE" : "system-phosphor\\x2dipmi\\x2dkcs.slice",
        "_MACHINE_ID" : "491b3315fc3749a0becbab8778243aff",
        "_CAP_EFFECTIVE" : "1ffffffffff",
        "SIZE" : "0",
        "__REALTIME_TIMESTAMP" : "1617259292618144",
        "LUN" : "0x00",
        "_TRANSPORT" : "journal",
        "__MONOTONIC_TIMESTAMP" : "12409375373",
        "SYSLOG_IDENTIFIER" : "kcsbridged",
        "_SYSTEMD_UNIT" : "phosphor-ipmi-kcs@ipmi_kcs3.service",
        "_EXE" : "/usr/bin/kcsbridged",
        "_SYSTEMD_CGROUP" : "/system.slice/system-phosphor\\x2dipmi\\x2dkcs.slice/phosphor-ipmi-kcs@ipmi_kcs3.service",
        "_UID" : "0",
        "_CMDLINE" : "/usr/bin/kcsbridged -c ipmi_kcs3",
        "EXPECT" : "10",
        "CC" : "0x00",
        "PRIORITY" : "3",
        "_GID" : "0",
        "CODE_FUNC" : "helper_log",
        "_HOSTNAME" : "g220a",
        "CODE_FILE" : "/usr/src/debug/phosphor-ipmi-kcs/1.0+gitAUTOINC+4a4d1d03d9-r1/recipe-sysroot/usr/include/phosphor-logging/log.hpp",
        "_PID" : "896",
        "_SOURCE_REALTIME_TIMESTAMP" : "1617259292617848",
        "MESSAGE" : "Failed to send rsp msg",
        "NETFN" : "0x0b"
}
@leiyu-bytedance
Copy link
Author

The issue occurs when there are mutliple writes to the kcs device wihtout read.
From the kcs driver, it shows that if one write to the kcs device, it can NOT be written again:

  1     if (kcs_bmc->phase == KCS_PHASE_WAIT_READ) {
  2         kcs_bmc->phase = KCS_PHASE_READ;
  3         kcs_bmc->data_out_idx = 1;
  4         kcs_bmc->data_out_len = count;
  5         memcpy(kcs_bmc->data_out, kcs_bmc->kbuffer, count);
  6         write_data(kcs_bmc, kcs_bmc->data_out[0]);
  7         ret = count;
  8     } else {
  9         ret = -EINVAL;
 10     }

@wak-google
Copy link
Contributor

It can also happen if the host decides to reset the state machine before the bridge finishes writing, but before a new command is sent down the channel and read().

I fixed some edge cases around this behavior in a rewrite of the bridge
https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/src/main.cpp#L127

However, I still haven't made the EINVAL error message more clear as a normal (non-exceptional) error case that the host can direct.

@leiyu-bytedance
Copy link
Author

It can also happen if the host decides to reset the state machine before the bridge finishes writing, but before a new command is sent down the channel and read().

I fixed some edge cases around this behavior in a rewrite of the bridge
https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/src/main.cpp#L127

However, I still haven't made the EINVAL error message more clear as a normal (non-exceptional) error case that the host can direct.

Good to know there is another implementation of kcsbridge.

Comparing the code, it looks like:

  • This repo uses async calls to read the fd, and async calls to ipmid, and sync call to write the fd.
  • Google's new kcsbridge uses epoll (assuming it's epoll) on the fd and read the data from fd, then make async calls to ipmid, and sync call to write the fd.

@wak-google Could you kindly tell which the edge cases are? And will you push Google's re-write of kcsbridge to the community?

@leiyu-bytedance
Copy link
Author

leiyu-bytedance commented Apr 9, 2021

We have verified that the rewrite of kcsbridge in https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/ works fine.
This issue is not reproduced.

I would propose that google push the re-write upstream :)

@wak-google
Copy link
Contributor

Fixed via 03e6def

zhangjian3032 pushed a commit to zhangjian3032/openbmc that referenced this issue Nov 7, 2022
The kcsbridge has bug (openbmc/kcsbridge#1) and google has a rewrite of
this function.
Use Google's kcsbridge implmentation.

Tested: The above issue is not reproduced.

Signed-off-by: Lei YU <[email protected]>
Change-Id: I60e2a63bdf9026bea19db92492e81696d6c6d115
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants