-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix 20284]: Enhance smartswitch environment variables parsing #21209
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ze Gan <[email protected]>
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw ms_conflict |
/azpw ms_conflict |
/azpw ms_conflict |
env_vars["IS_DPU_DEVICE"] = (smart_switch_dpu ? "true" : "false"); | ||
env_vars["NUM_DPU"] = std::to_string(num_dpus); | ||
|
||
for (const auto& [key, value] : env_vars) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think services files such as database, swss, syncd are enabled at build time.
Are we certain that, these will be started only after systemd-sonic-generator is run atleast once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also verify the changes on smartswitch platform and make sure multiple database instances are created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments.
Yes, the generator will be run only once and before all services started.
I did confirm the smartswitch scenario and pasted the results on the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing
|
||
std::unordered_map<std::string, std::string> env_vars; | ||
env_vars["IS_DPU_DEVICE"] = (smart_switch_dpu ? "true" : "false"); | ||
env_vars["NUM_DPU"] = std::to_string(num_dpus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good approach to solve this issue but i see one setback. If we need to add new env variables, ssg code should be updated. which IMO is not very flexible. I have an idea for generic solution, let me know what you think
- Add a oneshot service very early in the boot. Read static env variables (Eg: $PLATFORM, $NUM_ASIC, $NUM_DPU, $SONIC_BOOT_TYPE etc) and write them to a common file
/etc/sonic/static-env-variables
- We can leverage ssg to write
EnvironmentFile=/etc/sonic/static-env-variables
option to all the services. making the ssg code minimal and flexible. - We can potentially clean a lot of code under docker_image_ctl with this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I can have a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it, but I feel it wasn't easy yet.
- The shell of oneshot service you mentioned might look like the following. But at this time, the database service hasn't ready, so the sonic-cfggen would not work and I have to parse the
/host/machine.conf
as same as what systemd-sonic-generator did if I would like to get the variable. The SSG has done this by an efficient function, C function, why should I do it again by a shell?
SYSTEMD_ENV_FILE="/etc/sonic/static_env"
# Load platform from sonic-cfggen
PLATFORM=${PLATFORM:-`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`}
echo "PLATFORM='${PLATFORM}'" >> "$SYSTEMD_ENV_FILE"
# Parse environment from platform.json
PLATFORM_JSON=/usr/share/sonic/device/$PLATFORM/platform.json
if [ -f "$PLATFORM_JSON" ]; then
# Environment variables for Smart Switch
NUM_DPU=$(jq -r '.DPUS | length' $PLATFORM_JSON 2>/dev/null)
if [[ -z "$NUM_DPU" ]]; then
NUM_DPU=0
fi
jq -e '.DPU' $PLATFORM_JSON >/dev/null
if [[ $? -eq 0 ]]; then
IS_DPU_DEVICE="true"
else
IS_DPU_DEVICE="false"
fi
echo "NUM_DPU='${NUM_DPU}'" >> "$SYSTEMD_ENV_FILE"
echo "IS_DPU_DEVICE='${IS_DPU_DEVICE}'" >> "$SYSTEMD_ENV_FILE"
fi
- As your proposal, if we want to add new env variables, we have to update the shell script. I don't see any difference or challenge in doing this via SSG. If you feel that C code isn't flexible, I have to say we had an old SSG with python code previously, But we discarded it and rewrote it via C due to the efficiency issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, script couldn't use DB just like SSG
Hmm, difference is we need to run the script once. Since the oneshot service runs before other services are even started, CPU should be fairly free and should be executed quickly. we do need to benchmark this solution to measure impact.
Advantage being load on SSG is less, all it does it to add EnvironmentFIle= and with some optimization we don't even need to edit and write to the .service file after the first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, i'm okay with the current solution. We might move to the generic oneshot service if required in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this PR looks good to you, could you please help to approve it.
env_vars["NUM_DPU"] = std::to_string(num_dpus); | ||
|
||
for (const auto& [key, value] : env_vars) { | ||
tmp_file << "Environment=\"" << key << "=" << value << "\"" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add Environment= option if the file doesn't have[Service]
section in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that the Service
section might be introduced by other functions or modules in the future. And I don't see any side-effects if I define some environment variables at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure, systemd won't throw an error if we add Environment= values after the last section without [Service]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
Why I did it
Fix issue: #20284
In 202405 and above, two extra steps are added before the start of every container which checks NUM_DPU and IS_DPU_DEVICE by parsing the platform.json file using the jq tool. This is only relevant for Smartswitch. However, this is adding some delay during the reconciliation phase of WR/FR resulting
Work item tracking
How I did it
Set the environment variables for systemd by systemd-sonic-generator.
How to verify it
jq
command under the swss.sh start and syncd.sh start from the sonic-bootchartWhich release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)