Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
243 changes: 136 additions & 107 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,107 +1,136 @@
# Performance Savior Home (PSH)

[![image](https://img.shields.io/github/v/release/OptimatistOpenSource/psh?include_prereleases&color=blue)](https://github.com/OptimatistOpenSource/psh/releases)
[![License: LGPL v3](https://img.shields.io/badge/License-LGPL%20v3-blue.svg)](http://www.gnu.org/licenses/lgpl-3.0)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)
[![image](https://img.shields.io/github/stars/OptimatistOpenSource/psh)](https://github.com/OptimatistOpenSource/psh/stargazers)
[![image](https://img.shields.io/github/issues/OptimatistOpenSource/psh)](https://github.com/OptimatistOpenSource/psh/issues)

Performance Savior Home (PSH) collects software and hardware performance data when the cloud service is running.

PSH's layout has WASM sitting at the top tier, while the foundation is made up
of operators responsible for scooping up performance stats, utilizing tech like
eBPF and the perf_event_open interface. This setup brings both a secure
environment and user-friendliness to the table, making it a breeze to work with
while keeping things locked down tight.

It protects both the performance acquisition and computation algorithms of performance engineers and the sensitive data of companies applying PSH.

## Overview

Performance Savior Home (PSH) is a cutting-edge performance monitoring and analytics solution designed for cloud services.
It securely harvests software and hardware performance metrics while your cloud applications are in operation, safeguarding both the intricate performance tuning algorithms of engineers and the sensitive corporate data of its adopters.

PSH achieves this through a dual-layered architecture leveraging WebAssembly (WASM) at the top and an array of robust operators at its foundation.

PSH encapsulates low-level performance monitoring capabilities within WASM,
streamlining the development of performance collection tools with simplicity and
grace. Built with Rust, PSH inherently boasts memory safety, further enhancing
its robustness and reliability in high-stakes environments.

PSH's vision is to reduce the duplication of construction within the enterprise
and to collect performance data in a reliable, low-overhead, and elegant way.

## Key Features

- **Secure Sandboxing**: Leverages WASM to create a secure sandbox for
performance data acquisition and processing algorithms, ensuring isolation and
preventing unauthorized access. Permission control ensures that sensitive data
is not collected, while WASM's performance data processing algorithms are
easier to protect.
- **Low-Level Insights**: PSH harnesses eBPF and perf_event_open to gather
detailed, real-time performance metrics from both software and hardware
levels, encompassing a wide spectrum of metrics across various system layers.
The result is a 360-degree view of your application's performance footprint.
- **Cross-Platform Compatibility**: PSH is designed from the ground up with
performance data acquisition and analysis for the ARM platform in mind, and is
compatible with both x86_64 and RISC-V architectures.
- **Highly Scalable Architecture**: PSH is designed for effortless scalability,
allowing users to easily extend both the algorithms executed within the WASM
environment and the range of performance events captured by operators. This
flexibility ensures that as technology stacks evolve or new monitoring
requirements arise, PSH can be adapted swiftly to meet those needs,
future-proofing your performance monitoring strategy.
- **Minimal Performance Overhead**: Preliminary testing indicates that PSH's
data collection incurs a negligible operational overhead, with current
measurements suggesting an impact of merely around 3%. This ensures that while
comprehensive monitoring is in place, the system's primary functions remain
unaffected, preserving optimal performance and responsiveness.

## Config

The default config is located in `/etc/psh/config.toml`.

See [config template](./doc/config.toml)

## Contribution Guide

We welcome contributions! Please refer to the following guide for details on how
to get involved.

Before submitting a pull request (PR) to PSH, it's crucial to perform a
self-check to ensure the quality and adherence to coding standards. Follow these
steps for an effective self-check:

- Run Clippy: Execute `cargo clippy`, a lint tool for Rust designed to catch
common mistakes and enhance the overall quality of your Rust code.

- Format Code: Utilize `cargo fmt` to format your Rust code, ensuring
consistency in code formatting across the project.

- Security Audit: Employ `cargo audit` to enhance the security of your Rust
code. This command reviews your dependencies for any security vulnerabilities
reported to the RustSec Advisory Database. If you haven't installed
`cargo-audit` yet, you can do so by running `cargo install cargo-audit`.

Failing to adhere to these self-check steps might result in your PR not being
reviewed promptly. Without completing these checks, the chances of finding a
reviewer willing to assess your PR may be reduced. Therefore, it is essential to
diligently follow the outlined steps to increase the likelihood of a successful
and timely review for your pull request.

## Acknowledgments

The development of the Performance Savior Home (PSH) project can be attributed
to the collaborative efforts and shared vision of Optimatsit Technology Co., Ltd
and Zhejiang University's
[SPAIL – System Performance Analytics Intelligence Lab](https://github.com/ZJU-SPAIL).

<p float="left">
<img src="https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/AJdl643eJ4d9qke1/img/15b0f764-17be-42ff-bd26-3b647e89679a.png" width="100" />
<img src="https://avatars.githubusercontent.com/u/165106263" width="100" />
</p>

## License

Performance Savior Home is distributed under the terms of the LGPL3.0/GPL3.0
License.
# Performance Savior Home (PSH)

[![image](https://img.shields.io/github/v/release/OptimatistOpenSource/psh?include_prereleases&color=blue)](https://github.com/OptimatistOpenSource/psh/releases)
[![License: LGPL v3](https://img.shields.io/badge/License-LGPL%20v3-blue.svg)](http://www.gnu.org/licenses/lgpl-3.0)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)
[![image](https://img.shields.io/github/stars/OptimatistOpenSource/psh)](https://github.com/OptimatistOpenSource/psh/stargazers)
[![image](https://img.shields.io/github/issues/OptimatistOpenSource/psh)](https://github.com/OptimatistOpenSource/psh/issues)

Performance Savior Home (PSH) collects software and hardware performance data when the cloud service is running.

PSH's layout has WASM sitting at the top tier, while the foundation is made up
of operators responsible for scooping up performance stats, utilizing tech like
eBPF and the perf_event_open interface. This setup brings both a secure
environment and user-friendliness to the table, making it a breeze to work with
while keeping things locked down tight.

It protects both the performance acquisition and computation algorithms of performance engineers and the sensitive data of companies applying PSH.

## Overview

Performance Savior Home (PSH) is a cutting-edge performance monitoring and analytics solution designed for cloud services.
It securely harvests software and hardware performance metrics while your cloud applications are in operation, safeguarding both the intricate performance tuning algorithms of engineers and the sensitive corporate data of its adopters.

PSH achieves this through a dual-layered architecture leveraging WebAssembly (WASM) at the top and an array of robust operators at its foundation.

PSH encapsulates low-level performance monitoring capabilities within WASM,
streamlining the development of performance collection tools with simplicity and
grace. Built with Rust, PSH inherently boasts memory safety, further enhancing
its robustness and reliability in high-stakes environments.

PSH's vision is to reduce the duplication of construction within the enterprise
and to collect performance data in a reliable, low-overhead, and elegant way.

## Key Features

- **Secure Sandboxing**: Leverages WASM to create a secure sandbox for
performance data acquisition and processing algorithms, ensuring isolation and
preventing unauthorized access. Permission control ensures that sensitive data
is not collected, while WASM's performance data processing algorithms are
easier to protect.
- **Low-Level Insights**: PSH harnesses eBPF and perf_event_open to gather
detailed, real-time performance metrics from both software and hardware
levels, encompassing a wide spectrum of metrics across various system layers.
The result is a 360-degree view of your application's performance footprint.
- **Cross-Platform Compatibility**: PSH is designed from the ground up with
performance data acquisition and analysis for the ARM platform in mind, and is
compatible with both x86_64 and RISC-V architectures.
- **Highly Scalable Architecture**: PSH is designed for effortless scalability,
allowing users to easily extend both the algorithms executed within the WASM
environment and the range of performance events captured by operators. This
flexibility ensures that as technology stacks evolve or new monitoring
requirements arise, PSH can be adapted swiftly to meet those needs,
future-proofing your performance monitoring strategy.
- **Minimal Performance Overhead**: Preliminary testing indicates that PSH's
data collection incurs a negligible operational overhead, with current
measurements suggesting an impact of merely around 3%. This ensures that while
comprehensive monitoring is in place, the system's primary functions remain
unaffected, preserving optimal performance and responsiveness.

## Config

The default config is located in `/etc/psh/config.toml`.

See [config template](./doc/config.toml)

## Contribution Guide

We welcome contributions! Please refer to the following guide for details on how
to get involved.

Before submitting a pull request (PR) to PSH, it's crucial to perform a
self-check to ensure the quality and adherence to coding standards. Follow these
steps for an effective self-check:

- Run Clippy: Execute `cargo clippy`, a lint tool for Rust designed to catch
common mistakes and enhance the overall quality of your Rust code.

- Format Code: Utilize `cargo fmt` to format your Rust code, ensuring
consistency in code formatting across the project.

- Security Audit: Employ `cargo audit` to enhance the security of your Rust
code. This command reviews your dependencies for any security vulnerabilities
reported to the RustSec Advisory Database. If you haven't installed
`cargo-audit` yet, you can do so by running `cargo install cargo-audit`.

Failing to adhere to these self-check steps might result in your PR not being
reviewed promptly. Without completing these checks, the chances of finding a
reviewer willing to assess your PR may be reduced. Therefore, it is essential to
diligently follow the outlined steps to increase the likelihood of a successful
and timely review for your pull request.

## Known issues:
### Warning: Failed to initialize NVML with all available methods
PSH requires NVIDIA Management Library (NVML) to collect GPU statistics. This warning appears when PSH cannot find or initialize the NVML library. PSH attempts to initialize NVML in the following order:

1. Default system path
2. Architecture-specific paths:
- For x86_64:
- `/usr/lib/x86_64-linux-gnu/libnvidia-ml.so`
- `/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1`
- For ARM64:
- `/usr/lib/aarch64-linux-gnu/libnvidia-ml.so`
- `/usr/lib/aarch64-linux-gnu/libnvidia-ml.so.1`

To resolve this issue, you can:

1. Install the NVIDIA driver if not already installed
2. If the library is installed in a non-standard location, add it to `LD_LIBRARY_PATH`:
```bash
export LD_LIBRARY_PATH=/path/to/nvidia/lib:$LD_LIBRARY_PATH
```
3. Create a symbolic link to the library in the standard path for your architecture:
```bash
# For x86_64
sudo ln -s /path/to/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/libnvidia-ml.so

# For ARM64
sudo ln -s /path/to/libnvidia-ml.so /usr/lib/aarch64-linux-gnu/libnvidia-ml.so
```

## Acknowledgments

The development of the Performance Savior Home (PSH) project can be attributed
to the collaborative efforts and shared vision of Optimatsit Technology Co., Ltd
and Zhejiang University's
[SPAIL – System Performance Analytics Intelligence Lab](https://github.com/ZJU-SPAIL).

<p float="left">
<img src="https://alidocs.oss-cn-zhangjiakou.aliyuncs.com/res/AJdl643eJ4d9qke1/img/15b0f764-17be-42ff-bd26-3b647e89679a.png" width="100" />
<img src="https://avatars.githubusercontent.com/u/165106263" width="100" />
</p>

## License

Performance Savior Home is distributed under the terms of the LGPL3.0/GPL3.0
License.
2 changes: 1 addition & 1 deletion crates/psh-system/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ pub enum Error {
InvalidCpuMask(String),
#[error("Value is empty")]
EmptyValue,
#[error("Failed to init nvml: {0}.")]
#[error(transparent)]
Nvml(#[from] NvmlError),
}

Expand Down
47 changes: 43 additions & 4 deletions crates/psh-system/src/gpu/mod.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
mod handle;
mod raw;

use nvml_wrapper::enum_wrappers::device::MemoryLocation;
use nvml_wrapper::struct_wrappers::device::{MemoryInfo, PciInfo, Utilization};

pub use handle::NvidiaHandle;
Expand All @@ -11,18 +12,56 @@ pub struct GpuInfo {
pub cuda_driver_version: i32,
}

#[derive(Clone, Debug)]
pub struct EccErrorInfo {
pub location: MemoryLocation,
pub corrected_volatile: u64,
pub corrected_aggregate: u64,
pub uncorrected_volatile: u64,
pub uncorrected_aggregate: u64,
}

#[derive(Clone, Debug)]
pub struct GpuStats {
// Static fields (rarely change)
pub uuid: String,
/// the vec index is fan index
pub fan_speeds: Vec<u32>,
pub vbios_version: String,
pub temperature: u32,
pub name: String,
pub vbios_version: String,
pub pci_info: PciInfo,
pub irq_num: u32,
pub max_pcie_link_gen: u32,
pub max_pcie_link_width: u32,

// Dynamic fields (change frequently)
// Temperature and cooling
pub temperature: u32,
pub fan_speeds: Vec<u32>,

// PCIe status
pub current_pcie_link_gen: u32,
pub current_pcie_link_width: u32,

// Performance and utilization
pub utilization_rates: Utilization,
pub performance_state: u32,
pub compute_mode: u32,

// Memory
pub memory_info: MemoryInfo,
pub ecc_errors: Vec<EccErrorInfo>,

// Power
pub power_usage: u32,
pub power_limit: u32,
pub enforced_power_limit: u32,

// Clocks
pub memory_clock: u32,
pub graphics_clock: u32,
pub sm_clock: u32,
pub video_clock: u32,

// Processes
pub graphics_processes_count: u32,
pub compute_processes_count: u32,
}
Loading
Loading