The hardware dongle and scheduling deadlock exception reporting mechanism on Starry #14
guoweikang
started this conversation in
Show and tell
Replies: 2 comments 2 replies
-
AxWatchdog机制设计文档1. 概述现在Starry 一旦进入某个 hung 或者 dead lock 路径,没有什么好的定位手段能够帮助我们完成定位,需要人工debug。 2. 整体架构graph TB
subgraph "AxWatchdog核心模块"
A[NMI处理器]
B[巡检调度器]
C[任务注册表]
end
subgraph "巡检任务"
D[调度模块巡检]
E[内存模块巡检]
F[死锁检测巡检]
G[其他模块巡检]
end
subgraph "异常处理"
H[异常检测触发]
I[IPI Stop Machine]
J[快照生成器]
end
subgraph "输出"
K[系统快照]
L[CPU状态]
M[任务状态]
N[锁持有信息]
end
A --> B
B --> C
C --> D
C --> E
C --> F
C --> G
D --> H
E --> H
F --> H
G --> H
H --> I
I --> J
J --> K
K --> L
K --> M
K --> N
3. 核心组件设计3.1 NMI中断处理器3.1.1 设计目标
3.1.2 数据结构/// Watchdog core structure
pub struct Watchdog {
/// NMI interrupt vector
nmi_vector: u8,
/// Check interval in milliseconds
check_interval_ms: u64,
/// Last check timestamp
last_check_timestamp: AtomicU64,
/// Registered inspection task list
inspection_tasks: RwLock<Vec<Arc<InspectionTask>>>,
/// Whether watchdog is enabled
enabled: AtomicBool,
/// Per-CPU local state
per_cpu_state: PerCpuArray<CpuWatchdogState>,
}
/// Per-CPU watchdog state
pub struct CpuWatchdogState {
/// Last heartbeat time of this CPU
last_heartbeat: AtomicU64,
/// Whether this CPU is responsive to NMI
nmi_responsive: AtomicBool,
/// Current executing task ID
current_task_id: AtomicU64,
}3.1.3 NMI处理流程/// NMI interrupt handler function
fn nmi_watchdog_handler(context: &mut InterruptContext) {
let cpu_id = arceos_api::current_cpu();
// 1. Update heartbeat timestamp
WATCHDOG.per_cpu_state[cpu_id]
.last_heartbeat
.store(arceos_api::wall_timetick(), Ordering::Release);
// 2. Mark NMI response
WATCHDOG.per_cpu_state[cpu_id]
.nmi_responsive
.store(true, Ordering::Release);
// 3. Execute inspection tasks (only on master watchdog CPU)
if cpu_id == WATCHDOG_MASTER_CPU {
run_inspection_tasks();
}
// 4. Check if stop machine was triggered
if STOP_MACHINE_TRIGGERED.load(Ordering::Acquire) {
freeze_cpu_and_dump(context);
}
}3.2 巡检任务注册机制3.2.1 巡检任务接口/// Inspection task trait
pub trait InspectionTask: Send + Sync {
/// Task name
fn name(&self) -> &str;
/// Execute inspection, return check result
fn inspect(&self) -> InspectionResult;
/// Inspection interval in milliseconds
fn interval_ms(&self) -> u64;
/// Timeout threshold in milliseconds, exceeding this time without completion is considered abnormal
fn timeout_ms(&self) -> u64;
}
/// Inspection result
pub enum InspectionResult {
/// Normal
Ok,
/// Warning (does not trigger stop machine)
Warning(String),
/// Critical (triggers stop machine)
Critical(String),
}
/// Register inspection task
pub fn register_inspection_task(task: Arc<dyn InspectionTask>) -> Result<(), WatchdogError> {
let mut tasks = WATCHDOG.inspection_tasks.write();
tasks.push(task);
Ok(())
}3.2.2 调度模块巡检示例/// Scheduler inspection task: check if tasks in ReadyQueue have not been scheduled for a long time
pub struct SchedulerInspectionTask {
/// Starvation threshold in milliseconds
starvation_threshold_ms: u64,
}
impl InspectionTask for SchedulerInspectionTask {
fn name(&self) -> &str {
"scheduler_starvation_check"
}
fn inspect(&self) -> InspectionResult {
let current_time = current_timestamp();
let scheduler = get_global_scheduler();
// Iterate through all CPU ReadyQueues
for cpu_id in 0..num_cpus() {
let ready_queue = scheduler.get_ready_queue(cpu_id);
for task in ready_queue.iter() {
let wait_time = current_time - task.last_ready_time();
if wait_time > self.starvation_threshold_ms {
return InspectionResult::Critical(
format!(
"Task {} (PID: {}) in CPU {} ready queue for {}ms without scheduling",
task.name(), task.pid(), cpu_id, wait_time
)
);
}
}
}
InspectionResult::Ok
}
fn interval_ms(&self) -> u64 {
1000 // Check once per second
}
fn timeout_ms(&self) -> u64 {
5000 // 5 second timeout
}
}3.3 IPI Stop Machine机制3.3.1 设计说明当巡检任务检测到严重异常时,通过IPI(处理器间中断)冻结所有CPU,保留现场用于分析。 3.3.2 实现/// Trigger stop machine
pub fn trigger_stop_machine(reason: &str) {
// Set global flag
STOP_MACHINE_TRIGGERED.store(true, Ordering::Release);
STOP_MACHINE_REASON.lock().push_str(reason);
// Disable interrupts on current CPU
disable_interrupts();
// Send IPI to all other CPUs
for cpu_id in 0..num_cpus() {
if cpu_id != current_cpu_id() {
send_ipi(cpu_id, IPI_STOP_MACHINE);
}
}
// Current CPU also enters frozen state
freeze_cpu_and_dump(&get_current_context());
}
/// IPI stop machine handler function
fn ipi_stop_machine_handler(context: &mut InterruptContext) {
disable_interrupts();
freeze_cpu_and_dump(context);
}
/// Freeze CPU and wait
fn freeze_cpu_and_dump(context: &InterruptContext) {
let cpu_id = current_cpu_id();
// Mark this CPU as frozen
CPU_FROZEN_FLAGS[cpu_id].store(true, Ordering::Release);
// Wait for all CPUs to be frozen
wait_all_cpus_frozen();
// Master CPU is responsible for generating snapshot
if cpu_id == WATCHDOG_MASTER_CPU {
generate_system_snapshot(context);
}
// Enter infinite loop, waiting for debugging or restart
loop {
cpu_halt();
}
}4. 系统快照设计4.1 快照内容/// System snapshot
pub struct SystemSnapshot {
/// Trigger timestamp
pub timestamp: u64,
/// Trigger reason
pub reason: String,
/// State of each CPU
pub cpu_states: Vec<CpuSnapshot>,
/// State of all tasks
pub task_states: Vec<TaskSnapshot>,
/// Lock holding information
pub lock_info: Vec<LockSnapshot>,
}
/// CPU state snapshot
pub struct CpuSnapshot {
/// CPU ID
pub cpu_id: usize,
/// Currently running task ID
pub current_task_id: Option<u64>,
/// Interrupt nesting level
pub interrupt_depth: usize,
/// Whether interrupts are disabled
pub interrupts_disabled: bool,
/// Register state
pub registers: RegisterState,
/// Call stack
pub stack_trace: Vec<StackFrame>,
}
/// Task state snapshot
pub struct TaskSnapshot {
/// Task ID
pub task_id: u64,
/// Process ID
pub pid: u64,
/// Task name
pub name: String,
/// Task state
pub state: TaskState,
/// CPU ID (if running)
pub cpu_id: Option<usize>,
/// Priority
pub priority: u8,
/// Call stack
pub stack_trace: Vec<StackFrame>,
/// List of held locks
pub held_locks: Vec<usize>, // Lock addresses
/// Lock being waited for (if blocked on a lock)
pub waiting_lock: Option<usize>,
}
/// Lock state snapshot
pub struct LockSnapshot {
/// Lock address
pub lock_addr: usize,
/// Lock type
pub lock_type: LockType,
/// Whether the lock is held
pub is_locked: bool,
/// List of holder task IDs (rwlock may have multiple readers)
pub holders: Vec<u64>,
/// Task IDs in the wait queue
pub waiters: Vec<u64>,
}
/// Stack frame (axbacktrace)
pub struct StackFrame {
/// Program counter
pub pc: usize,
/// Function symbol name
pub symbol: Option<String>,
/// Frame pointer
pub fp: usize,
}
/// Task state
#[derive(Debug, Clone, Copy)]
pub enum TaskState {
Running,
Ready,
Blocked,
Sleeping,
Zombie,
}
/// Lock type
#[derive(Debug, Clone, Copy)]
pub enum LockType {
Spinlock,
Mutex,
RwLock,
Semaphore,
}4.2 快照生成实现/// Generate system snapshot
fn generate_system_snapshot(context: &InterruptContext) -> SystemSnapshot {
let mut snapshot = SystemSnapshot {
timestamp: current_timestamp(),
reason: STOP_MACHINE_REASON.lock().clone(),
cpu_states: Vec::new(),
task_states: Vec::new(),
lock_info: Vec::new(),
};
// 1. Collect all CPU states
for cpu_id in 0..num_cpus() {
snapshot.cpu_states.push(capture_cpu_state(cpu_id));
}
// 2. Collect all task states
let task_manager = get_global_task_manager();
for task in task_manager.all_tasks() {
snapshot.task_states.push(capture_task_state(task));
}
// 3. Collect lock information
let lock_tracker = get_global_lock_tracker();
for lock in lock_tracker.all_locks() {
snapshot.lock_info.push(capture_lock_state(lock));
}
// 4. Print snapshot
print_snapshot(&snapshot);
snapshot
}
/// Capture CPU state
fn capture_cpu_state(cpu_id: usize) -> CpuSnapshot {
let cpu_context = get_cpu_context(cpu_id);
CpuSnapshot {
cpu_id,
current_task_id: cpu_context.current_task().map(|t| t.id()),
interrupt_depth: cpu_context.interrupt_depth(),
interrupts_disabled: !cpu_context.interrupts_enabled(),
registers: cpu_context.registers().clone(),
stack_trace: unwind_stack(cpu_context.stack_pointer(), cpu_context.frame_pointer()),
}
}
/// Capture task state
fn capture_task_state(task: &Task) -> TaskSnapshot {
TaskSnapshot {
task_id: task.id(),
pid: task.pid(),
name: task.name().to_string(),
state: task.state(),
cpu_id: task.cpu_id(),
priority: task.priority(),
stack_trace: unwind_stack(task.stack_pointer(), task.frame_pointer()),
held_locks: task.held_locks().iter().map(|l| l.addr()).collect(),
waiting_lock: task.waiting_lock().map(|l| l.addr()),
}
}
/// Capture lock state
fn capture_lock_state(lock: &dyn Lock) -> LockSnapshot {
LockSnapshot {
lock_addr: lock.addr(),
lock_type: lock.lock_type(),
is_locked: lock.is_locked(),
holders: lock.holders().iter().map(|t| t.id()).collect(),
waiters: lock.waiters().iter().map(|t| t.id()).collect(),
}
}5. 初始化和使用流程5.1 系统初始化/// Initialize watchdog
pub fn init_watchdog() -> Result<(), WatchdogError> {
// 1. Initialize NMI interrupt
register_nmi_handler(nmi_watchdog_handler)?;
// 2. Initialize per-CPU state
for cpu_id in 0..num_cpus() {
WATCHDOG.per_cpu_state[cpu_id] = CpuWatchdogState {
last_heartbeat: AtomicU64::new(current_timestamp()),
nmi_responsive: AtomicBool::new(true),
current_task_id: AtomicU64::new(0),
};
}
// 3. Start NMI timer
setup_nmi_timer(WATCHDOG.check_interval_ms)?;
// 4. Enable watchdog
WATCHDOG.enabled.store(true, Ordering::Release);
// 5. Register IPI handler
register_ipi_handler(IPI_STOP_MACHINE, ipi_stop_machine_handler)?;
Ok(())
}5.2 模块注册巡检任务/// Register inspection task during scheduler module initialization
pub fn init_scheduler() {
// ... Scheduler initialization code ...
// Register scheduler inspection task
let inspection = Arc::new(SchedulerInspectionTask {
starvation_threshold_ms: 5000, // 5 seconds without scheduling is considered abnormal
});
register_inspection_task(inspection).expect("Failed to register scheduler inspection");
}6. 锁追踪机制6.1 锁追踪器/// Global lock tracker
pub struct LockTracker {
/// All registered locks
locks: RwLock<HashMap<usize, Arc<dyn Lock>>>,
}
/// Lock trait (all lock types need to implement this)
pub trait Lock: Send + Sync {
/// Lock address
fn addr(&self) -> usize;
/// Lock type
fn lock_type(&self) -> LockType;
/// Whether the lock is held
fn is_locked(&self) -> bool;
/// Holder list
fn holders(&self) -> Vec<Arc<Task>>;
/// Waiter list
fn waiters(&self) -> Vec<Arc<Task>>;
}
/// Add tracking in lock implementation
impl<T> Spinlock<T> {
pub fn lock(&self) -> SpinlockGuard<T> {
let current_task = current_task();
// Record waiting state before locking
LOCK_TRACKER.record_waiting(self.addr(), current_task.id());
// Actual locking operation
while self.locked.compare_exchange_weak(false, true, Ordering::Acquire, Ordering::Relaxed).is_err() {
core::hint::spin_loop();
}
// Record holding state after successful lock
LOCK_TRACKER.record_acquired(self.addr(), current_task.id());
SpinlockGuard { lock: self }
}
}7. 高级特性7.1 死锁检测/// Deadlock detection inspection task
pub struct DeadlockDetectionTask;
impl InspectionTask for DeadlockDetectionTask {
fn name(&self) -> &str {
"deadlock_detection"
}
fn inspect(&self) -> InspectionResult {
let lock_tracker = get_global_lock_tracker();
let task_manager = get_global_task_manager();
// Build wait graph
let wait_graph = build_wait_graph(lock_tracker, task_manager);
// Detect cycles (deadlock)
if let Some(cycle) = detect_cycle(&wait_graph) {
return InspectionResult::Critical(
format!("Deadlock detected: {:?}", cycle)
);
}
InspectionResult::Ok
}
fn interval_ms(&self) -> u64 {
2000
}
fn timeout_ms(&self) -> u64 {
5000
}
}7.2 性能监控/// Performance monitoring statistics
pub struct WatchdogStats {
/// NMI trigger count
pub nmi_count: AtomicU64,
/// Inspection task execution count
pub inspection_count: AtomicU64,
/// Detected anomaly count
pub anomaly_count: AtomicU64,
/// Average inspection time in microseconds
pub avg_inspection_time_us: AtomicU64,
}8. 安全性考虑
9. 调试和诊断9.1 快照分析工具/// Analyze snapshot to find possible deadlocks
pub fn analyze_snapshot_for_deadlock(snapshot: &SystemSnapshot) -> Option<Vec<u64>> {
// Build lock dependency graph
// Find tasks holding locks and tasks waiting for locks
// Detect circular dependencies
}
/// Analyze snapshot to find long-running tasks
pub fn analyze_snapshot_for_long_running(snapshot: &SystemSnapshot) -> Vec<TaskSnapshot> {
// Identify long-running tasks that may cause starvation
}10. 配置选项/// Watchdog configuration
pub struct WatchdogConfig {
/// Whether enabled
pub enabled: bool,
/// NMI check interval in milliseconds
pub check_interval_ms: u64,
/// CPU heartbeat timeout in milliseconds
pub heartbeat_timeout_ms: u64,
/// Maximum stack unwinding depth
pub max_stack_depth: usize,
/// Whether to print verbose logs
pub verbose: bool,
} |
Beta Was this translation helpful? Give feedback.
0 replies
-
NMI机制实现硬件实现硬件直接支持NMI,有独立的NMI中断屏蔽位,disable_irqs无法屏蔽NMI。armv8.8-A之后才提供支持。 软件实现sdeiNMI使用EL3层级的中断并下放处理,低层级中断屏蔽位无法屏蔽EL3中断,虚拟化环境似乎不太支持? 优先级方法将NMI注册为最高优先级中断,重写enable_irqs,disable_irqs,不使用中断屏蔽位而是使用GIC的中断优先级掩码,优先级比掩码高才能触发。 中断源问题看起来似乎优先级方法比较合适,中断源该如何处理?openEuler使用性能监控中断(PMI)来触发中断,目前Starry似乎还没提供PMI的支持?软件生成中断(SGI)来触发?CNTV定时器可行吗? 中断嵌套问题CPU响应中断时会执行中断隐指令自动设置中断屏蔽位,在中断处理程序中通过enabel_nmi来清除中断屏蔽位可行吗? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment





Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
现在Starry 一旦进入某个 hung 或者 dead lock 路径,没有什么好的定位手段能够帮助我们完成定位,需要人工debug。
通过给starry 增加 一个watch dog机制,可以让watch dog 完成巡检任务,当异常发生时,可以打印出系统snapshot, 帮助我们定位问题
功能需求:
1 需要一个基础的watchdog 机制,最好用NMI 中断实现(为了能够检测中断环境的死锁问题)
2 不同模块 可以通过watchdog 模块 注册巡检任务,举例:调度模块注册一个巡检任务,检查ReadyQueue是否有任务长时间没有得到调度。
3 当巡检出异常时,可以走ipi stop machine,把当前所有CPU 冻结,打印出来系统快照
4 快照应该包括的内容应该可以辅助开发人员快速完成 问题定位:当前各个CPU上运行的任务,以及他们的调用栈, 当前系统所有任务的状态(lock ready等) 调用栈,以及当前所有任务持有锁的情况(一个 锁指针地址,通过锁地址可以快速定位哪些任务持有相同的锁)
交付物:
1 要有基本的设计文档
2 源码、测试代码、测试验证
3 功能看护,目前测试框架天津同事在做,要有长期的功能看护用例
4 专利产出,公司有专利激励机制
Beta Was this translation helpful? Give feedback.
All reactions