Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep profile and history data in shared memory #66

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions collector.c
Original file line number Diff line number Diff line change
Expand Up @@ -129,16 +129,27 @@ probe_waits(const bool write_history, const bool write_profile)
int32 wait_event_info = proc->wait_event_info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be uint32 as in PGPROC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. Will fix it.

pid = proc->pid;

// FIXME: zero pid actually doesn't indicate that process slot is freed.
// After process termination this field becomes unchanged and thereby
// stores the pid of previous process. The possible indicator of process
// termination might be a condition `proc->procLatch->owner_pid == 0`.
// Abother option is to use the lists of freed PGPROCs from ProcGlocal:
// freeProcs, walsenderFreeProcs, bgworkerFreeProcs and autovacFreeProcs
// to define indexes of all freed slots in allProcs.
// The most appropriate solution here is to iterate over ProcArray items
// to explicitly access to the all live PGPROC entries. This will
// require to build iterator object over protected procArray.
/*
* FIXME: zero pid actually doesn't indicate that process slot is freed.
* After process termination this field becomes unchanged and thereby
* stores the pid of previous process. The possible indicator of process
* termination might be a condition `proc->procLatch->owner_pid == 0`.
* But even in this case ProcArrayLock doesn't protect `owner_pid`
* field from concurrent modifications that might cause race conditions.
*
* Abother option is to use the lists of freed PGPROCs from ProcGlocal:
* freeProcs, walsenderFreeProcs, bgworkerFreeProcs and autovacFreeProcs
* to define indexes of all freed slots in allProcs. But this requires
* acquiring ProcStructLock spinlock that is impractical for iteration
* over so long lists.
*
* The most appropriate solution here is to iterate over ProcArray items
* under ProcArrayLock and over AuxiliaryProcs under ProcStructLock
* spinlock (AuxiliaryProcs contains just NUM_AUXILIARY_PROCS=5 slots)
* or without any locks as it's done in pg_stat_get_activity() function.
* These arrays are not accessible externally and require to add some
* iterator object into corresponding containing modules.
*/
if (pid == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can follow pg_stat_get_activity and get away without any locks. The only risk is that we take a few incorrect samples and attribute wait_event_info to a wrong pid. That doesn't seem to be a problem. At least pg_stat_activity lives with it.

Concerning the pid == 0 check. I think we are safe, because the wait_event_info == 0 condition guards us from sampling PGPROC's of terminated backends. Although PGPROC.pid is not cleared upon termination, wait_event_info is (not in ProcKill, but at various other places, e.g. AbortTransaction calls pgstat_report_wait_end).

Shall we decide to sample non-waiting states (wait_event_info == 0), then we could iterate over BackendStatusArray to visit only live backends, like pg_stat_activity does via pgstat_read_current_status.

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can follow pg_stat_get_activity and get away without any locks.

In pg_stat_activity we iterate over auxiliary PROC slots without any lock. This is practical as these slots are reserved for long-lived background workers as bgwriter, checkpointer, walwriter, archiver and wal_receiver. Any changes in this specific range of slots occur very rare and most commonly are initiated by external forces. Although we might use ProcStructLock spinlock to protect access when we gather wait event info as this array is small enough (now it occupies 5 slots).
Iteration over other slots intended mostly for regular backends in pg_stat_activity is protected by ProcArrayLock and iteration have to be performed over ProcArray as indirection level to access to real working PROC slots.

Concerning the pid == 0 check. I think we are safe, because the wait_event_info == 0 condition guards us from sampling PGPROC's of terminated backends. Although PGPROC.pid is not cleared upon termination, wait_event_info is (not in ProcKill, but at various other places, e.g. AbortTransaction calls pgstat_report_wait_end).

Yes, in current case it works. But when we will start to add the state "no wait" as pointed in #10 we have to consider the working criteria whether PROC slot is freed.

Shall we decide to sample non-waiting states (wait_event_info == 0), then we could iterate over BackendStatusArray to visit only live backends, like pg_stat_activity does via pgstat_read_current_status.

Requesting PROC slots from pid values gotten from BackendStatusArray items makes iteration routine quadratic in time complexity. This factor is main reason that sampling of pg_stat_activity in much enough frequent manner as in pg_wait_sampling extension is not practical. IMO the most appropriate solution that I propose in comment is to organize (in future) the iteration over ProcArray and AuxiliaryProcs

continue;

Expand Down