-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
+metrics #30 basic metrics "skeleton" #42
Conversation
|
||
/// Timing how long it takes to converge (i.e. an update to reach all members) | ||
// TODO: how to measure this without huge overhead, maybe opt in | ||
// let crdt_convergence_time: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yim-lee I wonder about pulling off that metrics, this is not for now at all, but something to keep in mind for later on. I know for sure this would quite excite some people... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. would be interesting if we can pull this off. need to define what a unit of update is and when it starts (presumably that's whenever ActorOwned
sends write
command to local replicator?), when the updates reaches all members (after direct replication? gossip?). and since gossip is periodic, it might seem like convergence takes longer than it should but maybe that's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(presumably that's whenever ActorOwned sends write command to local replicator?)
yeah... I think we'll pull it off by "attach some metadata to envelope when pereforming the write (that's "baggage" I sometimes mention), and we carry it around during replication; once a replica notices that "all nodes have now seen this write" we emit a "replication complete: timestamp" event or something like that...
Definitely doable, just need to think it through when we get there :)
// let crdt_convergence_time: | ||
|
||
// ==== ------------------------------------------------------------------------------------------------------------ | ||
// MARK: Actors Group-metrics (i.e. all actors of given "type" or "role") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in general measuring every single actor by itself is "too much", so that's why we have the metrics .group
so we can group them and offer "those actors in this group process on avg like that" etc.
} else { | ||
self.actors_count.increment() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a case that examplifies why we need the "add" @tomerd
I'm still thinking of workarounds and/or how to change the metrics api... Gauge IS-A Recorder does not really fit.., since we'd have to force RecorderHandler to accept add/sub which seems wrong. Gauge is its own thing really if we decide to go this way. OR we make AddGauge a completely new type that can be supported?
We figured it out, by expressing the lifecycle as independent events and their counters we get all we need and even more information 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yim-lee for the idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By forcing you to read CRDT papers? Yup, that's me. 😃
} else { | ||
self.actors_count.increment() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yim-lee for the idea!
@@ -164,6 +167,7 @@ internal final class ActorShell<Message>: ActorContext<Message>, AbstractActor { | |||
_ = system.userCellInitCounter.sub(1) | |||
} | |||
#endif | |||
system.metrics.recordActorStop(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ note: I'd want to have semantically meaningful "do a thing" methods for all metrics, we should not have to wiggle around with +1 or other "which counter exactly" here; all this should be encapsulated in the Metrics files so we can quickly skim it there what counter is updated when.
// let path = self._myCell.address.description | ||
// return "\(type(of: self))(\(path))" | ||
let prettyTypeName = String(reflecting: Message.self).split(separator: ".").dropFirst().joined(separator: ".") | ||
return "ActorShell<\(prettyTypeName)>(\(self.path))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidenote: without this the type was quite useless -- everything was <Message>
😉
self.cluster_members_leaving.record(leaving) | ||
self.cluster_members_removed.record(removed) | ||
self.cluster_unreachable_members.record(unreachable) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least it is O(n) over the members and not worse... we could make the membership keep count as well, though realistically this is fine.
@@ -160,17 +177,21 @@ let package = Package( | |||
/* --- samples --- */ | |||
|
|||
.executable( | |||
name: "DistributedActorsSampleDiningPhilosophers", | |||
name: "SampleDiningPhilosophers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 like the shorter names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
} | ||
|
||
public init() { | ||
self.init(mailbox: .default(), dispatcher: .default, supervision: .init()) | ||
self.init(mailbox: .default(), dispatcher: .default, supervision: .init(), metrics: .default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation: we are not being consistent with how default instance is created (.default()
vs. .default
vs. .init()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, going to stick to default 👍
public init(mailbox: MailboxProps, dispatcher: DispatcherProps, supervision: SupervisionProps) { | ||
public var metrics: MetricsProps | ||
|
||
public init(mailbox: MailboxProps, dispatcher: DispatcherProps, supervision: SupervisionProps, metrics: MetricsProps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we combine this initializer with init()
and have a single init
where all parameters as optionals instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okey :)
|
||
let actorsLifecycle = settings.makeLabel("actors", "lifecycle") | ||
self.actors_count_user = .init(label: actorsLifecycle, positive: [rootUser, dimStart], negative: [rootUser, dimStop]) | ||
self.actors_count_system = .init(label: actorsLifecycle, positive: [rootSystem, dimStart], negative: [rootSystem, dimStop]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for your help here :-)
/// Timing how long it takes to converge (i.e. an update to reach all members) | ||
// TODO: how to measure this without huge overhead, maybe opt in | ||
// let crdt_convergence_time: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll be very exciting; I know for sure some people internally will be very excited if we have CRDTs with convergence metrics <3
// ==== ------------------------------------------------------------------------------------------------------------ | ||
// MARK: Cluster Metrics | ||
|
||
let cluster_members: Gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason these variable names use underscore instead of camel case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of wanted them to be visually same as the names they end up being in the metrics:
/// actors.count { root=/system, event=stop }
let actors_count ...
/// cluster.members ...
let cluster_members ...
etc
If too annoying we can move to camel, WDYT 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Just curious. 👍
Maybe document that in the code so we can be consistent with the pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks for noticing it's not documented wholely :)
@@ -21,6 +21,7 @@ declare -r logs=$1 | |||
failures_count=0 | |||
failures=() | |||
|
|||
IFS=$'\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(was broken without this, makes \n the separator for the for
)
Addressed feedback in a18fe28 |
Failure was the known #17 which we should look at; likely fleaky tests |
@swift-server-bot test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor things 👍
@@ -18,6 +18,8 @@ public struct SupervisionProps { | |||
// on purpose stored as list, to keep order in which the supervisors are added as we "scan" from first to last when we handle | |||
internal var supervisionMappings: [ErrorTypeBoundSupervisionStrategy] | |||
|
|||
public static let `default`: SupervisionProps = .init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
public init() { | ||
self.init(mailbox: .default(), dispatcher: .default, supervision: .init()) | ||
self.init(mailbox: .default(), dispatcher: .default, supervision: .default, metrics: .default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
self.mailbox = mailbox | ||
self.dispatcher = dispatcher | ||
self.supervision = supervision | ||
self.metrics = metrics | ||
} | ||
|
||
public init() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess technically we don't need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah true, removing
let negative: Counter | ||
|
||
init(label: String, positive positiveDimensions: [(String, String)] = [], negative negativeDimensions: [(String, String)] = []) { | ||
assert(positiveDimensions.map { "\($0)\($1)" }.joined() != negativeDimensions.map { "\($0)\($1)" }.joined(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good assertion
|
||
init(label: String, positive positiveDimensions: [(String, String)] = [], negative negativeDimensions: [(String, String)] = []) { | ||
assert(positiveDimensions.map { "\($0)\($1)" }.joined() != negativeDimensions.map { "\($0)\($1)" }.joined(), | ||
"Dimensions for PNCounter pair [\(label)] MUST NOT be equal.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PNCounter
intended? Not MetricsPNCounter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
public var systemName: String? | ||
|
||
/// Segment prefixed before all metrics exported automatically by the actor system. | ||
public var systemMetricsPrefix: String? = "sact" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
/// Prefix all metrics with this segment. | ||
/// | ||
/// Defaults to the actor systems' name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systems' => system's?
Whoops thanks will follow up on those comments 👍 |
Introduces skeleton for using metrics in the actor system
Resolves #30
Motivation:
A core feature of the actor system is to be introspectible by default.
Metrics must be offered at the core, and should include general things like actor counts (per groups), message queues and latencies etc.
This is just a "skeleton", we'll get more metrics in place as we go.
Modifications:
system gains
.metrics
which has all the well typed and helpers to emitting metricssettings for the metrics, though not much action there yet
just two PoC metrics for now: actor count and membership status:
Need to figure out if we need to push upstream the AddGauge or not.
Result:
Basic actor system metrics;