+metrics #30 basic metrics "skeleton" #42

ktoso · 2019-08-26T05:57:47Z

Introduces skeleton for using metrics in the actor system

Resolves #30

Motivation:

A core feature of the actor system is to be introspectible by default.
Metrics must be offered at the core, and should include general things like actor counts (per groups), message queues and latencies etc.

This is just a "skeleton", we'll get more metrics in place as we go.

Modifications:

system gains .metrics which has all the well typed and helpers to emitting metrics
settings for the metrics, though not much action there yet
just two PoC metrics for now: actor count and membership status:

# Metrics is the name of the actor system, so it'd be whatever users name their system/node with.

# TYPE Metrics.actors.lifecycle counter
Metrics.actors.lifecycle 0
Metrics.actors.lifecycle{root="/user", event="start"} 13
Metrics.actors.lifecycle{root="/user", event="stop"} 5
Metrics.actors.lifecycle{root="/system", event="start"} 6
# TYPE Metrics.cluster.members gauge
Metrics.cluster.members 1.0 # for simplicity of reporting == .up members
Metrics.cluster.members{status="joining"} 0.0
Metrics.cluster.members{status="up"} 1.0
Metrics.cluster.members{status="down"} 0.0
Metrics.cluster.members{status="leaving"} 0.0
Metrics.cluster.members{status="removed"} 0.0
Metrics.cluster.members{reachability="unreachable"} 0.0

Need to figure out if we need to push upstream the AddGauge or not.

Result:

Basic actor system metrics;

ktoso · 2019-08-26T06:04:07Z

Sources/DistributedActors/Metrics/Metrics.swift

+
+    /// Timing how long it takes to converge (i.e. an update to reach all members)
+    // TODO: how to measure this without huge overhead, maybe opt in
+    // let crdt_convergence_time:


@yim-lee I wonder about pulling off that metrics, this is not for now at all, but something to keep in mind for later on. I know for sure this would quite excite some people... :)

+1. would be interesting if we can pull this off. need to define what a unit of update is and when it starts (presumably that's whenever ActorOwned sends write command to local replicator?), when the updates reaches all members (after direct replication? gossip?). and since gossip is periodic, it might seem like convergence takes longer than it should but maybe that's ok.

(presumably that's whenever ActorOwned sends write command to local replicator?)

yeah... I think we'll pull it off by "attach some metadata to envelope when pereforming the write (that's "baggage" I sometimes mention), and we carry it around during replication; once a replica notices that "all nodes have now seen this write" we emit a "replication complete: timestamp" event or something like that...

Definitely doable, just need to think it through when we get there :)

ktoso · 2019-08-26T06:04:56Z

Sources/DistributedActors/Metrics/Metrics.swift

+    // let crdt_convergence_time:
+
+    // ==== ------------------------------------------------------------------------------------------------------------
+    // MARK: Actors Group-metrics (i.e. all actors of given "type" or "role")


So in general measuring every single actor by itself is "too much", so that's why we have the metrics .group so we can group them and offer "those actors in this group process on avg like that" etc.

Sources/DistributedActors/Metrics/CoreMetrics+AddGauge.swift

ktoso · 2019-08-26T06:28:56Z

Sources/DistributedActors/Metrics/Metrics.swift

+        } else {
+            self.actors_count.increment()
+        }
+    }


~~So this is a case that examplifies why we need the "add" @tomerd~~

I'm still thinking of workarounds and/or how to change the metrics api... Gauge IS-A Recorder does not really fit.., since we'd have to force RecorderHandler to accept add/sub which seems wrong. Gauge is its own thing really if we decide to go this way. OR we make AddGauge a completely new type that can be supported?

We figured it out, by expressing the lifecycle as independent events and their counters we get all we need and even more information 👍

Thanks @yim-lee for the idea!

By forcing you to read CRDT papers? Yup, that's me. 😃

ktoso · 2019-08-26T07:30:36Z

Sources/DistributedActors/Metrics/Metrics.swift

+        } else {
+            self.actors_count.increment()
+        }
+    }


Thanks @yim-lee for the idea!

ktoso · 2019-08-26T08:05:25Z

Sources/DistributedActors/ActorShell.swift

@@ -164,6 +167,7 @@ internal final class ActorShell<Message>: ActorContext<Message>, AbstractActor {
            _ = system.userCellInitCounter.sub(1)
        }
        #endif
+        system.metrics.recordActorStop(self)


ℹ️ note: I'd want to have semantically meaningful "do a thing" methods for all metrics, we should not have to wiggle around with +1 or other "which counter exactly" here; all this should be encapsulated in the Metrics files so we can quickly skim it there what counter is updated when.

ktoso · 2019-08-26T08:05:53Z

Sources/DistributedActors/ActorShell.swift

+        // let path = self._myCell.address.description
+        // return "\(type(of: self))(\(path))"
+        let prettyTypeName = String(reflecting: Message.self).split(separator: ".").dropFirst().joined(separator: ".")
+        return "ActorShell<\(prettyTypeName)>(\(self.path))"


sidenote: without this the type was quite useless -- everything was <Message> 😉

ktoso · 2019-08-26T08:14:02Z

Sources/DistributedActors/Metrics/ActorSystemMetrics.swift

+            self.cluster_members_leaving.record(leaving)
+            self.cluster_members_removed.record(removed)
+            self.cluster_unreachable_members.record(unreachable)
+        }


At least it is O(n) over the members and not worse... we could make the membership keep count as well, though realistically this is fine.

yim-lee · 2019-08-26T16:49:49Z

Package.swift

@@ -160,17 +177,21 @@ let package = Package(
        /* ---  samples --- */

        .executable(
-            name: "DistributedActorsSampleDiningPhilosophers",
+            name: "SampleDiningPhilosophers",


👍 like the shorter names

Sources/DistributedActors/Props.swift

yim-lee · 2019-08-26T18:18:21Z

Sources/DistributedActors/Props.swift

    }

    public init() {
-        self.init(mailbox: .default(), dispatcher: .default, supervision: .init())
+        self.init(mailbox: .default(), dispatcher: .default, supervision: .init(), metrics: .default)


Observation: we are not being consistent with how default instance is created (.default() vs. .default vs. .init())

Thanks, going to stick to default 👍

yim-lee · 2019-08-26T18:20:14Z

Sources/DistributedActors/Props.swift

-    public init(mailbox: MailboxProps, dispatcher: DispatcherProps, supervision: SupervisionProps) {
+    public var metrics: MetricsProps
+
+    public init(mailbox: MailboxProps, dispatcher: DispatcherProps, supervision: SupervisionProps, metrics: MetricsProps) {


Should we combine this initializer with init() and have a single init where all parameters as optionals instead?

Sources/DistributedActors/Props.swift

yim-lee · 2019-08-26T18:55:20Z

Sources/DistributedActors/Metrics/ActorSystemMetrics.swift

+
+        let actorsLifecycle = settings.makeLabel("actors", "lifecycle")
+        self.actors_count_user = .init(label: actorsLifecycle, positive: [rootUser, dimStart], negative: [rootUser, dimStop])
+        self.actors_count_system = .init(label: actorsLifecycle, positive: [rootSystem, dimStart], negative: [rootSystem, dimStop])


thx for your help here :-)

yim-lee · 2019-08-26T18:56:09Z

Sources/DistributedActors/Metrics/ActorSystemMetrics.swift

+    /// Timing how long it takes to converge (i.e. an update to reach all members)
+    // TODO: how to measure this without huge overhead, maybe opt in
+    // let crdt_convergence_time:
+


This'll be very exciting; I know for sure some people internally will be very excited if we have CRDTs with convergence metrics <3

Sources/DistributedActors/Metrics/ActorSystemMetrics.swift

yim-lee · 2019-08-26T18:59:47Z

Sources/DistributedActors/Metrics/ActorSystemMetrics.swift

+    // ==== ------------------------------------------------------------------------------------------------------------
+    // MARK: Cluster Metrics
+
+    let cluster_members: Gauge


Any specific reason these variable names use underscore instead of camel case?

I kind of wanted them to be visually same as the names they end up being in the metrics:

/// actors.count { root=/system, event=stop } let actors_count ... /// cluster.members ... let cluster_members ...

etc

If too annoying we can move to camel, WDYT 🤔

I see. Just curious. 👍

Maybe document that in the code so we can be consistent with the pattern?

Will do, thanks for noticing it's not documented wholely :)

Sources/DistributedActors/Cluster/ClusterShell.swift

ktoso · 2019-08-27T03:28:05Z

scripts/find_test_failures.sh

@@ -21,6 +21,7 @@ declare -r logs=$1
 failures_count=0
 failures=()

+IFS=$'\n'


(was broken without this, makes \n the separator for the for)

ktoso · 2019-08-27T03:29:24Z

Addressed feedback in a18fe28

ktoso · 2019-08-27T03:44:54Z

Failure was the known #17 which we should look at; likely fleaky tests

ktoso · 2019-08-27T03:45:04Z

@swift-server-bot test this please

yim-lee

Just a couple of minor things 👍

yim-lee · 2019-08-27T05:29:11Z

Sources/DistributedActors/Supervision.swift

@@ -18,6 +18,8 @@ public struct SupervisionProps {
    // on purpose stored as list, to keep order in which the supervisors are added as we "scan" from first to last when we handle
    internal var supervisionMappings: [ErrorTypeBoundSupervisionStrategy]

+    public static let `default`: SupervisionProps = .init()


yim-lee · 2019-08-27T05:44:16Z

Sources/DistributedActors/Props.swift

    }

    public init() {
-        self.init(mailbox: .default(), dispatcher: .default, supervision: .init())
+        self.init(mailbox: .default(), dispatcher: .default, supervision: .default, metrics: .default)


yim-lee · 2019-08-27T05:45:08Z

Sources/DistributedActors/Props.swift

        self.mailbox = mailbox
        self.dispatcher = dispatcher
        self.supervision = supervision
+        self.metrics = metrics
    }

    public init() {


I guess technically we don't need this?

ah true, removing

yim-lee · 2019-08-27T05:49:28Z

Sources/DistributedActors/Metrics/CoreMetrics+MetricsPNCounter.swift

+    let negative: Counter
+
+    init(label: String, positive positiveDimensions: [(String, String)] = [], negative negativeDimensions: [(String, String)] = []) {
+        assert(positiveDimensions.map { "\($0)\($1)" }.joined() != negativeDimensions.map { "\($0)\($1)" }.joined(),


👍 good assertion

yim-lee · 2019-08-27T05:50:01Z

Sources/DistributedActors/Metrics/CoreMetrics+MetricsPNCounter.swift

+
+    init(label: String, positive positiveDimensions: [(String, String)] = [], negative negativeDimensions: [(String, String)] = []) {
+        assert(positiveDimensions.map { "\($0)\($1)" }.joined() != negativeDimensions.map { "\($0)\($1)" }.joined(),
+               "Dimensions for PNCounter pair [\(label)] MUST NOT be equal.")


PNCounter intended? Not MetricsPNCounter?

yim-lee · 2019-08-27T05:52:56Z

Sources/DistributedActors/Metrics/ActorSystemSettings+Metrics.swift

+    public var systemName: String?
+
+    /// Segment prefixed before all metrics exported automatically by the actor system.
+    public var systemMetricsPrefix: String? = "sact"


yim-lee · 2019-08-27T05:53:05Z

Sources/DistributedActors/Metrics/ActorSystemSettings+Metrics.swift

+
+    /// Prefix all metrics with this segment.
+    ///
+    /// Defaults to the actor systems' name.


systems' => system's?

ktoso · 2019-08-27T06:06:22Z

Whoops thanks will follow up on those comments 👍

* +metrics #30 basic metrics "skeleton" * +metrics #30 implement actor lifecycle as start/stop pn-counter metric * Addressed review

+metrics #30 basic metrics "skeleton"

e1079a7

ktoso commented Aug 26, 2019

View reviewed changes

Sources/DistributedActors/Metrics/CoreMetrics+AddGauge.swift Outdated Show resolved Hide resolved

ktoso commented Aug 26, 2019

View reviewed changes

ktoso requested review from tomerd, drexin and yim-lee August 26, 2019 08:16

ktoso commented Aug 26, 2019

View reviewed changes

+metrics #30 implement actor lifecycle as start/stop pn-counter metric

69fbdc8

yim-lee reviewed Aug 26, 2019

View reviewed changes

ktoso commented Aug 27, 2019

View reviewed changes

Addressed review

a18fe28

ktoso merged commit c7ebbb6 into apple:master Aug 27, 2019

ktoso deleted the wip-metrics branch August 27, 2019 05:34

yim-lee reviewed Aug 27, 2019

View reviewed changes

ktoso mentioned this pull request Aug 27, 2019

Small cleanup wrt. metrics PR #44

Merged

ktoso added a commit that referenced this pull request Aug 31, 2019

+metrics #30 basic metrics "skeleton" (#42)

e9032d7

* +metrics #30 basic metrics "skeleton" * +metrics #30 implement actor lifecycle as start/stop pn-counter metric * Addressed review

+metrics #30 basic metrics "skeleton" #42

+metrics #30 basic metrics "skeleton" #42

Conversation

ktoso commented Aug 26, 2019 • edited Loading

Motivation:

Modifications:

Result:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktoso Aug 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktoso commented Aug 27, 2019

ktoso commented Aug 27, 2019

ktoso commented Aug 27, 2019

yim-lee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktoso commented Aug 27, 2019

ktoso commented Aug 26, 2019 •

edited

Loading

ktoso Aug 26, 2019 •

edited

Loading