Skip to content
This repository was archived by the owner on Mar 5, 2024. It is now read-only.

Commit a069095

Browse files
authored
V3 release (#195)
v3 release - added changelog - start work on upgrade doc to cover v2 -> v3 - rename histogram metrics to clarify seconds - update grafana dashboard to fix plotting of histogram data - add quay badge to readme
1 parent 4bee061 commit a069095

File tree

7 files changed

+79
-20
lines changed

7 files changed

+79
-20
lines changed

CHANGELOG.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,40 @@
11
# Changelog
22

3+
## v3.0
4+
6 December 2018
5+
6+
v3 introduces a change to the gRPC API. Servers are compatible with v2.x Agents although **v3 Agents require v3 Servers**. Other breaking changes have been made so it's worth reading through [docs/UPGRADING.md](docs/UPGRADING.md) for more detail on moving from v2 to v3.
7+
8+
Notable changes:
9+
10+
* [#109](https://github.com/uswitch/kiam/pull/109) v3 API
11+
* [#110](https://github.com/uswitch/kiam/pull/110) Restrict metadata routes. Everything other than credentials **will be blocked by default**
12+
* [#122](https://github.com/uswitch/kiam/pull/122) Record Server error messages as Events on Pod
13+
* [#131](https://github.com/uswitch/kiam/pull/131) Replace go-metrics with native Prometheus metrics client
14+
* [#140](https://github.com/uswitch/kiam/pull/140) Example Grafana dashboard for Prometheus metrics
15+
* [#163](https://github.com/uswitch/kiam/pull/163) Server manifests use 127.0.0.1 rather than localhost to avoid DNS
16+
* [#173](https://github.com/uswitch/kiam/pull/173) Metadata Agent uses 301 rather than 308 redirects
17+
* [#180](https://github.com/uswitch/kiam/pull/180) Fix race condition with xtables.lock
18+
* [#193](https://github.com/uswitch/kiam/pull/193) Add optional pprof http handler to add monitoring in live clusters
19+
20+
A huge thanks to the following contributors for this release:
21+
22+
* [@Joseph-Irving](https://github.com/Joseph-Irving)
23+
* [@max-lobur](https://github.com/max-lobur)
24+
* [@fernandocarletti](https://github.com/fernandocarletti)
25+
* [@integrii](https://github.com/integrii)
26+
* [@duncward](https://github.com/duncward)
27+
* [@stevenjm](https://github.com/stevenjm)
28+
* [@tasdikrahman](https://github.com/tasdikrahman)
29+
* [@word](https://github.com/word)
30+
* [@DewaldV](https://github.com/DewaldV)
31+
* [@roffe](https://github.com/roffe)
32+
* [@sambooo](https://github.com/sambooo)
33+
* [@idiamond-stripe](https://github.com/idiamond-stripe)
34+
* [@ash2k](https://github.com/ash2k)
35+
* [@moofish32](https://github.com/moofish32)
36+
* [@sp-joseluis-ledesma](https://github.com/sp-joseluis-ledesma)
37+
338
## v2.8
439
1st June 2018
540

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
# kiam
2+
3+
[![Docker Repository on Quay](https://quay.io/repository/uswitch/kiam/status "Docker Repository on Quay")](https://quay.io/repository/uswitch/kiam)
4+
25
kiam runs as an agent on each node in your Kubernetes cluster and allows cluster users to associate IAM roles to Pods.
36

47
Docker images are available at [https://quay.io/repository/uswitch/kiam](https://quay.io/repository/uswitch/kiam).

docs/METRICS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ daemonset status from kube-state-metrics & container metrics from cAdvisor if av
3737

3838
#### Metadata Subsystem
3939

40-
- `kiam_metadata_handler_latency_milliseconds` - Bucketed histogram of handler timings. Tagged by handler
40+
- `kiam_metadata_handler_latency_seconds` - Bucketed histogram of handler timings. Tagged by handler
4141
- `kiam_metadata_credential_fetch_errors_total` - Number of errors fetching the credentials for a pod
4242
- `kiam_metadata_credential_encode_errors_total` - Number of errors encoding credentials for a pod
4343
- `kiam_metadata_find_role_errors_total` - Number of errors finding the role for a pod
@@ -51,7 +51,7 @@ daemonset status from kube-state-metrics & container metrics from cAdvisor if av
5151
- `kiam_sts_cache_hit_total` - Number of cache hits to the metadata cache
5252
- `kiam_sts_cache_miss_total` - Number of cache misses to the metadata cache
5353
- `kiam_sts_issuing_errors_total` - Number of errors issuing credentials
54-
- `kiam_sts_assumerole_timing_milliseconds` - Bucketed histogram of assumeRole timings
54+
- `kiam_sts_assumerole_timing_seconds` - Bucketed histogram of assumeRole timings
5555
- `kiam_sts_assumerole_current` - Number of assume role calls currently executing
5656

5757
#### K8s Subsystem

docs/UPGRADING.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Upgrading
2+
3+
## v2 to v3
4+
5+
Kiam changed significantly between v2.X and v3.0. Breaking changes are:
6+
7+
* The gRPC API was changed. v3 Agent processes can only connect and communicate with v3 Server processes.
8+
* The Agent metadata proxy HTTP server now blocks access to any path other than those used for obtaining credentials.
9+
* Server's handling of TLS has changed to remove port from Host. This requires certificates to name `kiam-server` rather than `kiam-server:443`, for example. Any issued certificates will likely need re-issuing.
10+
* Separated agent, server and health commands have been merged into a kiam binary. This means that when upgrading the image referenced the command and arguments used will also need to change.
11+
* Server now reports events to Pods, requiring additional RBAC privileges for the service account.
12+
13+
We would suggest upgrading in the following way:
14+
15+
1. Generate new TLS assets. You can use [docs/TLS.md](docs/TLS.md) to create new certificates, or use something like [cert-manager](https://github.com/jetstack/cert-manager) or [Vault](https://vaultproject.io). Given the TLS changes make sure that your server certificate supports names:
16+
* `kiam-server`
17+
* `kiam-server:443`
18+
* `127.0.0.1`
19+
2. Create a new DaemonSet to deploy the v3 Server processes and should use the new TLS assets deployed above. This will ensure that you have new server processes running alongside the old servers. Once the v3 servers are running and passing their health checks you can proceed. **Please note that RBAC policy changes are required for the Server** and are documented in [deploy/server-rbac.yaml](deploy/server-rbac.yaml)
20+
3. Update the Agent DaemonSet to use the v3 image. Because the command has changed it's worth being careful when changing this as the existing configuration will not work with v3. One option is to ensure your DaemonSet uses a `OnDelete` [update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/#daemonset-update-strategy): you can deploy new nodes running new agents connecting to new servers while leaving existing nodes as-is.

docs/dashboard-prom.json

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -972,14 +972,14 @@
972972
"min": null,
973973
"mode": "spectrum"
974974
},
975-
"dataFormat": "timeseries",
975+
"dataFormat": "tsbuckets",
976976
"datasource": "$datasource",
977977
"description": "Bucketed histogram of handler timings. Tagged by handler",
978978
"gridPos": {
979979
"h": 5,
980980
"w": 12,
981981
"x": 0,
982-
"y": 13
982+
"y": 24
983983
},
984984
"heatmap": {},
985985
"highlightCards": true,
@@ -990,8 +990,9 @@
990990
"links": [],
991991
"targets": [
992992
{
993-
"expr": "sum(increase(kiam_metadata_handler_latency_milliseconds_bucket{handler=\"credentials\"}[$interval])) by (le)",
994-
"format": "time_series",
993+
"expr": "sum(rate(kiam_metadata_handler_latency_seconds_bucket{handler=\"credentials\"}[$interval])) by (le)",
994+
"format": "heatmap",
995+
"interval": "",
995996
"intervalFactor": 2,
996997
"legendFormat": "{{le}}",
997998
"refId": "A",
@@ -1012,14 +1013,14 @@
10121013
"xBucketSize": null,
10131014
"yAxis": {
10141015
"decimals": null,
1015-
"format": "ms",
1016+
"format": "s",
10161017
"logBase": 1,
10171018
"max": null,
10181019
"min": null,
10191020
"show": true,
10201021
"splitFactor": null
10211022
},
1022-
"yBucketBound": "auto",
1023+
"yBucketBound": "upper",
10231024
"yBucketNumber": null,
10241025
"yBucketSize": null
10251026
},
@@ -1037,14 +1038,14 @@
10371038
"min": null,
10381039
"mode": "spectrum"
10391040
},
1040-
"dataFormat": "timeseries",
1041+
"dataFormat": "tsbuckets",
10411042
"datasource": "$datasource",
10421043
"description": "Bucketed histogram of handler timings. Tagged by handler",
10431044
"gridPos": {
10441045
"h": 5,
10451046
"w": 12,
10461047
"x": 12,
1047-
"y": 13
1048+
"y": 24
10481049
},
10491050
"heatmap": {},
10501051
"highlightCards": true,
@@ -1055,8 +1056,8 @@
10551056
"links": [],
10561057
"targets": [
10571058
{
1058-
"expr": "sum(increase(kiam_metadata_handler_latency_milliseconds_bucket{handler=\"roleName\"}[$interval])) by (le)",
1059-
"format": "time_series",
1059+
"expr": "sum(rate(kiam_metadata_handler_latency_seconds_bucket{handler=\"roleName\"}[$interval])) by (le)",
1060+
"format": "heatmap",
10601061
"interval": "",
10611062
"intervalFactor": 2,
10621063
"legendFormat": "{{le}}",
@@ -1084,7 +1085,7 @@
10841085
"show": true,
10851086
"splitFactor": null
10861087
},
1087-
"yBucketBound": "auto",
1088+
"yBucketBound": "upper",
10881089
"yBucketNumber": null,
10891090
"yBucketSize": null
10901091
},
@@ -1102,14 +1103,14 @@
11021103
"min": null,
11031104
"mode": "spectrum"
11041105
},
1105-
"dataFormat": "timeseries",
1106+
"dataFormat": "tsbuckets",
11061107
"datasource": "$datasource",
11071108
"description": "Bucketed histogram of assumeRole timings",
11081109
"gridPos": {
11091110
"h": 6,
11101111
"w": 24,
11111112
"x": 0,
1112-
"y": 18
1113+
"y": 29
11131114
},
11141115
"heatmap": {},
11151116
"highlightCards": true,
@@ -1120,8 +1121,8 @@
11201121
"links": [],
11211122
"targets": [
11221123
{
1123-
"expr": "sum(increase(kiam_sts_assumerole_timing_milliseconds_bucket[$interval])) by (le)",
1124-
"format": "time_series",
1124+
"expr": "sum(rate(kiam_sts_assumerole_timing_seconds_bucket[$interval])) by (le)",
1125+
"format": "heatmap",
11251126
"intervalFactor": 2,
11261127
"legendFormat": "{{le}}",
11271128
"refId": "A",
@@ -1142,7 +1143,7 @@
11421143
"xBucketSize": null,
11431144
"yAxis": {
11441145
"decimals": null,
1145-
"format": "ms",
1146+
"format": "s",
11461147
"logBase": 1,
11471148
"max": null,
11481149
"min": null,

pkg/aws/metadata/metrics.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ var (
99
prometheus.HistogramOpts{
1010
Namespace: "kiam",
1111
Subsystem: "metadata",
12-
Name: "handler_latency_milliseconds",
12+
Name: "handler_latency_seconds",
1313
Help: "Bucketed histogram of handler timings",
1414

1515
// 1ms to 5min

pkg/aws/sts/metrics.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ var (
3434
prometheus.HistogramOpts{
3535
Namespace: "kiam",
3636
Subsystem: "sts",
37-
Name: "assumerole_timing_milliseconds",
37+
Name: "assumerole_timing_seconds",
3838
Help: "Bucketed histogram of assumeRole timings",
3939

4040
// 1ms to 5min

0 commit comments

Comments
 (0)