Releases: cloudfoundry/diego-release
v2.47.0
v2.46.0
Changes
Bug Fixes
- Ensure ClaimLRP request for suspect LRP is ignored @aminjam (#514)
- Change Suspect ActualLRP back to Ordinary if corresponding Ordinary ActualLRP doesn't exist @jvshahid (#501)
- Fix hung BBS with a huge number of goroutines @JimmyMa (#505)
- Return error if os.Setenv fails (cloudfoundry/buildpackapplifecycle#41)
Dependencies
Diego v2.45.0
Resources
- Download release v2.45.0 from bosh.io.
- Verified with cloudfoundry/cf-deployment @
877dec06620a0474a786b1471286b33f56d4cb0f
.
Features
- #455 HTTP healthcheck requests are easy to identify using the
User-Agent
header. - #458 Provide generalized recommendations for BBS and Auctioneer VM sizing.
- #448 Use
bin pack first fit
strategy to place LRPs and tasks with adjustable weight. Turned off by default. - Do not skip certificate validation when fetching docker image metadata.
- Use RFC3339 timestamps in drain, pre-start and post-start scripts.
Bug Fixes
- #500 Route emitters incorrectly emit unregistration messages when binding or unbinding route services
Dependencies
- Bump Go from 1.13.3 to 1.13.8
- Bump crypto to latest release-branch.go1.13
- Other dependency bumps that were opened by dependabot. See the list of PRs
BOSH property changes
- Added
diego.auctioneer.bin_pack_first_fit_weight
Diego v2.44.0
Resources
- Download release v2.44.0 from bosh.io.
- Verified with cloudfoundry/cf-deployment @
9a71d9d39a2e10e41070850b3c91e72f25156b19
.
Changes from v2.43.0 to v2.44.0
Significant changes
Local Route Emitters
Declarative Health Checks
Per-Instance Proxy
- Envoy proxy binary bumped to bb7ceff4c3c5bd4555dff28b6e56d27f2f8be0a7
Component Logging and Metrics
BOSH property changes
route_emitter
and route_emitter_windows
diego.route_emitter.nats.tls.hostname
- Hostname of the NATS cluster.
Diego v2.43.0
Resources
- Download release v2.43.0 from bosh.io.
- Verified with cloudfoundry/cf-deployment @
6a1a4292cca658ec876f934c0f39c94be34e37a0
.
Diego Team will manage development directly in GitHub
The Diego Team has decided to manage development directly in GitHub rather than through GitHub<==>Pivotal Tracker.
This simplification should provide more visibility for the community and simplify the overall processes for the diego team.
The Diego Team owns 33 repos and in order to consolidate/simplify the inputs from all of them, we will now ask folks to submit issues to only diego-release
.
We've put the following changes in place to make this possible:
- GitHub issues have been disabled for all diego sub-components (e.g.: rep, bbs, auctioneer, cfdot, bytefmt, etc...). Issues remain enabled for
diego-release
.- Going forward, report any issues associated with diego's sub-components to
diego-release
using the new bug report or feature request templates and reference the sub-component in question in your report.
- Going forward, report any issues associated with diego's sub-components to
- When submitting PR's to a diego sub-component, please also submit an accompanying PR review request to
diego-release
and include a pointer to the sub-component PR you wish for the team to review.
Changes from v2.42.0 to v2.43.0
Significant changes
Per-Instance Proxy
- Envoy proxy binary bumped to 8f2515a19bdcc75bea0bfd7016231a7661d0be6e
Test Suites and Tooling
- pre-compile
vizzini
tests in compilation- run.erb should no longer need the dev_tools to be installed (e.g gcc)
Diego v2.42.0
Resources
- Download release v2.42.0 from bosh.io.
- Verified with cloudfoundry/cf-deployment @
8586950a55497c912c2ad109a93383c68ed34e51
.
Changes from v2.41.0 to v2.42.0
Significant changes
Per-Instance Proxy
- Envoy proxy binary bumped to 4e2f2022af592ef13afd325988262789db577f3b
App Logging and Metrics
To address the app "noisy neighbor problem", an app log rate limiting feature has been published in this diego-release.
- The configurable app log rate limit is exposed as the
diego.executor.max_log_lines_per_second
property in diego'srep
sub-component. - The default value of the property is
0
- no rate limit is enabled - The rate limit can be set to a unique or equivalent value on a per-diego-cell instance-group basis.
- In the scenario where an Isolation Segment , Windows, and standard diego-cell instance group has been deployed, each instance group can be configured separately with a distinct
diego.executor.max_log_lines_per_second
value. - In order to set the same rate limit across the entire foundation in the 3 diego-cell instance group deployment scenario described above, the rep
diego.executor.max_log_lines_per_second
property must be set explicitly in the deployment manifest for each instance group.
- In the scenario where an Isolation Segment , Windows, and standard diego-cell instance group has been deployed, each instance group can be configured separately with a distinct
- For additional details regarding the implementation of the log rate limiting feature, see the bottom of release notes
- The stories associated with the feature:
- As a platform operator I would like to be able enable and set a rate-limit for app log-lines/second on the diego-cells and/or isolated diego-cells, so I can be sure no spurious "chatty" app could cause log loss for other apps on the same cell
- As a platform operator I would like to be able to observe a metric revealing the cumulative number of app instances that have exceeded their log-lines-per-second rate limit over the last 5 minutes so I can tune the rate limit I've set on the platform and/or become aware I'll need to work with the offending app developers to decrease the volume of logs being generated by their apps
- As an app developer I would like to see a log entry in my app log-stream when app-instances that have exceeded their log-lines-per-second rate limit so I can make appropriate changes to my app to reduce the volume of log generation and/or reach out to the platform administrator to update the log-rate-limit for the platform
- As an app developer, I expect that when an app instance that has exceeded it's max-log-lines-per-second rate limit (when set on the platform) is stopped, any logs that have not been written to the CF app log stream are dropped so my app's logs are not polluted with logs from an instance that is no longer running
Component Logging and Metrics
- cloudfoundry/auction #6: Change cellStates logging to DEBUG
- cloudfoundry/auction #7: Reduce logging of fetched-cell-state to debug
BOSH property changes
rep
and rep_windows
- New -
diego.executor.max_log_lines_per_second
:
Maximum log lines allowed per second per app instance (default:0)
Default value of 0 disables rate limiting
Minimum recommended value, if set: 100
Log Rate Limit Feature Implementation Details
The implementation used by Diego for the app log rate limiting feature is the golang rate limiting library.
The library essentially uses the [Token Bucket Algorithm](https://en.wikipedia.org/wiki/Token_bucket algorithm).
- If an app exceeds its log rate limit, the rate at which the app's logs are pushed into the logging system will be limited (will slow down) up to the buffer limit.
- If/when the buffer limit it hit, if the app is still logging, those extra log messages will be dropped (Only logs generated by the app instance that's exceeding the rate limit will be dropped. Other apps colocated on the same cell as the "noisy neighbor" will not be affected).
- If the noisy neighbor app recovers to a logging rate that's below the rate limit set on the platform before the buffer limit is hit (for instance, after a short burst of extremely high log message generation), all the messages will still be printed out albeit with a slight delay).
Diego v2.41.0
Resources
- Download release v2.41.0 from bosh.io.
- Verified with cloudfoundry/cf-deployment @
edd69407ed538b140526b654c47c2f3a84a197b7
.
Changes from v2.40.0 to v2.41.0
Significant changes
Local Route Emitters
Per-Instance Proxy
- Envoy proxy binary bumped to 373af7564f4e943112456bf40084a7f43d5e9d96
Windows Support
App Logging and Metrics
- cloudfoundry/executor #50: Send cpu spike metric on every reporter interation
- As an app dev, I can see my app has spiked in the past even when log-cache no longer has metrics that old
Component Logging and Metrics
- As a cf operator I want the "lock loss" log messages to be obvious/clear so I'm not confused by or ignore these important messages and I take appropriate action when they occur
- As a platform operator I want to observe the bbs master election metric as part of the bbs indicator dashboard so that I can take appropriate action if/when the bbs master is swapping outside of platform upgrades
- add logging in the rep when curling for azure metadata fails
Dependencies
Test Suites and Tooling
- Migrate BenchmarkBBS to consume instance events because the non-instance events are deprecated
- Bugfix: inigo should clean up test artifacts in the temp directory etc. after it finishes
Documentation
- As a CF operator, I would like a document that describes the process for rotating the Diego intermediate instance identity ca cert and CF application ca cert so I can reliably rotate the certs in my foundation without application downtime
- As a platform operator I want to know which KPIs/metrics/platform-behaviors would indicate diego component/jobs, other than diego-cell, should be scaled up/out so that I can maintain optimum platform health
BOSH property changes
route_emitter
and route_emitter_windows
diego.route_emitter.nats.tls.enabled
- Enables route_emitter to connect to NATS server via TLS (default value:false
)diego.route_emitter.nats.tls.client_cert
- PEM-encoded certificate for the route-emitter to present to NATS for verification when connecting via TLSdiego.route_emitter.nats.tls.client_key
- PEM-encoded private key for the route-emitter to present to NATS for verification when connecting via TLS
Diego v2.40.0
Resources
- Download release v2.40.0 from bosh.io.
- Verified with cloudfoundry/cf-deployment @
bf96d52f8823c48ef3124cda4a8a93e86e27fa33
.
Changes from v2.39.0 to v2.40.0
Significant changes
Component Coordination
CF Tasks
Local Route Emitters
Per-Instance Proxy
- Envoy proxy binary bumped to 1e3e792a45236b8dc29db5f463a08f9a9928ab2c
Dependencies
BOSH property changes
rep
and rep_windows
diego.rep*.bbs.request_timeout
adds ability to configure the request timeout fromrep
tobbs
(default:10s
)
vizzini
vizzini.file_server.address
adds ability to configure address of thefile_server
(default: file-server.service.cf.internal:8080
)
Diego v2.39.0
Resources
- Download release v2.39.0 from bosh.io.
- Verified with cloudfoundry/cf-deployment @
f9c0a5fd2eaa8ec2d06e8cf533f469015262c7d5
.
Changes from v2.38.0 to v2.39.0
Significant changes
Local Route Emitters
Per-Instance Proxy
- Envoy proxy binary bumped to 4478c1984d17146b1ff78d0babedae2a4752b027
App Logging and Metrics
- cloudfoundry/diego-logging-client #7: Send container CPU usage spike metric
- cloudfoundry/executor #49: Let's Emit a CPU Usage Spike Metric
Documentation
BOSH property changes
auctioneer
bpm.enabled
is no longer experimental.
bbs
- The following properties are no longer experimental:
bpm.enabled
tasks.max_retries
file_server
bpm.enabled
is no longer experimental.
locket
bpm.enabled
is no longer experimental.
rep
and rep_windows
- The following spec properties are no longer experimental:
bpm.enabled
diego.executor.volman.driver_paths
- (property removed fromrep_windows
)containers.graceful_shutdown_interval_in_seconds
containers.proxy.require_and_verify_client_certificates
containers.proxy.trusted_ca_certificates
containers.proxy.verify_subject_alt_name
route_emitter
and route_emitter_windows
bpm.enabled
is no longer experimental.
ssh_proxy
bpm.enabled
is no longer experimental.
vizzini
- The following properties are no longer experimental:
enable_declarative_healthcheck
max_task_retries
enable_container_proxy_tests
vizzini.container_proxy.ca
vizzini.container_proxy.client_cert
vizzini.container_proxy.client_key
Diego v2.38.0
Changes from v2.37.0 to v2.38.0
Significant changes
Component Coordination
- As a application developer I want the system to handle the transfer of staging results larger than 10k to the cloud controller so my applications which generate larger staging results can successfully stage and subsequently start/run on the foundation
- A recent bump to
Buildpack Application Lifecycle
can cause task result files to be larger than 10K
(e.g. java apps). We are bumping theMAX_RESULT_SIZE
to 20K to address
the recent change.
- A recent bump to
Per-Instance Proxy
- Envoy proxy binary bumped to 4478c1984d17146b1ff78d0babedae2a4752b027
Docker/Image Support
- As a CF app developer, I expect to be able to push Docker apps that are hosted on AWS ECR and that they continue to run when restarted, crashed, or evacuated after the typical AWS ECR credential expiration period
- App developers who wish to push apps based on images from AWS ECR should set their
CF_DOCKER_PASSWORD
env variable to theAWS Secret Access Key
for the IAM user and pass theAWS Access Key ID
for the IAM user as thedocker-username
in theircf push...
command:cf push [appname] --docker-image [repo/ECR-container-image-name] --docker-username [aws-access-key-id]
- App developers who wish to push apps based on images from AWS ECR should set their
Component Logging and Metrics
- As a Diego Operator, I can observe the auctioneer logs and see what the auctioneer was trying to place when there was a placement failure so I can better diagnose the root cause of that placement failure
- As a 3rd party network plugin author, I expect my component to be able to tell what containers are internal system containers and don't require networking setup
- Log an error for slow readers on BBS
Test Suites and Tooling
- regenerate-certs.sh under ./rep/cmd/rep/fixtures should regenerate all of the fixture certs
- Flaky Test Crashes with a monitor action when the monitor never succeeds when the process dies with exit code 1 [It] gets marked as crashed (immediately)
- Failing Benchmark BBS Build - mitigate test flakiness
- Failing Benchmark BBS Build - Set the buffer on receiving channel for BBS events