Providing che.openshift.io status and monitoring tools #4730

slemeur · 2019-01-24T14:42:51Z

Goals

The idea of this epic is to improve the different mechanism we have in place to report the state of che.openshift.io. We would add solutions to monitor the state, report the metrics and expose those in a way which could be leverage to provide a better user experience.

Example: When we detect that the platform is having issues to start the workspaces, we should inform the user that his workspace might take longer than usual to get ready.

User Story 1: As a user, I should be able to know the status of the platform

Today, we are already measuring some elements on of the platform is behaving:

Workspace startup time
PVC mount time with empty PVC
PVC mount time with big PVC
Those metrics are currently not exposed and not everybody can access those.

With those metrics, we want to setup the basis of a status page and our monitoring tools:

Have an agent that is running the test with a defined interval
Expose the metrics to the prometheus format
Add a status page which will display the information (like status.io)

There are many different online services that are providing information about the state of their platform:

User Story 2: As an admin, or ops of the system, I should be notified/alerted when the platform is not behaving properly.

Once we have the metrics reported into prometheus format and the status available to end-user, we need to put in place an alerting system so when someone is going bad with the platform we have the information that is reported to the right people.

User Story 3: As an ops or admin of the platform, I'd like to get more insights about how the platform is behaving.

Once the basis are setup, we would enrich the metrics we are following:

time of pulling images
time of pulling images that are already cached
time to create routes
time to clone a repository
time it spent in initializing a Language Server

User Story 4: As a user using the product, I should be notified in my environment if there is something behaving wrong on the platform.

User Story 5: As one deploying Che, I'd like to benefit from this tooling.

We should be able to provide those tools for anyone who setup Che on their own.

User Story 6: As a user, I want to get in-context feedback about the state of the platform.

There are multiple aspects where we could provide information about the state of the platform:

When starting the workspace, if the state of the platform doesn't provide fast start of the workspace, we should provide a message "The platform is currently under load, your workspace may take longer than usual to get ready."
When in the IDE, we could have a small status widget in the status bar - showing different indicators about the state of the platform

slemeur added type/epic area/che team/che/osio labels Jan 24, 2019

ibuziuk mentioned this issue Jan 29, 2019

Need to create system status page of the che.openshift.io redhat-developer/rh-che#1224

Closed

3 tasks

This was referenced Feb 19, 2019

Add traces/metrics relate to routes eclipse-che/che#12699

Closed

Add traces/metrics relate to pulling images eclipse-che/che#12698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Providing che.openshift.io status and monitoring tools #4730

Providing che.openshift.io status and monitoring tools #4730

slemeur commented Jan 24, 2019

Providing che.openshift.io status and monitoring tools #4730

Providing che.openshift.io status and monitoring tools #4730

Comments

slemeur commented Jan 24, 2019

Goals

User Story 1: As a user, I should be able to know the status of the platform

User Story 2: As an admin, or ops of the system, I should be notified/alerted when the platform is not behaving properly.

User Story 3: As an ops or admin of the platform, I'd like to get more insights about how the platform is behaving.

User Story 4: As a user using the product, I should be notified in my environment if there is something behaving wrong on the platform.

User Story 5: As one deploying Che, I'd like to benefit from this tooling.

User Story 6: As a user, I want to get in-context feedback about the state of the platform.