|
1 | 1 | # SkyWalking AIOps Engine
|
2 |
| -**An AIOps Engine for Observability.** |
3 | 2 |
|
4 |
| -A usable open-source AIOps framework for the domain of cloud computing observability. |
| 3 | +*A practical open-source AIOps engine for the |
| 4 | +era of cloud computing.* |
5 | 5 |
|
6 |
| -### Why this project matters? |
7 |
| -We could answer this from the following progressive questions: |
8 |
| -1. Are there existing algorithms for telemetry data? |
| 6 | +## Why do we build this project? |
| 7 | + |
| 8 | +**We strongly believe that this project will bring value |
| 9 | +to AIOps practitioners and researchers.** |
| 10 | +<details> |
| 11 | + <summary>Towards better Observability</summary> |
| 12 | +We could reason this from the following progressive questions: |
| 13 | + |
| 14 | +1. Are there existing algorithms for telemetry data? |
9 | 15 | - **Abundant.**
|
10 | 16 |
|
11 |
| -2. Are the existing algorithms empirically verified? |
12 |
| - |
13 |
| - - **Most proposed algorithms are not empirically verified** |
14 | 17 |
|
15 |
| -3. Are there AIOps tools that embed machine learning algorithms? |
| 18 | +2. Are the existing algorithms empirically verified? |
| 19 | + |
| 20 | + - **Most algorithms are not verified in production** |
| 21 | + |
| 22 | + |
| 23 | +3. Are there practical AIOps frameworks? |
16 | 24 | - **Limited, often out of maintenance or commercialized.**
|
17 |
| - |
18 |
| -4. Are there open-source AIOps solutions that integrates with popular backends? |
| 25 | + |
| 26 | + |
| 27 | +4. Are there open-source AIOps solutions that offers Out-of-Box integrations? |
19 | 28 | - **Hardly any.**
|
20 | 29 |
|
| 30 | + |
21 | 31 | 5. Why would I need that?
|
22 | 32 | 1. For developers & organizations curious for AIOps:
|
23 |
| - - a. Just install and start using it, saves budget, saves head-scratching. |
| 33 | + - a. Just install and start using it, saves budget, prevents head-scratching. |
24 | 34 | - b. Treat this project as a good (or bad) reference for your own AIOps pipeline.
|
25 | 35 | 2. For researchers in the AIOps domain:
|
26 | 36 | - a. For software engineering researchers - sample for AIOps evolution and empirical study.
|
27 | 37 | - b. For algorithm researchers - playground for new algorithms, solid case studies.
|
28 |
| - |
29 | 38 |
|
30 |
| -The above is where we place the value of this project, though our current aim is to become the official AIOps engine |
31 |
| -of [Apache SkyWalking](https://github.com/apache/skywalking), each component could be easily swapped given its |
32 |
| -plugable design. |
| 39 | +</details> |
| 40 | + |
| 41 | + |
| 42 | +Click the above section to find out where we place the value of this project, |
| 43 | +though our current aim is to become the official AIOps engine |
| 44 | +of [Apache SkyWalking](https://github.com/apache/skywalking), |
| 45 | +each component could be easily swapped, extended and scaled to fit your own needs. |
33 | 46 |
|
34 | 47 | ### Current Goal
|
35 | 48 |
|
36 |
| -At the current stage, it serves as an **anomaly detection** engine, in the future, we will also explore root cause analysis and |
37 |
| -automatic problem recovery. |
| 49 | +At the current stage, it targets at Logs and Metrics analysis, |
| 50 | +in the future, we will also explore root cause analysis and |
| 51 | +automatic problem recovery based on Traces. |
38 | 52 |
|
39 |
| -This is also the tentative repository for OSPP 2022 and GSOC 2022 student project outcomes. |
| 53 | +This is also the repository for |
| 54 | +OSPP 2022 and GSOC 2022 student research outcomes. |
40 | 55 |
|
41 |
| -Project `Exploration of Advanced Metrics Anomaly Detection & Alerts with Machine Learning in Apache SkyWalking` |
| 56 | +1. `Exploration of Advanced Metrics Anomaly Detection & Alerts with Machine Learning in Apache SkyWalking` |
42 | 57 |
|
43 |
| -Project `Log Outlier Detection in Apache SkyWalking` |
| 58 | +2. `Log Outlier Detection in Apache SkyWalking` |
44 | 59 |
|
45 | 60 | ### Architecture
|
46 | 61 |
|
47 |
| -**TBA** |
| 62 | +**Log Clustering and Log Trend Analysis** |
48 | 63 |
|
49 |
| -**Data pulling:** |
| 64 | + |
50 | 65 |
|
51 |
| -The current data pulling and retention rely on a common set of ingestion methods, with a |
52 |
| -first focus on SkyWalking OAP GraphQL and static file loader. We maintain a local storage for processed data. |
| 66 | + |
53 | 67 |
|
54 |
| -**Alert component:** |
| 68 | +**Metric Anomaly Detection and Visualizations** |
55 | 69 |
|
56 |
| -An anomaly does not directly trigger an alert, it |
57 |
| -goes through a tolerance mechanism. |
| 70 | +TBD - Soon to be added |
58 | 71 |
|
59 | 72 | ### Roadmap
|
60 | 73 |
|
61 |
| -Phase 0 (current) |
62 |
| -1. [ ] Implement essential development infrastructure. |
63 |
| -2. [ ] Implement naive algorithms as baseline & pipline POC (on existing datasets). |
64 |
| -3. [ ] Implement a SkyWalking `GraphQLDataLoaderProvider` to test data pulling. |
65 |
| - |
66 |
| -Phase 1 (summer -> fall 2022, OSPP & GSOC period) |
67 |
| -1. [ ] Implement the remaining core default providers. |
68 |
| -2. [ ] **Research and implement algorithms with OSPP & GSOC students.** |
69 |
| -3. [ ] Integrate with Apache Airflow for orchestration. |
70 |
| -5. [ ] Evaluation based on benchmark microservices systems (anomaly injection). |
71 |
| -6. [ ] MVP ready without UI-side changes. |
72 |
| - |
73 |
| -Phase 2 (fall -> end of 2022) |
74 |
| -1. [ ] Join as an Apache SkyWalking subproject. |
75 |
| -2. [ ] Integrate with SkyWalking Backend & rule-based alert module. |
76 |
| -3. [ ] Propose and request SkyWalking UI-side changes. |
77 |
| -4. [ ] First release for end-user testing. |
78 |
| - |
79 |
| -Phase Next |
| 74 | +For the details of our progress, please refer to our project dashboard |
| 75 | +[Here](https://github.com/SkyAPM/aiops-engine-for-skywalking/projects?query=is%3Aopen). |
| 76 | + |
| 77 | +Phase Current (fall -> end of 2022) |
| 78 | + |
| 79 | +0. [ ] Finish POC stage and start implementing dashboards for first stage users. (demo purposes) |
| 80 | +1. [ ] Real-world data testing and chaos engineering benchmark experiments. |
| 81 | +2. [ ] Join Apache Software Foundation as an Apache SkyWalking subproject. |
| 82 | +3. [ ] Integrate with SkyWalking Backend (Export analytics results to SkyWalking) |
| 83 | +4. [ ] Propose and request SkyWalking UI-side changes. |
| 84 | +5. [ ] First release for SkyWalking end-user testing. |
| 85 | + |
| 86 | +Phase Next |
| 87 | + |
80 | 88 | 1.[ ] Towards production-ready.
|
0 commit comments