Skip to content

Conversation

@Linu-Elias
Copy link
Contributor

@Linu-Elias Linu-Elias commented Jan 2, 2026

Proposed commit message

Adding alerting rule templates to AWS Content Packs:

  • AWS VPC Flow Logs
  • AWS Cloudtrail Logs
  • AWS ELB Logs

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

aws_cloudtrail aws_elb aws_vpc

@Linu-Elias Linu-Elias self-assigned this Jan 5, 2026
@Linu-Elias Linu-Elias marked this pull request as ready for review January 6, 2026 05:16
@Linu-Elias Linu-Elias requested a review from a team as a code owner January 6, 2026 05:16
@andrewkroh andrewkroh added Integration:aws_elb_otel AWS ELB OpenTelemetry Assets Integration:aws_vpcflow_otel AWS VPC Flow Logs OpenTelemetry Assets Integration:aws_cloudtrail_otel AWS CloudTrail Logs OpenTelemetry Assets Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels Jan 8, 2026
@muthu-mps
Copy link
Contributor

Template name:

Can we update the template names as below,

Excessive high-risk actions succeed

High-risk actions succeeded

Massive resource deletion from same IP

High resource deletion

Multiple error spike from same IP

High error rate

Multiple failed login attempts from same IP

Multiple failed login attempts

Applicationl level failures

Application errors

Backend target failures

Backend errors

Excessive data transfer from a single source

High data transfer rate

Excessive REJECT actions with single source IP

High reject actions.

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @Linu-Elias

Copy link
Contributor

@tommyers-elastic tommyers-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i sort of reviewed this back to front, so the more general comments are on the later rules.

i noticed that all these rules run every 5m over the last 10/15m of data. did we consider each rule independently and decide that this is the best schedule in every case?

"id": "aws-vpcflow-otel-massive-data-transfer",
"type": "alerting_rule_template",
"attributes": {
"name": "[AWS VPC OTEL] Excessive data transfer from a single source",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[AWS VPC OTEL] doesn't seem very use friendly

can we remove 'OTEL'?

"searchType": "esqlQuery",
"timeWindowSize": 10,
"timeWindowUnit": "m",
"esqlQuery": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need to include WHERE @timestamp > NOW()- 10m - it's handled by the timeWindowSize param in the rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same for all other rules in this PR)

"id": "aws-vpcflow-otel-reject-ip",
"type": "alerting_rule_template",
"attributes": {
"name": "[AWS VPC OTEL] Excessive REJECT actions with single source IP",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the naming consistent. above we have "from a single source", here we have "with single source IP"

"type": "alerting_rule_template",
"attributes": {
"name": "[AWS VPC OTEL] Excessive data transfer from a single source",
"tags": ["AWS VPC Logs OpenTelemetry Assets"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a good tag name

we should have tags for 'aws', 'vpc' (and possibly 'otel'?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same for all other rules in this PR)

"timeWindowSize": 10,
"timeWindowUnit": "m",
"esqlQuery": {
"esql": "// Alert triggers when any source IP address whose bytes exceed a threshold (e.g. > 50GB in 10 minutes)\n// You can adjust the threshold value in WHERE clause as needed.\nFROM logs-aws.vpcflow.otel-default | WHERE @timestamp > NOW()- 10m | STATS total_bytes = SUM(aws.vpc.flow.bytes) BY source.address | WHERE total_bytes > 53687091200 | SORT total_bytes DESC"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the SORT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same for all other rules)

},
"params": {
"searchType": "esqlQuery",
"timeWindowSize": 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be 15m to match description

},
"params": {
"searchType": "esqlQuery",
"timeWindowSize": 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does 10m seem like a long time period for detecting failed login attempts?

"timeWindowSize": 10,
"timeWindowUnit": "m",
"esqlQuery": {
"esql": "// Alert triggers when any source IP address whose reject requests exceed a threshold (e.g. > 100 in 10 minutes)\n// You can adjust the threshold value in WHERE clause as needed.\nFROM logs-aws.cloudtrail.otel-default | WHERE @timestamp > NOW()- 10m | WHERE rpc.method == \"ConsoleLogin\" | WHERE aws.error.code IS NOT NULL | STATS failed_count = COUNT(*), users_tried = VALUES(user.name) BY source.address | WHERE failed_count >= 100 | SORT failed_count DESC"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the VALUES agg here?

"timeWindowSize": 10,
"timeWindowUnit": "m",
"esqlQuery": {
"esql": "// Alert triggers when any client IP address whose error count exceed a threshold (e.g. > 50 in 10 minutes)\n// You can adjust the threshold value in WHERE clause as needed.\nFROM logs-aws.elbaccess.otel-default | WHERE @timestamp > NOW()- 10m | WHERE aws.elb.status.code != 200| STATS error_count = COUNT(*) BY client.address | WHERE error_count >= 50 | SORT error_count DESC"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should client errors, e.g. 404, trigger this alert?

"timeWindowSize": 10,
"timeWindowUnit": "m",
"esqlQuery": {
"esql": "// Alert triggers when any source IP address whose reject requests exceed a threshold (e.g. > 100 in 10 minutes)\n// You can adjust the threshold value in WHERE clause as needed.\nFROM logs-aws.cloudtrail.otel-default | WHERE @timestamp > NOW()- 10m | WHERE rpc.method == \"ConsoleLogin\" | WHERE aws.error.code IS NOT NULL | STATS failed_count = COUNT(*), users_tried = VALUES(user.name) BY source.address | WHERE failed_count >= 100 | SORT failed_count DESC"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had a few concerns about this rule and did a sanity check by asking chatgpt for some feedback. it has a lot of concerns about this rule.

did we get an LLM to thouroughly review all the queries here?

i don't know if the concerns are valid, but i just want to check we have considered feedback like this.

please DM me for the detail i got from GPT, but the summary was:

Primary concerns:

  • Threshold is orders of magnitude too high

  • Failure signal is weak

  • Missing service scoping

  • Detection intent is unclear

As written, this alert will almost certainly never fire for real attacks, while giving a false sense of coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Integration:aws_cloudtrail_otel AWS CloudTrail Logs OpenTelemetry Assets Integration:aws_elb_otel AWS ELB OpenTelemetry Assets Integration:aws_vpcflow_otel AWS VPC Flow Logs OpenTelemetry Assets Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants