Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survive attempts at indexing events that don't have a @timestamp #777

Closed

Conversation

webmat
Copy link
Contributor

@webmat webmat commented May 25, 2018

  • Events being indexed to an index that doesn't require time-based interpolation
    are unaffected and are still indexed, whether or not they have a @timestamp.
  • When events are missing @timestamp are destined to a timestamped index name,
    we don't send them for indexing, and we log an error.
  • #event_action_tuple now returns nil when an event is discarded this way.
  • Created a helper class to be a home for trivial functions that don't
    require the full plugin to be tested.
  • I am perhaps over-testing this.

This prevents a full Logstash crash, and closes #739

Update

After multiple discussions on the subject elsewhere, let’s remind ourselves that this PR is about fixing a crash today for all users of 5.x and 6.x.

Whether or not we eventually prevent the removal of @timestamp in a future breaking change (in LS 7.0 or later) is orthogonal to this.

Let’s try to discuss only the merits of this PR in whether it solves the crash in a good way or if needs improvements.

Mathieu Martin added 3 commits May 25, 2018 13:46
@webmat
Copy link
Contributor Author

webmat commented May 25, 2018

Prior to this PR, the following crashes Logstash with a FATAL error (see #739 for a well-formatted example):

bin/logstash -e "input { stdin {} generator { count => 1 } }
  filter { mutate { remove_field => ['@timestamp'] } }
  output { elasticsearch {}}"

After this PR, Logstash simply skips the event, and logs this instead:

[2018-05-25T15:10:43,326][ERROR][logstash.outputs.elasticsearch] Cannot index event missing @timestamp to a timestamped index {:index=>"logstash-%{+YYYY.MM.dd}", :event=>"{\"host\":\"matbook-pro.lan\",\"message\":\"Hello world!\",\"sequence\":0,\"@version\":\"1\"}"}

@webmat
Copy link
Contributor Author

webmat commented May 25, 2018

Poll:

  • minor version bump?
  • patch only?

Is fixing a crash a breaking change? 😂

@webmat
Copy link
Contributor Author

webmat commented May 25, 2018

Oh I had not noticed this suggestion #718, which is an even better behaviour to what this PR puts in place.

Although I'd say since this PR fixes a full-on Logstash crash, we should release +/- as is (following review suggestions), and I'll add #718 to my shortlist. I may even include it in #772.

@ph
Copy link
Contributor

ph commented May 25, 2018

Thanks for working on this, I agree this is a bad experience +1 to improving it!

But I think this could be part of a broader discussion:

  1. Do we want to allow users to remove @timestamp, some use case breaks if the field is removed. (Index, serialization, or some filter)
  2. This look like a problem with the string interpolation feature of the event, maybe the string interpolation should return an error or allow a default when the field is missing.

Concerning 1, I think you are right, so many use cases don't need it.

@webmat
Copy link
Contributor Author

webmat commented May 25, 2018

I agree that Logstash's string interpolation for timestamp could be made to not raise an exception. I think there are already tickets for this.

This PR fixes a Logstash crash today for LS 5.x and 6.x users, though.

@webmat
Copy link
Contributor Author

webmat commented May 25, 2018

Looking at the code around DLQ, I see metrics being tracked for each type of outcome for an event.

My PR here does not do that. This potentially needs to be discussed. Or perhaps we can simply introduce such a metric in #772 and fix this crash today.

@webmat
Copy link
Contributor Author

webmat commented May 26, 2018

After multiple discussions on the subject elsewhere, let’s remind ourselves that this PR is about fixing a crash today for all users of 5.x and 6.x.

Whether or not we eventually prevent the removal of @timestamp in a future breaking change (in LS 7.0 or later) is orthogonal to this.

Let’s try to discuss only the merits of this PR in whether it solves the crash in a good way or if needs improvements.

@elasticsearch-bot elasticsearch-bot self-assigned this May 28, 2018
@jsvd
Copy link
Member

jsvd commented May 28, 2018

@webmat I think I'd prefer this to remain a situation where logstash crashes fatally.

Logstash expects that any event will have @timestamp and should operate only under this assumption. With this in mind, we should assume that if @timestamp is removed then something went wrong and we should prevent as much data loss as possible. With this PR the es output would massively start dropping data if for some reason the field gets removed.

@webmat
Copy link
Contributor Author

webmat commented May 28, 2018

@jsvd What about people importing semi-static CSVs (e.g. zip codes) with Logstash? Should these have a @timestamp field?

Is this requirement documented anywhere? Finally, is this a hard requirement for the Logstash => ES scenario only, or should any Logstash configuration (even when it doesn't talk to ElasticSearch) have this hard requirement?

@jsvd
Copy link
Member

jsvd commented May 28, 2018

@jsvd What about people importing semi-static CSVs (e.g. zip codes) with Logstash? Should these have a @timestamp field?
Is this requirement documented anywhere? Finally, is this a hard requirement for the Logstash => ES scenario only, or should any Logstash configuration (even when it doesn't talk to ElasticSearch) have this hard requirement?

A long time ago @version 1 of the LogStash::Event was created and stated that it would always have an @timestamp: elastic/logstash@95a5bcb#diff-b09e11a609fa0d943a749a72972396ccR14

About documentation, we didn't do a good job there at all.

That said, I'm up for discussing if we should have a version 2 where we drop this requirement for @timestamp, but falls out of scope for this PR/issue (and should include core folks)

[edit] link to issue about documenting event api elastic/logstash#8662

@webmat
Copy link
Contributor Author

webmat commented May 28, 2018

Gotcha. Should I close this PR, or repurpose it to output a slightly more explicit message when either are missing? The following has the merit of complaining about the missing timestamp:

[2018-02-22T13:49:42,425][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<LogStash::Error: timestamp field is missing>

But this message will not stop people from contacting support and saying Logstash is crashing. It doesn't make clear that this is crashing because missing @timestamp is not supported.

So perhaps we could literally output a message that says both @version and @timestamp are mandatory fields that should never be removed, when sending to ElasticSearch? Perhaps we could even have the error contain a link to documentation about this.

The point of this would be to prevent needless support and SDH activity.

Note that with the discussions I had last week, I was under the impression that requiring @timestamp was a potential upcoming requirement, not a requirement that had been in place a long time.

Here's another idea, though (based on #718). We could also consider sending events missing a @timestamp to the DLQ, when it's enabled? It's not as hard as a crash, though, so it may fly under the radar and lead to data loss...

@jsvd
Copy link
Member

jsvd commented May 29, 2018

So perhaps we could literally output a message that says both @Version and @timestamp are mandatory fields that should never be removed, when sending to ElasticSearch? Perhaps we could even have the error contain a link to documentation about this.
The point of this would be to prevent needless support and SDH activity.

++ on improving messaging when this happens. We can catch this error and print a much nicer error message like "An event without @timestamp has been sent to ES output. This should never happen as this field is expected to exist, so assuming fatal error in event manipulation. Please check if any input, filter, codec or output is removing this field. Aborting logstash to prevent further data loss.."

Here's another idea, though (based on #718). We could also consider sending events missing a @timestamp to the DLQ, when it's enabled? It's not as hard as a crash, though, so it may fly under the radar and lead to data loss...

My perspective here is that anytime this situation happens it was by user configuration (e.g. using mutate/remove) so the reaction we want is for this to be solved asap. With this in mind, probably DLQ'ing isn't ideal, wdyt?

@jsvd
Copy link
Member

jsvd commented May 29, 2018

Gotcha. Should I close this PR, or repurpose it to output a slightly more explicit message when either are missing? The following has the merit of complaining about the missing timestamp:

maybe we can create a new PR and refer to this one as the discussion that led to it.

@webmat
Copy link
Contributor Author

webmat commented May 29, 2018

New issue created: #779. Let's move the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected error when @timestamp field missing
4 participants