Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New intake decoder #4076

Closed
wants to merge 7 commits into from
Closed

New intake decoder #4076

wants to merge 7 commits into from

Conversation

simitt
Copy link
Contributor

@simitt simitt commented Aug 17, 2020

Motivation/summary

This is the first step for changing the JSON decoding and validation. The PR introduces dedicated modeldecoder models with json tags for the v2 and v3 rum API. The generated files are not yet generated, but give an outlook for what should be generated at the end.

The next steps are to

  • generate the decoding logic
  • add validation tags and generate the validation logic
  • reuse the models
  • update the profile handler to the new handling and remove the current modeldecoder implementations for metadata

Some first benchmarks comparing the decoding step of metadata suggest ~3 times less allocations, using the standard lib json decoder.

Checklist

I have considered changes for:

How to test these changes

Related issues

@apmmachine
Copy link
Contributor

apmmachine commented Aug 17, 2020

💔 Build Failed

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Branch indexing]

  • Start Time: 2020-09-17T02:50:11.769+0000

  • Duration: 8 min 1 sec

Steps errors

Expand to view the steps failures

  • Name: Run intake
    • Description: ./script/jenkins/intake.sh

    • Duration: 5 min 10 sec

    • Start Time: 2020-09-17T02:53:00.301+0000

    • log

Log output

Expand to view the last 100 lines of log output

[2020-09-17T02:54:27.356Z] Generated fields.yml for apm-server to /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4076/src/github.com/elastic/apm-server/build/fields/fields.all.yml
[2020-09-17T02:54:27.356Z] >> Generating docs/fields.asciidoc for apm-server
[2020-09-17T02:54:27.356Z] exec: /tmp/tmp.dilFVhFEHh/python-env/build/ve/linux/bin/python3 /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4076/pkg/mod/github.com/elastic/beats/[email protected]/libbeat/scripts/generate_fields_docs.py build/fields/fields.all.yml apm-server /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4076/pkg/mod/github.com/elastic/beats/[email protected] --output_path /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4076/src/github.com/elastic/apm-server
[2020-09-17T02:54:28.302Z] 2020/09/17 02:54:28 exec: go list -m
[2020-09-17T02:54:28.565Z] Running target: Check
[2020-09-17T02:54:28.565Z] >> check: Checking source code for common problems
[2020-09-17T02:54:28.565Z] Running dependency: github.com/elastic/beats/v7/dev-tools/mage.CheckYAMLNotExecutable
[2020-09-17T02:54:28.565Z] Running dependency: github.com/elastic/beats/v7/dev-tools/mage.GoVet
[2020-09-17T02:54:28.565Z] Running dependency: github.com/elastic/beats/v7/dev-tools/mage.CheckPythonTestNotExecutable
[2020-09-17T02:54:28.565Z] exec: go vet ./...
[2020-09-17T02:54:28.565Z] go: downloading github.com/stretchr/testify v1.6.1
[2020-09-17T02:54:28.827Z] go: downloading github.com/cloudfoundry-community/go-cfclient v0.0.0-20190808214049-35bcce23fc5f
[2020-09-17T02:54:28.827Z] go: downloading github.com/ua-parser/uap-go v0.0.0-20200325213135-e1c09f13e2fe
[2020-09-17T02:54:28.827Z] go: downloading github.com/dgraph-io/badger/v2 v2.0.3
[2020-09-17T02:54:28.827Z] go: downloading github.com/aws/aws-sdk-go-v2 v0.9.0
[2020-09-17T02:54:28.827Z] go: downloading github.com/tidwall/sjson v1.1.1
[2020-09-17T02:54:28.827Z] go: downloading github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e
[2020-09-17T02:54:28.827Z] go: downloading github.com/tidwall/gjson v1.6.0
[2020-09-17T02:54:28.827Z] go: downloading code.cloudfoundry.org/go-loggregator v7.4.0+incompatible
[2020-09-17T02:54:28.827Z] go: downloading github.com/tidwall/match v1.0.1
[2020-09-17T02:54:28.827Z] go: downloading github.com/tidwall/pretty v1.0.1
[2020-09-17T02:54:28.827Z] go: downloading code.cloudfoundry.org/go-diodes v0.0.0-20190809170250-f77fb823c7ee
[2020-09-17T02:54:28.827Z] go: downloading github.com/elastic/go-hdrhistogram v0.1.0
[2020-09-17T02:54:28.827Z] go: downloading github.com/Masterminds/semver v1.4.2
[2020-09-17T02:54:28.827Z] go: downloading code.cloudfoundry.org/gofileutils v0.0.0-20170111115228-4d0c80011a0f
[2020-09-17T02:54:29.090Z] go: downloading github.com/cloudfoundry/noaa v2.1.0+incompatible
[2020-09-17T02:54:29.090Z] go: downloading github.com/elastic/elastic-agent-client/v7 v7.0.0-20200709172729-d43b7ad5833a
[2020-09-17T02:54:29.090Z] go: downloading github.com/DataDog/zstd v1.4.1
[2020-09-17T02:54:29.090Z] go: downloading github.com/gorilla/websocket v1.4.1
[2020-09-17T02:54:29.090Z] go: downloading github.com/pmezard/go-difflib v1.0.0
[2020-09-17T02:54:29.090Z] go: downloading github.com/dgryski/go-farm v0.0.0-20190423205320-6a90982ecee2
[2020-09-17T02:54:29.090Z] go: downloading github.com/dgraph-io/ristretto v0.0.2
[2020-09-17T02:54:29.090Z] go: downloading gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
[2020-09-17T02:54:29.090Z] go: downloading github.com/ianlancetaylor/demangle v0.0.0-20200715173712-053cf528c12f
[2020-09-17T02:54:29.351Z] go: downloading code.cloudfoundry.org/rfc5424 v0.0.0-20180905210152-236a6d29298a
[2020-09-17T02:54:29.351Z] go: downloading github.com/cloudfoundry/sonde-go v0.0.0-20171206171820-b33733203bb4
[2020-09-17T02:54:29.351Z] go: downloading github.com/mailru/easyjson v0.7.1
[2020-09-17T02:54:31.901Z] go: downloading github.com/jmespath/go-jmespath v0.0.0-20180206201540-c2b33e8439af
[2020-09-17T02:56:38.509Z] exec: git update-index -q --refresh
[2020-09-17T02:56:38.509Z] exec: git diff-index HEAD -- .
[2020-09-17T02:56:38.509Z] go build -o build/linux/golint golang.org/x/lint/golint
[2020-09-17T02:56:38.509Z] go: downloading golang.org/x/lint v0.0.0-20200302205851-738671d3881b
[2020-09-17T02:56:38.509Z] go build -o build/linux/reviewdog github.com/reviewdog/reviewdog/cmd/reviewdog
[2020-09-17T02:56:38.509Z] go: downloading github.com/reviewdog/reviewdog v0.9.17
[2020-09-17T02:56:38.509Z] go: downloading github.com/xanzy/go-gitlab v0.22.3
[2020-09-17T02:56:38.509Z] go: downloading github.com/bradleyfalzon/ghinstallation v1.1.0
[2020-09-17T02:56:38.509Z] go: downloading github.com/reviewdog/errorformat v0.0.0-20200109134752-8983be9bc7dd
[2020-09-17T02:56:38.509Z] go: downloading github.com/google/go-github/v29 v29.0.2
[2020-09-17T02:56:38.509Z] go: downloading github.com/haya14busa/go-actions-toolkit v0.0.0-20200105081403-ca0307860f01
[2020-09-17T02:56:38.509Z] go: downloading cloud.google.com/go v0.51.0
[2020-09-17T02:56:38.509Z] go: downloading github.com/mattn/go-shellwords v1.0.7
[2020-09-17T02:56:38.509Z] go: downloading github.com/google/go-github/v28 v28.1.1
[2020-09-17T02:56:38.509Z] go: downloading github.com/dgrijalva/jwt-go v3.2.1-0.20190620180102-5e25c22bd5d6+incompatible
[2020-09-17T02:56:38.509Z] go: downloading github.com/google/go-querystring v1.0.0
[2020-09-17T02:56:38.509Z] go: downloading cloud.google.com/go/datastore v1.0.0
[2020-09-17T02:56:38.509Z] go: downloading google.golang.org/api v0.15.0
[2020-09-17T02:56:38.509Z] go: downloading github.com/googleapis/gax-go/v2 v2.0.5
[2020-09-17T02:56:38.509Z] go: downloading go.opencensus.io v0.22.2
[2020-09-17T02:56:38.509Z] go: downloading github.com/golang/groupcache v0.0.0-20191227052852-215e87163ea7
[2020-09-17T02:57:10.683Z] go build -o build/linux/staticcheck honnef.co/go/tools/cmd/staticcheck
[2020-09-17T02:57:10.683Z] go: downloading honnef.co/go/tools v0.0.1-2019.2.3
[2020-09-17T02:57:10.683Z] go: downloading github.com/BurntSushi/toml v0.3.1
[2020-09-17T02:57:13.992Z] ./build/linux/staticcheck github.com/elastic/apm-server/...
[2020-09-17T02:58:10.309Z] model/modeldecoder/rumv3/transaction.go:48:6: type queue is unused (U1000)
[2020-09-17T02:58:10.309Z] model/modeldecoder/v2/decoder.go:409:2: empty branch (SA9003)
[2020-09-17T02:58:10.309Z] model/modeldecoder/v2/decoder_test.go:96:3: should use 'return <expr>' instead of 'if <expr> { return <bool> }; return <bool>' (S1008)
[2020-09-17T02:58:10.309Z] sampling/sampling_test.go:70:6: func newBool is unused (U1000)
[2020-09-17T02:58:10.309Z] Makefile:207: recipe for target 'staticcheck' failed
[2020-09-17T02:58:10.309Z] make: *** [staticcheck] Error 1
[2020-09-17T02:58:10.309Z] + cleanup
[2020-09-17T02:58:10.309Z] + rm -rf /tmp/tmp.dilFVhFEHh
[2020-09-17T02:58:11.437Z] Stage "Build and Test" skipped due to earlier failure(s)
[2020-09-17T02:58:11.513Z] Stage "linux build" skipped due to earlier failure(s)
[2020-09-17T02:58:11.514Z] Stage "windows build-test" skipped due to earlier failure(s)
[2020-09-17T02:58:11.515Z] Stage "OSX build-test" skipped due to earlier failure(s)
[2020-09-17T02:58:11.515Z] Stage "Unit Test" skipped due to earlier failure(s)
[2020-09-17T02:58:11.516Z] Stage "System and Environment Tests" skipped due to earlier failure(s)
[2020-09-17T02:58:11.517Z] Stage "Benchmarking" skipped due to earlier failure(s)
[2020-09-17T02:58:11.517Z] Stage "Check kibana Obj. Updated" skipped due to earlier failure(s)
[2020-09-17T02:58:11.518Z] Stage "Hey-Apm" skipped due to earlier failure(s)
[2020-09-17T02:58:11.518Z] Stage "Package" skipped due to earlier failure(s)
[2020-09-17T02:58:11.519Z] Stage "APM Integration Tests" skipped due to earlier failure(s)
[2020-09-17T02:58:11.552Z] Stage "Package" skipped due to earlier failure(s)
[2020-09-17T02:58:11.680Z] Failed in branch linux build
[2020-09-17T02:58:11.681Z] Failed in branch windows build-test
[2020-09-17T02:58:11.681Z] Failed in branch OSX build-test
[2020-09-17T02:58:11.682Z] Failed in branch Unit Test
[2020-09-17T02:58:11.683Z] Failed in branch System and Environment Tests
[2020-09-17T02:58:11.683Z] Failed in branch Benchmarking
[2020-09-17T02:58:11.684Z] Failed in branch Check kibana Obj. Updated
[2020-09-17T02:58:11.684Z] Failed in branch Hey-Apm
[2020-09-17T02:58:11.685Z] Failed in branch APM Integration Tests
[2020-09-17T02:58:11.731Z] Stage "Package" skipped due to earlier failure(s)
[2020-09-17T02:58:11.788Z] Failed in branch Package
[2020-09-17T02:58:12.270Z] Running on Jenkins in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4076
[2020-09-17T02:58:12.501Z] [INFO] getVaultSecret: Getting secrets
[2020-09-17T02:58:12.607Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-09-17T02:58:13.384Z] + chmod 755 generate-build-data.sh
[2020-09-17T02:58:13.384Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4076/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4076/runs/24 FAILURE 481354
[2020-09-17T02:58:13.634Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4076/runs/24/steps/?limit=10000 -o steps-info.json

model/modeldecoder/rumv3/generated.go Outdated Show resolved Hide resolved
model/modeldecoder/v2/model.go Outdated Show resolved Hide resolved
Comment on lines 50 to 53
type metadataDecoder interface {
Decode(m *model.Metadata) error
Validate(input map[string]interface{}) error
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about changing this to be something like

type decodeMetadataFunc func(d decoder.Decoder, *model.Metadata) error

Where decoder.Decoder is a new interface in the decoder package:

package decoder

type Decoder interface {
    Decode(v interface{}) error
}

Internally the decoder could decode to a json.RawMessage first, and then decode to the struct - until the validation logic is run after decoding into the struct.

I'm not sure if this is great either, I'd just like to keep all the decoding and validation in one call from here. IIUC the Validate call is here only temporarily, and it would later be done immediately after UnmarshalJSON. How do you expect the code in this package to look ultimately?

@simitt
Copy link
Contributor Author

simitt commented Aug 25, 2020

@axw when having another look please focus on the modeldecoder/v2 part - it shows the two decoder approaches and the validation (logic to be generated later). The latest changes introducing UnmarshalJSON functions for the json-iter decoder introduced quite some allocations, I haven't had a chance yet to investigate this further.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is taking good shape.

processor/stream/processor.go Outdated Show resolved Hide resolved
model/modeldecoder/validation/typ.go Outdated Show resolved Hide resolved
model/modeldecoder/v2/validator.go Outdated Show resolved Hide resolved
model/modeldecoder/v2/model.go Outdated Show resolved Hide resolved
model/modeldecoder/v2/model.go Outdated Show resolved Hide resolved
Comment on lines 140 to 144
if key != "" {
s = fmt.Sprintf("%s.%s", key, s)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in the validation code, it would be great if we could use our knowledge of the schema to generate constants (or perfect hash for constant key lookup), to avoid these allocations.

return err
}

func decodeElements(obj *simdjson.Object, key string, out interface{}) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envision that we will generate type-specific decodeElements functions, or a single one like this with reflection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't really thought about it, but now that you mention it it should also be possible to generate dedicated decode functions here per key.

model/modeldecoder/validation/typ.go Outdated Show resolved Hide resolved
func elementToMetadata(key string, elem simdjson.Element, m *Metadata) error {
switch key {
case "cloud.account.id":
if elem.Type == simdjson.TypeString {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we silently ignore other types, how will we enforce type validation? Perhaps we should check elem.Type != simdjson.TypeNull, and then just call elem.Iter.String() which will return an error if it's not a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will change that

@simitt
Copy link
Contributor Author

simitt commented Sep 3, 2020

jenkins run hey-apm tests please

1 similar comment
@simitt
Copy link
Contributor Author

simitt commented Sep 3, 2020

jenkins run hey-apm tests please

Add transaction model to modeldecoder/rumv3`

Create mapping function

Create decoder and change processor
@simitt simitt mentioned this pull request Sep 15, 2020
1 task
@simitt simitt changed the title Decode metadata New intake decoder Sep 18, 2020
@simitt
Copy link
Contributor Author

simitt commented Oct 3, 2020

Closing in favor of #4261

@simitt simitt closed this Oct 3, 2020
@simitt simitt deleted the decode-metadata branch October 6, 2020 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants