Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fleet-server "failed to fetch elasticsearch version" - ECK install on OpenShift isn't working #8111

Open
ALL-SPACE-Anas opened this issue Oct 14, 2024 · 7 comments
Labels

Comments

@ALL-SPACE-Anas
Copy link

Elasticsearch Version

Version: 8.15.2, Build: docker/98adf7bf6bb69b66ab95b761c9e5aadb0bb059a3/2024-09-19T10:06:03.564235954Z, JVM: 22.0.1

Installed Plugins

No response

Java Version

bundled

OS Version

OpenShift BareMetal

Problem Description

I have deployed ECK on OpenShift baremetal servers for POC. While I can get kibana dashboard, I cannot get fleet-server to start and work. I'm using default configuration (from these documentations https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-openshift-deploy-the-operator.html and https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-elastic-agent-fleet-quickstart.html) for the most part with little modifications where needed.

these are my manifests:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana-sample
spec:
  version: 8.15.2
  count: 1
  elasticsearchRef:
    name: "elasticsearch-sample"
  podTemplate:
    spec:
      containers:
      - name: kibana
        resources:
          limits:
            memory: 1Gi
            cpu: 1
  config:
    server.publicBaseUrl: "https://#######"
    xpack.fleet.agents.elasticsearch.hosts: ["https://elasticsearch-sample-es-http.elastic.svc:9200"]
    xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server-sample-agent-http.elastic.svc:8220"]
    xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
      - name: apm
        version: latest
    xpack.fleet.agentPolicies:
      - name: Fleet Server on ECK policy
        id: eck-fleet-server
        namespace: elastic
        monitoring_enabled:
          - logs
          - metrics
        unenroll_timeout: 900
        package_policies:
        - name: fleet_server-1
          id: fleet_server-1
          package:
            name: fleet_server
      - name: Elastic Agent on ECK policy
        id: eck-agent
        namespace: elastic
        monitoring_enabled:
          - logs
          - metrics
        unenroll_timeout: 900
        package_policies:
          - name: system-1
            id: system-1
            package:
              name: system
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-sample
spec:
  version: 8.15.2
  nodeSets:
    - name: default
      count: 1
      config:
        node.store.allow_mmap: false
        index.store.type: niofs # https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-store.html
---
apiVersion: apm.k8s.elastic.co/v1
kind: ApmServer
metadata:
  name:apm-server-sample
spec:
  version: 8.15.2
  count: 1
  elasticsearchRef:
    name: "elasticsearch-sample"
  kibanaRef: 
    name: kibana-sample
  podTemplate:
    spec:
      serviceAccountName: apm-server

Agent state:
oc get agents

NAME                   HEALTH   AVAILABLE   EXPECTED   VERSION   AGE
elastic-agent-sample   green    3           3          8.15.2    138m
fleet-server-sample    red                  1          8.15.2    138m

oc describe agent fleet-server-sample

Name:         fleet-server-sample
Namespace:    elastic
Labels:       <none>
Annotations:  ###
API Version:  agent.k8s.elastic.co/v1alpha1
Kind:         Agent
Metadata: ###
Spec:
  Deployment:
    Pod Template:
      Metadata:
        Creation Timestamp:  <nil>
      Spec:
        Automount Service Account Token:  true
        Containers:                       <nil>
        Security Context:
          Run As User:         0
        Service Account Name:  elastic-agent
        Volumes:
          Name:  agent-data
          Persistent Volume Claim:
            Claim Name:  fleet-server-sample
    Replicas:            1
    Strategy:
  Elasticsearch Refs:
    Name:                elasticsearch-sample
  Fleet Server Enabled:  true
  Fleet Server Ref:
  Http:
    Service:
      Metadata:
      Spec:
    Tls:
      Certificate:
  Kibana Ref:
    Name:     kibana-sample
  Mode:       fleet
  Policy ID:  eck-fleet-server
  Version:    8.15.2
Status:
  Elasticsearch Associations Status:
    elastic/elasticsearch-sample:  Established
  Expected Nodes:                  1
  Health:                          red
  Kibana Association Status:       Established
  Observed Generation:             2
  Version:                         8.15.2
Events:
  Type     Reason                   Age                   From                                 Message
  ----     ------                   ----                  ----                                 -------
  Warning  AssociationError         138m (x5 over 138m)   agent-controller                     Association backend for elasticsearch is not configured
  Warning  AssociationError         138m (x9 over 138m)   agent-controller                     Association backend for kibana is not configured
  Normal   AssociationStatusChange  138m                  agent-es-association-controller      Association status changed from [] to [elastic/elasticsearch-sample: Established]
  Normal   AssociationStatusChange  138m                  agent-kibana-association-controller  Association status changed from [] to [Established]
  Warning  Delayed                  138m (x11 over 138m)  agent-controller                     Delaying deployment of Elastic Agent in Fleet Mode as Kibana is not available yet

fleet-server pod error logs (which is in CrashLoopBackoff):

{"log.level":"error","@timestamp":"2024-10-14T16:35:35.550Z","message":"failed to fetch elasticsearch version","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"@timestamp":"2024-10-14T16:35:35.55Z","ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","error.message":"dial tcp [::1]:9200: connect: connection refused","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-10-14T16:35:35.551Z","message":"Failed Elasticsearch output configuration test, using bootstrap values.","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","error.message":"dial tcp [::1]:9200: connect: connection refused","output":{"hosts":["localhost:9200"],"protocol":"https","proxy_disable":false,"proxy_headers":{},"service_token":"#####","ssl":{"certificate_authorities":["/mnt/elastic-internal/elasticsearch-association/elastic/elasticsearch-sample/certs/ca.crt"],"verification_mode":"full"},"type":"elasticsearch"},"@timestamp":"2024-10-14T16:35:35.55Z","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:35.612Z","message":"panic: runtime error: invalid memory address or nil pointer dereference","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x55df2cba3217]","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"goroutine 279 [running]:","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).configFromUnits(0xc000002240, {0x55df2d489218, 0xc000486370})","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:441 +0x97","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).start(0xc000002240, {0x55df2d489218, 0xc000486370})","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:344 +0x51","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).reconfigure(0xc0002fd728?, {0x55df2d489218?, 0xc000486370?})","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.012Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:387 +0x8d","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.013Z","message":"github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).Run.func5()","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.013Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:204 +0x5c5","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.148Z","message":"created by github.com/elastic/fleet-server/v7/internal/pkg/server.(*Agent).Run in goroutine 1","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.148Z","message":"/opt/buildkite-agent/builds/bk-agent-prod-aws-1726684516326467547/elastic/fleet-server-package-mbp/internal/pkg/server/agent.go:162 +0x416","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.515Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":647},"message":"Component state changed fleet-server-default (STARTING->FAILED): Failed: pid '1214' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.515Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":665},"message":"Unit state changed fleet-server-default-fleet-server (STARTING->FAILED): Failed: pid '1214' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED"},"unit":{"id":"fleet-server-default-fleet-server","type":"input","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:36.516Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":665},"message":"Unit state changed fleet-server-default (STARTING->FAILED): Failed: pid '1214' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED"},"unit":{"id":"fleet-server-default","type":"output","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-14T16:36:45.612Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/cmd.logReturn","file.name":"cmd/run.go","file.line":162},"message":"2 errors occurred:\n\t* timeout while waiting for managers to shut down: no response from runtime manager, no response from vars manager\n\t* config manager: failed to initialize Fleet Server: context deadline exceeded\n\n","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
Error: 2 errors occurred:
	* timeout while waiting for managers to shut down: no response from runtime manager, no response from vars manager
	* config manager: failed to initialize Fleet Server: context deadline exceeded

From the logs it appears that fleet-server pod is looking for elasticsearch cluster at localhost instead of sending requests to elasticsearch service. There are other errors as well but I think this needs to be resolved first.

Errors in kibana pod:

[2024-10-14T16:17:47.714+00:00][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. Request timed out

Steps to Reproduce

Deploy ECK cluster using manifests mentioned above. Which are default for the most part with some changes.

Logs (if relevant)

No response

@ALL-SPACE-Anas ALL-SPACE-Anas added the >bug Something isn't working label Oct 14, 2024
@gwbrown
Copy link

gwbrown commented Oct 15, 2024

I think this issue is more appropriate for the ECK repo rather than the Elasticsearch repo, so I'll move this there. Let me know if there's an underlying issue with Elasticsearch here.

@gwbrown gwbrown transferred this issue from elastic/elasticsearch Oct 15, 2024
@barkbay
Copy link
Contributor

barkbay commented Oct 17, 2024

I didn't manage to reproduce the problem using the provided Elasticsearch and Kibana manifests and the following versions:

  • ECK 2.14.0
  • OpenShift 4.17.0 / K8s v1.30.4

I would first try to understand the connectivity issue between Kibana and Elasticsearch. Could you check that the ES cluster is healthy, all the Pods are running and ready and check if there is anything suspicious in the ES logs?

FWIW here is the Agent definition I've been using for my test:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server-sample
  namespace: elastic
spec:
  version: 8.15.2
  kibanaRef:
    name: kibana-sample
  elasticsearchRefs:
    - name: elasticsearch-sample
  mode: fleet
  fleetServerEnabled: true
  policyID: eck-fleet-server
  deployment:
    replicas: 1
    podTemplate:
      spec:
        serviceAccountName: fleet-server
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
        containers:
          - name: agent
            securityContext:
              privileged: true

With the following command to add the service account to the privileged SCC:

oc adm policy add-scc-to-user privileged -z fleet-server -n elastic

@barkbay barkbay added the triage label Oct 17, 2024
@botelastic botelastic bot removed the triage label Oct 17, 2024
@barkbay barkbay added triage and removed >bug Something isn't working labels Oct 17, 2024
@ALL-SPACE-Anas
Copy link
Author

ALL-SPACE-Anas commented Oct 21, 2024

Hi @barkbay,

Thanks for your reply. I was basically using the same fleet-server definition except the privileged security context. I've now tried with privileged:true and the issue isn't resolved unfortunately. I've done some more troubleshooting and found some more potential clues as to what could be the problem:

oc get kibana

NAME            HEALTH   NODES   VERSION   AGE
kibana-sample   green    1       8.15.2    6d20h

kibana pod error logs:

[2024-10-20T14:20:20.005+00:00][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. Request timed out
[2024-10-20T14:42:46.570+00:00][ERROR][plugins.security.authentication] Authentication attempt failed:
{
  "error": {
    "root_cause": [
      {
        "type": "security_exception",
        "reason": "unable to authenticate user [elastic-fleet-server-sample-agent-kb-user] for REST request [/_security/_authenticate]",
        "header": {
          "WWW-Authenticate": [
            "Basic realm=\"security\", charset=\"UTF-8\"",
            "Bearer realm=\"security\"",
            "ApiKey"
          ]
        }
      }
    ],
    "type": "security_exception",
    "reason": "unable to authenticate user [elastic-fleet-server-sample-agent-kb-user] for REST request [/_security/_authenticate]",
    "header": {
      "WWW-Authenticate": [
        "Basic realm=\"security\", charset=\"UTF-8\"",
        "Bearer realm=\"security\"",
        "ApiKey"
      ]
    }
  },
  "status": 401
}

either of these two logs seems to be the root-cause for this error. is it possible that the first log is causing "failed to get elasticsearch version" error?
as to find the cause of first log, it appears something's wrong with elasticsearch so tried troubleshooting that.
oc get elasticsearch

NAME                   HEALTH   NODES   VERSION   PHASE   AGE
elasticsearch-sample   yellow   1       8.15.2    Ready   6d21h

oc describe elasticsearch

...
Status:
  Available Nodes:  1
  Conditions:
    Last Transition Time:  2024-10-19T21:39:52Z
    Status:                True
    Type:                  ReconciliationComplete
    Last Transition Time:  2024-10-14T14:10:27Z
    Message:               All nodes are running version 8.15.2
    Status:                True
    Type:                  RunningDesiredVersion
    Last Transition Time:  2024-10-19T21:39:52Z
    Message:               Service elastic/elasticsearch-sample-es-internal-http has endpoints
    Status:                True
    Type:                  ElasticsearchIsReachable
    Last Transition Time:  2024-10-14T16:08:39Z
    Message:               Cannot get compute and storage resources from Elasticsearch resource generation 3: cannot compute resources for NodeSet "default": no CPU request or limit set
    Status:                False
    Type:                  ResourcesAwareManagement
  Health:                  yellow
Events:                              <none>

elasticsearch error logs:

{
  "@timestamp": "2024-10-19T21:39:48.160Z",
  "log.level": "ERROR",
  "message": "exception during geoip databases update",
  "ecs.version": "1.2.0",
  "service.name": "ES_ECS",
  "event.dataset": "elasticsearch.server",
  "process.thread.name": "elasticsearch[elasticsearch-sample-es-default-0][generic][T#4]",
  "log.logger": "org.elasticsearch.ingest.geoip.GeoIpDownloader",
  "elasticsearch.cluster.uuid": "d_KKluHQS2Ohfrx6aJn1SA",
  "elasticsearch.node.id": "hUz6RUtXTmep87LFE_FNkQ",
  "elasticsearch.node.name": "elasticsearch-sample-es-default-0",
  "elasticsearch.cluster.name": "elasticsearch-sample",
  "error.type": "org.elasticsearch.ElasticsearchException",
  "error.message": "not all primary shards of [.geoip_databases] index are active",
  "error.stack_trace": "org.elasticsearch.ElasticsearchException: not all primary shards of [.geoip_databases] index are active\n\tat [email protected]/org.elasticsearch.ingest.geoip.GeoIpDownloader.updateDatabases(GeoIpDownloader.java:141)\n\tat [email protected]/org.elasticsearch.ingest.geoip.GeoIpDownloader.runDownloader(GeoIpDownloader.java:293)\n\tat [email protected]/org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:162)\n\tat [email protected]/org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:61)\n\tat [email protected]/org.elasticsearch.persistent.NodePersistentTasksExecutor$1.doRun(NodePersistentTasksExecutor.java:34)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1570)\n"
}
{
  "@timestamp": "2024-10-19T21:39:50.271Z",
  "log.level": "WARN",
  "message": "path: /.kibana_task_manager/_search, params: {ignore_unavailable=true, index=.kibana_task_manager}, status: 503",
  "ecs.version": "1.2.0",
  "service.name": "ES_ECS",
  "event.dataset": "elasticsearch.server",
  "process.thread.name": "elasticsearch[elasticsearch-sample-es-default-0][system_read][T#1]",
  "log.logger": "rest.suppressed",
  "trace.id": "c819660c9f92837d8de985ed7fb51b84",
  "elasticsearch.cluster.uuid": "d_KKluHQS2Ohfrx6aJn1SA",
  "elasticsearch.node.id": "hUz6RUtXTmep87LFE_FNkQ",
  "elasticsearch.node.name": "elasticsearch-sample-es-default-0",
  "elasticsearch.cluster.name": "elasticsearch-sample",
  "error.type": "org.elasticsearch.action.search.SearchPhaseExecutionException",
  "error.message": "all shards failed",
  "error.stack_trace": "Failed to execute phase [query], all shards failed; shardFailures {[hUz6RUtXTmep87LFE_FNkQ][.kibana_task_manager_8.15.2_001][0]: org.elasticsearch.action.NoShardAvailableActionException: [elasticsearch-sample-es-default-0][20.128.0.208:9300][indices:data/read/search[phase/query]]\n}\n\tat [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:724)\n\tat [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:416)\n\tat [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:756)\n\tat [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:509)\n\tat [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:337)\n\tat [email protected]/org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)\n\tat [email protected]/org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)\n\tat [email protected]/org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)\n\tat [email protected]/org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:53)\n\tat [email protected]/org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:677)\n\tat [email protected]/org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleException(TransportService.java:1766)\n\tat [email protected]/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1490)\n\tat [email protected]/org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1624)\n\tat [email protected]/org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1599)\n\tat [email protected]/org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:44)\n\tat [email protected]/org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:44)\n\tat [email protected]/org.elasticsearch.action.ActionRunnable.onFailure(ActionRunnable.java:151)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:967)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1570)\nCaused by: org.elasticsearch.action.NoShardAvailableActionException: [elasticsearch-sample-es-default-0][20.128.0.208:9300][indices:data/read/search[phase/query]]\n\tat [email protected]/org.elasticsearch.action.NoShardAvailableActionException.forOnShardFailureWrapper(NoShardAvailableActionException.java:28)\n\tat [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:544)\n\tat [email protected]/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:491)\n\t... 18 more\n"
}

for the first log, according to the post https://discuss.elastic.co/t/not-all-primary-shards-of-geoip-databases-index-are-active/324401 , it appears this issue should go away on its own but it doesn't. other posts suggest it may have to do with limited resources on the servers but OpenShift servers have more than plenty of available cpu and memory to be used.

Finally, latest fleet server error logs:

{
  "log.level": "error",
  "@timestamp": "2024-10-21T14:15:27.979Z",
  "message": "failed to fetch elasticsearch version",
  "component": {
    "binary": "fleet-server",
    "dataset": "elastic_agent.fleet_server",
    "id": "fleet-server-default",
    "type": "fleet-server"
  },
  "log": {
    "source": "fleet-server-default"
  },
  "service.type": "fleet-server",
  "error.message": "dial tcp [::1]:9200: connect: connection refused",
  "ecs.version": "1.6.0",
  "service.name": "fleet-server"
}
{
  "log.level": "error",
  "@timestamp": "2024-10-21T14:15:28.96Z",
  "message": "failed to fetch elasticsearch version",
  "component": {
    "binary": "fleet-server",
    "dataset": "elastic_agent.fleet_server",
    "id": "fleet-server-default",
    "type": "fleet-server"
  },
  "log": {
    "source": "fleet-server-default"
  },
  "ecs.version": "1.6.0",
  "service.name": "fleet-server",
  "service.type": "fleet-server",
  "error.message": "dial tcp <load-balancer-IP>:9200: connect: no route to host"
}
{
  "log.level": "error",
  "@timestamp": "2024-10-21T14:15:28.962Z",
  "log.origin": {
    "function": "github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents",
    "file.name": "coordinator/coordinator.go",
    "file.line": 665
  },
  "message": "Unit state changed fleet-server-default-fleet-server (STARTING->FAILED): Error - failed version compatibility check with elasticsearch: dial tcp <load-balancer-IP>:9200: connect: no route to host",
  "log": {
    "source": "elastic-agent"
  },
  "component": {
    "id": "fleet-server-default",
    "state": "HEALTHY"
  },
  "unit": {
    "id": "fleet-server-default-fleet-server",
    "type": "input",
    "state": "FAILED",
    "old_state": "STARTING"
  },
  "ecs.version": "1.6.0"
}
{
  "log.level": "error",
  "@timestamp": "2024-10-21T14:17:09.078Z",
  "log.origin": {
    "function": "github.com/elastic/elastic-agent/internal/pkg/agent/cmd.logReturn",
    "file.name": "cmd/run.go",
    "file.line": 162
  },
  "message": "1 error occurred:\n\t* timeout while waiting for managers to shut down: no response from vars manager\n\n",
  "log": {
    "source": "elastic-agent"
  },
  "ecs.version": "1.6.0"
}
Error: 1 error occurred:
	* timeout while waiting for managers to shut down: no response from vars manager

many other fleet-server errors appear to have gone but "failed to get elasticsearch version" and similar seem to be persistent. fleetserver pod is still in crashloopbackoff state. elasticsearch pod is running and elasticsearch service exists so not sure why fleet-server cannot reach elasticsearch cluster. This feels more ECK issue than OpenShift/Infrastructure issue as I don't have this problem with any other apps. Can you please nudge me as to what could be the actual problem here?

@michbeck100
Copy link

I have the same problem, but running ECK on Rancher. I basically followed https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stack-helm-chart.html#k8s-install-fleet-agent-elasticsearch-kibana-helm. Can't figure out where the problem with the connection between fleet-server and and elasticsearch is.

@anas0001
Copy link

@manas-suleman @barkbay @michbeck100 I'm facing the same problem. Did you guys manage to find a solution?

@barkbay
Copy link
Contributor

barkbay commented Nov 25, 2024

"error.message": "all shards failed"

Could you get the shards status using the following API calls:

GET /_cat/shards
GET /_cluster/allocation/explain?pretty

Also you cluster is yellow, please try to add 2 other nodes to rule out any replicas issues.

@ALL-SPACE-Anas
Copy link
Author

ALL-SPACE-Anas commented Dec 4, 2024

@barkbay please find outputs for both API calls:

GET /_cat/shards:

.kibana_security_session_1                                                          0 p STARTED          1   6.7kb   6.7kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-logstash.stack_monitoring.node-elastic-2024.10.15-000001                0 p STARTED      80823  28.1mb  28.1mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-logstash.stack_monitoring.node-elastic-2024.10.15-000001                0 r UNASSIGNED
.kibana_alerting_cases_8.15.2_001                                                   0 p STARTED          1   6.9kb   6.9kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.pending_tasks-elastic-2024.10.15-000001  0 p STARTED      80901  29.4mb  29.4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.pending_tasks-elastic-2024.10.15-000001  0 r UNASSIGNED
.kibana_task_manager_8.15.2_001                                                     0 p STARTED         29 139.5kb 139.5kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.load-elastic-2024.11.13-000002                                   0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.load-elastic-2024.11.13-000002                                   0 r UNASSIGNED
.ds-.kibana-event-log-ds-2024.10.28-000003                                          0 p STARTED          1   6.3kb   6.3kb 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-ml.anomaly-detection-health.alerts-default-000001                  0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.socket_summary-elastic-2024.10.14-000001                         0 p STARTED      98205   4.6mb   4.6mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.socket_summary-elastic-2024.10.14-000001                         0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.shard-elastic-2024.11.14-000002          0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.shard-elastic-2024.11.14-000002          0 r UNASSIGNED
.apm-agent-configuration                                                            0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-ml.anomaly-detection.alerts-default-000001                         0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.fleet-agents-7                                                                     0 p STARTED         45 152.1kb 152.1kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-.fleet-fileds-fromhost-meta-agent-2024.10.15-000001                             0 p STARTED          1   7.2kb   7.2kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.ingest_pipeline-elastic-2024.11.14-000002                 0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.ingest_pipeline-elastic-2024.11.14-000002                 0 r UNASSIGNED
.internal.alerts-observability.logs.alerts-default-000001                           0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.fleet-policies-7                                                                   0 p STARTED         29 295.4kb 295.4kb 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-observability.apm.alerts-default-000001                            0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.metricbeat-elastic-2024.11.13-000002                      0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.metricbeat-elastic-2024.11.13-000002                      0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.index_recovery-elastic-2024.11.14-000002 0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index_recovery-elastic-2024.11.14-000002 0 r UNASSIGNED
.ds-metrics-elastic_agent.filebeat_input-elastic-2024.11.13-000002                  0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.filebeat_input-elastic-2024.11.13-000002                  0 r UNASSIGNED
.ds-logs-elastic_agent.fleet_server-elastic-2024.10.14-000001                       0 p STARTED      34643  15.9mb  15.9mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent.fleet_server-elastic-2024.10.14-000001                       0 r UNASSIGNED
.ds-metrics-elastic_agent.elastic_agent-elastic-2024.11.13-000002                   0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.elastic_agent-elastic-2024.11.13-000002                   0 r UNASSIGNED
.ds-ilm-history-7-2024.11.20-000006                                                 0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent-elastic-2024.10.14-000001                                    0 p STARTED      83562  32.9mb  32.9mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent-elastic-2024.10.14-000001                                    0 r UNASSIGNED
.ds-.logs-deprecation.elasticsearch-default-2024.10.14-000001                       0 p STARTED         15 169.3kb 169.3kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.cluster_stats-elastic-2024.10.15-000001  0 p STARTED      80901  29.4mb  29.4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.cluster_stats-elastic-2024.10.15-000001  0 r UNASSIGNED
.apm-custom-link                                                                    0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.ml_job-elastic-2024.11.14-000002         0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.ml_job-elastic-2024.11.14-000002         0 r UNASSIGNED
.ds-metrics-system.process-elastic-2024.11.13-000002                                0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.process-elastic-2024.11.13-000002                                0 r UNASSIGNED
.ds-metrics-system.process.summary-elastic-2024.11.13-000002                        0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.process.summary-elastic-2024.11.13-000002                        0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.shard-elastic-2024.10.15-000001          0 p STARTED      80900  29.2mb  29.2mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.shard-elastic-2024.10.15-000001          0 r UNASSIGNED
.internal.alerts-transform.health.alerts-default-000001                             0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.metricbeat-elastic-2024.10.14-000001                      0 p STARTED      79739    25mb    25mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.metricbeat-elastic-2024.10.14-000001                      0 r UNASSIGNED
.kibana_8.15.2_001                                                                  0 p STARTED        251  70.5kb  70.5kb 20.128.0.69 elasticsearch-sample-es-default-0
.kibana-observability-ai-assistant-conversations-000001                             0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-logstash.stack_monitoring.node_stats-elastic-2024.11.14-000002          0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-logstash.stack_monitoring.node_stats-elastic-2024.11.14-000002          0 r UNASSIGNED
.ds-ilm-history-7-2024.10.28-000003                                                 0 p STARTED          5  26.7kb  26.7kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.diskio-elastic-2024.10.14-000001                                 0 p STARTED     639717  23.2mb  23.2mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.diskio-elastic-2024.10.14-000001                                 0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.index-elastic-2024.11.14-000002          0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index-elastic-2024.11.14-000002          0 r UNASSIGNED
.apm-source-map                                                                     0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.fleet-actions-7                                                                    0 p STARTED          7  43.1kb  43.1kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.cluster_stats-elastic-2024.11.14-000002  0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.cluster_stats-elastic-2024.11.14-000002  0 r UNASSIGNED
.security-7                                                                         0 p STARTED        207 484.9kb 484.9kb 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-observability.slo.alerts-default-000001                            0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-fleet_server.agent_versions-default-2024.11.13-000002                   0 p STARTED      28878 813.5kb 813.5kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-fleet_server.agent_versions-default-2024.11.13-000002                   0 r UNASSIGNED
.ds-metrics-fleet_server.agent_versions-default-2024.10.14-000001                   0 p STARTED      39886     1mb     1mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-fleet_server.agent_versions-default-2024.10.14-000001                   0 r UNASSIGNED
.ds-metrics-system.load-elastic-2024.10.14-000001                                   0 p STARTED      98215   4.4mb   4.4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.load-elastic-2024.10.14-000001                                   0 r UNASSIGNED
.internal.alerts-default.alerts-default-000001                                      0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.fleet-servers-7                                                                    0 p STARTED          4  32.8kb  32.8kb 20.128.0.69 elasticsearch-sample-es-default-0
.kibana-observability-ai-assistant-kb-000001                                        0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-.kibana-event-log-ds-2024.10.14-000001                                          0 p STARTED          9  55.4kb  55.4kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.network-elastic-2024.11.13-000002                                0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.network-elastic-2024.11.13-000002                                0 r UNASSIGNED
.ds-metrics-system.process-elastic-2024.10.14-000001                                0 p STARTED     588497 342.5mb 342.5mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.process-elastic-2024.10.14-000001                                0 r UNASSIGNED
.ds-.fleet-actions-results-2024.10.15-000001                                        0 p STARTED          6    27kb    27kb 20.128.0.69 elasticsearch-sample-es-default-0
.kibana_ingest_8.15.2_001                                                           0 p STARTED       1027     6mb     6mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent.filebeat-elastic-2024.10.14-000001                           0 p STARTED       1631   1.3mb   1.3mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent.filebeat-elastic-2024.10.14-000001                           0 r UNASSIGNED
.ds-metrics-system.memory-elastic-2024.11.13-000002                                 0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.memory-elastic-2024.11.13-000002                                 0 r UNASSIGNED
.ds-metrics-fleet_server.agent_status-default-2024.10.14-000001                     0 p STARTED      39886   1.1mb   1.1mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-fleet_server.agent_status-default-2024.10.14-000001                     0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.index_summary-elastic-2024.10.15-000001  0 p STARTED      80900   4.1mb   4.1mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index_summary-elastic-2024.10.15-000001  0 r UNASSIGNED
.ds-metrics-elasticsearch.ingest_pipeline-elastic-2024.10.15-000001                 0 p STARTED      80901     4mb     4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.ingest_pipeline-elastic-2024.10.15-000001                 0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.node_stats-elastic-2024.11.14-000002     0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.node_stats-elastic-2024.11.14-000002     0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.enrich-elastic-2024.11.14-000002         0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.enrich-elastic-2024.11.14-000002         0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.ml_job-elastic-2024.10.15-000001         0 p STARTED      80901   3.9mb   3.9mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.ml_job-elastic-2024.10.15-000001         0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.node_stats-elastic-2024.10.15-000001     0 p STARTED      80901   3.9mb   3.9mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.node_stats-elastic-2024.10.15-000001     0 r UNASSIGNED
.ds-ilm-history-7-2024.11.04-000004                                                 0 p STARTED         21  13.5kb  13.5kb 20.128.0.69 elasticsearch-sample-es-default-0
.security-profile-8                                                                 0 p STARTED          1   8.8kb   8.8kb 20.128.0.69 elasticsearch-sample-es-default-0
.kibana_analytics_8.15.2_001                                                        0 p STARTED        795   2.9mb   2.9mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent-elastic-2024.11.13-000002                                    0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent-elastic-2024.11.13-000002                                    0 r UNASSIGNED
.async-search                                                                       0 p STARTED          0    253b    253b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.fsstat-elastic-2024.10.14-000001                                 0 p STARTED      16371 949.1kb 949.1kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.fsstat-elastic-2024.10.14-000001                                 0 r UNASSIGNED
.slo-observability.sli-v3.3                                                         0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.cpu-elastic-2024.10.14-000001                                    0 p STARTED      98213   5.7mb   5.7mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.cpu-elastic-2024.10.14-000001                                    0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.pending_tasks-elastic-2024.11.14-000002  0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.pending_tasks-elastic-2024.11.14-000002  0 r UNASSIGNED
.ds-logs-elastic_agent.fleet_server-elastic-2024.11.13-000002                       0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent.fleet_server-elastic-2024.11.13-000002                       0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.node-elastic-2024.10.15-000001           0 p STARTED      80900     4mb     4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.node-elastic-2024.10.15-000001           0 r UNASSIGNED
.ds-.kibana-event-log-ds-2024.11.04-000004                                          0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index-elastic-2024.10.15-000001          0 p STARTED      80900   3.9mb   3.9mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index-elastic-2024.10.15-000001          0 r UNASSIGNED
.ds-.fleet-actions-results-2024.11.14-000002                                        0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.fsstat-elastic-2024.11.13-000002                                 0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.fsstat-elastic-2024.11.13-000002                                 0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.index_summary-elastic-2024.11.14-000002  0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index_summary-elastic-2024.11.14-000002  0 r UNASSIGNED
.ds-metrics-fleet_server.agent_status-default-2024.11.13-000002                     0 p STARTED      28878 882.1kb 882.1kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-fleet_server.agent_status-default-2024.11.13-000002                     0 r UNASSIGNED
.slo-observability.summary-v3.3                                                     0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.uptime-elastic-2024.10.14-000001                                 0 p STARTED      98213   3.7mb   3.7mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.uptime-elastic-2024.10.14-000001                                 0 r UNASSIGNED
.ds-metrics-system.network-elastic-2024.10.14-000001                                0 p STARTED     294638  15.2mb  15.2mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.network-elastic-2024.10.14-000001                                0 r UNASSIGNED
.ds-logs-elastic_agent.filebeat-elastic-2024.11.13-000002                           0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent.filebeat-elastic-2024.11.13-000002                           0 r UNASSIGNED
.ds-metrics-system.diskio-elastic-2024.11.13-000002                                 0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.diskio-elastic-2024.11.13-000002                                 0 r UNASSIGNED
.ds-metrics-logstash.stack_monitoring.node_stats-elastic-2024.10.15-000001          0 p STARTED      80823  28.4mb  28.4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-logstash.stack_monitoring.node_stats-elastic-2024.10.15-000001          0 r UNASSIGNED
.fleet-enrollment-api-keys-7                                                        0 p STARTED          6  39.7kb  39.7kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.process.summary-elastic-2024.10.14-000001                        0 p STARTED      98213   4.4mb   4.4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.process.summary-elastic-2024.10.14-000001                        0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.node-elastic-2024.11.14-000002           0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.node-elastic-2024.11.14-000002           0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.ccr-elastic-2024.11.14-000002            0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.ccr-elastic-2024.11.14-000002            0 r UNASSIGNED
.slo-observability.summary-v3.3.temp                                                0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.fleet-policies-leader-7                                                            0 p STARTED          4    12kb    12kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-logstash.stack_monitoring.node-elastic-2024.11.14-000002                0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-logstash.stack_monitoring.node-elastic-2024.11.14-000002                0 r UNASSIGNED
.ds-metrics-system.memory-elastic-2024.10.14-000001                                 0 p STARTED      98213   6.2mb   6.2mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.memory-elastic-2024.10.14-000001                                 0 r UNASSIGNED
.ds-logs-elastic_agent.metricbeat-elastic-2024.11.13-000002                         0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent.metricbeat-elastic-2024.11.13-000002                         0 r UNASSIGNED
.ds-metrics-elastic_agent.fleet_server-elastic-2024.10.14-000001                    0 p STARTED       2715   1.1mb   1.1mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.fleet_server-elastic-2024.10.14-000001                    0 r UNASSIGNED
.ds-metrics-system.uptime-elastic-2024.11.13-000002                                 0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.uptime-elastic-2024.11.13-000002                                 0 r UNASSIGNED
.ds-metrics-elasticsearch.stack_monitoring.enrich-elastic-2024.10.15-000001         0 p STARTED      80901  29.4mb  29.4mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.enrich-elastic-2024.10.15-000001         0 r UNASSIGNED
.ds-.kibana-event-log-ds-2024.10.21-000002                                          0 p STARTED          2  12.5kb  12.5kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index_recovery-elastic-2024.10.15-000001 0 p STARTED      80900   4.2mb   4.2mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.index_recovery-elastic-2024.10.15-000001 0 r UNASSIGNED
.ds-ilm-history-7-2024.10.21-000002                                                 0 p STARTED          7  11.6kb  11.6kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-ilm-history-7-2024.11.13-000005                                                 0 p STARTED        258  95.5kb  95.5kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.elastic_agent-elastic-2024.10.14-000001                   0 p STARTED     136437   8.3mb   8.3mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.elastic_agent-elastic-2024.10.14-000001                   0 r UNASSIGNED
.kibana_security_solution_8.15.2_001                                                0 p STARTED          4    45kb    45kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.fleet_server-elastic-2024.11.13-000002                    0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.fleet_server-elastic-2024.11.13-000002                    0 r UNASSIGNED
.ds-logs-elastic_agent.metricbeat-elastic-2024.10.14-000001                         0 p STARTED    1138374 141.7mb 141.7mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-logs-elastic_agent.metricbeat-elastic-2024.10.14-000001                         0 r UNASSIGNED
.ds-.fleet-fileds-fromhost-meta-agent-2024.11.14-000002                             0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-observability.threshold.alerts-default-000001                      0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-.logs-deprecation.elasticsearch-default-2024.11.13-000002                       0 p STARTED          2  22.8kb  22.8kb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.ccr-elastic-2024.10.15-000001            0 p STARTED      80901  30.1mb  30.1mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elasticsearch.stack_monitoring.ccr-elastic-2024.10.15-000001            0 r UNASSIGNED
.ds-metrics-system.socket_summary-elastic-2024.11.13-000002                         0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.socket_summary-elastic-2024.11.13-000002                         0 r UNASSIGNED
.geoip_databases                                                                    0 p STARTED         38  36.3mb  36.3mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-ilm-history-7-2024.10.14-000001                                                 0 p STARTED        153 108.3kb 108.3kb 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-stack.alerts-default-000001                                        0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-security.alerts-default-000001                                     0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-.fleet-fileds-fromhost-data-agent-2024.10.22-000002                             0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.internal.alerts-observability.uptime.alerts-default-000001                         0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.filebeat_input-elastic-2024.10.14-000001                  0 p STARTED      18795    22mb    22mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.filebeat_input-elastic-2024.10.14-000001                  0 r UNASSIGNED
.ds-metrics-system.cpu-elastic-2024.11.13-000002                                    0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-system.cpu-elastic-2024.11.13-000002                                    0 r UNASSIGNED
.ds-metrics-elastic_agent.filebeat-elastic-2024.10.14-000001                        0 p STARTED      18767   7.2mb   7.2mb 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.filebeat-elastic-2024.10.14-000001                        0 r UNASSIGNED
.internal.alerts-observability.metrics.alerts-default-000001                        0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.filebeat-elastic-2024.11.13-000002                        0 p STARTED          0    249b    249b 20.128.0.69 elasticsearch-sample-es-default-0
.ds-metrics-elastic_agent.filebeat-elastic-2024.11.13-000002                        0 r UNASSIGNED

GET /_cluster/allocation/explain?pretty:

{
  "note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "index" : ".ds-metrics-logstash.stack_monitoring.node-elastic-2024.10.15-000001",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2024-12-04T12:03:08.333Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
  "node_allocation_decisions" : [
    {
      "node_id" : "hUz6RUtXTmep87LFE_FNkQ",
      "node_name" : "elasticsearch-sample-es-default-0",
      "transport_address" : "20.128.0.69:9300",
      "node_attributes" : {
        "k8s_node_name" : "server1.example.com",
        "transform.config_version" : "10.0.0",
        "xpack.installed" : "true",
        "ml.allocated_processors" : "60",
        "ml.max_jvm_size" : "1073741824",
        "ml.config_version" : "12.0.0",
        "ml.machine_memory" : "2147483648",
        "ml.allocated_processors_double" : "60.0"
      },
      "roles" : [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[.ds-metrics-logstash.stack_monitoring.node-elastic-2024.10.15-000001][0], node[hUz6RUtXTmep87LFE_FNkQ], [P], s[STARTED], a[id=3G5vPTxMQRGuZhV7Mcnp5g], failed_attempts[0]]"
        }
      ]
    }
  ]
}

I'm not sure what you mean by add 2 other nodes. Do you mean I should be using a 3 nodes cluster instead of 1 node? I'm already using a 3 node cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants