Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Error: Initialization issues during scap_init #3323

Open
OneideLuizSchneider opened this issue Sep 12, 2024 · 47 comments
Open

ERROR: Error: Initialization issues during scap_init #3323

OneideLuizSchneider opened this issue Sep 12, 2024 · 47 comments
Labels
Milestone

Comments

@OneideLuizSchneider
Copy link

OneideLuizSchneider commented Sep 12, 2024

Describe the bug

After the POD restarted 8 times it worked.
ERROR: Error: Initialization issues during scap_init

Just Install it, details are below.

Expected behaviour
it should not need to restart to able to work

Screenshots
Screenshot 2024-09-12 at 20 11 59

Environment

  • Falco version:
    Falco version: 0.38.2 (x86_64)
  • System info:
    Linux version 5.10.223-212.873.amzn2.x86_64 (mockbuild@ip-10-0-60-177) (gcc10-gcc (GCC) 10.5.0 20230707 (Red Hat 10.5.0-1), GNU ld version 2.35.2-9.amzn2.0.1) Digwatch compiler #1 SMP Wed Aug 7 16:53:32 UTC 2024
  • Cloud provider or hardware configuration:
  • OS:
    AWS Linux 2
  • Kernel:
    5.10
  • Installation method:

EKS 1.29

helm upgrade --install falco falcosecurity/falco \
    -f values.yml \
    --create-namespace \
    --namespace falco

values.yaml->

tty: true

driver:
  enabled: true
  kind: modern_ebpf

falco:

  rules_files:
    - /etc/falco/falco_rules.yaml
    - /etc/falco/falco-incubating_rules.yaml
    - /etc/falco/falco-sandbox_rules.yaml
    - /etc/falco/rules.d
  rules:
    - disable:
        tag: T1552.005
    - disable:
        tag: T1565

  json_output: true

extra:
  env:
    - name: FALCO_HOSTNAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName

falcoctl:

  artifact:
    install:
      enabled: true
    follow:
      enabled: true
  config:
    artifact:
      allowedTypes:
        - rulesfile
      install:
        resolveDeps: false
        refs: [falco-rules:latest, falco-incubating-rules:latest, falco-sandbox-rules:latest]
      follow:
        refs: [falco-rules:latest, falco-incubating-rules:latest, falco-sandbox-rules:latest]

falcosidekick:
  enabled: false

Additional context

I saw many other folks reporting this here, but it's not clear why this happened and how to fix it if there is a fix.

@FedeDP
Copy link
Contributor

FedeDP commented Sep 25, 2024

Hi! Thanks for reporting this issue; i don't have an answer, this seems really weird; since at every restart Falco is using the same driver (ie: modern ebpf one in this case), perhaps it is a timing issue with something else on the system?
cc @Andreagit97 perhaps got more ideas, as i don't really know what to look for in this specific case.

/milestone 0.40.0

@poiana poiana added this to the 0.40.0 milestone Sep 25, 2024
@OneideLuizSchneider
Copy link
Author

OneideLuizSchneider commented Oct 1, 2024

@FedeDP FYI I removed the incubating_rules, sandbox_rules and I had the same issue.

 - /etc/falco/falco-incubating_rules.yaml
 - /etc/falco/falco-sandbox_rules.yaml`

@Andreagit97
Copy link
Member

IMO we should enable a more verbose log Error: Initialization issues during scap_init is too generic to understand what is going on

@kirylbelavus
Copy link

kirylbelavus commented Oct 17, 2024

I encountered the same issue in a similar environment, and switching to eBPF mode instead of modern_eBPF was the only solution that helped. I tried enabling debug logs, but they didn’t provide any insight. Additionally, it’s worth noting that in an EKS cluster with 4 nodes, only 1 node failed to start Falco in modern_eBPF mode (although the kernel version is the same on all nodes)

@nadiamoe
Copy link

Seeing a very similar behavior here:

2024-10-19T09:58:46.595592088Z Sat Oct 19 09:58:46 2024: The --cri option is deprecated and will be removed in Falco 0.40.0. Use -o container_engines.cri.sockets]=<socket_path> instead.
2024-10-19T09:58:46.598439995Z Sat Oct 19 09:58:46 2024: Falco version: 0.39.1 (x86_64)
2024-10-19T09:58:46.598439995Z Sat Oct 19 09:58:46 2024: Falco initialized with configuration files:
2024-10-19T09:58:46.598451935Z Sat Oct 19 09:58:46 2024:    /etc/falco/config.d/engine-kind-falcoctl.yaml | schema validation: ok
2024-10-19T09:58:46.598451935Z Sat Oct 19 09:58:46 2024:    /etc/falco/falco.yaml | schema validation: ok
2024-10-19T09:58:46.598496263Z Sat Oct 19 09:58:46 2024: System info: Linux version 6.6.57-1-lts (linux-lts@archlinux) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Thu, 17 Oct 2024 13:57:25 +0000
2024-10-19T09:58:46.598824145Z Sat Oct 19 09:58:46 2024: Loading rules from:
2024-10-19T09:58:46.630720133Z Sat Oct 19 09:58:46 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
2024-10-19T09:58:46.651177935Z Sat Oct 19 09:58:46 2024:    /etc/falco/rules.d/rules-override.yaml | schema validation: ok
2024-10-19T09:58:46.651177935Z Sat Oct 19 09:58:46 2024: /etc/falco/rules.d/rules-override.yaml: Ok, with warnings
2024-10-19T09:58:46.651177935Z 1 Warnings:
2024-10-19T09:58:46.651177935Z In rules content: (/etc/falco/falco_rules.yaml:0:0)
2024-10-19T09:58:46.651177935Z     list 'read_sensitive_file_images': (/etc/falco/falco_rules.yaml:382:2)
2024-10-19T09:58:46.651177935Z ------
2024-10-19T09:58:46.651177935Z - list: read_sensitive_file_images
2024-10-19T09:58:46.651177935Z   ^
2024-10-19T09:58:46.651177935Z ------
2024-10-19T09:58:46.651177935Z LOAD_UNUSED_LIST (Unused list): List not referred to by any other rule/macro
2024-10-19T09:58:46.651239866Z Sat Oct 19 09:58:46 2024: Hostname value has been overridden via environment variable to: moniserver
2024-10-19T09:58:46.651761569Z Sat Oct 19 09:58:46 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
2024-10-19T09:58:46.651776556Z Sat Oct 19 09:58:46 2024: Starting health webserver with threadiness 4, listening on 0.0.0.0:8765
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: Loaded event sources: syscall
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: Enabled event sources: syscall
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: Opening 'syscall' source with modern BPF probe.
2024-10-19T09:58:46.652049599Z Sat Oct 19 09:58:46 2024: One ring buffer every '2' CPUs.
2024-10-19T09:58:47.613393945Z Sat Oct 19 09:58:47 2024: An error occurred in an event source, forcing termination...
2024-10-19T09:58:47.766747214Z Error: Initialization issues during scap_init
2024-10-19T09:58:47.767029775Z Events detected: 0
2024-10-19T09:58:47.767029775Z Rule counts by severity:
2024-10-19T09:58:47.767029775Z Triggered rules by rule name:
2024-10-19T09:58:53.682056483Z Stream closed EOF for falco/falco-nz8g7 (falco)

This is a very vanilla helm installation with the following values:

  customRules:
    rules-override.yaml: |-
      - macro: user_known_contact_k8s_api_server_activities
        condition: |-
          container.image.repository = registry.k8s.io/node-problem-detector/node-problem-detector
          or
          proc.name startswith node-problem-de
          or
          container.image.repository = ghcr.io/roobre/ktemplate
          or
          container.image.repository = ghcr.io/k8up-io/k8up
          or
          container.name startswith k8up
        override:
          condition: replace
      - macro: user_known_stand_streams_redirect_activities
        condition: |-
          container.image.repository = ghcr.io/fluxcd/kustomize-controller
          or
          (container.name startswith crocochrome and proc.name = chromium)
        override:
          condition: replace
      - macro: known_drop_and_execute_activities
        condition: |-
          (container.image.repository = ghcr.io/flaresolverr/flaresolverr and proc.name = chromedriver)
        override:
          condition: replace
      - macro: user_read_sensitive_file_containers
        condition: |-
          container.id = host
        override:
          condition: replace
      - list: user_known_packet_socket_binaries
        items:
          - speaker # metallb
          - bfdd # also metallb
        override:
          items: append
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: null
      memory: 512Mi

  falcosidekick:
    enabled: true
    replicaCount: 1
    resources:
      requests:
        cpu: 10m
        memory: 64Mi
      limits:
        memory: 64Mi
    config:
      existingSecret: creds

Using the default image shipped in the chart

dependencies:
  - name: falco
    repository: https://falcosecurity.github.io/charts
    version: 4.11.1
Linux moniserver 6.6.57-1-lts #1 SMP PREEMPT_DYNAMIC Thu, 17 Oct 2024 13:57:25 +0000 x86_64 GNU/Linux

Also attaching /proc/config.gz in case it helps
config.gz

@PierreBart
Copy link

PierreBart commented Oct 31, 2024

Hello,

The falco pods running in my GKE cluster fail with the same error Initialization issues during scap_init, but contrary to the author, the pods actually keep restarting forever.

Falco output:

Thu Oct 31 08:50:11 2024: The --cri option is deprecated and will be removed in Falco 0.40.0. Use -o container_engines.cri.sockets[]=<socket_path> instead.
Thu Oct 31 08:50:11 2024: Falco version: 0.39.1 (x86_64)
Thu Oct 31 08:50:11 2024: Falco initialized with configuration files:
Thu Oct 31 08:50:11 2024:    /etc/falco/falco.yaml | schema validation: ok
Thu Oct 31 08:50:11 2024: System info: Linux version 6.6.44+ (builder@5b283881ec70) (Chromium OS 17.0_pre498229-r33 clang version 17.0.0 (/var/cache/chromeos-cache/distfiles/egit-src/external/github.com/llvm/llvm-project 14f0776550b5a49e1c42f49a00213f7f3fa047bf), LLD 17.0.0) #1 SMP PREEMPT_DYNAMIC Sat Sep 28 09:09:42 UTC 2024
Thu Oct 31 08:50:11 2024: Loading rules from:
Thu Oct 31 08:50:11 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
Thu Oct 31 08:50:11 2024: Hostname value has been overridden via environment variable to: gke-***
Thu Oct 31 08:50:11 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Oct 31 08:50:11 2024: Starting health webserver with threadiness 4, listening on 0.0.0.0:8765
Thu Oct 31 08:50:11 2024: Loaded event sources: syscall
Thu Oct 31 08:50:11 2024: Enabled event sources: syscall
Thu Oct 31 08:50:11 2024: Opening 'syscall' source with modern BPF probe.
Thu Oct 31 08:50:11 2024: One ring buffer every '2' CPUs.
Thu Oct 31 08:50:11 2024: An error occurred in an event source, forcing termination...
Error: Initialization issues during scap_init
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

Environment

  • Falco version: 0.39.1
  • System info: Linux version 6.6.44+ (builder@5b283881ec70) (Chromium OS 17.0_pre498229-r33 clang version 17.0.0 (/var/cache/chromeos-cache/distfiles/egit-src/external/github.com/llvm/llvm-project 14f0776550b5a49e1c42f49a00213f7f3fa047bf), LLD 17.0.0) #1 SMP PREEMPT_DYNAMIC Sat Sep 28 09:09:42 UTC 2024)
  • Cloud provider or hardware configuration: GCP
  • OS: Container-Optimized OS cos-beta-117-18613-0-66
  • Kernel: COS-6.6.44

@xvzf
Copy link

xvzf commented Oct 31, 2024

+1, same behaviour as @PierreBart

@shane-lawrence
Copy link
Contributor

Which version of k8s are you seeing this behavior on? I'm just starting to troubleshoot the same problem so I haven't had a chance to isolate it yet, but I'm only seeing it in k8s v1.31 so far.

@OneideLuizSchneider
Copy link
Author

@shane-lawrence 1.29

@shane-lawrence
Copy link
Contributor

Thanks @OneideLuizSchneider, sounds like the k8s version was a red herring and it must be some other difference that's triggering this issue. I will provide more context if I find something.

@PierreBart
Copy link

PierreBart commented Oct 31, 2024

@shane-lawrence I am running 1.31.1, same as you. I have clusters in 1.30.5, and it runs just fine.

@OneideLuizSchneider
Copy link
Author

OneideLuizSchneider commented Oct 31, 2024

I don't think it has something to do with the k8s version, because it's random on some Nodes, not on every Node.
The full version I'm running now is 1.29.8, and I did add and remove many Nodes since I posted here, and now I don't see this behavior anymore, I'm starting my tests on the 1.31 today and will add Falco there as well and I will come back here.

@tiny-pangolin
Copy link

I'm experiencing the issues on Fedora 40 and Fedora 41 hosts without Kubernetes. Sometimes falco works and other times it crashloop with the same config on different hosts. Could it have something to do with having too few resources available to falco at startup or it conflicting with other processes like auditd?

@xvzf
Copy link

xvzf commented Nov 4, 2024

@shane-lawrence we're observing it on GKE 1.31.1 right now

@Andreagit97
Copy link
Member

Andreagit97 commented Nov 4, 2024

Hi folks! could you try to enable the libs_logger to obtain more info on the failure? This is very likely a verifier issue, you can enable the logger by providing the falco binary with the following command line arguments -o libs_logger.enabled=true -o libs_logger.severity=debug so something like

sudo ./usr/bin/falco -c ./etc/falco/falco.yaml -r ./etc/falco/falco_rules.yaml -o libs_logger.enabled=true -o libs_logger.severity=debug

@PierreBart
Copy link

There it is:

Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falcoctl-artifact-install (init)
Mon Nov  4 09:58:02 2024: The --cri option is deprecated and will be removed in Falco 0.40.0. Use -o container_engines.cri.sockets[]=<socket_path> instead.
Mon Nov  4 09:58:02 2024: Falco version: 0.39.1 (x86_64)
Mon Nov  4 09:58:02 2024: Falco initialized with configuration files:
Mon Nov  4 09:58:02 2024:    /etc/falco/falco.yaml | schema validation: ok
Mon Nov  4 09:58:02 2024: System info: Linux version 6.6.44+ (builder@5b283881ec70) (Chromium OS 17.0_pre498229-r33 clang version 17.0.0 (/var/cache/chromeos-cache/distfiles/egit-src/external/github.com/llvm/llvm-project 14f0776550b5a49e1c42f49a00213f7f3fa047bf), LLD 17.0.0) #1 SMP PREEMPT_DYNAMIC Sat Sep 28 09:09:42 UTC 2024
Mon Nov  4 09:58:02 2024: Loading rules from:
Mon Nov  4 09:58:02 2024:    /etc/falco/falco_rules.yaml | schema validation: ok
Mon Nov  4 09:58:02 2024: Hostname value has been overridden via environment variable to: gke-***
Mon Nov  4 09:58:02 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Mon Nov  4 09:58:02 2024: Starting health webserver with threadiness 4, listening on 0.0.0.0:8765
Mon Nov  4 09:58:02 2024: Loaded event sources: syscall
Mon Nov  4 09:58:02 2024: Enabled event sources: syscall
Mon Nov  4 09:58:02 2024: Opening 'syscall' source with modern BPF probe.
Mon Nov  4 09:58:02 2024: One ring buffer every '2' CPUs.
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': BPF program load failed: Permission denied
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function capset_x#984
0: R1=ctx(off=0,imm=0) R10=fp0
; int BPF_PROG(capset_x, struct pt_regs *regs, long ret) {
0: (bf) r7 = r1                       ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0)
; int BPF_PROG(capset_x, struct pt_regs *regs, long ret) {
1: (79) r9 = *(u64 *)(r7 +8)          ; R7_w=ctx(off=0,imm=0) R9_w=scalar()
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
2: (85) call bpf_get_smp_processor_id#8       ; R0_w=scalar(umax=3,var_off=(0x0; 0x3))
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
3: (63) *(u32 *)(r10 -8) = r0         ; R0_w=scalar(umax=3,var_off=(0x0; 0x3)) R10=fp0 fp-8=
4: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
;
5: (07) r2 += -8                      ; R2_w=fp-8
; return (struct ringbuf_map *)bpf_map_lookup_elem(&ringbuf_maps, &cpu_id);
6: (18) r1 = 0xffff8880490d1c00       ; R1_w=map_ptr(off=0,ks=4,vs=4,imm=0)
8: (85) call bpf_map_lookup_elem#1    ; R0=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
9: (bf) r6 = r0                       ; R0=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0) R6_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
; if(!rb) {
10: (55) if r6 != 0x0 goto pc+6 17: R0=map_ptr(off=0,ks=0,vs=0,imm=0) R6=map_ptr(off=0,ks=0,vs=0,imm=0) R7=ctx(off=0,imm=0) R9=scalar() R10=fp0 fp-8=????mmmm
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
17: (85) call bpf_get_smp_processor_id#8      ; R0_w=scalar(umax=3,var_off=(0x0; 0x3))
; uint32_t cpu_id = (uint32_t)bpf_get_smp_processor_id();
18: (63) *(u32 *)(r10 -8) = r0        ; R0_w=scalar(umax=3,var_off=(0x0; 0x3)) R10=fp0 fp-8=
19: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
;
20: (07) r2 += -8                     ; R2_w=fp-8
; return (struct counter_map *)bpf_map_lookup_elem(&counter_maps, &cpu_id);
21: (18) r1 = 0xffff888097e63c00      ; R1_w=map_ptr(off=0,ks=4,vs=136,imm=0)
23: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=2,off=0,ks=4,vs=136,imm=0)
24: (bf) r7 = r0                      ; R0_w=map_value_or_null(id=2,off=0,ks=4,vs=136,imm=0) R7_w=map_value_or_null(id=2,off=0,ks=4,vs=136,imm=0)
; if(!counter) {
25: (15) if r7 == 0x0 goto pc+372     ; R7_w=map_value(off=0,ks=4,vs=136,imm=0)
; counter->n_evts++;
26: (79) r1 = *(u64 *)(r7 +0)         ; R1_w=scalar() R7_w=map_value(off=0,ks=4,vs=136,imm=0)
27: (07) r1 += 1                      ; R1_w=scalar()
28: (7b) *(u64 *)(r7 +0) = r1         ; R1_w=scalar() R7_w=map_value(off=0,ks=4,vs=136,imm=0)
; uint8_t *space = bpf_ringbuf_reserve(rb, event_size, 0);
29: (bf) r1 = r6                      ; R1_w=map_ptr(off=0,ks=0,vs=0,imm=0) R6=map_ptr(off=0,ks=0,vs=0,imm=0)
30: (b7) r2 = 66                      ; R2_w=66
31: (b7) r3 = 0                       ; R3_w=0
32: (85) call bpf_ringbuf_reserve#131         ; R0=ringbuf_mem_or_null(id=4,ref_obj_id=4,off=0,imm=0) refs=4
33: (bf) r6 = r0                      ; R0=ringbuf_mem_or_null(id=4,ref_obj_id=4,off=0,imm=0) R6_w=ringbuf_mem_or_null(id=4,ref_obj_id=4,off=0,imm=0) refs=4
; if(!space) {
34: (55) if r6 != 0x0 goto pc+7 42: R0=ringbuf_mem(ref_obj_id=4,off=0,imm=0) R6_w=ringbuf_mem(ref_obj_id=4,off=0,imm=0) R7=map_value(off=0,ks=4,vs=136,imm=0) R9=scalar() R10=fp0 fp-8=????mmmm refs=4
; return g_event_params_table[event_id];
42: (18) r1 = 0xffffc900015ba010      ; R1_w=map_value(off=16,ks=4,vs=248766,imm=0) refs=4
44: (71) r2 = *(u8 *)(r1 +353)        ; R1_w=map_value(off=16,ks=4,vs=248766,imm=0) R2_w=4 refs=4
; ringbuf->payload_pos = sizeof(struct ppm_evt_hdr) + nparams * sizeof(uint16_t);
45: (bf) r7 = r2                      ; R2_w=4 R7_w=4 refs=4
46: (67) r7 <<= 1                     ; R7_w=8 refs=4
47: (7b) *(u64 *)(r10 -32) = r7       ; R7_w=8 R10=fp0 fp-32_w=8 refs=4
; ringbuf->payload_pos = sizeof(struct ppm_evt_hdr) + nparams * sizeof(uint16_t);
48: (07) r7 += 26                     ; R7_w=34 refs=4
49: (b7) r1 = 20                      ; R1_w=20 refs=4
50: (7b) *(u64 *)(r10 -24) = r2       ; R2_w=4 R10=fp0 fp-24_w=4 refs=4
; PUSH_FIXED_SIZE_TO_RINGBUF(ringbuf, param, sizeof(int64_t));
51: (2d) if r1 > r2 goto pc+1         ; R1_w=20 R2_w=4 refs=4
; return g_settings.boot_time;
53: (18) r1 = 0xffffc90001c8adb8      ; R1_w=map_value(off=3512,ks=4,vs=600281,imm=0) refs=4
55: (79) r8 = *(u64 *)(r1 +0)         ; R1_w=map_value(off=3512,ks=4,vs=600281,imm=0) R8_w=scalar() refs=4
; hdr->ts = maps__get_boot_time() + bpf_ktime_get_boot_ns();
56: (85) call bpf_ktime_get_boot_ns#125       ; R0_w=scalar() refs=4
; hdr->ts = maps__get_boot_time() + bpf_ktime_get_boot_ns();
57: (0f) r0 += r8                     ; R0_w=scalar() R8_w=scalar() refs=4
; hdr->ts = maps__get_boot_time() + bpf_ktime_get_boot_ns();
58: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
59: (77) r1 >>= 56                    ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) refs=4
60: (73) *(u8 *)(r6 +7) = r1          ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
61: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
62: (77) r1 >>= 48                    ; R1_w=scalar(umax=65535,var_off=(0x0; 0xffff)) refs=4
63: (73) *(u8 *)(r6 +6) = r1          ; R1_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
64: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
65: (77) r1 >>= 40                    ; R1_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) refs=4
66: (73) *(u8 *)(r6 +5) = r1          ; R1_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
67: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
68: (77) r1 >>= 32                    ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) refs=4
69: (73) *(u8 *)(r6 +4) = r1          ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
70: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
71: (77) r1 >>= 24                    ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
72: (73) *(u8 *)(r6 +3) = r1          ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
73: (bf) r1 = r0                      ; R0_w=scalar(id=5) R1_w=scalar(id=5) refs=4
74: (77) r1 >>= 16                    ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
75: (73) *(u8 *)(r6 +2) = r1          ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
76: (73) *(u8 *)(r6 +0) = r0          ; R0_w=scalar(id=5) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
77: (77) r0 >>= 8                     ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
78: (73) *(u8 *)(r6 +1) = r0          ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->tid = bpf_get_current_pid_tgid() & 0xffffffff;
79: (85) call bpf_get_current_pid_tgid#14     ; R0=scalar() refs=4
80: (b7) r1 = 1                       ; R1_w=1 refs=4
; hdr->type = ringbuf->event_type;
81: (73) *(u8 *)(r6 +21) = r1         ; R1_w=1 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
82: (b7) r1 = 97                      ; R1_w=97 refs=4
83: (73) *(u8 *)(r6 +20) = r1         ; R1_w=97 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
84: (b7) r1 = 0                       ; R1_w=0 refs=4
; hdr->nparams = nparams;
85: (73) *(u8 *)(r6 +25) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
86: (73) *(u8 *)(r6 +24) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
87: (73) *(u8 *)(r6 +23) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->tid = bpf_get_current_pid_tgid() & 0xffffffff;
88: (73) *(u8 *)(r6 +15) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
89: (73) *(u8 *)(r6 +14) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
90: (73) *(u8 *)(r6 +13) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
91: (73) *(u8 *)(r6 +12) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->len = ringbuf->reserved_event_size;
92: (73) *(u8 *)(r6 +19) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
93: (73) *(u8 *)(r6 +18) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
94: (73) *(u8 *)(r6 +17) = r1         ; R1_w=0 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
95: (b7) r1 = 66                      ; R1_w=66 refs=4
96: (73) *(u8 *)(r6 +16) = r1         ; R1_w=66 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->tid = bpf_get_current_pid_tgid() & 0xffffffff;
97: (bf) r1 = r0                      ; R0=scalar(id=6) R1_w=scalar(id=6) refs=4
98: (77) r1 >>= 24                    ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
99: (73) *(u8 *)(r6 +11) = r1         ; R1_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
100: (bf) r1 = r0                     ; R0=scalar(id=6) R1_w=scalar(id=6) refs=4
101: (77) r1 >>= 16                   ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
102: (73) *(u8 *)(r6 +10) = r1        ; R1_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
103: (73) *(u8 *)(r6 +8) = r0         ; R0=scalar(id=6) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
104: (77) r0 >>= 8                    ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
105: (73) *(u8 *)(r6 +9) = r0         ; R0_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; hdr->nparams = nparams;
106: (79) r1 = *(u64 *)(r10 -24)      ; R1_w=4 R10=fp0 fp-24=4 refs=4
107: (73) *(u8 *)(r6 +22) = r1        ; R1_w=4 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
; PUSH_FIXED_SIZE_TO_RINGBUF(ringbuf, param, sizeof(int64_t));
108: (bf) r1 = r6                     ; R1_w=ringbuf_mem(ref_obj_id=4,off=0,imm=0) R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
109: (0f) r1 += r7                    ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R7=34 refs=4
110: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
111: (77) r2 >>= 48                   ; R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) refs=4
112: (73) *(u8 *)(r1 +6) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) refs=4
113: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
114: (77) r2 >>= 56                   ; R2_w=scalar(umax=255,var_off=(0x0; 0xff)) refs=4
115: (73) *(u8 *)(r1 +7) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=255,var_off=(0x0; 0xff)) refs=4
116: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
117: (77) r2 >>= 32                   ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) refs=4
118: (73) *(u8 *)(r1 +4) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) refs=4
119: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
120: (77) r2 >>= 40                   ; R2_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) refs=4
121: (73) *(u8 *)(r1 +5) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=16777215,var_off=(0x0; 0xffffff)) refs=4
122: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
123: (77) r2 >>= 16                   ; R2_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
124: (73) *(u8 *)(r1 +2) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=281474976710655,var_off=(0x0; 0xffffffffffff)) refs=4
125: (bf) r2 = r9                     ; R2_w=scalar(id=7) R9=scalar(id=7) refs=4
126: (77) r2 >>= 24                   ; R2_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
127: (73) *(u8 *)(r1 +3) = r2         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R2_w=scalar(umax=1099511627775,var_off=(0x0; 0xffffffffff)) refs=4
128: (73) *(u8 *)(r1 +0) = r9         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R9=scalar(id=7) refs=4
129: (77) r9 >>= 8                    ; R9_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
130: (73) *(u8 *)(r1 +1) = r9         ; R1_w=ringbuf_mem(ref_obj_id=4,off=34,imm=0) R9_w=scalar(umax=72057594037927935,var_off=(0x0; 0xffffffffffffff)) refs=4
131: (b7) r1 = 8                      ; R1_w=8 refs=4
132: (6b) *(u16 *)(r6 +26) = r1       ; R1_w=8 R6=ringbuf_mem(ref_obj_id=4,off=0,imm=0) refs=4
133: (18) r1 = 0x1                    ; R1_w=1 refs=4
; if(bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_get_current_task_btf) &&
135: (15) if r1 == 0x0 goto pc+5      ; R1_w=1 refs=4
136: (18) r1 = 0x9e                   ; R1_w=158 refs=4
; if(bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_get_current_task_btf) &&
138: (55) if r1 != 0x9e goto pc+2     ; R1_w=158 refs=4
; return (struct task_struct *)bpf_get_current_task_btf();
139: (85) call bpf_get_current_task_btf#158   ; R0=trusted_ptr_task_struct(off=0,imm=0) refs=4
140: (05) goto pc+1
;
142: (bf) r7 = r0                     ; R0=trusted_ptr_task_struct(off=0,imm=0) R7_w=trusted_ptr_task_struct(off=0,imm=0) refs=4
143: (18) r1 = 0x1                    ; R1_w=1 refs=4
145: (79) r8 = *(u64 *)(r10 -32)      ; R8_w=8 R10=fp0 fp-32=8 refs=4
146: (79) r9 = *(u64 *)(r10 -24)      ; R9_w=4 R10=fp0 fp-24=4 refs=4
; READ_TASK_FIELD_INTO(&cap_struct, task, cred, cap_inheritable);
147: (15) if r1 == 0x0 goto pc+7      ; R1_w=1 refs=4
148: (18) r1 = 0x9e                   ; R1_w=158 refs=4
; READ_TASK_FIELD_INTO(&cap_struct, task, cred, cap_inheritable);
150: (55) if r1 != 0x9e goto pc+4     ; R1_w=158 refs=4
; READ_TASK_FIELD_INTO(&cap_struct, task, cred, cap_inheritable);
151: (79) r1 = *(u64 *)(r7 +1984)     ; R1_w=rcu_ptr_or_null_cred(id=8,off=0,imm=0) R7_w=trusted_ptr_task_struct(off=0,imm=0) refs=4
152: (79) r1 = *(u64 *)(r1 +48)
R1 invalid mem access 'rcu_ptr_or_null_'
processed 146 insns (limit 1000000) max_states_per_insn 0 total_states 7 peak_states 7 mark_read 5
-- END PROG LOAD LOG --
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': failed to load: -13
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load object 'bpf_probe'
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load BPF skeleton 'bpf_probe': -13
Mon Nov  4 09:58:02 2024: [libs]: libpman: failed to load BPF object (errno: 13 | message: Permission denied)
Mon Nov  4 09:58:02 2024: An error occurred in an event source, forcing termination...
Error: Initialization issues during scap_init
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

@Andreagit97
Copy link
Member

Andreagit97 commented Nov 4, 2024

@PierreBart thank you very much! this is an issue we already fixed in dev falcosecurity/libs#2118 let me check with other maintainers what we can do @falcosecurity/falco-maintainers

@FedeDP
Copy link
Contributor

FedeDP commented Nov 4, 2024

Since lots of people are having the issue, my 2c is to definitely make a patch release of libs (0.18.2) and then a patch release for Falco 0.39 (0.39.2).

@Andreagit97
Copy link
Member

I agree, I'm still investigating the fedora issue also reported here by @tiny-pangolin (#3323 (comment)), it would be great to have both of them in the patch

@jordyb6
Copy link

jordyb6 commented Nov 6, 2024

Same issue on almalinux 8.7 hosts, no kubernetes. Some hosts fail to restart after making a config change or whenever I make a tweak in one of the rules files. After a while they do succeed to startup again.
Using modern bpf driver.

@Andreagit97
Copy link
Member

Uhm interesting, I would try to separate this issue into several separate ones:

  1. issues with the latest GKE versions -> this is a verifier error already solved in dev [BUG] Verifier failure on cos-beta-117-18613-0-66 libs#2118
  2. issues with the latest Fedora versions or more in general with a kernel version >= 6.11.4. -> We know what is the issue and we are working on it.
  3. several restarts before the successful one -> this is still under investigation but we need more logs... as suggested here ERROR: Error: Initialization issues during scap_init #3323 (comment) please enable the libs logger.

More, in general, to understand under which category you fall please enable the libs logger:

  • in the falco config:
libs_logger:
   enabled: true 
   severity: debug
  • or directly from the command line:
sudo ./usr/bin/falco -c ./etc/falco/falco.yaml -r ./etc/falco/falco_rules.yaml -o libs_logger.enabled=true -o libs_logger.severity=debug
  1. The GKE error should be the following
-- END PROG LOAD LOG --
Mon Nov  4 09:58:02 2024: [libs]: libbpf: prog 'capset_x': failed to load: -13
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load object 'bpf_probe'
Mon Nov  4 09:58:02 2024: [libs]: libbpf: failed to load BPF skeleton 'bpf_probe': -13
Mon Nov  4 09:58:02 2024: [libs]: libpman: failed to load BPF object (errno: 13 | message: Permission denied)
Mon Nov  4 09:58:02 2024: An error occurred in an event source, forcing termination...
  1. The Fedora error should be this one:
libbpf: prog 'pf_user': BPF program load failed: Invalid argument
libbpf: prog 'pf_user': -- BEGIN PROG LOAD LOG --
processed 282 insns (limit 1000000) max_states_per_insn 0 total_states 17 peak_states 17 mark_read 8
-- END PROG LOAD LOG --
libbpf: prog 'pf_user': failed to load: -22
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -22
libpman: failed to load BPF object (errno: 22 | message: Invalid argument)
  1. Still under investigation, nobody provided full logs for this case

@tordenist
Copy link

@Andreagit97 a possible patch release was mentioned. Any news on that becoming patch becoming available anytime soon?

@FedeDP
Copy link
Contributor

FedeDP commented Nov 11, 2024

We are still investigating the

several restarts before the successful one

issue. I'd say that we can expect a Falco patch release in a couple of weeks; sorry for the delay!
Also, please note that Falco is having CI issues right now that can slow down the process too.

@FedeDP
Copy link
Contributor

FedeDP commented Nov 11, 2024

/milestone 0.39.2

@poiana poiana modified the milestones: 0.40.0, 0.39.2 Nov 11, 2024
@Andreagit97
Copy link
Member

Andreagit97 commented Nov 11, 2024

ei @tordenist issues 1 and 2 reported here (#3323 (comment)) are solved in dev. It would be great to understand also the third one before releasing a patch release, if you are experiencing the third issue could you please provide the logs required above (@OneideLuizSchneider you are the initial reporter of the restart issue, could you please provide additional logs as suggested above?)

If we cannot reproduce the third issue, we may release just the fixes for the first 2. Maybe the third one is just an unhappy consequence of the first 2 but it would be great to understand it

@FedeDP
Copy link
Contributor

FedeDP commented Nov 11, 2024

@OneideLuizSchneider as andrea said, can you enable libs logging and send us some logs?
I think it might be an issue related with eg: selinux; let's see if logs confirm this.

@OneideLuizSchneider
Copy link
Author

OneideLuizSchneider commented Nov 12, 2024

@FedeDP @Andreagit97
Sorry for not getting back to you sooner, I was testing it and I was not able to simulate it anymore(like I said here as well #3323 (comment)).
I can send the logs if you still want to see them.

I'm using the image=public.ecr.aws/falcosecurity/falco-no-driver:latest

I did test it on:

  • EKS 1.29.8
  • EKS 1.30.4
  • EKS 1.31.0

@tiny-pangolin
Copy link

is there somewhere I can post debug logs to? the full run between setting is about 31000 lines

@Andreagit97
Copy link
Member

ei @tiny-pangolin you can upload a txt file here on the issue or you can create a gists as you prefer

@FedeDP
Copy link
Contributor

FedeDP commented Nov 21, 2024

Falco 0.39.2 is out, feel free to test it!
I will move this issue to 0.40.0 to track the only remaining problem :)
/milestone 0.40.0

@poiana poiana modified the milestones: 0.39.2, 0.40.0 Nov 21, 2024
@jordyb6
Copy link

jordyb6 commented Nov 21, 2024

Same issue on almalinux 8.7 hosts, no kubernetes. Some hosts fail to restart after making a config change or whenever I make a tweak in one of the rules files. After a while they do succeed to startup again. Using modern bpf driver.

Regarding my issue, it turns out the vm's that failed to restart falco didn't have enough free memory to start Falco.

@PierreBart
Copy link

0.39.2 fixes the issue for me, thanks @FedeDP and @Andreagit97 for your help!

@OneideLuizSchneider
Copy link
Author

@FedeDP @Andreagit97
I moved all the EKS Worker Nodes to AWS Linux 2023, and not even one restart anymore.
I tested it with many sizes, from t3.Medium to m7a.4xlarge, all good...

@NachoxMacho
Copy link

ei @tordenist issues 1 and 2 reported here (#3323 (comment)) are solved in dev. It would be great to understand also the third one before releasing a patch release, if you are experiencing the third issue could you please provide the logs required above (@OneideLuizSchneider you are the initial reporter of the restart issue, could you please provide additional logs as suggested above?)

If we cannot reproduce the third issue, we may release just the fixes for the first 2. Maybe the third one is just an unhappy consequence of the first 2 but it would be great to understand it

I'm not the original person, but ran into this issue as well on self hosted Talos Linux nodes (version 1.8.1, kubernetes version 1.31.2). Noticed a pattern that falco ran on nodes without secureboot enabled, but on nodes with secureboot I was getting this error message. May be a separate issue, but here are debug logs from a good node and a bad node.

falco-logs-bad.log
falco-logs-good.log

@Andreagit97
Copy link
Member

@NachoxMacho this sounds like another issue so i opened it here #3416

@salem017
Copy link

salem017 commented Nov 27, 2024

Hello, I have the same error type.

auxmap__store_u32_param(auxmap, open_flags_to_scap(how.flags));
278: <invalid CO-RE relocation>
failed to resolve CO-RE relocation <byte_off> [1685] struct open_how.flags (0:0 @ offset 0)
processed 232 insns (limit 1000000) max_states_per_insn 0 total_states 13 peak_states 13 mark_read 5
-- END PROG LOAD LOG --
Wed Nov 27 13:23:03 2024: [libs]: libbpf: prog 'openat2_e': failed to load: -22
Wed Nov 27 13:23:03 2024: [libs]: libbpf: failed to load object 'bpf_probe'
Wed Nov 27 13:23:03 2024: [libs]: libbpf: failed to load BPF skeleton 'bpf_probe': -22
Wed Nov 27 13:23:03 2024: [libs]: libpman: failed to load BPF object (errno: 22 | message: Invalid argument)
Wed Nov 27 13:23:03 2024: An error occurred in an event source, forcing termination...
Wed Nov 27 13:23:03 2024: Stopping capture for event source 'syscall'
Wed Nov 27 13:23:03 2024: [libs]: 
n_evts:49

Bests regards,

@Andreagit97
Copy link
Member

Thank you for reporting!
This is actually another issue, i opened it here #3417

This is issue describes multiple restart of Falco before a successful run, if you are facing other issues (like a verifier failure) please open a new github issue

@Jmorkcho
Copy link

Jmorkcho commented Dec 3, 2024

Unfortunately, upgrading from 0.38.2 to 0.39.2 does not fix the issue for me...
Have you got any other suggestions?

% kubectl describe daemonset falco -n falco | grep 39.2
                app.kubernetes.io/version=0.39.2
    Image:      docker.io/falcosecurity/falco-driver-loader:0.39.2
    Image:      docker.io/falcosecurity/falco-no-driver:0.39.2
% kubectl logs falco-z2546 -n falco | tail -n 12
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-driver-loader (init), falcoctl-artifact-install (init)
If syscalls in rules include high volume syscalls (-> activate via `-A` flag), else syscalls may have been removed via base_syscalls option or might be associated with syscalls undefined on your architecture (https://marcin.juszkiewicz.com.pl/download/tables/syscalls.html)
Tue Dec  3 21:35:37 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Tue Dec  3 21:35:37 2024: Starting health webserver with threadiness 8, listening on 0.0.0.0:8765
Tue Dec  3 21:35:37 2024: Loaded event sources: syscall
Tue Dec  3 21:35:37 2024: Enabled event sources: syscall
Tue Dec  3 21:35:37 2024: Opening 'syscall' source with modern BPF probe.
Tue Dec  3 21:35:37 2024: One ring buffer every '2' CPUs.
Tue Dec  3 21:35:37 2024: An error occurred in an event source, forcing termination...
Error: Initialization issues during scap_init
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

Edit: Adding version

% kubectl logs falco-z2546 -n falco | grep "version:"     
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-driver-loader (init), falcoctl-artifact-install (init)
Tue Dec  3 21:40:44 2024: Falco version: 0.39.2 (x86_64)

@Andreagit97
Copy link
Member

ei @Jmorkcho can you share the Falco debug logs as explained here #3323 (comment)?

@Jmorkcho
Copy link

Jmorkcho commented Dec 4, 2024

The full log is here:
libs_enabled.log
Although this may be the root cause:

Wed Dec  4 12:59:19 2024: [libs]: libbpf: map 'bpf_prob.bss': created successfully, fd=15
Wed Dec  4 12:59:19 2024: [libs]: libbpf: Error setting initial map(bpf_prob.bss) contents: Cannot allocate memory
Wed Dec  4 12:59:19 2024: [libs]: libbpf: map 'bpf_prob.bss': failed to create: Cannot allocate memory(-12)
Wed Dec  4 12:59:19 2024: [libs]: libbpf: failed to load object 'bpf_probe'
Wed Dec  4 12:59:19 2024: [libs]: libbpf: failed to load BPF skeleton 'bpf_probe': -12
Wed Dec  4 12:59:19 2024: [libs]: libpman: failed to load BPF object (errno: 12 | message: Cannot allocate memory)
Wed Dec  4 12:59:19 2024: An error occurred in an event source, forcing termination...
Wed Dec  4 12:59:19 2024: [libs]: 
n_evts:49
n_drops:140723811443824
n_drops_buffer:0
n_drops_buffer_clone_fork_enter:663013304
n_drops_buffer_clone_fork_exit:667300712
n_drops_buffer_execve_enter:663722816
n_drops_buffer_execve_exit:0
n_drops_buffer_connect_enter:11684262
n_drops_buffer_connect_exit:60
n_drops_buffer_open_enter:605362731791990272
n_drops_buffer_open_exit:67
n_drops_buffer_dir_file_enter:18446744073709534816
n_drops_buffer_dir_file_exit:12
n_drops_buffer_other_interest_enter:24099682
n_drops_buffer_other_interest_exit:44
n_drops_buffer_close_exit:140723811444208
n_drops_buffer_proc_exit:140723811445760
n_drops_scratch_map:140011064102687
n_drops_pf:665319472
n_drops_bug:140011067460400
Wed Dec  4 12:59:19 2024: [libs]: total threads in the table:0, total fds in all threads:0
Error: Initialization issues during scap_init

@Andreagit97
Copy link
Member

(errno: 12 | message: Cannot allocate memory)

it seems you have not enough memory to deploy Falco on that node

@tiny-pangolin
Copy link

Are there documented minimum requirements for running falco? I was planning on running falco on small virtual machines with 2 vcpus 4gb of memory

@Jmorkcho
Copy link

Jmorkcho commented Dec 6, 2024

^ I'm wondering the same as well, since at the time of starting the pod, there is almost 1GB free memory on the node, yet I receive the CrashLoopBackOff/scap_init error.

image

@devasmith

This comment has been minimized.

@Andreagit97
Copy link
Member

At the moment the current memory resources are like this 74.5G/1007G. This host have 96 CPU cores.

@devasmith This is pretty strange Falco with modern ebpf requires by default 8 MB for each 2 CPUs.
So in your case 96CPU / 2 * 8 MB = 384 MB + some memory allocated by Falco. BTW the memory usage shouldn't t reach the limits in the helm chart

          resources:
            limits:
              cpu: 1000m
              memory: 1024Mi
            requests:
              cpu: 100m
              memory: 512Mi

BTW yes, using buf_size_preset and cpus_for_each_buffer as knobs is a great strategy to calibrate Falco memory usage.

@tiny-pangolin @Jmorkcho could you try the above 2 configs and see if you can find a fit for your environment?

@devasmith
Copy link

At the moment the current memory resources are like this 74.5G/1007G. This host have 96 CPU cores.

@devasmith This is pretty strange Falco with modern ebpf requires by default 8 MB for each 2 CPUs. So in your case 96CPU / 2 * 8 MB = 384 MB + some memory allocated by Falco. BTW the memory usage shouldn't t reach the limits in the helm chart

      resources:
        limits:
          cpu: 1000m
          memory: 1024Mi
        requests:
          cpu: 100m
          memory: 512Mi

BTW yes, using buf_size_preset and cpus_for_each_buffer as knobs is a great strategy to calibrate Falco memory usage.

@tiny-pangolin @Jmorkcho could you try the above 2 configs and see if you can find a fit for your environment?

Thanks for the quick reply! I had actually previously lowered the resource limits so that was why I hit it.. But maybe this help anyone else.. :)

@Jmorkcho
Copy link

Hello, @devasmith and @Andreagit97,

I've adjusted my config like this:

  kind: modern_ebpf
  modern_ebpf:
    buf_size_preset: 6
    cpus_for_each_buffer: 2
    drop_failed_exit: false

In order to stop another problem that we experience in our environment, namely dropped syscalls and in this article I've found that increasing the buf_size_preset might fix the problem.

I've also removed the resource limitations, but there are still some pods that will experience the scap_init problem, restart X amount of times and then run successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests