Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: Address limitations in determining KASLR shifts by introducing VMCoreInfo support #1332

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

gcmoreira
Copy link
Contributor

@gcmoreira gcmoreira commented Nov 4, 2024

This pull request addresses a limitation in the current method for determining KASLR and ASLR shifts by extracting these values from the VMCOREINFO ELF note. It also retrieves the DTB and selects the layer using only data from this ELF note and independently of any ISF symbol.

When present, VMCOREINFO offers a more accurate way to retrieve the current kernel parameters, making it the preferred method in tools like crash, makedumpfile, and drgn.

The KASLR and ASLR shifts issue

The existing scanning method can inaccurately calculate these shifts in certain situations. For instance, a QEMU memory dump demonstrates this issue.

$ ./vol.py \
    -f ./dump_ubuntu180464bit_4.15.0-213-generic_reptile.core \
    linux.pslist 
Volatility 3 Framework 2.11.0      
OFFSET (V)      PID     TID     PPID    COMM    File output


Volatility was unable to read a requested page:
Page error 0x8c1cbc89b82c in layer layer_name (Page Fault at entry 0x0 in table page directory pointer)

        * Memory smear during acquisition (try re-acquiring if possible)
        * An intentionally invalid page lookup (operating system protection)
        * A bug in the plugin/volatility3 (re-run with -vvv and file a bug)

No further results will be produced

This is because the sample being analyzed has not just one swapper string match, but four, with the last one being the correct match.

$ binwalk -R "swapper/0\x00\x00" ./dump_ubuntu180464bit_4.15.0-213-generic.core

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
346048320     0x14A04740      Raw signature (swapper/0\x00\x00)
1516259136    0x5A604740      Raw signature (swapper/0\x00\x00)
1556105024    0x5CC04740      Raw signature (swapper/0\x00\x00)
1975535424    0x75C04740      Raw signature (swapper/0\x00\x00)

This causes the current find_aslr() scanning implementation to incorrectly calculate the KASLR/ASLR shifts, resulting in the entire analysis failing.

VMCoreInfo

The VMCoreInfo was introduced in Linux kernel 2.6.24 to assist user-space tools such as Crash and makedumpfile in analyzing the kernel's memory layout.
It provides all the essential information and kernel parameters needed to analyze a crash memory dump.

Volatility3 using VMCoreInfo

This PR enhances the Volatility 3 framework by adding support for searching the VMCoreInfo ELF note during initialization. It retrieves the ASLR shift and calculates the physical shift. If the VMCoreInfo ELF note is not found, it gracefully falls back to the traditional method.

Pros

  • Self-Sufficient: Work without reliance on external ISF, enabling early collection of all system parameters during framework initialization. This is especially useful for handling more complex architectures compared to x86-64, like AArch64.
  • Improved Speed: When successful, it delivers faster execution time than the conventional method. However, if VMCoreInfo is missing, there is additional overhead as both methods need to be executed.

Cons

  • Only works when VMCoreInfo is enable in the kernel.
  • It doesn't support kernels < 4.10 ( KERNELOFFSET was added in kernels 3.13, while phys_base as a value was added in 4.10). Given that kernel 4.10 was released in 2017, I think this limitation is relatively manageable.

Demo

$ time python3 ./vol.py \
    -f ./dump_ubuntu180464bit_4.15.0-213-generic_reptile.core \
    linux.pslist
Volatility 3 Framework 2.11.0
OFFSET (V)      PID     TID     PPID    COMM    File output

0x8c1cbc8997c0  1       1       0       systemd Disabled
0x8c1cbc89af80  2       2       0       kthreadd        Disabled
0x8c1cbc898000  3       3       2       kworker/0:0     Disabled
...
0x8c1cb7f897c0  1126    1126    1125    bash    Disabled

real    0m8.990s
user    0m7.880s
sys     0m0.982s

The VMCoreInfo plugin

Additionally, this PR introduces the VMCoreInfo plugin. It's a particular plugin that works without relying on an external ISF symbol file, needing only the elf.json already included with the framework. It supports any architecture, with the only requirement being that it must be little-endian. This can be addressed in future updates.

Demos

x86-64

$ ./vol.py -r pretty \
    -f ./dump_ubuntu180464bit_4.15.0-213-generic_reptile_infected.core \
    linux.vmcoreinfo
Volatility 3 Framework 2.11.0
  |         Offset |                                    Key |              Value
* | 0x96853c9f6000 |                              OSRELEASE | 4.15.0-213-generic
* | 0x96853c9f6000 |                               PAGESIZE |               4096
* | 0x96853c9f6000 |                    SYMBOL(init_uts_ns) | 0xffffffffb1c13280
* | 0x96853c9f6000 |                SYMBOL(node_online_map) | 0xffffffffb1e60f20
* | 0x96853c9f6000 |                 SYMBOL(swapper_pg_dir) | 0xffffffffb1c0a000
* | 0x96853c9f6000 |                         SYMBOL(_stext) | 0xffffffffb0800000
* | 0x96853c9f6000 |                 SYMBOL(vmap_area_list) | 0xffffffffb1d07bd0
* | 0x96853c9f6000 |                    SYMBOL(mem_section) | 0xffff96853ff2c000
* | 0x96853c9f6000 |                    LENGTH(mem_section) |               2048
* | 0x96853c9f6000 |                      SIZE(mem_section) |                 16
* | 0x96853c9f6000 |    OFFSET(mem_section.section_mem_map) |                  0
* | 0x96853c9f6000 |                             SIZE(page) |                 64
* | 0x96853c9f6000 |                      SIZE(pglist_data) |             172864
* | 0x96853c9f6000 |                             SIZE(zone) |               1664
* | 0x96853c9f6000 |                        SIZE(free_area) |                104
* | 0x96853c9f6000 |                        SIZE(list_head) |                 16
* | 0x96853c9f6000 |                       SIZE(nodemask_t) |                128
* | 0x96853c9f6000 |                     OFFSET(page.flags) |                  0
* | 0x96853c9f6000 |                 OFFSET(page._refcount) |                 28
* | 0x96853c9f6000 |                   OFFSET(page.mapping) |                  8
* | 0x96853c9f6000 |                       OFFSET(page.lru) |                 32
* | 0x96853c9f6000 |                 OFFSET(page._mapcount) |                 24
* | 0x96853c9f6000 |                   OFFSET(page.private) |                 48
* | 0x96853c9f6000 |             OFFSET(page.compound_dtor) |                 40
* | 0x96853c9f6000 |            OFFSET(page.compound_order) |                 44
* | 0x96853c9f6000 |             OFFSET(page.compound_head) |                 32
* | 0x96853c9f6000 |         OFFSET(pglist_data.node_zones) |                  0
* | 0x96853c9f6000 |           OFFSET(pglist_data.nr_zones) |             172192
* | 0x96853c9f6000 |     OFFSET(pglist_data.node_start_pfn) |             172200
* | 0x96853c9f6000 | OFFSET(pglist_data.node_spanned_pages) |             172216
* | 0x96853c9f6000 |            OFFSET(pglist_data.node_id) |             172224
* | 0x96853c9f6000 |                 OFFSET(zone.free_area) |                192
* | 0x96853c9f6000 |                   OFFSET(zone.vm_stat) |               1472
* | 0x96853c9f6000 |             OFFSET(zone.spanned_pages) |                112
* | 0x96853c9f6000 |            OFFSET(free_area.free_list) |                  0
* | 0x96853c9f6000 |                 OFFSET(list_head.next) |                  0
* | 0x96853c9f6000 |                 OFFSET(list_head.prev) |                  8
* | 0x96853c9f6000 |             OFFSET(vmap_area.va_start) |                  0
* | 0x96853c9f6000 |                 OFFSET(vmap_area.list) |                 48
* | 0x96853c9f6000 |                 LENGTH(zone.free_area) |                 11
* | 0x96853c9f6000 |                        SYMBOL(log_buf) | 0xffffffffb1c63ce0
* | 0x96853c9f6000 |                    SYMBOL(log_buf_len) | 0xffffffffb1c63cdc
* | 0x96853c9f6000 |                  SYMBOL(log_first_idx) | 0xffffffffb21ab4f8
* | 0x96853c9f6000 |                      SYMBOL(clear_idx) | 0xffffffffb21ab4c4
* | 0x96853c9f6000 |                   SYMBOL(log_next_idx) | 0xffffffffb21ab4e8
* | 0x96853c9f6000 |                       SIZE(printk_log) |                 16
* | 0x96853c9f6000 |             OFFSET(printk_log.ts_nsec) |                  0
* | 0x96853c9f6000 |                 OFFSET(printk_log.len) |                  8
* | 0x96853c9f6000 |            OFFSET(printk_log.text_len) |                 10
* | 0x96853c9f6000 |            OFFSET(printk_log.dict_len) |                 12
* | 0x96853c9f6000 |            LENGTH(free_area.free_list) |                  6
* | 0x96853c9f6000 |                  NUMBER(NR_FREE_PAGES) |                  0
* | 0x96853c9f6000 |                         NUMBER(PG_lru) |                  5
* | 0x96853c9f6000 |                     NUMBER(PG_private) |                 12
* | 0x96853c9f6000 |                   NUMBER(PG_swapcache) |                  9
* | 0x96853c9f6000 |                        NUMBER(PG_slab) |                  8
* | 0x96853c9f6000 |                    NUMBER(PG_hwpoison) |                 22
* | 0x96853c9f6000 |                   NUMBER(PG_head_mask) |              32768
* | 0x96853c9f6000 |      NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE) |               -128
* | 0x96853c9f6000 |              NUMBER(HUGETLB_PAGE_DTOR) |                  2
* | 0x96853c9f6000 |                      NUMBER(phys_base) |          400556032
* | 0x96853c9f6000 |                   SYMBOL(init_top_pgt) | 0xffffffffb1c0a000
* | 0x96853c9f6000 |                      SYMBOL(node_data) | 0xffffffffb1e5c6c0
* | 0x96853c9f6000 |                      LENGTH(node_data) |               1024
* | 0x96853c9f6000 |                           KERNELOFFSET |         0x2f800000
* | 0x96853c9f6000 |              NUMBER(KERNEL_IMAGE_SIZE) |         1073741824

i686

$ ./vol.py -r pretty \
    -f ./ubuntu-4.9.0-19-686.core  \
    linux.vmcoreinfo
Volatility 3 Framework 2.11.0
  |     Offset |                                    Key |            Value
* | 0xcb92f140 |                              OSRELEASE | 4.9.0-19-686-pae
* | 0xcb92f140 |                               PAGESIZE |             4096
* | 0xcb92f140 |                    SYMBOL(init_uts_ns) |       0xcb7894a0
* | 0xcb92f140 |                SYMBOL(node_online_map) |       0xcb82209c
* | 0xcb92f140 |                 SYMBOL(swapper_pg_dir) |       0xcb8f5000
* | 0xcb92f140 |                         SYMBOL(_stext) |       0xcb000358
* | 0xcb92f140 |                 SYMBOL(vmap_area_list) |       0xcb7bae40
* | 0xcb92f140 |                        SYMBOL(mem_map) |       0xcb93bf84
* | 0xcb92f140 |               SYMBOL(contig_page_data) |       0xcb80b3c0
* | 0xcb92f140 |                             SIZE(page) |               36
* | 0xcb92f140 |                      SIZE(pglist_data) |             3520
* | 0xcb92f140 |                             SIZE(zone) |              768
* | 0xcb92f140 |                        SIZE(free_area) |               44
* | 0xcb92f140 |                        SIZE(list_head) |                8
* | 0xcb92f140 |                       SIZE(nodemask_t) |                4
* | 0xcb92f140 |                     OFFSET(page.flags) |                0
* | 0xcb92f140 |                 OFFSET(page._refcount) |               16
* | 0xcb92f140 |                   OFFSET(page.mapping) |                4
* | 0xcb92f140 |                       OFFSET(page.lru) |               20
* | 0xcb92f140 |                 OFFSET(page._mapcount) |               12
* | 0xcb92f140 |                   OFFSET(page.private) |               28
* | 0xcb92f140 |             OFFSET(page.compound_dtor) |               24
* | 0xcb92f140 |            OFFSET(page.compound_order) |               26
* | 0xcb92f140 |             OFFSET(page.compound_head) |               20
* | 0xcb92f140 |         OFFSET(pglist_data.node_zones) |                0
* | 0xcb92f140 |           OFFSET(pglist_data.nr_zones) |             3112
* | 0xcb92f140 |       OFFSET(pglist_data.node_mem_map) |             3116
* | 0xcb92f140 |     OFFSET(pglist_data.node_start_pfn) |             3124
* | 0xcb92f140 | OFFSET(pglist_data.node_spanned_pages) |             3132
* | 0xcb92f140 |            OFFSET(pglist_data.node_id) |             3136
* | 0xcb92f140 |                 OFFSET(zone.free_area) |              128
* | 0xcb92f140 |                   OFFSET(zone.vm_stat) |              704
* | 0xcb92f140 |             OFFSET(zone.spanned_pages) |               52
* | 0xcb92f140 |            OFFSET(free_area.free_list) |                0
* | 0xcb92f140 |                 OFFSET(list_head.next) |                0
* | 0xcb92f140 |                 OFFSET(list_head.prev) |                4
* | 0xcb92f140 |             OFFSET(vmap_area.va_start) |                0
* | 0xcb92f140 |                 OFFSET(vmap_area.list) |               24
* | 0xcb92f140 |                 LENGTH(zone.free_area) |               11
* | 0xcb92f140 |                        SYMBOL(log_buf) |       0xcb7a68c8
* | 0xcb92f140 |                    SYMBOL(log_buf_len) |       0xcb7a68c4
* | 0xcb92f140 |                  SYMBOL(log_first_idx) |       0xcb92d858
* | 0xcb92f140 |                      SYMBOL(clear_idx) |       0xcb92d82c
* | 0xcb92f140 |                   SYMBOL(log_next_idx) |       0xcb92d848
* | 0xcb92f140 |                       SIZE(printk_log) |               16
* | 0xcb92f140 |             OFFSET(printk_log.ts_nsec) |                0
* | 0xcb92f140 |                 OFFSET(printk_log.len) |                8
* | 0xcb92f140 |            OFFSET(printk_log.text_len) |               10
* | 0xcb92f140 |            OFFSET(printk_log.dict_len) |               12
* | 0xcb92f140 |            LENGTH(free_area.free_list) |                5
* | 0xcb92f140 |                  NUMBER(NR_FREE_PAGES) |                0
* | 0xcb92f140 |                         NUMBER(PG_lru) |                5
* | 0xcb92f140 |                     NUMBER(PG_private) |               11
* | 0xcb92f140 |                   NUMBER(PG_swapcache) |               15
* | 0xcb92f140 |                        NUMBER(PG_slab) |                7
* | 0xcb92f140 |                    NUMBER(PG_hwpoison) |               22
* | 0xcb92f140 |                   NUMBER(PG_head_mask) |            16384
* | 0xcb92f140 |      NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE) |             -128
* | 0xcb92f140 |              NUMBER(KERNEL_IMAGE_SIZE) |        536870912
* | 0xcb92f140 |              NUMBER(HUGETLB_PAGE_DTOR) |                2
* | 0xcb92f140 |                         CONFIG_X86_PAE |                y

aarch64

AARCH64 uses NUMBER() with hex values, see line which seems odd, or at least it’s not what other kernel developers are doing on x86, where NUMBER is always a decimal number.
Anyway, fortunately, it includes the 0x prefix, allowing the API value parser to handle these cases correctly. Unfortunately, when linux.vmcoreinfo performs the inverse process, it cannot determine whether to display the value as a decimal or hexadecimal number. That’s why it shows lines like:

* | 0x40928000 |                      NUMBER(PHYS_OFFSET) |                     18446654373317574656

Nonetheless, it doesn’t hurt. Internally, the VMCOREINFO API converts all values to integers and manages them correctly.

$ ./vol.py -r pretty \
    -f ./ram_6.8.0-35-generic_aarch64.raw \
    linux.vmcoreinfo
Volatility 3 Framework 2.11.0
  |     Offset |                                      Key |                                    Value
* | 0x40928000 |                                OSRELEASE |                         6.8.0-35-generic
* | 0x40928000 |                                 BUILD-ID | f0c1677ab8b19cd28c8686a717772df3f8dc6d99
* | 0x40928000 |                                 PAGESIZE |                                     4096
* | 0x40928000 |                      SYMBOL(init_uts_ns) |                       0xffffd231dec63400
* | 0x40928000 |               OFFSET(uts_namespace.name) |                                        0
* | 0x40928000 |                  SYMBOL(node_online_map) |                       0xffffd231de7dec28
* | 0x40928000 |                   SYMBOL(swapper_pg_dir) |                       0xffffd231ddbd8000
* | 0x40928000 |                           SYMBOL(_stext) |                       0xffffd231db5c0000
* | 0x40928000 |                   SYMBOL(vmap_area_list) |                       0xffffd231de966b28
* | 0x40928000 |                      SYMBOL(mem_section) |                       0xffff51957fa0e5c0
* | 0x40928000 |                      LENGTH(mem_section) |                                     8192
* | 0x40928000 |                        SIZE(mem_section) |                                       16
* | 0x40928000 |      OFFSET(mem_section.section_mem_map) |                                        0
* | 0x40928000 |                NUMBER(SECTION_SIZE_BITS) |                                       27
* | 0x40928000 |                 NUMBER(MAX_PHYSMEM_BITS) |                                       48
* | 0x40928000 |                               SIZE(page) |                                       64
* | 0x40928000 |                        SIZE(pglist_data) |                                    23104
* | 0x40928000 |                               SIZE(zone) |                                     1984
* | 0x40928000 |                          SIZE(free_area) |                                      104
* | 0x40928000 |                          SIZE(list_head) |                                       16
* | 0x40928000 |                         SIZE(nodemask_t) |                                        8
* | 0x40928000 |                       OFFSET(page.flags) |                                        0
* | 0x40928000 |                   OFFSET(page._refcount) |                                       52
* | 0x40928000 |                     OFFSET(page.mapping) |                                       24
* | 0x40928000 |                         OFFSET(page.lru) |                                        8
* | 0x40928000 |                   OFFSET(page._mapcount) |                                       48
* | 0x40928000 |                     OFFSET(page.private) |                                       40
* | 0x40928000 |               OFFSET(page.compound_head) |                                        8
* | 0x40928000 |           OFFSET(pglist_data.node_zones) |                                        0
* | 0x40928000 |             OFFSET(pglist_data.nr_zones) |                                    20192
* | 0x40928000 |       OFFSET(pglist_data.node_start_pfn) |                                    20200
* | 0x40928000 |   OFFSET(pglist_data.node_spanned_pages) |                                    20216
* | 0x40928000 |              OFFSET(pglist_data.node_id) |                                    20224
* | 0x40928000 |                   OFFSET(zone.free_area) |                                      256
* | 0x40928000 |                     OFFSET(zone.vm_stat) |                                     1792
* | 0x40928000 |               OFFSET(zone.spanned_pages) |                                      152
* | 0x40928000 |              OFFSET(free_area.free_list) |                                        0
* | 0x40928000 |                   OFFSET(list_head.next) |                                        0
* | 0x40928000 |                   OFFSET(list_head.prev) |                                        8
* | 0x40928000 |               OFFSET(vmap_area.va_start) |                                        0
* | 0x40928000 |                   OFFSET(vmap_area.list) |                                       40
* | 0x40928000 |                   LENGTH(zone.free_area) |                                       14
* | 0x40928000 |                              SYMBOL(prb) |                       0xffffd231de821078
* | 0x40928000 |                 SYMBOL(printk_rb_static) |                       0xffffd231de821338
* | 0x40928000 |                        SYMBOL(clear_seq) |                       0xffffd231decfd168
* | 0x40928000 |                  SIZE(printk_ringbuffer) |                                       88
* | 0x40928000 |      OFFSET(printk_ringbuffer.desc_ring) |                                        0
* | 0x40928000 | OFFSET(printk_ringbuffer.text_data_ring) |                                       48
* | 0x40928000 |           OFFSET(printk_ringbuffer.fail) |                                       80
* | 0x40928000 |                      SIZE(prb_desc_ring) |                                       48
* | 0x40928000 |         OFFSET(prb_desc_ring.count_bits) |                                        0
* | 0x40928000 |              OFFSET(prb_desc_ring.descs) |                                        8
* | 0x40928000 |              OFFSET(prb_desc_ring.infos) |                                       16
* | 0x40928000 |            OFFSET(prb_desc_ring.head_id) |                                       24
* | 0x40928000 |            OFFSET(prb_desc_ring.tail_id) |                                       32
* | 0x40928000 |                           SIZE(prb_desc) |                                       24
* | 0x40928000 |               OFFSET(prb_desc.state_var) |                                        0
* | 0x40928000 |           OFFSET(prb_desc.text_blk_lpos) |                                        8
* | 0x40928000 |                  SIZE(prb_data_blk_lpos) |                                       16
* | 0x40928000 |          OFFSET(prb_data_blk_lpos.begin) |                                        0
* | 0x40928000 |           OFFSET(prb_data_blk_lpos.next) |                                        8
* | 0x40928000 |                        SIZE(printk_info) |                                       88
* | 0x40928000 |                  OFFSET(printk_info.seq) |                                        0
* | 0x40928000 |              OFFSET(printk_info.ts_nsec) |                                        8
* | 0x40928000 |             OFFSET(printk_info.text_len) |                                       16
* | 0x40928000 |            OFFSET(printk_info.caller_id) |                                       20
* | 0x40928000 |             OFFSET(printk_info.dev_info) |                                       24
* | 0x40928000 |                    SIZE(dev_printk_info) |                                       64
* | 0x40928000 |        OFFSET(dev_printk_info.subsystem) |                                        0
* | 0x40928000 |            LENGTH(printk_info_subsystem) |                                       16
* | 0x40928000 |           OFFSET(dev_printk_info.device) |                                       16
* | 0x40928000 |               LENGTH(printk_info_device) |                                       48
* | 0x40928000 |                      SIZE(prb_data_ring) |                                       32
* | 0x40928000 |          OFFSET(prb_data_ring.size_bits) |                                        0
* | 0x40928000 |               OFFSET(prb_data_ring.data) |                                        8
* | 0x40928000 |          OFFSET(prb_data_ring.head_lpos) |                                       16
* | 0x40928000 |          OFFSET(prb_data_ring.tail_lpos) |                                       24
* | 0x40928000 |                      SIZE(atomic_long_t) |                                        8
* | 0x40928000 |            OFFSET(atomic_long_t.counter) |                                        0
* | 0x40928000 |                        SIZE(latched_seq) |                                       24
* | 0x40928000 |                  OFFSET(latched_seq.val) |                                        8
* | 0x40928000 |              LENGTH(free_area.free_list) |                                        6
* | 0x40928000 |                    NUMBER(NR_FREE_PAGES) |                                        0
* | 0x40928000 |                           NUMBER(PG_lru) |                                        5
* | 0x40928000 |                       NUMBER(PG_private) |                                       15
* | 0x40928000 |                     NUMBER(PG_swapcache) |                                       12
* | 0x40928000 |                    NUMBER(PG_swapbacked) |                                       19
* | 0x40928000 |                          NUMBER(PG_slab) |                                       11
* | 0x40928000 |                      NUMBER(PG_hwpoison) |                                       22
* | 0x40928000 |                     NUMBER(PG_head_mask) |                                       64
* | 0x40928000 |        NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE) |                                     -129
* | 0x40928000 |                       NUMBER(PG_hugetlb) |                                        8
* | 0x40928000 |      NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE) |                                     -257
* | 0x40928000 |                   SYMBOL(kallsyms_names) |                       0xffffd231dcf88838
* | 0x40928000 |                SYMBOL(kallsyms_num_syms) |                       0xffffd231dcf88830
* | 0x40928000 |             SYMBOL(kallsyms_token_table) |                       0xffffd231dd174e88
* | 0x40928000 |             SYMBOL(kallsyms_token_index) |                       0xffffd231dd175258
* | 0x40928000 |                 SYMBOL(kallsyms_offsets) |                       0xffffd231dd175458
* | 0x40928000 |           SYMBOL(kallsyms_relative_base) |                       0xffffd231dd212080
* | 0x40928000 |                          NUMBER(VA_BITS) |                                       48
* | 0x40928000 |                    NUMBER(MODULES_VADDR) |                     18446603336221196288
* | 0x40928000 |                      NUMBER(MODULES_END) |                     18446603338368679936
* | 0x40928000 |                    NUMBER(VMALLOC_START) |                     18446603338368679936
* | 0x40928000 |                      NUMBER(VMALLOC_END) |                     18446739675394605056
* | 0x40928000 |                    NUMBER(VMEMMAP_START) |                     18446739675663040512
* | 0x40928000 |                      NUMBER(VMEMMAP_END) |                     18446741874686296064
* | 0x40928000 |                   NUMBER(kimage_voffset) |                     18446693708645531648
* | 0x40928000 |                      NUMBER(PHYS_OFFSET) |                     18446654373317574656
* | 0x40928000 |                     NUMBER(TCR_EL1_T1SZ) |                                       16
* | 0x40928000 |                             KERNELOFFSET |                           0x52315b5b0000
* | 0x40928000 |                    NUMBER(KERNELPACMASK) |                                        0

@eve-mem
Copy link
Contributor

eve-mem commented Nov 4, 2024

This is extremely interesting! Great work!

It seems like with a little more digging we can use it to get to kallsyms using this method and then get all the symbols for the image.

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome, and I'm really excited to see it land! 5:D Just needs a version check at the point of use, and some confidence that we do some kind of verification of the results we get, and then it's all good to go in! 5:D

context, layer_name, progress_callback=progress_callback
)
if aslr_shifts:
kaslr_shift, aslr_shift = aslr_shifts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rest of the automagic ever validate these values in any way? If not, perhaps they should (checking for an ELF signature or mapping the virtual kernel to the physical one and checking a number of bytes match, just something to make sure the map works correctly)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. But, if it already validates these values for find_aslr_classic(), it will also do so for find_aslr_vmcoreinfo() and vice versa.

To validate these values and, if it fails, proceed with a fallback action like trying the next potential candidate, we should convert the find_aslr_*() to return a generator and make a loop. Once the layer is created here try to do something with it to test if it works.

The find_aslr_vmcoreinfo() could easily test if the ASLR address is correct even without creating a layer. The VMCOREINFO provides the virtual address (with the aslr shift already applied) of many symbols.
For instance, we can check that the address provided by get_symbol("init_uts_ns"), which is the System.map value (without the shift applied) be equal to the address provided in VMCOREINFO -> SYMBOL(init_uts_ns) minus the KERNELOFFSET. If that doesn't match there's something wrong with that VMCOREINFO table and it will have to find the next one. We can validate one, some or all of them, for instance:

    SYMBOL(init_uts_ns)=ffffffffb99e82e0
    SYMBOL(node_online_map)=ffffffffb9a4b680
    SYMBOL(swapper_pg_dir)=ffffffffb963c000
    SYMBOL(_stext)=ffffffffb7200000
    SYMBOL(vmap_area_list)=ffffffffb983ce70
    SYMBOL(mem_section)=ffff8d57ff7d1000
    SYMBOL(prb)=ffffffffb96a24e0
    SYMBOL(printk_rb_static)=ffffffffb96a2500
    SYMBOL(clear_seq)=ffffffffba01d220
    SYMBOL(kallsyms_names)=ffffffffb89ace68
    SYMBOL(kallsyms_num_syms)=ffffffffb89ace60
    SYMBOL(kallsyms_token_table)=ffffffffb8c74fb0
    SYMBOL(kallsyms_token_index)=ffffffffb8c75338
    SYMBOL(kallsyms_offsets)=ffffffffb8c75538
    SYMBOL(kallsyms_relative_base)=ffffffffb8d3bc00
    SYMBOL(init_top_pgt)=ffffffffb963c000
    SYMBOL(node_data)=ffffffffb9a46720

However, this approach makes the VMCOREINFO implementation dependent on a symbol table, which IMO is a mistake. It would prevent future capabilities, such as retrieving symbols from kallsyms. VMCOREINFO should be the first step and must remain completely independent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, I think that we could also move the find_aslr_vmcoreinfo() to be executed before and outside the layer.scan() banner loop. Since it's independent of the ISF, I think it should be here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, at the moment the classes are just trusted to have returned the right ones, but you're right it makes sense to do the double checking for either of them. Care to knock something up for that (you may have already, I'm just trying to get through all the PRs so I may have missed it, I feel one recently involved this bit of code...) 5;P

volatility3/framework/automagic/linux.py Outdated Show resolved Hide resolved
@gcmoreira
Copy link
Contributor Author

@ikelos I would also like to store the vmcoreinfo dict in a location that's accessible to other parts of the framework, including the core and plugins. What are your thoughts on this? Where do you think would be the best place for it?

@ikelos
Copy link
Member

ikelos commented Nov 13, 2024

Wouldn't it just be an extended type with an offset you could construct from a symbol? That feels like it should be a one liner to construct, and that would allow you to add helper functions to the class if needed? Anything else would likely require a fair bit of special casing to support properly? If it's something that explicitly needs locating, then a plugin to do the location and a static/classmethod other plugins can call to get to it?

Maximize use of VMCOREINFO data without reliance on ISF symbols:
- Obtain the DTB
- Utilize OSRELEASE (the same as UTS_RELEASE used in the Linux banner and init_uts_ns/new_utsname) to prefilter the list of Linux banners, reducing search time for linux_banner in memory.
- Find the correct layer using the VMCOREINFO data (including 32bit PAE).
@gcmoreira
Copy link
Contributor Author

Hey @ikelos, apologies for the post-review refactor, but I found a better approach and many improvements. In this latest commit, I moved the VMCoreInfo implementation into its own stacker class, leaving the original untouched (except for the virtual_to_physical_address that's now staticmethod) and making it easier to fall back to the traditional one if something goes wrong.

Apart from that, these are all the improvements from the previous implementation:

Maximize use of VMCOREINFO data without reliance on ISF symbols:

  • Obtain the DTB
  • Use OSRELEASE (the same as UTS_RELEASE used in the Linux banner and init_uts_ns/new_utsname) to prefilter the list of Linux banners, reducing search time for linux_banner in memory.
  • Find the correct layer using the VMCOREINFO data (including 32bit PAE).
  • Better docstings and comments to document the code

@Abyss-W4tcher
Copy link
Contributor

Abyss-W4tcher commented Nov 18, 2024

Hi, thanks for the awesome work as usual !

Before your PR, to handle potential new architectures in the future, I started to work on a local branch about a more scalable LinuxStacker, which can be easily modified to integrate new stackers.

Do you think it would be possible to directly scale your implementation here, where code is unified as much as possible ? Even if there is still only Intel for now, at least the ground work will be laid down and no complete rethinking will be needed later.

Something like that, as pseudocode:

class LinuxVMCOREINFOStacker(interfaces.automagic.StackerLayerInterface):
	def stack():
		arch_stacker_to_use : LinuxIntelVMCOREINFOStacker = cls._determine_arch()
		layer = arch_stacker_to_use.stack() 

class LinuxIntelVMCOREINFOStacker():
	def stack():
		# implies _vmcoreinfo_find_aslr() is a classmethod
		kaslr, aslr = LinuxVMCOREINFOStacker._vmcoreinfo_find_aslr()
		pass

# This might be done in another PR as it's not vmcore related ?
class LinuxStacker(interfaces.automagic.StackerLayerInterface):
	def stack():
		arch_stacker_to_use : LinuxIntelStacker = cls._determine_arch()
		layer = arch_stacker_to_use.stack() 

class LinuxIntelStacker():
	def stack():
		kaslr, aslr = LinuxStacker.find_aslr()
		pass

Thanks by advance, and feel free to tell me if something is not clear about what I'm asking or if it doesn't fit 👍.

@ikelos
Copy link
Member

ikelos commented Nov 18, 2024

Ok, before I review the refactored version, lemme get a couple quick points down:

  • It doesn't feel like a stacker? Stackers take a layer and stack them on top of another layer. This adds additional info to the process, and I'd run it as an automagic in its own right, prioritized to run after the stacker. I haven't checked, but it didn't sound like linux stacking required any info from the VMCOREINFO structure, so I'd just find it separately later? As I say, I may look through this and realize why that wouldn't work, but my first guess would be to do it that way.

  • Second, you wanted somewhere to stash the information so that other parts of the framework could find it. The layers have a special structure called metadata that should allow you to do that. It's setup to be stackable (as in, you can overwrite values in a higher layer and they should read down through the layers) but the data is freeform pretty much and just keyed off a specific name. For an example, this is how PAE layers flag themselves as PAE even though most everybody just checks for isinstance(IntelPAE) because at the moment there's only one PAE layer. This is accessed externally through the metadata property exposed on all layers (and chainmapped with lower layer's metadata to make sure it's cumulative). This is how crashdump passes up parameters from the data section of the crashdump file, so higher layers can find it...

I'll try to give this a review soon, but since it's affecting a core component it might take me a little while to go through it properly. I like the idea very much though, just not as its own stacker, either as part of the LinuxIntel stacker, or its own automagic...

@gcmoreira
Copy link
Contributor Author

Hey @ikelos

* It doesn't feel like a stacker?  Stackers take a layer and stack them on top of another layer.  This adds additional info to the process, and I'd run it as an automagic in its own right, prioritized to run after the stacker.  I haven't checked, but it didn't sound like linux stacking required any info from the VMCOREINFO structure, so I'd just find it separately later?  As I say, I may look through this and realize why that wouldn't work, but my first guess would be to do it that way.

It performs the same tasks as LinuxIntelStacker: finding ASLR offsets, the DTB, selecting the ISF, and returning the appropriate layer. The only difference is in how it locates these elements. LinuxIntelStacker find the init task and derives all other parameters based on it. This new approach pulls values directly from VMCOREINFO, providing a more direct solution.

@gcmoreira
Copy link
Contributor Author

* Second, you wanted somewhere to stash the information so that other parts of the framework could find it.  The layers have a special structure called `metadata` that should allow you to do that.  It's setup to be stackable (as in, you can overwrite values in a higher layer and they should read down through the layers) but the data is freeform pretty much and just keyed off a specific name.  

That's awesome, this worked like a charm:

    layer._direct_metadata = collections.ChainMap(
        {"vmcoreinfo": vmcoreinfo}, layer._direct_metadata
    )

@ikelos
Copy link
Member

ikelos commented Nov 18, 2024

Hmmmm, ok. If it serves the same purpose. I guess things can't rely on metadata existing (since there's no guarantee). I assume this is prioritized to run first and that if it succeeds, the second one bombs out immediately?

@gcmoreira
Copy link
Contributor Author

Hmmmm, ok. If it serves the same purpose. I guess things can't rely on metadata existing (since there's no guarantee). I assume this is prioritized to run first and that if it succeeds, the second one bombs out immediately?

Yeah, we'll need to check if that's presnet i.e layer.metadata.get("vmcoreinfo") is not None. Otherwise, we'll have to call the VMCOREINFO API again to obtain the VMCOREINFO dictionary. It's possible that LinuxIntelVMCOREINFOStacker may not succeed in finding a suitable and safe vmcoreinfo to set up the layer (e.g., it may not be able to obtain the ASLR offsets if the kernel is too old). However, the information available on it is still valid for other purposes.

The LinuxIntelVMCOREINFOStacker has higher priority than the LinuxIntelStacker, so it runs first. Since it returns a valid layer, the subsequent stackers are not executed.

@ikelos
Copy link
Member

ikelos commented Nov 18, 2024

I mean, it could get a bit sticky, but you may also need to consider virtualization, so the layer below this might be its own linux layer with its own vmcoreinfo rather than just a physical layer... Just a little gotcha to watch out for...

@gcmoreira
Copy link
Contributor Author

I mean, it could get a bit sticky, but you may also need to consider virtualization, so the layer below this might be its own linux layer with its own vmcoreinfo rather than just a physical layer... Just a little gotcha to watch out for...

Unless I'm missing something, since the the metadata is also stackable, I don't think that will be an issue. There will be two (or more) vmcoreinfo at different levels. The same way there more than one "architecture" key:

>>> layer.metadata._dict
ChainMap({'os': 'Linux'}, ChainMap({'architecture': 'Intel64'}, ChainMap({'architecture': 'Intel32'}, {'mapped': True}, {'architecture': 'Unknown', 'os': 'Unknown'})), <volatility3.framework.interfaces.objects.ReadOnlyMapping object at 0x70360bd48a90>)

As long as you are in the right layer, you will get the right vmcoreinfo. Same way, we get Intel64 and not Intel32 here:

>>> layer.metadata.architecture
'Intel64'

On the other hand, regarding virtualization :) ... I don't think LinuxIntelStacker supports it, does it? That could also be a challenge for LinuxIntelVMCOREINFOStacker. We'll need to investigate further to figure out how to properly identify OS instances and stack them correctly. So far, both methods use the first "valid" info they can find in memory.

@ikelos
Copy link
Member

ikelos commented Nov 18, 2024

Unless I'm missing something, since the the metadata is also stackable, I don't think that will be an issue. There will be two (or more) vmcoreinfo at different levels. The same way there more than one "architecture" key:

The concern was using the presence of it in the metadata to bomb out of trying to find it. Otherwise stacking should work as expected (and the lower layer can be interrogated for its vmcoreinfo if needed.

I don't think the linux stacker supports it yet, just thinking of my work on the VMCS stuff, and Intel stacked on Intel. Yeah, should be ok (the lower layers will only have access to their own version, but you might have to test one and see if it fits as to whether to keep searching...

@gcmoreira
Copy link
Contributor Author

The concern was using the presence of it in the metadata to bomb out of trying to find it.

Oh you mean if for instance, the current layer (let's say a guess OS) doesn't have a vmcoreinfo but the host (or the parent OS) does. If we do layer.metadata.get("vmcoreinfo") from a guess os layer, we will get the parent one, right?

Hmm, is there a way to check if the key is present at the top of the metadata? Alternatively, we could set the vmcoreinfo key to empty in the metadata if it wasn't obtained from LinuxIntelVMCOREINFOStacker. Is there a way to assign 'default' values, or do you have any suggestions on the best approach to handle this?

@ikelos
Copy link
Member

ikelos commented Nov 18, 2024

Yeah, that kind of issue.

Hmmm, well, either you can check the lower layers to make sure their vmcoreinfos are different, or you can assign it None if it fails. That won't get overwritten by different layers, but any linux layer would get it blatted over? Possibly worth doing in the alternate linux layer as well, so that if the automagic is ever turned off, you always get the value for a linux system, but it'll be None if the vmcoreinfo stuff failed?

@ikelos
Copy link
Member

ikelos commented Nov 18, 2024

As I say, if you traverse up for layer list, each layer will have its version of the metadata in it, so you could get back to old (now hidden) vmcoreinfos if needed...

@gcmoreira
Copy link
Contributor Author

How does the last commit look?

…o be modified; otherwise, it will only exist within the instance namespace.
@gcmoreira
Copy link
Contributor Author

Actually, there was a bug. It needs to be added in the class namespace. It should be fine now.

@gcmoreira
Copy link
Contributor Author

gcmoreira commented Nov 19, 2024

Actually, there was a bug. It needs to be added in the class namespace. It should be fine now.

I found what's happening. The layer instance returned by the stacker object is added to the context here. However, that's not the final context, it's a clone of it, see new_context.
There's a configuration merge here. So, the only thing it persists from the layer instances created in this process is the layer configurations; all other data is discarded. Is that okay?

  • Is it ok to update the __class__._direct_metadata? or
  • Should we persist this in the configuration?
  • Finally, should I create a new ChainMap as it is now, or simply set
    layer_class._direct_metadata["vmcoreinfo"] = vmcoreinfo

@ikelos
Copy link
Member

ikelos commented Nov 19, 2024

First off, don't update the class metadata, or it could bleed through into other layers that share the same class, it needs to be on a per-instance basis. Also, each layer creates its own chainmap, so you shouldn't need a new one, but there should be very little impact on making a new one if you really want or see a need to...

The reason for stashing the original context and rebuilding it at the end is because the stacker will try lots of layers, and we didn't want the context filling up with bogus layers that didn't work. There should be a way of getting the information across the gap, but I'm not sure how best to do it. We could:

  • Put the metadata into the configuration (this will need some thought, but would ensure the metadata is accessible if the configuration is loaded from a config file)
  • Change the stacker to pull and repopulate metadata (this should only copy the top chainmap, since the rest should get mapped during creation, and oughtn't to end up with references to old contexts/things that will get deleted, but it's possible?).
  • Re-find the information outside of the layer stacker code in a normal automagic (so the layer's been made already, this has the downside of not being saved/loaded as part of the config, unless the details are added as part of a config object)
  • Add a new type of requirement (although it would need to be optional, since not every type of layer will have a vmcoreinfo). Seems impractical.

I think I'm leaning towards the configuration option, but the downside with that is, the configuration intentionally only allows simple data values, so you'd need to split up the object into the bits you want/need. It might be tricky to access (although, to be fair, we should probably do that with the metadata, so that the object isn't constructed on a clone context that then deviates from the "real" context later on). So yeah, some questions that could do with answering at some point, but I still feel there's a way of achieving the goal, we just need to choose it carefully...

@gcmoreira
Copy link
Contributor Author

gcmoreira commented Nov 20, 2024

Sounds good! I updated the code at this line to instead include vmcoreinfo in the layer configuration, like this:

                    for key, value in vmcoreinfo.items():
                        layer.config[join("vmcoreinfo", key)] = value

However, when it's merged here, the vmcoreinfo config path is not merged in the context.

                context.config.merge(
                    path, new_context.layers[layer].build_configuration()
                )
...

>>> list(new_context.layers[layer].config)
['kernel_banner', 'memory_layer', 'page_map_offset', 'kernel_virtual_offset', 'vmcoreinfo.OSRELEASE', 'vmcoreinfo.BUILD-ID', 'vmcoreinfo.PAGESIZE', 'vmcoreinfo.SYMBOL(init_uts_ns)',  ...]

>>> list(context.config)
['automagic.LayerStacker.single_location', 'automagic.LayerStacker.stackers', 'plugins.PsList.threads', 'plugins.PsList.decorate_comm', 'plugins.PsList.dump', 'plugins.PsList.pid', 'plugins.PsList.elfs.elfs', 'plugins.PsList.elfs.pslist.pslist', 'plugins.PsList.elfs.pslist.elfs.elfs', 'plugins.PsList.kernel.layer_name.page_map_offset', 'plugins.PsList.kernel.layer_name.kernel_virtual_offset', 'plugins.PsList.kernel.layer_name.kernel_banner', 'plugins.PsList.kernel.layer_name.class', 'plugins.PsList.kernel.layer_name.memory_layer.class', 'plugins.PsList.kernel.layer_name.memory_layer.base_layer.location', 'plugins.PsList.kernel.layer_name.memory_layer.base_layer.class']

>>> list(new_context.layers[layer].build_configuration())
['page_map_offset', 'kernel_virtual_offset', 'kernel_banner', 'class', 'memory_layer.class', 'memory_layer.base_layer.location', 'memory_layer.base_layer.class']

And if I take that layer from an automagic that runs after LayerStacker, the vmcoreinfo is not there.

>>> layer_name
'layer_name'

>>> context.layers[layer_name].config
['page_map_offset', 'kernel_virtual_offset', 'kernel_banner', 'class', 'memory_layer', 'memory_layer.class', 'memory_layer.base_layer', 'memory_layer.isf_url', 'memory_layer.symbol_mask', 'memory_layer.base_layer.location', 'memory_layer.base_layer.class']

any clue?

EDIT: Ok, it's only merging the values required by the Intel layer here.
So, we could create new mixins for Linux like my other PR does here and add vmcoreinfo as an optional requirement, however, if the configuration doesn't accept dicts, list of lists or tuples, adding a dictionary would be challenging; unless we serialize it, such as by pickling it and store it as string? That won’t work, for example, for saving the configuration to disk, but it could be suitable for moving data internally. Too horrible workaround :) ? It works, from an automagic:

>>> layer_name
'layer_name'
>>> layer = context.layers[layer_name]
>>> pickle.loads(layer.config["vmcoreinfo"])
{'OSRELEASE': '6.8.0-41-generic', 'BUILD-ID': '60a8f2b523b8e496d3358c463af50fcfa301a572', 'PAGESIZE': 4096, 'SYMBOL(init_uts_ns)': 184467440717779769 ...

@gcmoreira
Copy link
Contributor Author

Hey @ikelos, in the last commit, I reverted the changes related to storing the vmcoreinfo. It's not needed at this stage, and including it would delay and block this PR. Let's proceed without that.

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this looks pretty good, just a couple minor points (possibly from some edits/experiments at some point). Would probably've been quicker/easier to review as a plugin and then the automagic bit, but happy to keep them all in one PR if you'd prefer.

volatility3/framework/automagic/linux.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/linux.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/linux.py Show resolved Hide resolved
volatility3/framework/automagic/linux.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/linux.py Outdated Show resolved Hide resolved
"""Converts the input VMCoreInfo data buffer into a dictionary"""

# Ensure the whole buffer is printable
if not all(c in string.printable.encode() for c in vmcoreinfo_data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of single letter variable names... 5:S Could I tempt you to use char (or even chr) instead of c, pretty please? 5:)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, still hoping to get this changed). 5;)

@gcmoreira
Copy link
Contributor Author

@ikelos I think everything was addressed. It's ready to go.

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! Just a couple small points, but not a lot left to get it across the line! 5:)


# Never stack on top of an intel layer
# FIXME: Find a way to improve this check
if isinstance(layer, intel.Intel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this will go wrong for virtualized hosts. We're still a little way off supporting them yet, but definitely one not to forget about...

context,
config_path=config_path,
name=new_layer_name,
metadata={"os": "Linux"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had you wanted to pass the vmcore through the metadata? If so, here's where you'd do it, I guess? Don't remember if we mind metadata being complex types...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, I just read back through the old comments and know why that won't work.. We probably should pull the metadata across from the constructed layer after stacking (although loading it from a config won't pull it across, so it might still require a mechanism to repopulate somehow)...

# in kernels 4.13 in 65ade2f872b474fa8a04c2d397783350326634e6) to init_top_pgt.
# In x86-32, the pgd is swapper_pg_dir. So, in any case, for VMCOREINFO
# SYMBOL(swapper_pg_dir) will always have the right value.
dtb_vaddr = vmcoreinfo.get("SYMBOL(swapper_pg_dir)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we use this twice, should it not be a property or something?

):
for key, value in vmcoreinfo.items():
if key.startswith("SYMBOL(") or key == "KERNELOFFSET":
value = f"0x{value:x}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is the same as f"{value:#x}"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants