| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
mm: call ->free_folio() directly in folio_unmap_invalidate()
We can only call filemap_free_folio() if we have a reference to (or hold a
lock on) the mapping. Otherwise, we've already removed the folio from the
mapping so it no longer pins the mapping and the mapping can be removed,
causing a use-after-free when accessing mapping->a_ops.
Follow the same pattern as __remove_mapping() and load the free_folio
function pointer before dropping the lock on the mapping. That lets us
make filemap_free_folio() static as this was the only caller outside
filemap.c. |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: SEV: Lock all vCPUs when synchronzing VMSAs for SNP launch finish
Lock all vCPUs when synchronizing and encrypting VMSAs for SNP guests, as
allowing userspace to manipulate and/or run a vCPU while its state is being
synchronized would at best corrupt vCPU state, and at worst crash the host
kernel.
Opportunistically assert that vcpu->mutex is held when synchronizing its
VMSA (the SEV-ES path already locks vCPUs). |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: SEV: Protect *all* of sev_mem_enc_register_region() with kvm->lock
Take and hold kvm->lock for before checking sev_guest() in
sev_mem_enc_register_region(), as sev_guest() isn't stable unless kvm->lock
is held (or KVM can guarantee KVM_SEV_INIT{2} has completed and can't
rollack state). If KVM_SEV_INIT{2} fails, KVM can end up trying to add to
a not-yet-initialized sev->regions_list, e.g. triggering a #GP
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 110 UID: 0 PID: 72717 Comm: syz.15.11462 Tainted: G U W O 6.16.0-smp-DEV #1 NONE
Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
RIP: 0010:sev_mem_enc_register_region+0x3f0/0x4f0 ../include/linux/list.h:83
Code: <41> 80 3c 04 00 74 08 4c 89 ff e8 f1 c7 a2 00 49 39 ed 0f 84 c6 00
RSP: 0018:ffff88838647fbb8 EFLAGS: 00010256
RAX: dffffc0000000000 RBX: 1ffff92015cf1e0b RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffff888367870000
RBP: ffffc900ae78f050 R08: ffffea000d9e0007 R09: 1ffffd4001b3c000
R10: dffffc0000000000 R11: fffff94001b3c001 R12: 0000000000000000
R13: ffff8982ab0bde00 R14: ffffc900ae78f058 R15: 0000000000000000
FS: 00007f34e9dc66c0(0000) GS:ffff89ee64d33000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe180adef98 CR3: 000000047210e000 CR4: 0000000000350ef0
Call Trace:
<TASK>
kvm_arch_vm_ioctl+0xa72/0x1240 ../arch/x86/kvm/x86.c:7371
kvm_vm_ioctl+0x649/0x990 ../virt/kvm/kvm_main.c:5363
__se_sys_ioctl+0x101/0x170 ../fs/ioctl.c:51
do_syscall_x64 ../arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x6f/0x1f0 ../arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f34e9f7e9a9
Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f34e9dc6038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f34ea1a6080 RCX: 00007f34e9f7e9a9
RDX: 0000200000000280 RSI: 000000008010aebb RDI: 0000000000000007
RBP: 00007f34ea000d69 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f34ea1a6080 R15: 00007ffce77197a8
</TASK>
with a syzlang reproducer that looks like:
syz_kvm_add_vcpu$x86(0x0, &(0x7f0000000040)={0x0, &(0x7f0000000180)=ANY=[], 0x70}) (async)
syz_kvm_add_vcpu$x86(0x0, &(0x7f0000000080)={0x0, &(0x7f0000000180)=ANY=[@ANYBLOB="..."], 0x4f}) (async)
r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000200), 0x0, 0x0)
r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
r2 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000240), 0x0, 0x0)
r3 = ioctl$KVM_CREATE_VM(r2, 0xae01, 0x0)
ioctl$KVM_SET_CLOCK(r3, 0xc008aeba, &(0x7f0000000040)={0x1, 0x8, 0x0, 0x5625e9b0}) (async)
ioctl$KVM_SET_PIT2(r3, 0x8010aebb, &(0x7f0000000280)={[...], 0x5}) (async)
ioctl$KVM_SET_PIT2(r1, 0x4070aea0, 0x0) (async)
r4 = ioctl$KVM_CREATE_VM(0xffffffffffffffff, 0xae01, 0x0)
openat$kvm(0xffffffffffffff9c, 0x0, 0x0, 0x0) (async)
ioctl$KVM_SET_USER_MEMORY_REGION(r4, 0x4020ae46, &(0x7f0000000400)={0x0, 0x0, 0x0, 0x2000, &(0x7f0000001000/0x2000)=nil}) (async)
r5 = ioctl$KVM_CREATE_VCPU(r4, 0xae41, 0x2)
close(r0) (async)
openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x8000, 0x0) (async)
ioctl$KVM_SET_GUEST_DEBUG(r5, 0x4048ae9b, &(0x7f0000000300)={0x4376ea830d46549b, 0x0, [0x46, 0x0, 0x0, 0x0, 0x0, 0x1000]}) (async)
ioctl$KVM_RUN(r5, 0xae80, 0x0)
Opportunistically use guard() to avoid having to define a new error label
and goto usage. |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: SEV: Reject attempts to sync VMSA of an already-launched/encrypted vCPU
Reject synchronizing vCPU state to its associated VMSA if the vCPU has
already been launched, i.e. if the VMSA has already been encrypted. On a
host with SNP enabled, accessing guest-private memory generates an RMP #PF
and panics the host.
BUG: unable to handle page fault for address: ff1276cbfdf36000
#PF: supervisor write access in kernel mode
#PF: error_code(0x80000003) - RMP violation
PGD 5a31801067 P4D 5a31802067 PUD 40ccfb5063 PMD 40e5954063 PTE 80000040fdf36163
SEV-SNP: PFN 0x40fdf36, RMP entry: [0x6010fffffffff001 - 0x000000000000001f]
Oops: Oops: 0003 [#1] SMP NOPTI
CPU: 33 UID: 0 PID: 996180 Comm: qemu-system-x86 Tainted: G OE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R7625/0H1TJT, BIOS 1.5.8 07/21/2023
RIP: 0010:sev_es_sync_vmsa+0x54/0x4c0 [kvm_amd]
Call Trace:
<TASK>
snp_launch_update_vmsa+0x19d/0x290 [kvm_amd]
snp_launch_finish+0xb6/0x380 [kvm_amd]
sev_mem_enc_ioctl+0x14e/0x720 [kvm_amd]
kvm_arch_vm_ioctl+0x837/0xcf0 [kvm]
kvm_vm_ioctl+0x3fd/0xcc0 [kvm]
__x64_sys_ioctl+0xa3/0x100
x64_sys_call+0xfe0/0x2350
do_syscall_64+0x81/0x10f0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ffff673287d
</TASK>
Note, the KVM flaw has been present since commit ad73109ae7ec ("KVM: SVM:
Provide support to launch and run an SEV-ES guest"), but has only been
actively dangerous for the host since SNP support was added. With SEV-ES,
KVM would "just" clobber guest state, which is totally fine from a host
kernel perspective since userspace can clobber guest state any time before
sev_launch_update_vmsa(). |
| In the Linux kernel, the following vulnerability has been resolved:
arm64: mm: Handle invalid large leaf mappings correctly
It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit
a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.
It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:
[ 15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[ 15.476896] Unable to handle kernel paging request at virtual address ffff000019600000
[ 15.513762] Mem abort info:
[ 15.527245] ESR = 0x0000000096000046
[ 15.548553] EC = 0x25: DABT (current EL), IL = 32 bits
[ 15.572146] SET = 0, FnV = 0
[ 15.592141] EA = 0, S1PTW = 0
[ 15.612694] FSC = 0x06: level 2 translation fault
[ 15.640644] Data abort info:
[ 15.661983] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
[ 15.694875] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 15.723740] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000
[ 15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704
[ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP
[ 15.886394] Modules linked in:
[ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT
[ 15.935258] Hardware name: linux,dummy-virt (DT)
[ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c
[ 16.006163] lr : swiotlb_bounce+0xf4/0x158
[ 16.024145] sp : ffff80008000b8f0
[ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000
[ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000
[ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0
[ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc
[ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974
[ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018
[ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000
[ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000
[ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000
[ 16.349071] Call trace:
[ 16.360143] __pi_memcpy_generic+0x128/0x22c (P)
[ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4
[ 16.400282] swiotlb_map+0x5c/0x228
[ 16.415984] dma_map_phys+0x244/0x2b8
[ 16.432199] dma_map_page_attrs+0x44/0x58
[ 16.449782] virtqueue_map_page_attrs+0x38/0x44
[ 16.469596] virtqueue_map_single_attrs+0xc0/0x130
[ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc
[ 16.510355] try_fill_recv+0x2a4/0x584
[ 16.526989] virtnet_open+0xd4/0x238
[ 16.542775] __dev_open+0x110/0x24c
[ 16.558280] __dev_change_flags+0x194/0x20c
[ 16.576879] netif_change_flags+0x24/0x6c
[ 16.594489] dev_change_flags+0x48/0x7c
[ 16.611462] ip_auto_config+0x258/0x1114
[ 16.628727] do_one_initcall+0x80/0x1c8
[ 16.645590] kernel_init_freeable+0x208/0x2f0
[ 16.664917] kernel_init+0x24/0x1e0
[ 16.680295] ret_from_fork+0x10/0x20
[ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c)
[ 16.723106] ---[ end trace 0000000000000000 ]---
[ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff8000800000
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
vfio/xe: Reorganize the init to decouple migration from reset
Attempting to issue reset on VF devices that don't support migration
leads to the following:
BUG: unable to handle page fault for address: 00000000000011f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 7443 Comm: xe_sriov_flr Tainted: G S U 7.0.0-rc1-lgci-xe-xe-4588-cec43d5c2696af219-nodebug+ #1 PREEMPT(lazy)
Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
RIP: 0010:xe_sriov_vfio_wait_flr_done+0xc/0x80 [xe]
Code: ff c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 53 <83> bf f8 11 00 00 02 75 61 41 89 f4 85 f6 74 52 48 8b 47 08 48 89
RSP: 0018:ffffc9000f7c39b8 EFLAGS: 00010202
RAX: ffffffffa04d8660 RBX: ffff88813e3e4000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc9000f7c39c8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888101a48800
R13: ffff88813e3e4150 R14: ffff888130d0d008 R15: ffff88813e3e40d0
FS: 00007877d3d0d940(0000) GS:ffff88890b6d3000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000011f8 CR3: 000000015a762000 CR4: 0000000000f52ef0
PKRU: 55555554
Call Trace:
<TASK>
xe_vfio_pci_reset_done+0x49/0x120 [xe_vfio_pci]
pci_dev_restore+0x3b/0x80
pci_reset_function+0x109/0x140
reset_store+0x5c/0xb0
dev_attr_store+0x17/0x40
sysfs_kf_write+0x72/0x90
kernfs_fop_write_iter+0x161/0x1f0
vfs_write+0x261/0x440
ksys_write+0x69/0xf0
__x64_sys_write+0x19/0x30
x64_sys_call+0x259/0x26e0
do_syscall_64+0xcb/0x1500
? __fput+0x1a2/0x2d0
? fput_close_sync+0x3d/0xa0
? __x64_sys_close+0x3e/0x90
? x64_sys_call+0x1b7c/0x26e0
? do_syscall_64+0x109/0x1500
? __task_pid_nr_ns+0x68/0x100
? __do_sys_getpid+0x1d/0x30
? x64_sys_call+0x10b5/0x26e0
? do_syscall_64+0x109/0x1500
? putname+0x41/0x90
? do_faccessat+0x1e8/0x300
? __x64_sys_access+0x1c/0x30
? x64_sys_call+0x1822/0x26e0
? do_syscall_64+0x109/0x1500
? tick_program_event+0x43/0xa0
? hrtimer_interrupt+0x126/0x260
? irqentry_exit+0xb2/0x710
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7877d5f1c5a4
Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d a5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
RSP: 002b:00007fff48e5f908 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007877d5f1c5a4
RDX: 0000000000000001 RSI: 00007877d621b0c9 RDI: 0000000000000009
RBP: 0000000000000001 R08: 00005fb49113b010 R09: 0000000000000007
R10: 0000000000000000 R11: 0000000000000202 R12: 00007877d621b0c9
R13: 0000000000000009 R14: 00007fff48e5fac0 R15: 00007fff48e5fac0
</TASK>
This is caused by the fact that some of the xe_vfio_pci_core_device
members needed for handling reset are only initialized as part of
migration init.
Fix the problem by reorganizing the code to decouple VF init from
migration init. |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: rtw88: fix device leak on probe failure
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.
This driver takes a reference to the USB device during probe but does
not to release it on all probe errors (e.g. when descriptor parsing
fails).
Drop the redundant device reference to fix the leak, reduce cargo
culting, make it easier to spot drivers where an extra reference is
needed, and reduce the risk of further memory leaks. |
| In the Linux kernel, the following vulnerability has been resolved:
fbdev: udlfb: avoid divide-by-zero on FBIOPUT_VSCREENINFO
Much like commit 19f953e74356 ("fbdev: fb_pm2fb: Avoid potential divide
by zero error"), we also need to prevent that same crash from happening
in the udlfb driver as it uses pixclock directly when dividing, which
will crash. |
| In the Linux kernel, the following vulnerability has been resolved:
smb: client: avoid double-free in smbd_free_send_io() after smbd_send_batch_flush()
smbd_send_batch_flush() already calls smbd_free_send_io(),
so we should not call it again after smbd_post_send()
moved it to the batch list. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: validate EaNameLength in smb2_get_ea()
smb2_get_ea() reads ea_req->EaNameLength from the client request and
passes it directly to strncmp() as the comparison length without
verifying that the length of the name really is the size of the input
buffer received.
Fix this up by properly checking the size of the name based on the value
received and the overall size of the request, to prevent a later
strncmp() call to use the length as a "trusted" size of the buffer.
Without this check, uninitialized heap values might be slowly leaked to
the client. |
| In the Linux kernel, the following vulnerability has been resolved:
usb: gadget: f_phonet: fix skb frags[] overflow in pn_rx_complete()
A broken/bored/mean USB host can overflow the skb_shared_info->frags[]
array on a Linux gadget exposing a Phonet function by sending an
unbounded sequence of full-page OUT transfers.
pn_rx_complete() finalizes the skb only when req->actual < req->length,
where req->length is set to PAGE_SIZE by the gadget. If the host always
sends exactly PAGE_SIZE bytes per transfer, fp->rx.skb will never be
reset and each completion will add another fragment via
skb_add_rx_frag(). Once nr_frags exceeds MAX_SKB_FRAGS (default 17),
subsequent frag stores overwrite memory adjacent to the shinfo on the
heap.
Drop the skb and account a length error when the frag limit is reached,
matching the fix applied in t7xx by commit f0813bcd2d9d ("net: wwan:
t7xx: fix potential skb->frags overflow in RX path"). |
| In the Linux kernel, the following vulnerability has been resolved:
fbdev: tdfxfb: avoid divide-by-zero on FBIOPUT_VSCREENINFO
Much like commit 19f953e74356 ("fbdev: fb_pm2fb: Avoid potential divide
by zero error"), we also need to prevent that same crash from happening
in the udlfb driver as it uses pixclock directly when dividing, which
will crash. |
| In the Linux kernel, the following vulnerability has been resolved:
bnge: return after auxiliary_device_uninit() in error path
When auxiliary_device_add() fails, the error block calls
auxiliary_device_uninit() but does not return. The uninit drops the
last reference and synchronously runs bnge_aux_dev_release(), which sets
bd->auxr_dev = NULL and frees the underlying object. The subsequent
bd->auxr_dev->net = bd->netdev then dereferences NULL, which is not a
good thing to have happen when trying to clean up from an error.
Add the missing return, as the auxiliary bus documentation states is a
requirement (seems that LLM tools read documentation better than humans
do...) |
| In the Linux kernel, the following vulnerability has been resolved:
x86/CPU: Fix FPDSS on Zen1
Zen1's hardware divider can leave, under certain circumstances, partial
results from previous operations. Those results can be leaked by
another, attacker thread.
Fix that with a chicken bit. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix buffer overread in rxgk_do_verify_authenticator()
Fix rxgk_do_verify_authenticator() to check the buffer size before checking
the nonce. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix integer overflow in rxgk_verify_response()
In rxgk_verify_response(), there's a potential integer overflow due to
rounding up token_len before checking it, thereby allowing the length check to
be bypassed.
Fix this by checking the unrounded value against len too (len is limited as
the response must fit in a single UDP packet). |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: fix reference count leak in rxrpc_server_keyring()
This patch fixes a reference count leak in rxrpc_server_keyring()
by checking if rx->securities is already set. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: reject undecryptable rxkad response tickets
rxkad_decrypt_ticket() decrypts the RXKAD response ticket and then
parses the buffer as plaintext without checking whether
crypto_skcipher_decrypt() succeeded.
A malformed RESPONSE can therefore use a non-block-aligned ticket
length, make the decrypt operation fail, and still drive the ticket
parser with attacker-controlled bytes.
Check the decrypt result and abort the connection with RXKADBADTICKET
when ticket decryption fails. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Only put the call ref if one was acquired
rxrpc_input_packet_on_conn() can process a to-client packet after the
current client call on the channel has already been torn down. In that
case chan->call is NULL, rxrpc_try_get_call() returns NULL and there is
no reference to drop.
The client-side implicit-end error path does not account for that and
unconditionally calls rxrpc_put_call(). This turns a protocol error
path into a kernel crash instead of rejecting the packet.
Only drop the call reference if one was actually acquired. Keep the
existing protocol error handling unchanged. |
| In the Linux kernel, the following vulnerability has been resolved:
mm: filemap: fix nr_pages calculation overflow in filemap_map_pages()
When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I
encountered some very strange crash issues showing up as "Bad page state":
"
[ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb
[ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb
[ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff)
[ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000
[ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000
[ 734.496442] page dumped because: nonzero mapcount
"
After analyzing this page’s state, it is hard to understand why the
mapcount is not 0 while the refcount is 0, since this page is not where
the issue first occurred. By enabling the CONFIG_DEBUG_VM config, I can
reproduce the crash as well and captured the first warning where the issue
appears:
"
[ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0
[ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 734.469315] memcg:ffff000807a8ec00
[ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540"
[ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff)
......
[ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1),
const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *:
(struct folio *)_compound_head(page + nr_pages - 1))) != folio)
[ 734.469390] ------------[ cut here ]------------
[ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468,
CPU#90: stress-ng-mlock/9430
[ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P)
[ 734.469555] set_pte_range+0xd8/0x2f8
[ 734.469566] filemap_map_folio_range+0x190/0x400
[ 734.469579] filemap_map_pages+0x348/0x638
[ 734.469583] do_fault_around+0x140/0x198
......
[ 734.469640] el0t_64_sync+0x184/0x188
"
The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page +
nr_pages - 1) != folio, folio)", which indicates that set_pte_range()
tried to map beyond the large folio’s size.
By adding more debug information, I found that 'nr_pages' had overflowed
in filemap_map_pages(), causing set_pte_range() to establish mappings for
a range exceeding the folio size, potentially corrupting fields of pages
that do not belong to this folio (e.g., page->_mapcount).
After above analysis, I think the possible race is as follows:
CPU 0 CPU 1
filemap_map_pages() ext4_setattr()
//get and lock folio with old inode->i_size
next_uptodate_folio()
.......
//shrink the inode->i_size
i_size_write(inode, attr->ia_size);
//calculate the end_pgoff with the new inode->i_size
file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
end_pgoff = min(end_pgoff, file_end);
......
//nr_pages can be overflowed, cause xas.xa_index > end_pgoff
end = folio_next_index(folio) - 1;
nr_pages = min(end, end_pgoff) - xas.xa_index + 1;
......
//map large folio
filemap_map_folio_range()
......
//truncate folios
truncate_pagecache(inode, inode->i_size);
To fix this issue, move the 'end_pgoff' calculation before
next_uptodate_folio(), so the retrieved folio stays consistent with the
file end to avoid
---truncated--- |