Skip to content

Drivers: hv: mshv_vtl: fix GUP into VTL0 device mappings#141

Merged
namancse merged 2 commits into
product/hcl-main/6.18from
user/namjain/6.18-warn-fix-v3
Jun 9, 2026
Merged

Drivers: hv: mshv_vtl: fix GUP into VTL0 device mappings#141
namancse merged 2 commits into
product/hcl-main/6.18from
user/namjain/6.18-warn-fix-v3

Conversation

@namancse

@namancse namancse commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Restores GUP (get_user_pages) into VTL0 memory mappings, broken by the 6.15 ZONE_DEVICE / pte_devmap removal (aed877c, d3f7922). After that refactor, GUP only walks PTEs/PMDs/PUDs that point to a real folio with a held reference; the legacy pte_devmap fast-path is gone. mshv_vtl_low was still installing devmap PTEs via vmf_insert_pfn_*, so userspace pins on /dev/mshv_vtl_low mappings silently failed.

Two commits:

  1. use folio-aware inserters for huge VTL0 mappings — switches the PMD/PUD fault paths to vmf_insert_folio_pmd / vmf_insert_folio_pud, resolving the pfn to its struct page / pgmap folio and verifying the folio order matches the fault order.

  2. fix GUP into VTL0 mappings on the 4K fault path — adds a folio-aware 4K path using vmf_insert_page_mkwrite once the pgmap is live, with a pte_special fallback (via vmf_insert_mixed) for early faults before devm_memremap_pages has run. Captures the chardev address_space on first open (cmpxchg) and calls unmap_mapping_range for both the encrypted and DECRYPTED_MASK-aliased pfns after pgmap registration so any stale special PTEs are dropped and refaulted as folio-backed. VM_MIXEDMAP | VM_DONTEXPAND are set on the VMA.

Copilot AI review requested due to automatic review settings June 3, 2026 06:06

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the mshv_vtl_low mmap fault paths so VTL0 ZONE_DEVICE mappings become GUP-pinable again after the removal of the pte_devmap fast-path in 6.15. It does so by switching huge faults to folio-aware inserters and by making the 4K fault path insert a refcounted page once the pgmap exists, while providing an early pre-pgmap pte_special fallback and later zapping those stale PTEs.

Changes:

  • Add a pgmap-backed PFN→struct page resolver and use vmf_insert_page_mkwrite() (4K) / vmf_insert_folio_pmd() / vmf_insert_folio_pud() (huge) so faults install folio-backed entries suitable for GUP.
  • Capture the /dev/mshv_vtl_low address_space on first open and invalidate early-fault pte_special mappings after pgmap registration.
  • Tighten VMA flags for the mapping (VM_MIXEDMAP + VM_DONTEXPAND) to support the mixed fallback and keep the mapping size pinned.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread drivers/hv/mshv_vtl_main.c Outdated
@namancse namancse force-pushed the user/namjain/6.18-warn-fix-v3 branch from 09c2a53 to dce4fdc Compare June 3, 2026 06:41
@namancse

namancse commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Adding the bug link for future reference.
https://microsoft.visualstudio.com/OS/_workitems/edit/62398261/

@namancse namancse requested a review from hargar19 June 4, 2026 04:31
Comment thread drivers/hv/mshv_vtl_main.c Outdated
Comment thread drivers/hv/mshv_vtl_main.c Outdated
@namancse namancse force-pushed the user/namjain/6.18-warn-fix-v3 branch from dce4fdc to b727d48 Compare June 8, 2026 05:19
Copilot AI review requested due to automatic review settings June 8, 2026 05:19
@namancse

namancse commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Fixed KPA issues reported by Hardik offline.

=== [1/2] Drivers: hv: mshv_vtl: use folio-aware inserters for huge VTL0 mappings ===

  1. [HIGH] mshv_vtl_low_resolve_page() accepts any user-selected MEMORY_DEVICE_GENERIC ZONE_DEVICE PFN without proving it belongs to mshv_vtl and without taking a live dev_pagemap reference, so the new huge fau…
    Sources: review-prompts
    Evidence: mshv_vtl_low_resolve_page() accepts any user-selected MEMORY_DEVICE_GENERIC
    ZONE_DEVICE PFN without proving it belongs to mshv_vtl and
    without taking a live dev_pagemap reference, so the new huge
    fault path can map foreign device memory and race foreign pgmap
    teardown.
    Impact: Severity: High — matches "Use-after-free, NULL deref, or double-free reachable
    by any reasonable user-space activity (e.g., bringing an interface
    up after a probe failure, reading a debugfs file that points at
    freed memory)" because a CAP_SYS_ADMIN process can choose the mmap
    offset, fault a foreign generic devmap PFN through this misc
    device, and race the owning device's removal while this helper uses
    pgmap/page metadata without get_dev_pagemap(). The PFN is derived
    directly from the VMA offset: drivers/hv/mshv_vtl_main.c:3723 sets
    pfn = vmf->pgoff & ~DECRYPTED_MASK, and
    drivers/hv/mshv_vtl_main.c:3682-3698 can only align the huge fault
    within the VMA; it does not check a registered mshv_vtl range. The
    resolver's success condition is only generic devmap type:
    drivers/hv/mshv_vtl_main.c:3710-3718 checks pfn_valid(),
    is_zone_device_page(), page_pgmap(page), and pgmap->type ==
    MEMORY_DEVICE_GENERIC, then returns page with no owner/range check
    and no get_dev_pagemap(). The structure explicitly has an owner
    field for this purpose: include/linux/memremap.h:121-123 says
    @owner is "Used by various helpers to make sure that no foreign
    ZONE_DEVICE memory is accessed." Foreign generic pgmaps exist
    in-tree, e.g. drivers/dax/device.c:445-449 sets pgmap->type =
    MEMORY_DEVICE_GENERIC and calls devm_memremap_pages(). The proper
    live-reference API is mm/memremap.c:399-415, where
    get_dev_pagemap() takes a live percpu ref under RCU; this helper
    does not use it. Teardown can proceed independently for foreign
    pgmaps: mm/memremap.c:112-126 kills the pgmap ref, waits for
    completion, unmaps ranges, and exits the ref, while
    drivers/hv/mshv_vtl_main.c:3746-3753 and 3759-3766 can continue
    from the unpinned page into page_folio(), folio_order(), and
    vmf_insert_folio_pmd()/vmf_insert_folio_pud().
    Suggested fix: (see Evidence)
    Fixed

  2. [HIGH] A fault can race MSHV_ADD_VTL0_MEMORY while devm_memremap_pages() has made PFNs valid and ZONE_DEVICE but before __init_zone_device_page() publishes page->pgmap, causing mshv_vtl_low_resolve_page() t…
    Sources: review-prompts
    Evidence: A fault can race MSHV_ADD_VTL0_MEMORY while devm_memremap_pages() has made
    PFNs valid and ZONE_DEVICE but before __init_zone_device_page()
    publishes page->pgmap, causing mshv_vtl_low_resolve_page() to
    dereference the lru/pgmap union as a bogus dev_pagemap pointer.
    Impact: Severity: High — matches "Race condition with a realistic concurrent access
    pattern (probe vs IRQ, suspend vs ioctl, two CPUs hitting the same
    hot path) that produces corruption or crash" because one thread can
    register VTL0 memory while another faults the same user-selected
    PFN through an existing mapping, and the helper dereferences
    page_pgmap() without synchronization against page metadata
    publication. Registration exposes a wide publication window:
    mm/memremap.c:180-183 stores the pgmap in pgmap_array,
    mm/memremap.c:221-234 calls arch_add_memory(),
    move_pfn_range_to_zone(), and mem_hotplug_done(), then
    mm/memremap.c:238-244 only later calls memmap_init_zone_device().
    move_pfn_range_to_zone() initializes visible page metadata before
    the pgmap backpointer: mm/memory_hotplug.c:776-784 notes a visible
    range and calls memmap_init_range(); mm/mm_init.c:581-592 shows
    __init_single_page() zeroes the page, sets zone links, initializes
    ref/mapcount, and INIT_LIST_HEAD(&page->lru).
    __init_zone_device_page() then sets the ZONE_DEVICE page's pgmap
    only after __init_single_page(): mm/mm_init.c:1007-1029 calls
    __init_single_page(page, pfn, zone_idx, nid), sets PageReserved,
    and later assigns page_folio(page)->pgmap = pgmap. The fault
    helper's sequence is unsafe in that window:
    drivers/hv/mshv_vtl_main.c:3710-3718 checks pfn_valid(), then
    is_zone_device_page(), then calls page_pgmap(page) and immediately
    dereferences pgmap->type. include/linux/mmzone.h:1209-1213
    implements page_pgmap() as return page_folio(page)->pgmap, while
    include/linux/mm_types.h:381-394 shows pgmap shares the union with
    struct list_head lru; after INIT_LIST_HEAD but before pgmap
    assignment, a racing fault can read the list_head value as a
    non-NULL pgmap and dereference pgmap->type.
    Suggested fix: (see Evidence)
    Fixed

  3. [MEDIUM] PMD/PUD faults now install refcounted file-rmapped folios into a non-DAX VM_MIXEDMAP VMA, but the huge zap/split paths classify that VMA as special and skip the matching rmap, RSS, and folio-referenc…
    Sources: review-prompts
    Evidence: PMD/PUD faults now install refcounted file-rmapped folios into a non-DAX
    VM_MIXEDMAP VMA, but the huge zap/split paths classify that VMA
    as special and skip the matching rmap, RSS, and folio-reference
    teardown.
    Impact: Severity: Medium — matches "Resource leak with bounded blast radius (a few
    pages, a single file descriptor, one workqueue) on a rare path; not
    exploitable as a DoS" because the device is CAP_SYS_ADMIN-gated,
    but every successful huge VTL0 mapping can leak the mapping
    reference and stale rmap/accounting until the page lifetime ends.
    The introduced path is reachable:
    drivers/hv/mshv_vtl_main.c:3743-3753 handles PMD_ORDER by resolving
    a page, requiring folio_order(folio) == PMD_ORDER, then calling
    vmf_insert_folio_pmd(); drivers/hv/mshv_vtl_main.c:3755-3766 does
    the same for PUD_ORDER. Those inserters add the resources the
    commit relies on: mm/huge_memory.c:1440-1448 builds a normal folio
    PMD, calls folio_get(), folio_add_file_rmap_pmd(), and
    add_mm_counter(); mm/huge_memory.c:1562-1567 does the analogous PUD
    folio_get(), folio_add_file_rmap_pud(), and add_mm_counter(). The
    VMA remains a non-DAX VM_MIXEDMAP file VMA: mm/vma.c:2408 sets
    vma->vm_file = get_file(map->file),
    drivers/hv/mshv_vtl_main.c:3786-3788 assigns mshv_vtl_low_vm_ops
    and sets VM_HUGEPAGE | VM_MIXEDMAP | VM_DONTEXPAND, and
    include/linux/fs.h:3796-3799 defines vma_is_dax() as
    file_is_dax(vma->vm_file) with no DAX setup in this driver.
    include/linux/mm.h:4142-4145 makes vma_is_special_huge() true for
    vma->vm_file && VM_MIXEDMAP. On teardown,
    mm/huge_memory.c:2196-2200 makes zap_huge_pmd() take the
    !vma_is_dax(vma) && vma_is_special_huge(vma) branch and unlock
    without reaching mm/huge_memory.c:2211-2245, where
    folio_remove_rmap_pmd(), add_mm_counter(..., -HPAGE_PMD_NR), and
    tlb_remove_page_size() would run. mm/huge_memory.c:2709-2711 makes
    zap_huge_pud() take the same special branch and skip
    mm/huge_memory.c:2720-2726, where folio_remove_rmap_pud(), RSS
    decrement, and tlb_remove_page_size() would run. The split path has
    the same classification: mm/huge_memory.c:2851-2860 clears a
    non-anonymous PMD and returns immediately for non-DAX special huge
    VMAs, skipping mm/huge_memory.c:2869-2878, where the file folio
    rmap/ref/accounting cleanup happens.
    Suggested fix: (see Evidence)

: Added a comment, this is not practical with current design of OpenVMM:
/*

  • Note on rmap/RSS accounting for huge VTL0 mappings:
  • vmf_insert_folio_{pmd,pud}() takes a folio reference, adds a file rmap,
  • and bumps mm RSS, but the matching teardown is skipped at zap/split time
  • because vma_is_special_huge() is true (VM_MIXEDMAP) while vma_is_dax() is
  • false (CONFIG_FS_DAX is not set in OHCL). The drift is theoretical for
  • OpenVMM/OpenHCL: VTL0 memory is mapped once per partition and held for
  • its lifetime - there is no map/unmap cycling, no partial munmap, and the
  • driver is not unloaded. Stale refs land on ZONE_DEVICE folios whose
  • pgmap is intentionally never released, no real bytes are leaked, and the
  • mm's inflated RSS is discarded with the mm at process exit.
    */
  1. [MEDIUM] The folio insertion path adds file rmap state for ZONE_DEVICE folios without initializing folio->mapping and folio->index to the mshv_vtl_low address_space, so object-based rmap walks cannot find map…
    Sources: review-prompts
    Evidence: The folio insertion path adds file rmap state for ZONE_DEVICE folios without
    initializing folio->mapping and folio->index to the mshv_vtl_low
    address_space, so object-based rmap walks cannot find mappings
    that were accounted as file-rmapped.
    Impact: Severity: Medium — matches "Partial-state hazard: failure path leaves the
    device half-torn-down but a subsequent normal operation (remove,
    reboot) cleans it up; user-visible symptom only if userspace
    touches the half-state" because the fault path creates
    file-rmap/mapcount/accounting state, but later rmap-based
    operations on the folio have no file mapping/index to walk. The
    driver resolves a ZONE_DEVICE page and directly inserts it as a
    file folio: drivers/hv/mshv_vtl_main.c:3746-3753 calls
    vmf_insert_folio_pmd(), and drivers/hv/mshv_vtl_main.c:3759-3766
    calls vmf_insert_folio_pud(), with no assignment to folio->mapping
    or folio->index. ZONE_DEVICE initialization sets pgmap metadata but
    not the file mapping fields: mm/mm_init.c:1023-1029 assigns
    page_folio(page)->pgmap = pgmap and page->zone_device_data = NULL,
    while include/linux/mm_types.h:393-399 shows pgmap, mapping, and
    index are distinct folio fields. The reference device-dax path does
    the missing setup: drivers/dax/device.c:75-99 computes the file
    offset and assigns folio->mapping = filp->f_mapping and
    folio->index = pgoff + i before drivers/dax/device.c:173-176 calls
    vmf_insert_folio_pmd() and drivers/dax/device.c:219-222 calls
    vmf_insert_folio_pud(). The consequence is concrete in rmap:
    mm/rmap.c:2937-2952 requires folio->mapping and returns immediately
    if it is NULL, so the file-rmap state added by
    mm/huge_memory.c:1446-1448 and 1565-1567 cannot be found by
    object-based rmap walkers.
    Suggested fix: (see Evidence)
    : This issue will not happen in practice on OpenHCL, but adding it maintains parity with DAX implementation. Adding it now.

  2. [LOW] missing Fixes:
    Sources: llm-analysis
    Evidence: missing Fixes: tag
    Sources: review-prompts/kernel/missing-fixes-tag.md
    Evidence: This is a bug fix: the patch addresses a WARN in
    try_grab_folio() and -ENOMEM I/O failure after v6.15 GUP changes
    stopped taking pgmap references for ZONE_DEVICE pages while huge
    VTL0 mappings still used vmf_insert_pfn_{pmd,pud}() without
    holding a folio reference. The commit message identifies the
    v6.15 changes as the regression source, but no Fixes: trailer is
    present.
    Impact: (Phase 3 finding — see Evidence for full reasoning)
    Suggested fix: Suggested fix: add Fixes: aed877c2b425
    Impact: lost attribution / incomplete stable backports

  3. [LOW] [checkpatch:WARNING] line length of 103 exceeds 100 columns
    Sources: tools
    Evidence: checkpatch.pl (WARNING) on
    0001-Drivers-hv-mshv_vtl-use-folio-aware-inserters-for-huge-VTL0-.patch:
    WARNING: line length of 103 exceeds 100 columns
    Impact: checkpatch warning: style/quality issue surfaced by the kernel's patch linter.
    Suggested fix: Address the checkpatch.pl finding in the patch before submission.

  4. [LOW] [checkpatch:WARNING] line length of 108 exceeds 100 columns
    Sources: tools
    Evidence: checkpatch.pl (WARNING) on
    0001-Drivers-hv-mshv_vtl-use-folio-aware-inserters-for-huge-VTL0-.patch:
    WARNING: line length of 108 exceeds 100 columns
    Impact: checkpatch warning: style/quality issue surfaced by the kernel's patch linter.
    Suggested fix: Address the checkpatch.pl finding in the patch before submission.

=== [2/2] Drivers: hv: mshv_vtl: fix GUP into VTL0 mappings on the 4K fault path ===

  1. [HIGH] mshv_vtl_low_mapping stores only the first opened character-device inode's address_space without pinning its inode and without tracking other active inode mappings, so stale fallback PTEs can be miss…
    Sources: review-prompts
    Evidence: mshv_vtl_low_mapping stores only the first opened character-device inode's
    address_space without pinning its inode and without tracking
    other active inode mappings, so stale fallback PTEs can be missed
    and a later ADD_VTL0_MEMORY can dereference freed inode storage.
    Impact: Severity: High - matches the High row 'Use-after-free, NULL deref, or
    double-free reachable by any reasonable user-space activity'
    because a temporary or alternate character-device inode can be
    opened to publish its embedded address_space, then closed/unlinked
    and evicted before a later successful ioctl dereferences the raw
    global pointer. The driver publishes only the first inode mapping
    at drivers/hv/mshv_vtl_main.c:3668-3672 and has no .release hook in
    drivers/hv/mshv_vtl_main.c:3817-3821 to clear it or drop a
    reference. struct inode contains the address_space pointer and
    embedded i_data at include/linux/fs.h:807 and
    include/linux/fs.h:883, and do_dentry_open() initializes each file
    from the opened inode at fs/open.c:911-913. VMAs are linked into
    file->f_mapping at mm/vma.c:1774-1783, while
    misc_open()/chrdev_open() pass the actual opened inode to the
    device open path at drivers/char/misc.c:160-163 and
    fs/char_dev.c:412-415, so alternate device nodes with the same
    dev_t can have different address_space trees that the single global
    will not zap. On inode lifetime, iput() calls iput_final() when
    i_count reaches zero at fs/inode.c:1957-1966, evict() calls
    destroy_inode() at fs/inode.c:809-834, and destroy_inode()
    schedules the inode memory for freeing at fs/inode.c:389-401. The
    ADD_VTL0_MEMORY success path later blindly uses the saved pointer
    in unmap_mapping_pages() at drivers/hv/mshv_vtl_main.c:1220-1222.
    Suggested fix: (see Evidence)
    : THis is not practical with OpenVMM design. OpenVMM is the only userspace of VTL2 kernel, and such attacks are not possible.

  2. [HIGH] The new order-0 folio path accepts any MEMORY_DEVICE_GENERIC ZONE_DEVICE PFN selected by the mmap offset as if it belonged to mshv_vtl, allowing /dev/mshv_vtl_low to install normal pinnable PTEs for…
    Sources: review-prompts
    Evidence: The new order-0 folio path accepts any MEMORY_DEVICE_GENERIC ZONE_DEVICE PFN
    selected by the mmap offset as if it belonged to mshv_vtl,
    allowing /dev/mshv_vtl_low to install normal pinnable PTEs for
    foreign device memory.
    Impact: Severity: High - matches the High row 'Security boundary violation: missing
    capability check on a privileged operation, namespace escape,
    missing ns_capable for a write' because the driver bypasses the
    owning device's mmap and lifetime policy by treating unrelated
    generic dev_pagemap memory as VTL0 memory. The commit claims the
    normal path is for an mshv_vtl pgmap, but
    mshv_vtl_low_resolve_page() only checks pfn_valid(),
    is_zone_device_page(), page_pgmap(page), and pgmap->type ==
    MEMORY_DEVICE_GENERIC at drivers/hv/mshv_vtl_main.c:3710-3718. The
    mshv registration code creates MEMORY_DEVICE_GENERIC pgmaps at
    drivers/hv/mshv_vtl_main.c:1184-1188 but sets no owner and records
    no range registry used by resolve_page(). struct dev_pagemap
    documents owner as identifying the managing entity and preventing
    foreign ZONE_DEVICE access at include/linux/memremap.h:121-123.
    Other in-tree code creates MEMORY_DEVICE_GENERIC pgmaps, including
    device DAX at drivers/dax/device.c:445-449 and Xen unpopulated
    memory at drivers/xen/unpopulated-alloc.c:90-97. The order-0 fault
    uses vmf->pgoff as the PFN at drivers/hv/mshv_vtl_main.c:3723 and
    maps any non-NULL resolved page with vmf_insert_page_mkwrite() at
    drivers/hv/mshv_vtl_main.c:3730-3741; mshv_vtl_low_mmap() performs
    no range validation at drivers/hv/mshv_vtl_main.c:3784-3794.
    Suggested fix: (see Evidence)
    : pre existing issue, not practical with existing OpenVMM design.

  3. [HIGH] The new order-0 vmf_insert_page_mkwrite() path can make PFNs writable for MAP_PRIVATE mappings from an O_RDONLY /dev/mshv_vtl_low file descriptor, violating the read-only fd and private-mapping contr…
    Sources: review-prompts
    Evidence: The new order-0 vmf_insert_page_mkwrite() path can make PFNs writable for
    MAP_PRIVATE mappings from an O_RDONLY /dev/mshv_vtl_low file
    descriptor, violating the read-only fd and private-mapping
    contract once the PFN resolves to a pgmap-backed page.
    Impact: Severity: High - matches the High row 'Security boundary violation: missing
    capability check on a privileged operation, namespace escape,
    missing ns_capable for a write' because a read-only fd can be
    delegated and then used for PROT_WRITE MAP_PRIVATE faults that
    write the underlying VTL0/device page instead of creating a private
    copy. mmap permission checks require FMODE_WRITE for
    MAP_SHARED|PROT_WRITE at mm/mmap.c:445-450, but MAP_PRIVATE only
    requires FMODE_READ at mm/mmap.c:466-468; vm_flags already include
    the PROT_WRITE-derived VM_WRITE at mm/mmap.c:400-401.
    mshv_vtl_low_open() checks CAP_SYS_ADMIN but not filp->f_mode at
    drivers/hv/mshv_vtl_main.c:3668-3672, and mshv_vtl_low_mmap() does
    not reject private or writable VMAs at
    drivers/hv/mshv_vtl_main.c:3784-3794. The new fault path derives
    write from FAULT_FLAG_WRITE at drivers/hv/mshv_vtl_main.c:3724 and
    calls vmf_insert_page_mkwrite(vmf, page, write) at
    drivers/hv/mshv_vtl_main.c:3741. insert_page_into_pte_locked()
    makes the PTE dirty and maybe writable when mkwrite is true at
    mm/memory.c:2306-2309, and maybe_mkwrite() sets write permission
    whenever VM_WRITE is set at include/linux/mm.h:1298-1302.
    Suggested fix: (see Evidence)

  4. [MEDIUM] A concurrent order-0 /dev/mshv_vtl_low fault can install a stale pte_special mapping after MSHV_ADD_VTL0_MEMORY has already registered and zapped the range, so later GUP can still fail on a freshly r…
    Sources: review-prompts
    Evidence: A concurrent order-0 /dev/mshv_vtl_low fault can install a stale pte_special
    mapping after MSHV_ADD_VTL0_MEMORY has already registered and
    zapped the range, so later GUP can still fail on a freshly
    registered VTL0 chunk.
    Impact: Severity: Medium - matches the Medium row 'Concurrency hazard that requires
    unusual scheduling to trigger (very narrow window, requires
    specific hardware behavior)' because it requires a fault racing the
    registration ioctl, but the failure is concrete: stale pte_special
    state remains and vm_normal_page()/GUP can still fail. The success
    path registers the pgmap at drivers/hv/mshv_vtl_main.c:1204, then
    performs a one-shot zap at drivers/hv/mshv_vtl_main.c:1216-1223.
    The order-0 fault path decides once at
    drivers/hv/mshv_vtl_main.c:3730-3739:
    mshv_vtl_low_resolve_page(pfn) can return NULL, after which the
    handler later calls vmf_insert_mixed(vmf->vma, vmf->address, pfn).
    unmap_mapping_pages() only walks and zaps currently mapped PTEs
    under i_mmap_lock_read at mm/memory.c:4218-4233; it does not
    serialize future fault insertion. vmf_insert_mixed() reaches
    insert_pfn(), which takes the PTE lock and installs the PTE at
    mm/memory.c:2576-2612. Therefore the structurally possible
    interleaving is: CPU0 resolves NULL before devm_memremap_pages() is
    visible, CPU1 completes memremap and unmap_mapping_pages() while no
    PTE exists, then CPU0 resumes and inserts the special PTE after the
    only zap.
    Suggested fix: (see Evidence)
    : So: real race, narrow window, observable only as the original WARN re-firing. Not a crash, not a security hole. Same severity as the bug the series already addresses, just at a finer interleaving. I am thinking of not over complicating this path.

  5. [LOW] missing Fixes:
    Sources: llm-analysis
    Evidence: missing Fixes: tag
    Sources: review-prompts/kernel/missing-fixes-tag.md
    Evidence: This patch fixes a user-visible GUP/O_DIRECT failure in
    the existing /dev/mshv_vtl_low 4K fault path caused by
    vmf_insert_mixed() installing pte_special PTEs that
    pin_user_pages*() cannot pin after MSHV_ADD_VTL0_MEMORY, but it
    lacks a Fixes: tag. The affected VTL0 mapping/add_vtl0_mem path
    appears to have been introduced by hyperv/mshv_vtl: Add SEV SNP
    guest support.
    Impact: (Phase 3 finding — see Evidence for full reasoning)
    Suggested fix: Suggested fix: add Fixes: be32e6590b6f
    Impact: lost attribution / incomplete stable backports

  6. [LOW] [checkpatch:WARNING] line length of 107 exceeds 100 columns
    Sources: tools
    Evidence: checkpatch.pl (WARNING) on
    0002-Drivers-hv-mshv_vtl-fix-GUP-into-VTL0-mappings-on-the-4K-fau.patch:
    WARNING: line length of 107 exceeds 100 columns
    Impact: checkpatch warning: style/quality issue surfaced by the kernel's patch linter.
    Suggested fix: Address the checkpatch.pl finding in the patch before submission.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread drivers/hv/mshv_vtl_main.c
Comment thread Microsoft/hcl-x64.config
Comment thread Microsoft/hcl-arm64.config
Naman Jain added 2 commits June 8, 2026 05:31
Since v6.15 (aed877c, d3f7922), GUP no longer takes a pgmap
reference for ZONE_DEVICE pages and walks huge entries through the
unified folio path. With vmf_insert_pfn_{pmd,pud}() the mapping holds
no folio reference, so a zap racing with pin_user_pages_fast() can
briefly drop the folio refcount to 0 and trigger a WARN in
try_grab_folio() with the I/O failing as -ENOMEM.

Switch the PMD/PUD fault paths to vmf_insert_folio_{pmd,pud}(),
mirroring drivers/dax/device.c. Each map takes folio_get(); the
matching folio_put() in zap keeps the refcount above 0. Gate the huge
inserters on pfn_valid() + ZONE_DEVICE + MEMORY_DEVICE_GENERIC via
mshv_vtl_low_resolve_page(); fall back to VM_FAULT_FALLBACK when the
folio order does not match PMD_ORDER/PUD_ORDER or the PFN is not yet
pgmap-backed, so the core can retry at smaller order. Add VM_DONTEXPAND
to the VMA to block mremap() growth past the pgmap.

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Extend the folio-aware fault path to the 4K case so GUP into
/dev/mshv_vtl_low works after MSHV_ADD_VTL0_MEMORY has registered the
range. With the previous vmf_insert_mixed() path the PTE was always
pte_special, vm_normal_page() returned NULL during pin_user_pages*(),
follow_pfn_pte() returned -EEXIST, and io_uring O_DIRECT surfaced it
as "disk io error: io error: File exists (os error 17)" on the first
DMA into a freshly-registered VTL0 chunk.

The 4K path now resolves the PFN via mshv_vtl_low_resolve_page(): when
backed by an mshv_vtl pgmap the PTE is installed with
vmf_insert_page_mkwrite(), giving GUP a normal pinnable page; otherwise
it falls back to vmf_insert_mixed() so early CPU accesses (e.g. the
VTL2 guest-memory self test reading GPA 0 before any add_vtl0_mem
ioctl) still succeed instead of SIGBUSing.

Such fallback PTEs would persist across registration and break later
GUP. Capture the cdev's address_space on first open and, on successful
MSHV_ADD_VTL0_MEMORY, invalidate the file-offset range via
unmap_mapping_range() for both the encrypted (pfn) and decrypted
(pfn | DECRYPTED_MASK) aliases that mshv_vtl_low_mmap() exposes. The
next access re-faults into the folio path and GUP works.

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
@namancse namancse force-pushed the user/namjain/6.18-warn-fix-v3 branch from b727d48 to c79bbfd Compare June 8, 2026 05:32
@namancse namancse merged commit ac3cf3e into product/hcl-main/6.18 Jun 9, 2026
11 checks passed
namancse pushed a commit to namancse/openvmm that referenced this pull request Jun 10, 2026
Upgrade kernel used in OpenVMM to 6.18.0.6 release tag.
This adds a fix for try_grab_folio warning in VTL2 kernel and
associated Hyper-V GuestBVT test failure.

Kernel PRs:
microsoft/OHCL-Linux-Kernel#141
microsoft/OHCL-Linux-Kernel#144
Bug: https://microsoft.visualstudio.com/OS/_workitems/edit/62100614

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
namancse pushed a commit to namancse/openvmm that referenced this pull request Jun 10, 2026
Upgrade kernel used in OpenVMM to 6.18.0.6 release tag.
This adds a fix for try_grab_folio warning in VTL2 kernel and
associated Hyper-V GuestBVT test failure.

Kernel PRs:
microsoft/OHCL-Linux-Kernel#141
microsoft/OHCL-Linux-Kernel#144
Bug: https://microsoft.visualstudio.com/OS/_workitems/edit/62100614

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
benhillis pushed a commit to microsoft/openvmm that referenced this pull request Jun 10, 2026
Upgrade kernel used in OpenHCL to 6.18.0.6 release tag. This adds a fix
for try_grab_folio warning in VTL2 kernel and associated Hyper-V
GuestBVT test failure.

Kernel PRs:
microsoft/OHCL-Linux-Kernel#141
microsoft/OHCL-Linux-Kernel#144
Bug: https://microsoft.visualstudio.com/OS/_workitems/edit/62100614

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Co-authored-by: Naman Jain <namjain@linux.microsoft.com>
moor-coding pushed a commit to moor-coding/openvmm that referenced this pull request Jun 17, 2026
Upgrade kernel used in OpenHCL to 6.18.0.6 release tag. This adds a fix
for try_grab_folio warning in VTL2 kernel and associated Hyper-V
GuestBVT test failure.

Kernel PRs:
microsoft/OHCL-Linux-Kernel#141
microsoft/OHCL-Linux-Kernel#144
Bug: https://microsoft.visualstudio.com/OS/_workitems/edit/62100614

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Co-authored-by: Naman Jain <namjain@linux.microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants