Skip to content

Commit c5c9f25

Browse files
Nishanth Aravamudanaxboe
authored andcommitted
NVMe: default to 4k device page size
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PRP entries will match the device's page size, and that the DMA aligment matches the kernel's page aligment. On Power, the the IOMMU page size, as mentioned above, can be 4K, while the device can have a page size of 8K, while the kernel has a page size of 64K. This eventually trips the BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple of 4K but not 8K (e.g., 0xF000). In this particular case of page sizes, we clearly want to use the IOMMU's page size in the driver. And generally, the NVMe driver in this function should be using the IOMMU's page size for the default device page size, rather than the kernel's page size. There is not currently an API to obtain the IOMMU's page size across all architectures and in the interest of a stop-gap fix to this functional issue, default the NVMe device page size to 4K, with the intent of adding such an API and implementation across all architectures in the next merge window. With the functionally equivalent v3 of this patch, our hardware test exerciser survives when using 32-bit DMA; without the patch, the kernel will BUG within a few minutes. Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
1 parent 6ffeba9 commit c5c9f25

1 file changed

Lines changed: 6 additions & 9 deletions

File tree

drivers/nvme/host/pci.c

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1728,9 +1728,13 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
17281728
u32 aqa;
17291729
u64 cap = lo_hi_readq(&dev->bar->cap);
17301730
struct nvme_queue *nvmeq;
1731-
unsigned page_shift = PAGE_SHIFT;
1731+
/*
1732+
* default to a 4K page size, with the intention to update this
1733+
* path in the future to accomodate architectures with differing
1734+
* kernel and IO page sizes.
1735+
*/
1736+
unsigned page_shift = 12;
17321737
unsigned dev_page_min = NVME_CAP_MPSMIN(cap) + 12;
1733-
unsigned dev_page_max = NVME_CAP_MPSMAX(cap) + 12;
17341738

17351739
if (page_shift < dev_page_min) {
17361740
dev_err(dev->dev,
@@ -1739,13 +1743,6 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
17391743
1 << page_shift);
17401744
return -ENODEV;
17411745
}
1742-
if (page_shift > dev_page_max) {
1743-
dev_info(dev->dev,
1744-
"Device maximum page size (%u) smaller than "
1745-
"host (%u); enabling work-around\n",
1746-
1 << dev_page_max, 1 << page_shift);
1747-
page_shift = dev_page_max;
1748-
}
17491746

17501747
dev->subsystem = readl(&dev->bar->vs) >= NVME_VS(1, 1) ?
17511748
NVME_CAP_NSSRC(cap) : 0;

0 commit comments

Comments
 (0)