Skip to content

Commit fe10f4e

Browse files
htejunAlex Shi
authored andcommitted
cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation
Now that cgroup v2 is almost out of the door, replace the development documentation unified-hierarchy.txt with Documentation/cgroup.txt which is a superset of unified-hierarchy.txt and authoritatively describes all userland-visible aspects of cgroup. v2: Updated to include all information from blkio-controller.txt and list filesystems which support cgroup writeback as suggested by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Vivek Goyal <vgoyal@redhat.com> (cherry picked from commit 6c2920926b10e8303378408e3c2b8952071d4344) Signed-off-by: Alex Shi <alex.shi@linaro.org>
1 parent 07d48ac commit fe10f4e

3 files changed

Lines changed: 1293 additions & 724 deletions

File tree

Documentation/cgroup-legacy/blkio-controller.txt

Lines changed: 0 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -374,82 +374,3 @@ One can experience an overall throughput drop if you have created multiple
374374
groups and put applications in that group which are not driving enough
375375
IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
376376
on individual groups and throughput should improve.
377-
378-
Writeback
379-
=========
380-
381-
Page cache is dirtied through buffered writes and shared mmaps and
382-
written asynchronously to the backing filesystem by the writeback
383-
mechanism. Writeback sits between the memory and IO domains and
384-
regulates the proportion of dirty memory by balancing dirtying and
385-
write IOs.
386-
387-
On traditional cgroup hierarchies, relationships between different
388-
controllers cannot be established making it impossible for writeback
389-
to operate accounting for cgroup resource restrictions and all
390-
writeback IOs are attributed to the root cgroup.
391-
392-
If both the blkio and memory controllers are used on the v2 hierarchy
393-
and the filesystem supports cgroup writeback, writeback operations
394-
correctly follow the resource restrictions imposed by both memory and
395-
blkio controllers.
396-
397-
Writeback examines both system-wide and per-cgroup dirty memory status
398-
and enforces the more restrictive of the two. Also, writeback control
399-
parameters which are absolute values - vm.dirty_bytes and
400-
vm.dirty_background_bytes - are distributed across cgroups according
401-
to their current writeback bandwidth.
402-
403-
There's a peculiarity stemming from the discrepancy in ownership
404-
granularity between memory controller and writeback. While memory
405-
controller tracks ownership per page, writeback operates on inode
406-
basis. cgroup writeback bridges the gap by tracking ownership by
407-
inode but migrating ownership if too many foreign pages, pages which
408-
don't match the current inode ownership, have been encountered while
409-
writing back the inode.
410-
411-
This is a conscious design choice as writeback operations are
412-
inherently tied to inodes making strictly following page ownership
413-
complicated and inefficient. The only use case which suffers from
414-
this compromise is multiple cgroups concurrently dirtying disjoint
415-
regions of the same inode, which is an unlikely use case and decided
416-
to be unsupported. Note that as memory controller assigns page
417-
ownership on the first use and doesn't update it until the page is
418-
released, even if cgroup writeback strictly follows page ownership,
419-
multiple cgroups dirtying overlapping areas wouldn't work as expected.
420-
In general, write-sharing an inode across multiple cgroups is not well
421-
supported.
422-
423-
Filesystem support for cgroup writeback
424-
---------------------------------------
425-
426-
A filesystem can make writeback IOs cgroup-aware by updating
427-
address_space_operations->writepage[s]() to annotate bio's using the
428-
following two functions.
429-
430-
* wbc_init_bio(@wbc, @bio)
431-
432-
Should be called for each bio carrying writeback data and associates
433-
the bio with the inode's owner cgroup. Can be called anytime
434-
between bio allocation and submission.
435-
436-
* wbc_account_io(@wbc, @page, @bytes)
437-
438-
Should be called for each data segment being written out. While
439-
this function doesn't care exactly when it's called during the
440-
writeback session, it's the easiest and most natural to call it as
441-
data segments are added to a bio.
442-
443-
With writeback bio's annotated, cgroup support can be enabled per
444-
super_block by setting MS_CGROUPWB in ->s_flags. This allows for
445-
selective disabling of cgroup writeback support which is helpful when
446-
certain filesystem features, e.g. journaled data mode, are
447-
incompatible.
448-
449-
wbc_init_bio() binds the specified bio to its cgroup. Depending on
450-
the configuration, the bio may be executed at a lower priority and if
451-
the writeback session is holding shared resources, e.g. a journal
452-
entry, may lead to priority inversion. There is no one easy solution
453-
for the problem. Filesystems can try to work around specific problem
454-
cases by skipping wbc_init_bio() or using bio_associate_blkcg()
455-
directly.

0 commit comments

Comments
 (0)