kernel-aes67/mm
Christoph Lameter f6ac2354d7 [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h
NOTE: ZVC are *not* the lightweight event counters.  ZVCs are reliable whereas
event counters do not need to be.

Zone based VM statistics are necessary to be able to determine what the state
of memory in one zone is.  In a NUMA system this can be helpful for local
reclaim and other memory optimizations that may be able to shift VM load in
order to get more balanced memory use.

It is also useful to know how the computing load affects the memory
allocations on various zones.  This patchset allows the retrieval of that data
from userspace.

The patchset introduces a framework for counters that is a cross between the
existing page_stats --which are simply global counters split per cpu-- and the
approach of deferred incremental updates implemented for nr_pagecache.

Small per cpu 8 bit counters are added to struct zone.  If the counter exceeds
certain thresholds then the counters are accumulated in an array of
atomic_long in the zone and in a global array that sums up all zone values.
The small 8 bit counters are next to the per cpu page pointers and so they
will be in high in the cpu cache when pages are allocated and freed.

Access to VM counter information for a zone and for the whole machine is then
possible by simply indexing an array (Thanks to Nick Piggin for pointing out
that approach).  The access to the total number of pages of various types does
no longer require the summing up of all per cpu counters.

Benefits of this patchset right now:

- Ability for UP and SMP configuration to determine how memory
  is balanced between the DMA, NORMAL and HIGHMEM zones.

- loops over all processors are avoided in writeback and
  reclaim paths. We can avoid caching the writeback information
  because the needed information is directly accessible.

- Special handling for nr_pagecache removed.

- zone_reclaim_interval vanishes since VM stats can now determine
  when it is worth to do local reclaim.

- Fast inline per node page state determination.

- Accurate counters in /sys/devices/system/node/node*/meminfo. Current
  counters are counting simply which processor allocated a page somewhere
  and guestimate based on that. So the counters were not useful to show
  the actual distribution of page use on a specific zone.

- The swap_prefetch patch requires per node statistics in order to
  figure out when processors of a node can prefetch. This patch provides
  some of the needed numbers.

- Detailed VM counters available in more /proc and /sys status files.

References to earlier discussions:
V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2

Performance tests with AIM7 did not show any regressions.  Seems to be a tad
faster even.  Tested on ia64/NUMA.  Builds fine on i386, SMP / UP.  Includes
fixes for s390/arm/uml arch code.

This patch:

Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.

Create vmstat.c/vmstat.h by separating the counter code and the proc
functions.

Move the vm_stat_text array before zoneinfo_show.

[akpm@osdl.org: s390 build fix]
[akpm@osdl.org: HOTPLUG_CPU build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:34 -07:00
..
bootmem.c [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory 2006-04-09 11:53:16 -07:00
fadvise.c [PATCH] sys_sync_file_range() 2006-03-31 12:18:54 -08:00
filemap_xip.c [PATCH] mark address_space_operations const 2006-06-28 14:59:04 -07:00
filemap.c [PATCH] generic_file_buffered_write(): handle zero-length iovec segments 2006-06-29 10:26:20 -07:00
filemap.h [PATCH] generic_file_buffered_write(): handle zero-length iovec segments 2006-06-29 10:26:20 -07:00
fremap.c [PATCH] fix update_mmu_cache in fremap.c 2006-06-23 07:42:52 -07:00
highmem.c BUG_ON() Conversion in mm/highmem.c 2006-04-02 13:47:35 +02:00
hugetlb.c [PATCH] tightening hugetlb strict accounting 2006-06-23 07:42:48 -07:00
internal.h [PATCH] remove set_page_count() outside mm/ 2006-03-22 07:54:02 -08:00
Kconfig Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 2006-06-29 10:49:17 -07:00
madvise.c [PATCH] Fix MADV_REMOVE protection checking 2006-04-17 18:22:18 -07:00
Makefile [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h 2006-06-30 11:25:34 -07:00
memory_hotplug.c [PATCH] Register sysfs file for hotplugged new node 2006-06-27 17:32:36 -07:00
memory.c [PATCH] add page_mkwrite() vm_operations method 2006-06-23 07:42:51 -07:00
mempolicy.c [PATCH] proc: don't lock task_structs indefinitely 2006-06-26 09:58:25 -07:00
mempool.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-03-26 09:41:18 -08:00
migrate.c [PATCH] Allow migration of mlocked pages 2006-06-25 10:00:55 -07:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
mmap.c [PATCH] add page_mkwrite() vm_operations method 2006-06-23 07:42:51 -07:00
mmzone.c [PATCH] uninline zone helpers 2006-03-27 08:44:48 -08:00
mprotect.c [PATCH] add page_mkwrite() vm_operations method 2006-06-23 07:42:51 -07:00
mremap.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
msync.c [PATCH] Kill PF_SYNCWRITE flag 2006-06-23 17:10:39 +02:00
nommu.c [PATCH] overcommit: use totalreserve_pages for nommu 2006-04-11 06:18:32 -07:00
oom_kill.c [PATCH] mm: fix typos in comments in mm/oom_kill.c 2006-06-23 07:42:47 -07:00
page_alloc.c [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h 2006-06-30 11:25:34 -07:00
page_io.c [PATCH] mm: split page table lock 2005-10-29 21:40:42 -07:00
page-writeback.c [PATCH] cpu hotplug: make cpu_notifier related notifier calls __cpuinit only 2006-06-27 17:32:41 -07:00
pdflush.c [PATCH] pdflush: handle resume wakeups 2006-06-25 10:01:06 -07:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c spelling fixes 2006-06-26 18:35:02 +02:00
rmap.c [PATCH] Allow migration of mlocked pages 2006-06-25 10:00:55 -07:00
shmem.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6 2006-06-29 14:19:21 -07:00
slab.c [PATCH] pi-futex: rt mutex debug 2006-06-27 17:32:47 -07:00
slob.c [PATCH] mm/slob.c: for_each_possible_cpu(), not NR_CPUS 2006-04-19 09:13:49 -07:00
sparse.c [PATCH] spin/rwlock init cleanups 2006-06-27 17:32:39 -07:00
swap_state.c [PATCH] mark address_space_operations const 2006-06-28 14:59:04 -07:00
swap.c [PATCH] core: use list_move() 2006-06-26 09:58:17 -07:00
swapfile.c [PATCH] read_mapping_page for address space 2006-06-23 07:43:02 -07:00
thrash.c [PATCH] temporarily disable swap token on memory pressure 2005-11-28 14:42:25 -08:00
tiny-shmem.c [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree 2006-06-26 12:25:08 -07:00
truncate.c [PATCH] Remove semi-softlockup from invalidate_mapping_pages 2006-06-23 07:43:07 -07:00
util.c [PATCH] slab: optimize constant-size kzalloc calls 2006-03-25 08:22:49 -08:00
vmalloc.c [PATCH] mm: introduce remap_vmalloc_range() 2006-06-23 07:42:49 -07:00
vmscan.c [PATCH] cpu hotplug: revert init patch submitted for 2.6.17 2006-06-27 17:32:40 -07:00
vmstat.c [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h 2006-06-30 11:25:34 -07:00