Commit Graph

1349 Commits

Author SHA1 Message Date
Christoph Lameter
43c0f3d25c Fix: find_or_create_page skips cpuset memory spreading.
We call alloc_page where we should be calling __page_cache_alloc.

__page_cache_alloc performs cpuset memory spreading.  alloc_page does not.
There is no reason that pages allocated via find_or_create should be
exempt.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-16 21:19:15 -07:00
Hugh Dickins
1800782016 slub: don't confuse ctor and dtor
kmem_cache_create() was swapping ctor and dtor in calling find_mergeable():
though it caused no bug, and probably never would, even if destructors are
retained; but fix it so as not to generate anxiety ;)

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-16 21:19:15 -07:00
Paul Mundt
6c645ac725 sh64: generic quicklist support.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-14 09:55:35 +09:00
Miklos Szeredi
0ea9718016 consolidate generic_writepages and mpage_writepages
Clean up massive code duplication between mpage_writepages() and
generic_writepages().

The new generic function, write_cache_pages() takes a function pointer
argument, which will be called for each page to be written.

Maybe cifs_writepages() too can use this infrastructure, but I'm not
touching that with a ten-foot pole.

The upcoming page writeback support in fuse will also want this.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:35 -07:00
Christoph Lameter
39bf6270f5 VM statistics: Make timer deferrable
VM statistics updates do not matter if the kernel is in idle powersaving
mode.  So allow the timer to be deferred.

It would be better though if we could switch the timer between deferrable
and nondeferrable based on differentials present.  The timer would start
out nondeferrable and if we find that there were no updates in the last
statistics interval then we would switch the timer to deferrable.  If the
timer later finds again that there are differentials then go to
nondeferrable again.

And yet another way would be to run the timer shortly before going to idle?

The solution here means that the VM counters may be slightly off during
idle since differentials may be still pending while the timer is deferred.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:32 -07:00
Mika Kukkonen
7faaa5f0bf Bug in mm/thrash.c function grab_swap_token()
Following bug was uncovered by compiling with '-W' flag:

  CC      mm/thrash.o
mm/thrash.c: In function ‘grab_swap_token’:
mm/thrash.c:52: warning: comparison of unsigned expression < 0 is always false

Variable token_priority is unsigned, so decrementing first and then
checking the result does not work; fixed by reversing the test, patch
attached (compile tested only).

I am not sure if likely() makes much sense in this new situation, but
I'll let somebody else to make a decision on that.

Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:32 -07:00
Christoph Lameter
bcf889f965 SLUB: remove nr_cpu_ids hack
This was in SLUB in order to head off trouble while the nr_cpu_ids
functionality was not merged.  Its merged now so no need to still have this.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-10 09:26:53 -07:00
Stephen Rothwell
6f076f5dd9 early_pfn_to_nid needs to be __meminit
Since it is referenced by memmap_init_zone (which is __meminit) via the
early_pfn_in_nid macro when CONFIG_NODES_SPAN_OTHER_NODES is set (which
basically means PowerPC 64).

This removes a section mismatch warning in those circumstances.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-10 09:26:52 -07:00
Christoph Lameter
894b8788d7 slub: support concurrent local and remote frees and allocs on a slab
Avoid atomic overhead in slab_alloc and slab_free

SLUB needs to use the slab_lock for the per cpu slabs to synchronize with
potential kfree operations.  This patch avoids that need by moving all free
objects onto a lockless_freelist.  The regular freelist continues to exist
and will be used to free objects.  So while we consume the
lockless_freelist the regular freelist may build up objects.

If we are out of objects on the lockless_freelist then we may check the
regular freelist.  If it has objects then we move those over to the
lockless_freelist and do this again.  There is a significant savings in
terms of atomic operations that have to be performed.

We can even free directly to the lockless_freelist if we know that we are
running on the same processor.  So this speeds up short lived objects.
They may be allocated and freed without taking the slab_lock.  This is
particular good for netperf.

In order to maximize the effect of the new faster hotpath we extract the
hottest performance pieces into inlined functions.  These are then inlined
into kmem_cache_alloc and kmem_cache_free.  So hotpath allocation and
freeing no longer requires a subroutine call within SLUB.

[I am not sure that it is worth doing this because it changes the easy to
read structure of slub just to reduce atomic ops.  However, there is
someone out there with a benchmark on 4 way and 8 way processor systems
that seems to show a 5% regression vs.  Slab.  Seems that the regression is
due to increased atomic operations use vs.  SLAB in SLUB).  I wonder if
this is applicable or discernable at all in a real workload?]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-10 09:26:52 -07:00
Linus Torvalds
d84c4124c4 Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Fix stacktrace simplification fallout.
  sh: SH7760 DMABRG support.
  sh: clockevent/clocksource/hrtimers/nohz TMU support.
  sh: Truncate MAX_ACTIVE_REGIONS for the common case.
  rtc: rtc-sh: Fix rtc_dev pointer for rtc_update_irq().
  sh: Convert to common die chain.
  sh: Wire up utimensat syscall.
  sh: landisk mv_nr_irqs definition.
  sh: Fixup ndelay() xloops calculation for alternate HZ.
  sh: Add 32-bit opcode feature CPU flag.
  sh: Fix PC adjustments for varying opcode length.
  sh: Support for SH-2A 32-bit opcodes.
  sh: Kill off redundant __div64_32 symbol export.
  sh: Share exception vector table for SH-3/4.
  sh: Always define TRAPA_BUG_OPCODE.
  sh: __GFP_REPEAT for pte allocations, too.
  rtc: rtc-sh: Fix up dev_dbg() warnings.
  sh: generic quicklist support.
2007-05-09 13:08:20 -07:00
David Howells
c855ff3718 Fix a bad error case handling in read_cache_page_async()
Commit 6fe6900e1e introduced a nasty bug
in read_cache_page_async().

It added a "mark_page_accessed(page)" at the final return path in
read_cache_page_async().  But in error cases, 'page' holds the error
code, and you can't mark it accessed.

[ and Glauber de Oliveira Costa points out that we can use a return
  instead of adding more goto's ]

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 13:04:03 -07:00
Linus Torvalds
9a9136e270 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  sound: convert "sound" subdirectory to UTF-8
  MAINTAINERS: Add cxacru website/mailing list
  include files: convert "include" subdirectory to UTF-8
  general: convert "kernel" subdirectory to UTF-8
  documentation: convert the Documentation directory to UTF-8
  Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
  remove broken URLs from net drivers' output
  Magic number prefix consistency change to Documentation/magic-number.txt
  trivial: s/i_sem /i_mutex/
  fix file specification in comments
  drivers/base/platform.c: fix small typo in doc
  misc doc and kconfig typos
  Remove obsolete fat_cvf help text
  Fix occurrences of "the the "
  Fix minor typoes in kernel/module.c
  Kconfig: Remove reference to external mqueue library
  Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
  Correct comments in genrtc.c to refer to correct /proc file.
  Fix more "deprecated" spellos.
  Fix "deprecated" typoes.
  ...

Fix trivial comment conflict in kernel/relay.c.
2007-05-09 12:54:17 -07:00
Christoph Lameter
4037d45220 Move remote node draining out of slab allocators
Currently the slab allocators contain callbacks into the page allocator to
perform the draining of pagesets on remote nodes.  This requires SLUB to have
a whole subsystem in order to be compatible with SLAB.  Moving node draining
out of the slab allocators avoids a section of code in SLUB.

Move the node draining so that is is done when the vm statistics are updated.
At that point we are already touching all the cachelines with the pagesets of
a processor.

Add a expire counter there.  If we have to update per zone or global vm
statistics then assume that the pageset will require subsequent draining.

The expire counter will be decremented on each vm stats update pass until it
reaches zero.  Then we will drain one batch from the pageset.  The draining
will cause vm counter updates which will then cause another expiration until
the pcp is empty.  So we will drain a batch every 3 seconds.

Note that remote node draining is a somewhat esoteric feature that is required
on large NUMA systems because otherwise significant portions of system memory
can become trapped in pcp queues.  The number of pcp is determined by the
number of processors and nodes in a system.  A system with 4 processors and 2
nodes has 8 pcps which is okay.  But a system with 1024 processors and 512
nodes has 512k pcps with a high potential for large amount of memory being
caught in them.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Christoph Lameter
77461ab332 Make vm statistics update interval configurable
Make it configurable.  Code in mm makes the vm statistics intervals
independent from the cache reaper use that opportunity to make it
configurable.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Christoph Lameter
d1187ed210 vmstat: use our own timer events
vmstat is currently using the cache reaper to periodically bring the
statistics up to date.  The cache reaper does only exists in SLUB as a way to
provide compatibility with SLAB.  This patch removes the vmstat calls from the
slab allocators and provides its own handling.

The advantage is also that we can use a different frequency for the updates.
Refreshing vm stats is a pretty fast job so we can run this every second and
stagger this by only one tick.  This will lead to some overlap in large
systems.  F.e a system running at 250 HZ with 1024 processors will have 4 vm
updates occurring at once.

However, the vm stats update only accesses per node information.  It is only
necessary to stagger the vm statistics updates per processor in each node.  Vm
counter updates occurring on distant nodes will not cause cacheline
contention.

We could implement an alternate approach that runs the first processor on each
node at the second and then each of the other processor on a node on a
subsequent tick.  That may be useful to keep a large amount of the second free
of timer activity.  Maybe the timer folks will have some feedback on this one?

[jirislaby@gmail.com: add missing break]
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Rafael J. Wysocki
8bb7844286 Add suspend-related notifications for CPU hotplug
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Nate Diller
01f2705daf fs: convert core functions to zero_user_page
It's very common for file systems to need to zero part or all of a page,
the simplist way is just to use kmap_atomic() and memset().  There's
actually a library function in include/linux/highmem.h that does exactly
that, but it's confusingly named memclear_highpage_flush(), which is
descriptive of *how* it does the work rather than what the *purpose* is.
So this patchset renames the function to zero_user_page(), and calls it
from the various places that currently open code it.

This first patch introduces the new function call, and converts all the
core kernel callsites, both the open-coded ones and the old
memclear_highpage_flush() ones.  Following this patch is a series of
conversions for each file system individually, per AKPM, and finally a
patch deprecating the old call.  The diffstat below shows the entire
patchset.

[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:55 -07:00
Christoph Lameter
5830c59021 slab: shut down cache_reaper when cpu goes down
Shutdown the cache_reaper if the cpu is brought down and set the
cache_reap.func to NULL.  Otherwise hotplug shuts down the reaper for good.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:53 -07:00
Heiko Carstens
38c3bd96a0 slab: use CPU_LOCK_[ACQUIRE|RELEASE]
Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
introduced.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Gautham Shenoy <ego@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:51 -07:00
David Howells
ef71c15c46 AFS: export a couple of core functions for AFS write support
Export a couple of core functions for AFS write support to use:

	find_get_pages_contig()
	find_get_pages_tag()

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:50 -07:00
Ken Chen
8a63011275 pretend cpuset has some form of hugetlb page reservation
When cpuset is configured, it breaks the strict hugetlb page reservation as
the accounting is done on a global variable.  Such reservation is
completely rubbish in the presence of cpuset because the reservation is not
checked against page availability for the current cpuset.  Application can
still potentially OOM'ed by kernel with lack of free htlb page in cpuset
that the task is in.  Attempt to enforce strict accounting with cpuset is
almost impossible (or too ugly) because cpuset is too fluid that task or
memory node can be dynamically moved between cpusets.

The change of semantics for shared hugetlb mapping with cpuset is
undesirable.  However, in order to preserve some of the semantics, we fall
back to check against current free page availability as a best attempt and
hopefully to minimize the impact of changing semantics that cpuset has on
hugetlb.

Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:49 -07:00
Ken Chen
ace4bd29c2 fix leaky resv_huge_pages when cpuset is in use
The internal hugetlb resv_huge_pages variable can permanently leak nonzero
value in the error path of hugetlb page fault handler when hugetlb page is
used in combination of cpuset.  The leaked count can permanently trap N
number of hugetlb pages in unusable "reserved" state.

Steps to reproduce the bug:

  (1) create two cpuset, user1 and user2
  (2) reserve 50 htlb pages in cpuset user1
  (3) attempt to shmget/shmat 50 htlb page inside cpuset user2
  (4) kernel oom the user process in step 3
  (5) ipcrm the shm segment

At this point resv_huge_pages will have a count of 49, even though
there are no active hugetlbfs file nor hugetlb shared memory segment
in the system.  The leak is permanent and there is no recovery method
other than system reboot. The leaked count will hold up all future use
of that many htlb pages in all cpusets.

The culprit is that the error path of alloc_huge_page() did not
properly undo the change it made to resv_huge_page, causing
inconsistent state.

Signed-off-by: Ken Chen <kenchen@google.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Martin Bligh <mbligh@google.com>
Acked-by: David Gibson <dwg@au1.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Pekka J Enberg
7ae439ce0c krealloc: fix kerneldoc comments
No "blank" (or "*") line is allowed between the function name and lines for
it parameter(s).

Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:46 -07:00
Christoph Lameter
5e6d444ea1 SLUB: rework slab order determination
In some cases SLUB is creating uselessly slabs that are larger than
slub_max_order. Also the layout of some of the slabs was not satisfactory.

Go to an iterarive approach.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:46 -07:00
Christoph Lameter
45edfa580b SLUB: include lifetime stats and sets of cpus / nodes in tracking output
We have information about how long an object existed and about the nodes and
cpus where the allocations and frees took place.  Add that information to the
tracking output in /sys/slab/xx/alloc_calls and /sys/slab/free_calls

This will then enable slabinfo to output nice reports like this:

  christoph@qirst:~/slub$ ./slabinfo kmalloc-128

  Slabcache: kmalloc-128           Aliases:  0 Order :  0

  Sizes (bytes)     Slabs              Debug                Memory
  ------------------------------------------------------------------------
  Object :     128  Total  :      12   Sanity Checks : On   Total:   49152
  SlabObj:     200  Full   :       7   Redzoning     : On   Used :   24832
  SlabSiz:    4096  Partial:       4   Poisoning     : On   Loss :   24320
  Loss   :      72  CpuSlab:       1   Tracking      : On   Lalig:   13968
  Align  :       8  Objects:      20   Tracing       : Off  Lpadd:    1152

  kmalloc-128 has no kmem_cache operations

  kmalloc-128: Kernel object allocation
  -----------------------------------------------------------------------
        6 param_sysfs_setup+0x71/0x130 age=284512/284512/284512 pid=1 nodes=0-1,3
       11 percpu_populate+0x39/0x80 age=283914/284428/284512 pid=1 nodes=0
       21 __register_chrdev_region+0x31/0x170 age=282896/284347/284473 pid=1-1705 nodes=0-2
        1 sys_inotify_init+0x76/0x1c0 age=283423 pid=1004 nodes=0
       19 as_get_io_context+0x32/0xd0 age=6/247567/283988 pid=1-11782 nodes=0,2
       10 ida_pre_get+0x4a/0x80 age=277666/283773/284526 pid=0-2177 nodes=0,2
       24 kobject_kset_add_dir+0x37/0xb0 age=282727/283860/284472 pid=1-1723 nodes=0-2
        1 acpi_ds_build_internal_buffer_obj+0xd3/0x11d age=284508 pid=1 nodes=0
       24 con_insert_unipair+0xd7/0x110 age=284438/284438/284438 pid=1 nodes=0,2
        1 uart_open+0x2d2/0x4b0 age=283896 pid=1 nodes=0
       26 dma_pool_create+0x73/0x1a0 age=282762/282833/282916 pid=1705-1723 nodes=0
        1 neigh_table_init_no_netlink+0xd2/0x210 age=284461 pid=1 nodes=0
        2 neigh_parms_alloc+0x2b/0xe0 age=284410/284411/284412 pid=1 nodes=2
        2 neigh_resolve_output+0x1e1/0x280 age=276289/276291/276293 pid=0-2443 nodes=0
        1 netlink_kernel_create+0x90/0x170 age=284472 pid=1 nodes=0
        4 xt_alloc_table_info+0x39/0xf0 age=283958/283958/283959 pid=1 nodes=1
        3 fn_hash_insert+0x473/0x720 age=277653/277661/277666 pid=2177-2185 nodes=0
        1 get_mtrr_state+0x285/0x2a0 age=284526 pid=0 nodes=0
        1 cacheinfo_cpu_callback+0x26d/0x3e0 age=284458 pid=1 nodes=0
       29 kernel_param_sysfs_setup+0x25/0x90 age=284511/284511/284512 pid=1 nodes=0-1,3
        5 process_zones+0x5e/0x170 age=284546/284546/284546 pid=0 nodes=0
        1 drm_core_init+0x48/0x160 age=284421 pid=1 nodes=2

  kmalloc-128: Kernel object freeing
  ------------------------------------------------------------------------
      163 <not-available> age=4295176847 pid=0 nodes=0-3
        1 __vunmap+0x6e/0xf0 age=282907 pid=1723 nodes=0
       28 free_as_io_context+0x12/0x90 age=9243/262197/283474 pid=42-11754 nodes=0
        1 acpi_get_object_info+0x1b7/0x1d4 age=284475 pid=1 nodes=0
        1 do_acpi_find_child+0x45/0x4e age=284475 pid=1 nodes=0

  NUMA nodes           :    0    1    2    3
  ------------------------------------------
  All slabs                 7    2    2    1
  Partial slabs             2    2    0    0

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:46 -07:00
Christoph Lameter
41ecc55b8a SLUB: add CONFIG_SLUB_DEBUG
CONFIG_SLUB_DEBUG can be used to switch off the debugging and sysfs components
of SLUB.  Thus SLUB will be able to replace SLOB.  SLUB can arrange objects in
a denser way than SLOB and the code size should be minimal without debugging
and sysfs support.

Note that CONFIG_SLUB_DEBUG is materially different from CONFIG_SLAB_DEBUG.
CONFIG_SLAB_DEBUG is used to enable slab debugging in SLAB.  SLUB enables
debugging via a boot parameter.  SLUB debug code should always be present.

CONFIG_SLUB_DEBUG can be modified in the embedded config section.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
02cbc87446 SLUB: move tracking definitions and check_valid_pointer() away from debug code
Move the tracking definitions and the check_valid_pointer() function away from
the debugging related functions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
636f0d7de8 SLUB: consolidate trace code
Trace in both slab_alloc and slab_free has a lot of common code.  Use a single
function for both.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
35e5d7ee27 SLUB: introduce DebugSlab(page)
This replaces the PageError() checking.  DebugSlab is clearer and allows for
future changes to the page bit used.  We also need it to support
CONFIG_SLUB_DEBUG.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
b345970905 SLUB: move resiliency check into SYSFS section
Move the resiliency check into the SYSFS section after validate_slab that is
used by the resiliency check.  This will avoid a forward declaration.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
7656c72b5a SLUB: add macros for scanning objects in a slab
Scanning of objects happens in a number of functions.  Consolidate that code.
DECLARE_BITMAP instead of coding the declaration for bitmaps.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
672bba3a4b SLUB: update comments
Update comments throughout SLUB to reflect the new developments.  Fix up
various awkward sentences.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
26a7bd0302 SLUB: get rid of finish_bootstrap
Its only purpose was to bring some sort of symmetry to sysfs usage when
dealing with bootstrapping per cpu flushing.  Since we do not time out slabs
anymore we have no need to run finish_bootstrap even without sysfs.  Fold it
back into slab_sysfs_init and drop the initcall for the !SYFS case.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
1f99a283dc SLUB: clean up krealloc
We really do not need all this gaga there.

ksize gives us all the information we need to figure out if the object can
cope with the new size.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:45 -07:00
Christoph Lameter
abcd08a6f5 SLUB: use check_valid_pointer in kmem_ptr_validate
We needlessly duplicate code. Also make check_valid_pointer inline.

Signed-off-by: Christoph LAemter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:44 -07:00
Christoph Lameter
be7b3fbcef SLUB: after object padding only needed for Redzoning
If no redzoning is selected then we do not need padding before the next
object.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:44 -07:00
Christoph Lameter
65c02d4cfb SLUB: add support for dynamic cacheline size determination
SLUB currently assumes that the cacheline size is static.  However, i386 f.e.
supports dynamic cache line size determination.

Use cache_line_size() instead of L1_CACHE_BYTES in the allocator.

That also explains the purpose of SLAB_HWCACHE_ALIGN.  So we will need to keep
that one around to allow dynamic aligning of objects depending on boot
determination of the cache line size.

[akpm@linux-foundation.org: need to define it before we use it]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:44 -07:00
Michael Opdenacker
59c51591a0 Fix occurrences of "the the "
Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:57:56 +02:00
Paul Mundt
5f8c9908f2 sh: generic quicklist support.
This moves SH over to the generic quicklists. As per x86_64,
we have special mappings for the PGDs, so these go on their
own list..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 01:35:00 +00:00
Roland McGrath
74add80cbd Remove unused variable in get_unmapped_area
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:35:28 -07:00
Jaya Kumar
60b59beafb fbdev: mm: Deferred IO support
This implements deferred IO support in fbdev.  Deferred IO is a way to delay
and repurpose IO.  This implementation is done using mm's page_mkwrite and
page_mkclean hooks in order to detect, delay and then rewrite IO.  This
functionality is used by hecubafb.

[adaplas]
This is useful for graphics hardware with no directly addressable/mappable
framebuffer. Implementing this will allow the "framebuffer" to be accesible
from user space via mmap().

Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:26 -07:00
Alexey Dobriyan
a5c43dae7a Fix race between cat /proc/slab_allocators and rmmod
Same story as with cat /proc/*/wchan race vs rmmod race, only
/proc/slab_allocators want more info than just symbol name.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00
Mark Fasheh
ef51c97623 Remove do_sync_file_range()
Remove do_sync_file_range() and convert callers to just use
do_sync_mapping_range().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Christoph Hellwig
1eeb66a1bb move die notifier handling to common code
This patch moves the die notifier handling to common code.  Previous
various architectures had exactly the same code for it.  Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at.  avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Guillaume Chazarain
3e9f45bd18 Factor outstanding I/O error handling
Cleanup: setting an outstanding error on a mapping was open coded too many
times.  Factor it out in mapping_set_error().

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Yasunori Goto
72280ede31 Add white list into modpost.c for memory hotplug code and ia64's machvec section
This patch is add white list into modpost.c for some functions and
ia64's section to fix section mismatchs.

  sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator
  at boot time, and kmalloc/vmalloc at hotplug time. If config
  memory hotplug is on, there are references of bootmem allocater(init text)
  from them (normal text). This is cause of section mismatch.

  Bootmem is called by many functions and it must be
  used only at boot time. I think __init of them should keep for
  section mismatch check. So, I would like to register sparse_index_alloc()
  and zone_wait_table_init() into white list.

  In addition, ia64's .machvec section is function table of some platform
  dependent code. It is mixture of .init.text and normal text. These
  reference of __init functions are valid too.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Yasunori Goto
a3142c8e1d Fix section mismatch of memory hotplug related code.
This is to fix many section mismatches of code related to memory hotplug.
I checked compile with memory hotplug on/off on ia64 and x86-64 box.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Dmitriy Monakhov
0ceb331433 mm: move common segment checks to separate helper function
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org>
Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Anton Altaparmakov <aia21@cam.ac.uk>
Acked-by: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
David Woodhouse
b46b8f19c9 Increase slab redzone to 64bits
There are two problems with the existing redzone implementation.

Firstly, it's causing misalignment of structures which contain a 64-bit
integer, such as netfilter's 'struct ipt_entry' -- causing netfilter
modules to fail to load because of the misalignment.  (In particular, the
first check in
net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks())

On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8.

With slab debugging, we use 32-bit redzones. And allocated slab objects
aren't sufficiently aligned to hold a structure containing a uint64_t.

By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable
redzone checks on those architectures.  By using 64-bit redzones we avoid that
loss of debugging, and also fix the other problem while we're at it.

When investigating this, I noticed that on 64-bit platforms we're using a
32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set
aside for the redzone.  Which means that the four bytes immediately before
or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE
machines, respectively.  Which is probably not the most useful choice of
poison value.

One way to fix both of those at once is just to switch to 64-bit
redzones in all cases.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Linus Torvalds
0f9008ef38 Fix up SLUB compile
The newly merged SLUB allocator patches had been generated before the
removal of "struct subsystem", and ended up applying fine, but wouldn't
build based on the current tree as a result.

Fix up that merge error - not that SLUB is likely really ready for
showtime yet, but at least I can fix the trivial stuff.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:31:58 -07:00