Some (block) devices have a '/' in the name, and need special
handling. Let's have that rule to the core, so we can remove it
from the block class.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Optimize various places where a pointer to the cpumask_of_cpu value
will result in reducing stack pressure.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ftrace: do not trace library functions
ftrace: do not trace scheduler functions
ftrace: fix lockup with MAXSMP
ftrace: fix merge buglet
make function tracing more robust: do not trace library functions.
We've already got a sizable list of exceptions:
ifdef CONFIG_FTRACE
# Do not profile string.o, since it may be used in early boot or vdso
CFLAGS_REMOVE_string.o = -pg
# Also do not profile any debug utilities
CFLAGS_REMOVE_spinlock_debug.o = -pg
CFLAGS_REMOVE_list_debug.o = -pg
CFLAGS_REMOVE_debugobjects.o = -pg
CFLAGS_REMOVE_find_next_bit.o = -pg
CFLAGS_REMOVE_cpumask.o = -pg
CFLAGS_REMOVE_bitmap.o = -pg
endif
... and the pattern has been that random library functionality showed
up in ftrace's critical path (outside of its recursion check), causing
hard to debug lockups.
So be a bit defensive about it and exclude all lib/*.o functions by
default. It's not that they are overly interesting for tracing purposes
anyway. Specific ones can still be traced, in an opt-in manner.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
MAXSMP brings in lots of use of various bitops in smp_processor_id()
and friends - causing ftrace to lock up during bootup:
calling anon_inode_init+0x0/0x130
initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
calling acpi_event_init+0x0/0x57
[ hard hang ]
So exclude the bitops facilities from tracing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
[SCSI] scsi_dh: fix kconfig related build errors
[SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
[SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
[SCSI] make struct scsi_{host,target}_type static
[SCSI] fix locking in host use of blk_plug_device()
[SCSI] zfcp: Cleanup external header file
[SCSI] zfcp: Cleanup code in zfcp_erp.c
[SCSI] zfcp: zfcp_fsf cleanup.
[SCSI] zfcp: consolidate sysfs things into one file.
[SCSI] zfcp: Cleanup of code in zfcp_aux.c
[SCSI] zfcp: Cleanup of code in zfcp_scsi.c
[SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
[SCSI] zfcp: Small QDIO cleanups
[SCSI] zfcp: Adapter reopen for large number of unsolicited status
[SCSI] zfcp: Fix error checking for ELS ADISC requests
[SCSI] zfcp: wait until adapter is finished with ERP during auto-port
[SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
[SCSI] sg: Add target reset support
[SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
[SCSI] sd: Move scsi_disk() accessor function to sd.h
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits)
ext4: Documention update for new ordered mode and delayed allocation
ext4: do not set extents feature from the kernel
ext4: Don't allow nonextenst mount option for large filesystem
ext4: Enable delalloc by default.
ext4: delayed allocation i_blocks fix for stat
ext4: fix delalloc i_disksize early update issue
ext4: Handle page without buffers in ext4_*_writepage()
ext4: Add ordered mode support for delalloc
ext4: Invert lock ordering of page_lock and transaction start in delalloc
mm: Add range_cont mode for writeback
ext4: delayed allocation ENOSPC handling
percpu_counter: new function percpu_counter_sum_and_set
ext4: Add delayed allocation support in data=writeback mode
vfs: add hooks for ext4's delayed allocation support
jbd2: Remove data=ordered mode support using jbd buffer heads
ext4: Use new framework for data=ordered mode in JBD2
jbd2: Implement data=ordered mode handling via inodes
vfs: export filemap_fdatawrite_range()
ext4: Fix lock inversion in ext4_ext_truncate()
ext4: Invert the locking order of page_lock and transaction start
...
The SCSI Block Protocol uses this 16-bit CRC to verify the integrity
of each data sector. crc_t10dif() is used by sd_dif.c when performing
I/O to or from disks formatted with protection information.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Delayed allocation need to check free blocks at every write time.
percpu_counter_read_positive() is not quit accurate. delayed
allocation need a more accurate accounting, but using
percpu_counter_sum_positive() is frequently is quite expensive.
This patch added a new function to update center counter when sum
per-cpu counter, to increase the accurate rate for next
percpu_counter_read() and require less calling expensive
percpu_counter_sum().
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For fsm text search, handle case insensitive parameter as -EINVAL.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for case insensitive search to Knuth-Morris-Pratt algorithm.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for case insensitive search to Boyer-Moore algorithm.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function textsearch_prepare has a new flag to support case
insensitive searching.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
They print out a pointer in symbolic format, if possible (ie using
symbolic KALLSYMS information). The '%pS' format is for regular direct
pointers (which can point to data or code and that you find on the stack
during backtraces etc), while '%pF' is for C function pointer types.
On most architectures, the two mean exactly the same thing, but some
architectures use an indirect pointer for C function pointers, where the
function pointer points to a function descriptor (which in turn contains
the actual pointer to the code). The '%pF' code automatically does the
appropriate function descriptor dereference on such architectures.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This expands the kernel '%p' handling with an arbitrary alphanumberic
specifier extension string immediately following the '%p'. Right now
it's just being ignored, but the next commit will start adding some
specific pointer type extensions.
NOTE! The reason the extension is appended to the '%p' is to allow
minimal gcc type checking: gcc will still see the '%p' and will check
that the argument passed in is indeed a pointer, and yet will not
complain about the extended information that gcc doesn't understand
about (on the other hand, it also won't actually check that the pointer
type and the extension are compatible).
Alphanumeric characters were chosen because there is no sane existing
use for a string format with a hex pointer representation immediately
followed by alphanumerics (which is what such a format string would have
traditionally resulted in).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The actual code is the same, just split out into a helper function.
This makes it easier to read, and allows for simple future extension
of %p handling.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The actual code is the same, just split out into a helper function.
This makes it easier to read, and allows for future sharing of the
string code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 95b570c9ce ("Taint kernel after
WARN_ON(condition)") introduced a TAINT_WARN that was implemented for
all architectures using the generic warn_on_slowpath(), which excluded
any architecture that set HAVE_ARCH_WARN_ON.
As all of the architectures that implement their own WARN_ON() all go
through the report_bug() path (specifically handling BUG_TRAP_TYPE_WARN),
taint the kernel there as well for consistency.
Tested on avr32 and sh. Also relevant for s390, parisc, and powerpc.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove all clameter@sgi.com addresses from the kernel tree since they will
become invalid on June 27th. Change my maintainer email address for the
slab allocators to cl@linux-foundation.org (which will be the new email
address for the future).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (55 commits)
net: fib_rules: fix error code for unsupported families
netdevice: Fix wrong string handle in kernel command line parsing
net: Tyop of sk_filter() comment
netlink: Unneeded local variable
net-sched: fix filter destruction in atm/hfsc qdisc destruction
net-sched: change tcf_destroy_chain() to clear start of filter list
ipv4: fix sysctl documentation of time related values
mac80211: don't accept WEP keys other than WEP40 and WEP104
hostap: fix sparse warnings
hostap: don't report useless WDS frames by default
textsearch: fix Boyer-Moore text search bug
netfilter: nf_conntrack_tcp: fixing to check the lower bound of valid ACK
ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.
netlabel: Fix a problem when dumping the default IPv6 static labels
net/inet_lro: remove setting skb->ip_summed when not LRO-able
inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild
CONNECTOR: add a proc entry to list connectors
netlink: Fix some doc comments in net/netlink/attr.c
tcp: /proc/net/tcp rto,ato values not scaled properly (v2)
include/linux/netdevice.h: don't export MAX_HEADER to userspace
...
The current logic has a bug which cannot find matching pattern, if the
pattern is matched from the first character of target string.
for example:
pattern=abc, string=abcdefg
pattern=a, string=abcdefg
Searching algorithm should return 0 for those things.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds saved stack-traces to the backtrace suite of self-tests.
Note that we don't depend on or unconditionally enable CONFIG_STACKTRACE
because not all architectures may have it (and we still want to enable the
other tests for those architectures).
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch re-institutes the ability to build rcutorture directly into
the Linux kernel. The reason that this capability was removed was that
this could result in your kernel being pretty much useless, as rcutorture
would be running starting from early boot. This problem has been avoided
by (1) making rcutorture run only three seconds of every six by default,
(2) adding a CONFIG_RCU_TORTURE_TEST_RUNNABLE that permits rcutorture
to be quiesced at boot time, and (3) adding a sysctl in /proc named
/proc/sys/kernel/rcutorture_runnable that permits rcutorture to be
quiesced and unquiesced when built into the kernel.
Please note that this /proc file is -not- available when rcutorture
is built as a module. Please also note that to get the earlier
take-no-prisoners behavior, you must use the boot command line to set
rcutorture's "stutter" parameter to zero.
The rcutorture quiescing mechanism is currently quite crude: loops
in each rcutorture process that poll a global variable once per tick.
Suggestions for improvement are welcome. The default action will
be to reduce the polling rate to a few times per second.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Daniel J Blueman reported:
| =======================================================
| [ INFO: possible circular locking dependency detected ]
| 2.6.26-rc5-201c #1
| -------------------------------------------------------
| nscd/3669 is trying to acquire lock:
| (&n->list_lock){.+..}, at: [<ffffffff802bab03>] deactivate_slab+0x173/0x1e0
|
| but task is already holding lock:
| (&obj_hash[i].lock){++..}, at: [<ffffffff803fa56f>]
| __debug_object_init+0x2f/0x350
|
| which lock already depends on the new lock.
There are two locks involved here; the first is a SLUB-local lock, and
the second is a debugobjects-local lock. They are basically taken in two
different orders:
1. SLUB { debugobjects { ... } }
2. debugobjects { SLUB { ... } }
This patch changes pattern #2 by trying to fill the memory pool (e.g.
the call into SLUB/kmalloc()) outside the debugobjects lock, so now the
two patterns look like this:
1. SLUB { debugobjects { ... } }
2. SLUB { } debugobjects { ... }
[ daniel.blueman@gmail.com: pool_lock needs to be taken irq safe in fill_pool ]
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This reverts commit 9aaffc898f.
That commit was a very bad idea. RCU_TORTURE found many boot timing
bugs and other sorts of bugs in the past, so excluding it from
boot images is very silly.
The option already depends on DEBUG_KERNEL and is disabled by default.
Even when it runs, the test threads are reniced. If it annoys people
we could add a runtime sysctl.
We shrink a radix tree when its root node has only one child, in the left
most slot. The child becomes the new root node. To perform this
operation in a manner compatible with concurrent lockless lookups, we
atomically switch the root pointer from the parent to its child.
However a concurrent lockless lookup may now have loaded a pointer to the
parent (and is presently deciding what to do next). For this reason, we
also have to keep the parent node in a valid state after shrinking the
tree, until the next RCU grace period -- otherwise this lookup with the
parent pointer may not do the right thing. Notably, we need to keep the
child in the left most slot there in case that is requested by the lookup.
This is all pretty standard RCU stuff. It is worth repeating because in
my eagerness to obey the radix tree node constructor scheme, I had broken
it by zeroing the radix tree node before the grace period.
What could happen is that a lookup can load the parent pointer, then
decide it wants to follow the left most child slot, only to find the slot
contained NULL due to the concurrent shrinker having zeroed the parent
node before waiting for a grace period. The lookup would return a false
negative as a result.
Fix it by doing that clearing in the RCU callback. I would normally want
to rip out the constructor entirely, but radix tree nodes are one of those
places where they make sense (only few cachelines will be touched soon
after allocation).
This was never actually found in any lockless pagecache testing or by the
test harness, but by seeing the odd problem with my scalable vmap rewrite.
I have not tickled the test harness into reproducing it yet, but I'll
keep working at it.
Fortunately, it is not a problem anywhere lockless pagecache is used in
mainline kernels (pagecache probe is not a guarantee, and brd does not
have concurrent lookups and deletes).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iter_div_u64_rem is used in the x86-64 vdso, which cannot call other
kernel code. For this case, provide the always_inlined version,
__iter_div_u64_rem.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.
The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.
This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Detect all physical PCI slots as described by ACPI, and create entries in
/sys/bus/pci/slots/.
Not all physical slots are hotpluggable, and the acpiphp module does not
detect them. Now we know the physical PCI geography of our system, without
caring about hotplug.
[kaneshige.kenji@jp.fujitsu.com: export-kobject_rename-for-pci_hotplug_core]
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: Greg KH <greg@kroah.com>
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix build with CONFIG_DMI=n]
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Len Brown <len.brown@intel.com>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bluetooth will be able to use this.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
allow users to configure the softlockup detector to generate a panic
instead of a warning message.
high-availability systems might opt for this strict method (combined
with panic_timeout= boot option/sysctl), instead of generating
softlockup warnings ad infinitum.
also, automated tests work better if the system reboots reliably (into
a safe kernel) in case of a lockup.
The full spectrum of configurability is supported: boot option, sysctl
option and Kconfig option.
it's default-disabled.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes the Makefile turd and uses the nice CFLAGS_REMOVE macro
in the lib directory.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The debug functions in spin_lock debugging pollute the output of the
function tracer. This patch adds the debug files in the lib director
to those that should not be compiled with mcount tracing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Most archs define the string and memory compare functions in assembly.
Some do not. But these functions may be used in some archs at early
boot up.
Since most archs define this code in assembly and they are not usually
traced, there's no need to trace them when they are not defined in
assembly.
This patch removes the -pg from the CFLAGS for lib/string.o.
This prevents the string functions use in either vdso or early bootup
from crashing the system.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The debug smp_processor_id caused a recursive fault in debugging
the irqsoff tracer. The tracer used a smp_processor_id in the
ftrace callback, and this function called preempt_disable which
also is traced. This caused a recursive fault (stack overload).
Since using smp_processor_id without debugging on does not cause
faults with the tracer (even when the tracer is wrong), the
debug version should not cause a system reboot.
This changes the debug_smp_processor_id to use the notrace versions
of preempt_disable and enable.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.
The ftrace routine will then call a registered function if a function
happens to be registered.
[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
so don't blame Arnaldo for all of this ;-) ]
Update:
It is now possible to register more than one ftrace function.
If only one ftrace function is registered, that will be the
function that ftrace calls directly. If more than one function
is registered, then ftrace will call a function that will loop
through the functions to call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mark with "notrace" functions in core code that should not be
traced. The "notrace" attribute will prevent gcc from adding
a call to ftrace on the annotated funtions.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Increase performance for systems with large count NR_CPUS by limiting
the range of the cpumask operators that loop over the bits in a cpumask_t
variable. This removes a large amount of wasted cpu cycles.
* Add performance variants of the cpumask operators:
int cpus_weight_nr(mask) Same using nr_cpu_ids instead of NR_CPUS
int first_cpu_nr(mask) Number lowest set bit, or nr_cpu_ids
int next_cpu_nr(cpu, mask) Next cpu past 'cpu', or nr_cpu_ids
for_each_cpu_mask_nr(cpu, mask) for-loop cpu over mask using nr_cpu_ids
* Modify following to use performance variants:
#define num_online_cpus() cpus_weight_nr(cpu_online_map)
#define num_possible_cpus() cpus_weight_nr(cpu_possible_map)
#define num_present_cpus() cpus_weight_nr(cpu_present_map)
#define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
#define for_each_online_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
#define for_each_present_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
* Comment added to include/linux/cpumask.h:
Note: The alternate operations with the suffix "_nr" are used
to limit the range of the loop to nr_cpu_ids instead of
NR_CPUS when NR_CPUS > 64 for performance reasons.
If NR_CPUS is <= 64 then most assembler bitmask
operators execute faster with a constant range, so
the operator will continue to use NR_CPUS.
Another consideration is that nr_cpu_ids is initialized
to NR_CPUS and isn't lowered until the possible cpus are
discovered (including any disabled cpus). So early uses
will span the entire range of NR_CPUS.
(The net effect is that for systems with 64 or less CPU's there are no
functional changes.)
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move rcu-protected lists from list.h into a new header file rculist.h.
This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.
For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h. It actually compiles because users of
rcu_dereference() are macros. Others RCU functions could be used too but
aren't probably because of this.
Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
lib/lmb.c: In function 'lmb_dump_all':
lib/lmb.c:51: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'u64'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix error path during early mount
9p: make cryptic unknown error from server less scary
9p: fix flags length in net
9p: Correct fidpool creation failure in p9_client_create
9p: use struct mutex instead of struct semaphore
9p: propagate parse_option changes to client and transports
fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure.
9p: Documentation updates
add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Use a TS_RESTORE_SIGMASK
lmb: Make lmb debugging more useful.
lmb: Fix inconsistent alignment of size argument.
sparc: Fix mremap address range validation.
Add a common hex array in hexdump.c so everyone can use it.
Add a common hi/lo helper to avoid the shifting masking that is
done to get the upper and lower nibbles of a byte value.
Pull the pack_hex_byte helper from kgdb as it is opencoded many
places in the tree that will be consolidated.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
match_strcpy() is a somewhat creepy function: the caller needs to make sure
that the destination buffer is big enough, and when he screws up or
forgets, match_strcpy() happily overruns the buffer.
There's exactly one customer: v9fs_parse_options(). I believe it currently
can't overflow its buffer, but that's not exactly obvious.
The source string is a substing of the mount options. The kernel silently
truncates those to PAGE_SIZE bytes, including the terminating zero. See
compat_sys_mount() and do_mount().
The destination buffer is obtained from __getname(), which allocates from
name_cachep, which is initialized by vfs_caches_init() for size PATH_MAX.
We're safe as long as PATH_MAX <= PAGE_SIZE. PATH_MAX is 4096. As far as
I know, the smallest PAGE_SIZE is also 4096.
Here's a patch that makes the code a bit more obviously correct. It
doesn't depend on PATH_MAX <= PAGE_SIZE.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Jim Meyering <meyering@redhat.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
They aren't used. They were briefly used as part of some other patches to
provide an alternative format for displaying some /proc and /sys cpumasks.
They probably should have been removed when those other patches were dropped,
in favor of a different solution.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: "Mike Travis" <travis@sgi.com>
Cc: "Bert Wesarg" <bert.wesarg@googlemail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Having to muck with the build and set DEBUG just to
get lmb_dump_all() to print things isn't very useful.
So use pr_info() and use an early boot param
"lmb=debug" so we can simply ask users to reboot
with this option when we need some debugging from
them.
Signed-off-by: David S. Miller <davem@davemloft.net>
When allocating, if we will align up the size when making
the reservation, we should also align the size for the
check that the space is actually available.
The simplest thing is to just aling the size up from
the beginning, then we can use plain 'size' throughout.
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic semaphore rewrite had a huge performance regression on AIM7
(and potentially other BKL-heavy benchmarks) because the generic
semaphores had been rewritten to be simple to understand and fair. The
latter, in particular, turns a semaphore-based BKL implementation into a
mess of scheduling.
The attempt to fix the performance regression failed miserably (see the
previous commit 00b41ec261 'Revert
"semaphore: fix"'), and so for now the simple and sane approach is to
instead just go back to the old spinlock-based BKL implementation that
never had any issues like this.
This patch also has the advantage of being reported to fix the
regression completely according to Yanmin Zhang, unlike the semaphore
hack which still left a couple percentage point regression.
As a spinlock, the BKL obviously has the potential to be a latency
issue, but it's not really any different from any other spinlock in that
respect. We do want to get rid of the BKL asap, but that has been the
plan for several years.
These days, the biggest users are in the tty layer (open/release in
particular) and Alan holds out some hope:
"tty release is probably a few months away from getting cured - I'm
afraid it will almost certainly be the very last user of the BKL in
tty to get fixed as it depends on everything else being sanely locked."
so while we're not there yet, we do have a plan of action.
Tested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alexander Viro <viro@ftp.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We provide an ioremap_flags, so this provides a corresponding
devm_ioremap_prot. The slight name difference is at Ben
Herrenschmidt's request as he plans on changing ioremap_flags to
ioremap_prot in the future.
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Tejun Heo <htejun@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The return inside the loop makes us free only a single layer.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new sysfs_streq() string comparison function, which ignores
the trailing newlines found in sysfs inputs. By example:
sysfs_streq("a", "b") ==> false
sysfs_streq("a", "a") ==> true
sysfs_streq("a", "a\n") ==> true
sysfs_streq("a\n", "a") ==> true
This is intended to simplify parsing of sysfs inputs, letting them
avoid the need to manually strip off newlines from inputs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide. Move its definition
to math64.h as currently no architecture overrides the generic implementation.
They can still override it of course, but the duplicated declarations are
avoided.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current do_div doesn't explicitly say that it's unsigned and the signed
counterpart is missing, which is e.g. needed when dealing with time values.
This introduces 64bit signed/unsigned divide functions which also attempts to
cleanup the somewhat awkward calling API, which often requires the use of
temporary variables for the dividend. To avoid the need for temporary
variables everywhere for the remainder, each divide variant also provides a
version which doesn't return the remainder.
Each architecture can now provide optimized versions of these function,
otherwise generic fallback implementations will be used.
As an example I provided an alternative for the current x86 divide, which
avoids the asm casts and using an union allows gcc to generate better code.
It also avoids the upper divde in a few more cases, where the result is known
(i.e. upper quotient is zero).
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use a resource_size_t instead of unsigned long since some arch's are
capable of having ioremap deal with addresses greater than the size of a
unsigned long.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add klist_add_after() and klist_add_before() which puts a new node
after and before an existing node, respectively. This is useful for
callers which need to keep klist ordered. Note that synchronizing
between simultaneous additions for ordering is the caller's
responsibility.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
__FUNCTION__ is gcc specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can see an ever repeating problem pattern with objects of any kind in the
kernel:
1) freeing of active objects
2) reinitialization of active objects
Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore. One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.
While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause. This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.
The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.
Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.
The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash. Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.
Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.
The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.
The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list
Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation. When the sanity check triggers a warning message
and a stack trace is printed.
The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).
The core code enqueues the objects into hash buckets. The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
global lock.
The debug code can be compiled in without being active. The runtime overhead
is minimal and could be optimized by asm alternatives. A kernel command line
option enables the debugging code.
Thanks to Ingo Molnar for review, suggestions and cleanup patches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add "max_ratio" to /sys/class/bdi. This indicates the maximum percentage of
the global dirty threshold allocated to this bdi.
[mszeredi@suse.cz]
- fix parsing in max_ratio_store().
- export bdi_set_max_ratio() to modules
- limit bdi_dirty with bdi->max_ratio
- document new sysfs attribute
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.
In particular this properly exposes the read-ahead window for all relevant
users and /sys/block/<block>/queue/read_ahead_kb should be deprecated.
With patient help from Kay Sievers and Greg KH
[mszeredi@suse.cz]
- split off NFS and FUSE changes into separate patches
- document new sysfs attributes under Documentation/ABI
- do bdi_class_init as a core_initcall, otherwise the "default" BDI
won't be initialized
- remove bdi_init_fmt macro, it's not used very much
[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg KH <greg@kroah.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
[RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
[RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
[RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
[RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
[RAPIDIO] Auto-probe the RapidIO system size
[RAPIDIO] Add OF-tree support to RapidIO controller driver
[RAPIDIO] Add RapidIO multi mport support
[RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
[RAPIDIO] Add RapidIO option to kernel configuration
[RAPIDIO] Change RIO function mpc85xx_ to fsl_
[POWERPC] Provide walk_memory_resource() for powerpc
[POWERPC] Update lmb data structures for hotplug memory add/remove
[POWERPC] Hotplug memory remove notifications for powerpc
[POWERPC] windfarm: Add PowerMac 12,1 support
[POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
[POWERPC] Add IRQSTACKS support on ppc32
[POWERPC] Use __always_inline for xchg* and cmpxchg*
[POWERPC] Add fast little-endian switch system call
The mapsize optimizations which were moved from x86 to the generic
code in commit 64970b68d2 increased the
binary size on non x86 architectures.
Looking into the real effects of the "optimizations" it turned out
that they are not used in find_next_bit() and find_next_zero_bit().
The ones in find_first_bit() and find_first_zero_bit() are used in a
couple of places but none of them is a real hot path.
Remove the "optimizations" all together and call the library functions
unconditionally.
Boot-tested on x86 and compile tested on every cross compiler I have.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.
This change also enables us to eliminate the check every time idr_init() is
called.
[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces.
Implement the old dma_*map_*() interfaces in terms of the corresponding new
interfaces. For ia64/sn, make use of one dma attribute,
DMA_ATTR_WRITE_BARRIER. Introduce swiotlb_*map*_attrs() functions.
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some
serious complaining of kernel, we need repeat the warnings, so here I isolate
the ratelimit part of printk.c to a standalone file.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa1). SWIOTLB can use it instead
of the homegrown function.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a pointlessly braced block of code in there. Remove the braces and
save a tabstop.
Cc: Andi Kleen <ak@suse.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Almost all implementations of pci_iomap() in the kernel, including the generic
lib/iomap.c one, copies the content of a struct resource into unsigned long's
which will break on 32 bits platforms with 64 bits resources.
This fixes all definitions of pci_iomap() to use resource_size_t. I also
"fixed" the 64bits arch for consistency.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains
logical memory region mapping in the lmb.memory structure. Walk
through these structures and do the callbacks for the contiguous
chunks.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc kernel maintains information about logical memory blocks
in the lmb.memory structure, which is initialized and updated at boot
time, but not when memory is added or removed while the kernel is
running.
This adds a hotplug memory notifier which updates lmb.memory when
memory is added or removed. This information is useful for eHEA
driver to find out the memory layout and holes.
NOTE: No special locking is needed for lmb_add() and lmb_remove().
Calls to these are serialized by caller. (pSeries_reconfig_chain).
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.
The bitmap_onto() operator computes one bitmap relative to another. If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.
The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.
There are two substantive changes between this patch and its
predecessor bitmap_relative:
1) Renamed bitmap_relative() to be bitmap_onto().
2) Added bitmap_fold().
The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N. The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.
Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
1) A task cannot at present reliably establish a cpuset
relative mempolicy because there is an essential race
condition, in that the tasks cpuset may be changed in
between the time the task can query its cpuset placement,
and the time the task can issue the applicable mbind or
set_memplicy system call.
2) A task cannot at present establish what cpuset relative
mempolicy it would like to have, if it is in a smaller
cpuset than it might have mempolicy preferences for,
because the existing interface only allows specifying
mempolicies for nodes currently allowed by the cpuset.
Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.
The motivation for the added bitmap_fold() can be seen in the following
example.
Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15. Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes. That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <kosaki.motohiro@jp.fujitsu.com>
Cc: <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Migrate flags must be set on slab creation as agreed upon when the antifrag
logic was reviewed. Otherwise some slabs of a slabcache will end up in the
unmovable and others in the reclaimable section depending on which flag was
active when a new slab page was allocated.
This likely slid in somehow when antifrag was merged. Remove it.
The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
SLAB_RECLAIM_ACCOUNT option is set. The set_migrateflags() never had any
effect there.
Radix tree allocations are not directly reclaimable but they are allocated
with __GFP_RECLAIMABLE set on each allocation. We now set
SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
tree slabs are consistently placed in the reclaimable section. Radix tree
slabs will also be accounted as such.
There is then no user left of set_migratepages. So remove it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.
I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.
The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.
This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.
On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:
In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
On x86_64, 52 locations are optimized with a minimal increase in
code size:
Current #testing defconfig:
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392637 846592 724424 6963653 6a41c5 vmlinux
After removing the x86_64 specific optimization for find_next_*bit:
94 x bsf, 79 x find_next_*bit
text data bss dec hex filename
5392358 846592 724424 6963374 6a40ae vmlinux
After this patch (making the optimization generic):
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392396 846592 724424 6963412 6a40d4 vmlinux
[ tglx@linutronix.de: build fixes ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.
Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.
Athlon Xeon Opteron 32/64bit
x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s
generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s
If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.
Using the generic version does give a slightly bigger kernel, though.
defconfig: text data bss dec hex filename
x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit)
generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit)
x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit)
generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit)
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add option to enable -Wframe-larger-than= on gcc 4.4
gcc mainline (upcoming 4.4) added a new -Wframe-larger-than=...
option to warn at build time about too large stack frames. Add a config
option to enable this warning, since this very useful for the kernel.
I choose (somewhat arbitarily) 2048 as default warning threshold for 64bit
and 1024 as default for 32bit architectures. With some research and
fixing all the code for smaller values these defaults should be probably
lowered.
With the default allyesconfigs have some new warnings, but I think
that is all code that should be just fixed.
At some point (when gcc 4.4 is released and widely used) this should
obsolete make checkstack
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Changeset d9024df02f ("[LMB] Restructure
allocation loops to avoid unsigned underflow") removed the alignment
of the 'size' argument to call lmb_add_region() done by __lmb_alloc_base().
In doing so it reintroduced the bug fixed by changeset
eea89e13a9 ("[LMB]: Fix bug in
__lmb_alloc_base().").
This puts it back.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...