Commit Graph

3972 Commits

Author SHA1 Message Date
Ingo Molnar
3640543df2 [PATCH] netpoll: fix netpoll lockup
current -git doesnt boot on my laptop due to netpoll not unlocking the
tx lock in the else branch.

booted this up on my laptop with lockdep enabled and there are no
locking complaints and it works fine.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-12 08:37:51 -08:00
Andrew Morton
a49f99ffca [NETPOLL]: Fix local_bh_enable() warning.
During boot we get:

netconsole: device eth0 not up yet, forcing it
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
WARNING (!__warned) at kernel/softirq.c:137 local_bh_enable()

Call Trace:
 [<ffffffff80235baf>] local_bh_enable+0x41/0xa3
 [<ffffffff8045ab8e>] netpoll_send_skb+0x116/0x144
 [<ffffffff8045b1ee>] netpoll_send_udp+0x263/0x271
 [<ffffffff803d41ec>] write_msg+0x42/0x5e
 [<ffffffff80230c9b>] __call_console_drivers+0x5f/0x70
 [<ffffffff80230d19>] _call_console_drivers+0x6d/0x71
 [<ffffffff802313f0>] release_console_sem+0x148/0x1ec
 [<ffffffff802316ce>] register_console+0x1b1/0x1ba
 [<ffffffff803d4178>] init_netconsole+0x54/0x68
 [<ffffffff802071ae>] init+0x152/0x308
 [<ffffffff804dac8b>] _spin_unlock_irq+0x14/0x30
 [<ffffffff8022c15e>] schedule_tail+0x43/0x9f
 [<ffffffff8020a758>] child_rip+0xa/0x12

Herbert sayeth:

  Normally networking isn't invoked with interrupts turned off, but I
  suppose we don't have a choice here.  This is unique being a place where you
  can get called with BH on, off, or IRQs off.

  Given that this is only used for printk, the easiest solution is probably
  just to disable local IRQs instead of BH.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-11 17:24:46 -08:00
Simon Horman
37004af3aa [IPVS]: Make ip_vs_sync.c <= 80col wide.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-11 14:35:03 -08:00
Simon Horman
89eaeb09ba [IPVS]: Use msleep_interruptable() instead of ssleep() aka msleep()
Dean Manners notices that when an IPVS synchonisation daemons are
started the system load slowly climbs up to 1. This seems to be related
to the call to ssleep(1) (aka msleep(1000) in the main loop. Replacing
this with a call to msleep_interruptable() seems to make the problem go
away. Though I'm not sure that it is correct.

This is the second edition of this patch, which replaces ssleep()
in the main loop for both the master and backup threads, as well
as some thread synchronisation code. The latter is just for thorougness
as it shouldn't be causing any problems.

Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-11 14:35:02 -08:00
Ralf Baechle
f654c854d1 [HAMRADIO]: Fix baycom_epp.c compile failure.
Fix foobar in 15b1c0e822 and
e8cc49bb0f patch series.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-11 14:35:01 -08:00
Arnaldo Carvalho de Melo
8109b02b53 [DCCP]: Whitespace cleanups
That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:35:00 -08:00
Arnaldo Carvalho de Melo
1fba78b6cb [DCCP] ccid3: Fixup some type conversions related to rtts
Spotted by David Miller when compiling on sparc64, I reproduced it here on
parisc64, that are the only platforms to define __kernel_suseconds_t as an
'int', all the others, x86_64 and x86 included typedef it as a 'long', but from
the definition of suseconds_t it should just be an 'int' on platforms where it
is >= 32bits, it would not require all the castings from suseconds_t to (int)
when printking variables of this type, that are not needed on parisc64 and
sparc64.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:59 -08:00
Gerrit Renker
9e8efc8240 [DCCP] ccid3: BUG-FIX - conversion errors
This fixes conversion errors which arose by not properly type-casting
from u32 to __u64. Fixed by explicitly casting each type which is not
__u64, or by performing operation after assignment.

The patch further adds missing debug information to track the current
value of X_recv.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:58 -08:00
Gerrit Renker
7af5af3013 [DCCP] ccid3: Reorder packet history source file
No code change at all.

This reorders the source file to follow the same order as the corresponding
header file.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:57 -08:00
Gerrit Renker
85dcb1f780 [DCCP] ccid3: Reorder packet history header file
No code change at all.

To make the header file easier to read, the following ordering is established
among the declarations:
	* hist_new
	* hist_delete
	* hist_entry_new
	* hist_head
	* hist_find_entry
	* hist_add_entry
	* hist_entry_delete
	* hist_purge

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:56 -08:00
Gerrit Renker
a967241129 [DCCP] ccid3: Make debug output consistent
This patch does not alter any algorithm, just the debug message format:

 * s#%s, sk=%p#%s(%p)#g

 * when a statename is present, it now uses %s(%p, state=%s)

 * when only function entry is debugged, it adds an `- entry'

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:55 -08:00
Gerrit Renker
c5a1ae9a4c [DCCP] ccid3: Perform history operations only after packet has been sent
This migrates all packet history operations into the routine
 ccid3_hc_tx_packet_sent, thereby removing synchronization problems
 that occur when, as before, the operations are spread over multiple
 routines.
 The following minor simplifications are also applied:
  * several simplifications now follow from this change - several tests
    are now no longer required
  * removal of one unnecessary variable (dp)

Justification:

 Currently packet history operations span two different routines,
 one of which is likely to pass through several iterations of sleeping
 and awakening.
 The first routine, ccid3_hc_tx_send_packet, allocates an entry and
 sets a few fields. The remaining fields are filled in when the second
 routine (which is not within a sleeping context), ccid3_hc_tx_packet_sent,
 is called. This has several strong drawbacks:
  * it is not necessary to split history operations - all fields can be
    filled in by the second routine
  * the first routine is called multiple times, until a packet can be sent,
    and sleeps meanwhile - this causes a lot of difficulties with regard to
    keeping the list consistent
  * since both routines do not have a producer-consumer like synchronization,
    it is very difficult to maintain data across calls to these routines
  * the fact that the routines are called in different contexts (sleeping, not
    sleeping) adds further problems

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:54 -08:00
Gerrit Renker
e312d100f1 [DCCP] ccid3: TX history - remove unused field
This removes the `dccphtx_ccval' field since it is nowhere used in the code and
in fact not necessary for the accounting.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:53 -08:00
Gerrit Renker
9f8681db96 [DCCP] ccid3: Shift window counter computation
This puts the window counter computation [RFC 4342, 8.1] into a separate
 function which is called whenever a new packet is ready for immediate
 transmission in ccid3_hc_tx_send_packet.

Justification:

 The window counter update was previously computed after the packet was sent. This has
 two drawbacks, both fixed by this patch:
   1) re-compute another timestamp almost directly after the packet was sent (expensive),
   2) the CCVal for the window counter is needed at the instant the packet is sent.

Further details:

 The initialisation of the window counter is left in the state NO_SENT, as before.
 The algorithm will do nothing if either RTT is initialised to 0 (which is ok) or if
 the RTT value remains below 4 microseconds (which is almost pathological).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:52 -08:00
Gerrit Renker
de553c189e [DCCP] ccid3: Sanity-check RTT samples
CCID3 performance depends much on the accuracy of RTT samples.  If RTT
samples grow too large, performance can be catastrophically poor.

To limit the amount of possible damage in such cases, the patch
 * introduces an upper limit which identifies a maximum `sane' RTT value;
 * uses a macro to enforce this upper limit.

Using a macro was given preference, since it is necessary to identify the
calling function in the warning message. Since exceeding this threshold
identifies a critical condition, DCCP_CRIT is used and not DCCP_WARN.

Many thanks to Ian McDonald for collaboration on this issue.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:51 -08:00
Gerrit Renker
fe0499ae95 [DCCP] ccid3: Initialise RTT values
In both the sender and the receiver it is possible that the stored
RTT value is accessed before an actual RTT estimate has been computed.

This patch
 * initialises the sender RTT to 0
     - the sender always accesses the RTT in ccid3_hc_tx_packet_sent
     - the RTT is further needed for the window counter algorithm

 * replaces the receiver initialisation of 5msec with 0
     - which has the same effect and removes an `XXX'
     - the RTT value is needed in ccid3_hc_rx_packet_recv as rtt_prev

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:50 -08:00
Gerrit Renker
65d6c2b42e [DCCP] ccid: Deprecate ccid_hc_tx_insert_options
The function ccid3_hc_tx_insert_options only does a redundant no-op,
 as the operation

  DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count;

 is already performed _unconditionally_ in ccid3_hc_tx_send_packet.

 Since there is further no current need for this function, it is removed
 entirely. Since furthermore, there is actually no present need for the
 entire interface function ccid_hc_tx_insert_options, it was decided to
 remove it also, to clean up the interface.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:49 -08:00
Gerrit Renker
f6282f4da5 [DCCP]: Warn when discarding packet due to internal errors
This adds a (debug) warning message which is triggered whenever a packet is
discarded due to send failure.

It also adds a conditional, so that an interruption during dccp_wait_for_ccid
is not treated as a `BUG': the rationale is that interruptions are external,
whereas bug warnings are concerned with the internals.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:48 -08:00
Gerrit Renker
bf58a381e8 [DCCP]: Only deliver to the CCID rx side in charge
This is an optimisation to reduce CPU load. The received feedback is now
only directed to the active CCID component, without requiring processing
also by the inactive one.

As a consequence, a similar test in ccid3.c is now redundant and is
also removed.

Justification:

 Currently DCCP works as a unidirectional service, i.e. a listening server
 is not at the same time a connecting client.
 As far as I can see, several modifications are necessary until that
 becomes possible.
 At the present time, received feedback is both fed to the rx/tx CCID
 modules. In unidirectional service, only one of these is active at any
 one time.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:47 -08:00
Gerrit Renker
d63d8364cf [DCCP]: Simplify TFRC calculation
In migrating towards using the newer functions scaled_div/scaled_div32
for TFRC computations mapped from floating-point onto integer arithmetic,
this completes the last stage of modifications.

In particular, the overflow case for computing X_calc is circumvented by
 * breaking the computation into two stages
 * the first stage, res = (s*1E6)/R, cannot overflow due to use of u64
 * in the second stage, res = (res*1E6)/f, overflow on u32 is avoided due
   to (i) returning UINT_MAX in this case (which is logically appropriate)
   and (ii) issuing a warning message into the system log (since very likely
   there is a problem somewhere else with the parameters)

Lastly, all such scaling operations are now exported into tfrc.h, since
actually this form of scaled computation is specific to TFRC and not to CCID3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:46 -08:00
Gerrit Renker
0f9e5b573f [DCCP]: Debug timeval operations
Problem:

 Most target types in the CCID3 code are u32, so subtle conversion errors
 can occur if signed time calculations yield negative results: the original
 values are lost in the conversion to unsigned, calculation errors go undetected.

 This patch therefore
   * sets all critical time types from unsigned to suseconds_t
   * avoids comparison between signed/unsigned via type-casting
   * provides ample warning messages in case time calculations are negative

 These warning messages can be removed at a later stage when the code
 has undergone more testing.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:45 -08:00
Gerrit Renker
bfe24a6cc2 [DCCP] ccid3: Simplify calculation for reverse lookup of p
This simplifies the calculation of a value p for a given fval when the
 first loss interval is computed (RFC 3448, 6.3.1). It makes use of the
 two new functions scaled_div/scaled_div32 to provide overflow protection.

 Additionally, protection against divide-by-zero is extended - in this
 case the function will return the maximally possible value of p=100%.

Background:

 The maximum fval, f(100%), is approximately 244, i.e. the scaled value of fval
 should never exceed 244E6, which fits easily into u32. The problem is the scaling
 by 10^6, since additionally R(TT) is in microseconds.
 This is resolved by breaking the division into two stages: the first stage
 computes fval=(s*10^6)/R, stores that into u64; the second stage computes
 fval = (fval*10^6)/X_recv and complains if overflow is reached for u32.
 This case is safe since the TFRC reverse-lookup routine then returns p=100%.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:44 -08:00
Gerrit Renker
b9039a2a8d [DCCP] ccid3: Replace scaled division operations
This replaces the remaining uses of usecs_div with scaled_div32, which
internally uses 64bit division and produces a warning on overflow.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:43 -08:00
Gerrit Renker
1a21e49a8d [DCCP] ccid3: Finer-grained resolution of sending rates
This patch
 * resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0
 * supports all sending rates from 1/64 bytes/second up to 4Gbyte/second
 * simplifies the present overflow problems in calculations

Current sending rate X and the cached value X_recv of the receiver-estimated
sending rate are both scaled by 64 (2^6) in order to
 * cope with low sending rates (minimally 1 byte/second)
 * allow upgrading to use a packets-per-second implementation of CCID 3
 * avoid calculation errors due to integer arithmetic cut-off

The patch implements a revised strategy from
http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html

The only difference with regard to that strategy is that t_ipi is already
used in the calculation of the nofeedback timeout, which saves one division.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:42 -08:00
Gerrit Renker
179ebc9f92 [DCCP] ccid3: Fix two bugs in sending rate computation
This fixes
 1) a bug in the recomputation of the sending rate by the nofeedback
    timer when no feedback at all has so far been sent by the receiver:
    min_t was used instead of max_t, which is wrong (cf. RFC 3448, p. 10);

 2) an error in the computation of larger initial windows: instead of
    min(... max()) (cf. RFC 4342, 5.), the code had used max(... max()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:41 -08:00
Gerrit Renker
ff58629824 [DCCP] ccid3: Two optimisations for sending rate recomputation
This performs two optimisations for the recomputation of the sending rate.

1) Currently the target sending rate X_calc is recalculated whenever
	a) the nofeedback timer expires, or
	b) a feedback packet is received.
   In the (a) case, recomputing X_calc is redundant, since

    * the parameters p and RTT do not change in between the
      reception of feedback packets;

    * the parameter X_recv is either modified from received
      feedback or via the nofeedback timer;

    * a test (`p == 0') in the nofeedback timer avoids using
      a stale/undefined value of X_calc if p was previously 0.

2) The nofeedback timer now only recomputes a timestamp when p == 0.
   This is according to step (4) of [RFC 3448, 4.3] and avoids
   unnecessarily determining a timestamp.

A debug statement about not updating X is also removed - it helps very
little in debugging and just clutters the logs.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:40 -08:00
Gerrit Renker
45393a66a2 [DCCP] ccid3: Check against too large p
This patch follows a suggestion by Ian McDonald and ensures that in
the current code the value of p can not exceed 100%.  Such a value is
illegal and would consequently cause a bug condition in tfrc_calc_x().

The receiver case is also tested, and a warning message is added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:39 -08:00
Ian McDonald
5cc3741d6c [DCCP]: Remove timeo from output.c
It simplifies waiting for the CCID module to signal that a packet
is ready to be sent.  Other simplifications flow on from this such as
removing constants.

As a result of this EAGAIN is not returned any more by dccp_wait_for_ccid
(which would otherwise lead to unnecessarily discarding the packet in
dccp_write_xmit).

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11 14:34:37 -08:00
Andrew Morton
e37b8d9319 [NETPOLL]: Make sure TX lock is taken with BH disabled.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-11 14:34:36 -08:00
Alexey Dobriyan
1f29bcd739 [PATCH] sysctl: remove unused "context" param
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
David Howells
db9a758c38 [PATCH] workstruct: fix ieee80211-softmac compile problem
Fix ieee80211-softmac compile problem where it's using schedule_work() on a
delayed_work struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:39 -08:00
Jarek Poplawski
160d5e10f8 [NET_SCHED] sch_htb: turn intermediate classes into leaves
- turn intermediate classes into leaves again when their
  last child is deleted (struct htb_class changed)

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:32 -08:00
Jarek Poplawski
a37ef2e325 [NET_SCHED] sch_cbq: deactivating when grafting, purging etc.
- deactivating of active classes when q.qlen drops to zero
  (cbq_drop)

- a redundant instruction removed from cbq_deactivate_class

PS: probably htb_deactivate in htb_delete and
cbq_deactivate_class in cbq_delete are also
redundant now.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:31 -08:00
Neil Horman
47bbec0282 [NETPOLL]: make arp replies through netpoll use mac address of sender
Back in 2.4 arp requests that were recevied by netpoll were processed
in netconsole_receive_skb, where they were responded to using the src
mac of the request sender.  In the 2.6 kernel arp_reply is responsible
for this function, but instead of using the src mac address of the
incomming request, the stored mac address that was registered for the
netconsole application is used.  While this is usually ok, it can lead
to failures in netpoll in some situations (specifically situations
where a network may have two gateways, as arp requests from one may be
responded to using the mac address of the other).  This patch reverts
the behavior to what we had in 2.4, in which all arp requests are sent
back using the src address of the request sender.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:28 -08:00
Ralf Baechle
15b1c0e822 [AX.25]: Fix default address and broadcast address initialization.
Only the callsign but not the SSID part of an AX.25 address is ASCII
based but Linux by initializes the SSID which should be just a 4-bit
number from ASCII anyway.

Fix that and convert the code to use a shared constant for both default
addresses.  While at it, use the same style for null_ax25_address also.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:26 -08:00
Ralf Baechle
e8cc49bb0f [AX.25]: Constify ax25 utility functions
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:25 -08:00
Stephen Hemminger
3644f0cee7 [NET]: Convert hh_lock to seqlock.
The hard header cache is in the main output path, so using
seqlock instead of reader/writer lock should reduce overhead.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:20 -08:00
Alan Cox
606d099cdd [PATCH] tty: switch to ktermios
This is the grungy swap all the occurrences in the right places patch that
goes with the updates.  At this point we have the same functionality as
before (except that sgttyb() returns speeds not zero) and are ready to
begin turning new stuff on providing nobody reports lots of bugs

If you are a tty driver author converting an out of tree driver the only
impact should be termios->ktermios name changes for the speed/property
setting functions from your upper layers.

If you are implementing your own TCGETS function before then your driver
was broken already and its about to get a whole lot more painful for you so
please fix it 8)

Also fill in c_ispeed/ospeed on init for most devices, although the current
code will do this for you anyway but I'd like eventually to lose that extra
paranoia

[akpm@osdl.org: bluetooth fix]
[mp3@de.ibm.com: sclp fix]
[mp3@de.ibm.com: warning fix for tty3270]
[hugh@veritas.com: fix tty_ioctl powerpc build]
[jdike@addtoit.com: uml: fix ->set_termios declaration]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:57 -08:00
Alan Cox
edc6afc549 [PATCH] tty: switch to ktermios and new framework
This is the core of the switch to the new framework.  I've split it from the
driver patches which are mostly search/replace and would encourage people to
give this one a good hard stare.

The references to BOTHER and ISHIFT are the termios values that must be
defined by a platform once it wants to turn on "new style" ioctl support.  The
code patches here ensure that providing

1. The termios overlays the ktermios in memory
2. The only new kernel only fields are c_ispeed/c_ospeed (or none)

the existing behaviour is retained.  This is true for the patches at this
point in time.

Future patches will define BOTHER, ISHIFT and enable newer termios structures
for each architecture, and once they are all done some of the ifdefs also
vanish.

[akpm@osdl.org: warning fix]
[akpm@osdl.org: IRDA fix]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:56 -08:00
Josef Sipek
592ccbf9fb [PATCH] struct path: convert unix
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:50 -08:00
Josef Sipek
303b46bb77 [PATCH] struct path: convert sunrpc
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:50 -08:00
Josef Sipek
6db5fc5d53 [PATCH] struct path: convert netlink
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:48 -08:00
Josef Sipek
6df81ab227 [PATCH] struct path: convert netfilter
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:48 -08:00
Josef Sipek
3126a42c4d [PATCH] struct path: convert net
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:48 -08:00
Josef Sipek
76a0f17429 [PATCH] struct path: convert atm
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:44 -08:00
Trond Myklebust
21b4e73692 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus 2006-12-07 16:35:17 -05:00
Trond Myklebust
34161db6b1 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus
Conflicts:

	include/linux/sunrpc/xprt.h
	net/sunrpc/xprtsock.c
Fix up conflicts with the workqueue changes.
2006-12-07 15:48:15 -05:00
Linus Torvalds
0a01707b28 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (43 commits)
  [wireless] zd1211rw: workqueue-related build fixes
  [netdrvr] netxen: workqueue-related build fixes
  [PATCH] sky2: sparse warnings
  [PATCH] skge: fix sparse warnings
  [PATCH] myri10ge: write as 2 32-byte blocks in myri10ge_submit_8rx
  [PATCH] sky2: receive queue watermark tweak
  [PATCH] sky2: beter ram buffer partitioning
  [PATCH] sky2: add comments to PCI ids
  [PATCH] sky2: add PCI for 88ec033
  [PATCH] AT91RM9200 Ethernet: Use dev_alloc_skb()
  [PATCH] AT91RM9200 Ethernet: Add netpoll / netconsole support
  [PATCH] AT91RM9200 Ethernet: Move check_timer variable and use mod_timer()
  [PATCH] AT91RM9200 Ethernet: Remove 'at91_dev' and use netdev_priv()
  [PATCH] ipw2200: Fix debug output endian issue
  [PATCH] ipw2200: Fix a typo
  [PATCH] ipw2200: Update version stamp to 1.2.0
  [PATCH] ipw2200: Add IEEE80211_RADIOTAP_TSFT for promiscuous mode
  [PATCH] softmac: fix unbalanced mutex_lock/unlock in ieee80211softmac_wx_set_mlme
  [PATCH] softmac: Fixed handling of deassociation from AP
  [PATCH] ipw2200: replace kmalloc+memset with kcalloc
  ...
2006-12-07 09:09:52 -08:00
Linus Torvalds
2685b267bc Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits)
  [NETFILTER]: Fix non-ANSI func. decl.
  [TG3]: Identify Serdes devices more clearly.
  [TG3]: Use msleep.
  [TG3]: Use netif_msg_*.
  [TG3]: Allow partial speed advertisement.
  [TG3]: Add TG3_FLG2_IS_NIC flag.
  [TG3]: Add 5787F device ID.
  [TG3]: Fix Phy loopback.
  [WANROUTER]: Kill kmalloc debugging code.
  [TCP] inet_twdr_hangman: Delete unnecessary memory barrier().
  [NET]: Memory barrier cleanups
  [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries.
  audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n
  audit: Add auditing to ipsec
  [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n
  [IrDA]: Incorrect TTP header reservation
  [IrDA]: PXA FIR code device model conversion
  [GENETLINK]: Fix misplaced command flags.
  [NETLIK]: Add a pointer to the Generic Netlink wiki page.
  [IPV6] RAW: Don't release unlocked sock.
  ...
2006-12-07 09:05:15 -08:00
Eric Dumazet
304e61e6fb [PATCH] net: don't insert socket dentries into dentry_hashtable
We currently insert socket dentries into the global dentry hashtable.  This
is suboptimal because there is currently no way these entries can be used
for a lookup().  (/proc/xxx/fd/xxx uses a different mechanism).  Inserting
them in dentry hashtable slows dcache lookups.

To let __dpath() still work correctly (ie not adding a " (deleted)") after
dentry name, we do :

- Right after d_alloc(), pretend they are hashed by clearing the
  DCACHE_UNHASHED bit.

- Call d_instantiate() instead of d_add() : dentry is not inserted in
  hash table.

  __dpath() & friends work as intended during dentry lifetime.

- At dismantle time, once dput() must clear the dentry, setting again
  DCACHE_UNHASHED bit inside the custom d_delete() function provided by
  socket code, so that dput() can just kill_it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:41 -08:00
Ingo Molnar
0231606785 [PATCH] hotplug CPU: clean up hotcpu_notifier() use
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.

the compiler can skip truly unused functions just fine:

    text    data     bss     dec     hex filename
 1624412  728710 3674856 6027978  5bfaca vmlinux.before
 1624412  728710 3674856 6027978  5bfaca vmlinux.after

[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Peter Zijlstra
6cfd76a26d [PATCH] lockdep: name some old style locks
Name some of the remaning 'old_style_spin_init' locks

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Peter Zijlstra
ed07536ed6 [PATCH] lockdep: annotate nfs/nfsd in-kernel sockets
Stick NFS sockets in their own class to avoid some lockdep warnings.  NFS
sockets are never exposed to user-space, and will hence not trigger certain
code paths that would otherwise pose deadlock scenarios.

[akpm@osdl.org: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Dickson <SteveD@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:30 -08:00
Nigel Cunningham
7dfb71030f [PATCH] Add include/linux/freezer.h and move definitions from sched.h
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.

[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
Christoph Lameter
e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Christoph Lameter
e94b176609 [PATCH] slab: remove SLAB_KERNEL
SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Christoph Lameter
54e6ecb239 [PATCH] slab: remove SLAB_ATOMIC
SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Alan Stern
a120586873 [PATCH] Allow NULL pointers in percpu_free
The patch (as824b) makes percpu_free() ignore NULL arguments, as one would
expect for a deallocation routine.  (Note that free_percpu is #defined as
percpu_free in include/linux/percpu.h.) A few callers are updated to remove
now-unneeded tests for NULL.  A few other callers already seem to assume
that passing a NULL pointer to percpu_free() is okay!

The patch also removes an unnecessary NULL check in percpu_depopulate().

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Christoph Hellwig
b30973f877 [PATCH] node-aware skb allocation
Node-aware allocation of skbs for the receive path.

Details:

  - __alloc_skb gets a new node argument and cals the node-aware
    slab functions with it.
  - netdev_alloc_skb passed the node number it gets from dev_to_node
    to it, everyone else passes -1 (any node)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:22 -08:00
Jeff Garzik
359f2d17e3 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream
Conflicts:

	drivers/net/wireless/zd1211rw/zd_mac.h
	net/ieee80211/softmac/ieee80211softmac_assoc.c
2006-12-07 05:02:40 -05:00
Randy Dunlap
272491ef42 [NETFILTER]: Fix non-ANSI func. decl.
Fix non-ANSI function declaration:
net/netfilter/nf_conntrack_core.c:1096:25: warning: non-ANSI function declaration of function 'nf_conntrack_flush'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-07 01:17:24 -08:00
David S. Miller
456c38f968 [WANROUTER]: Kill kmalloc debugging code.
It duplicates what SLAB debug can do already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-07 00:18:22 -08:00
David S. Miller
905eee008b [TCP] inet_twdr_hangman: Delete unnecessary memory barrier().
As per Ralf Baechle's observations, the schedule_work() call
should give enough of a memory barrier, so the explicit one
here is totally unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-07 00:12:30 -08:00
Ralf Baechle
e16aa207cc [NET]: Memory barrier cleanups
I believe all the below memory barriers only matter on SMP so
therefore the smp_* variant of the barrier should be used.

I'm wondering if the barrier in net/ipv4/inet_timewait_sock.c should be
dropped entirely.  schedule_work's implementation currently implies a
memory barrier and I think sane semantics of schedule_work() should imply
a memory barrier, as needed so the caller shouldn't have to worry.
It's not quite obvious why the barrier in net/packet/af_packet.c is
needed; maybe it should be implied through flush_dcache_page?

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-07 00:11:33 -08:00
David S. Miller
26db167702 [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries.
We grab a reference to the route's inetpeer entry but
forget to release it in xfrm4_dst_destroy().

Bug discovered by Kazunori MIYAZAWA <kazunori@miyazawa.org>

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 23:45:15 -08:00
Joy Latten
c9204d9ca7 audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n
Disables auditing in ipsec when CONFIG_AUDITSYSCALL is
disabled in the kernel.

Also includes a bug fix for xfrm_state.c as a result of
original ipsec audit patch.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 20:14:23 -08:00
Joy Latten
161a09e737 audit: Add auditing to ipsec
An audit message occurs when an ipsec SA
or ipsec policy is created/deleted.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 20:14:22 -08:00
Jeet Chaudhuri
e694ba4428 [IrDA]: Incorrect TTP header reservation
We must reserve SAR + MAX_HEADER bytes for IrLMP to fit in.

Patch from Jeet Chaudhuri <jeetlinux@yahoo.co.in>

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 20:08:45 -08:00
Jamal Hadi Salim
48d4ed7a86 [GENETLINK]: Fix misplaced command flags.
The command flags for dump and do were swapped..

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 20:06:25 -08:00
Masahide NAKAMURA
4e33fa14fa [IPV6] RAW: Don't release unlocked sock.
When user builds IPv6 header and send it through raw socket, kernel
tries to release unlocked sock. (Kernel log shows
"BUG: bad unlock balance detected" with enabled debug option.)

The lock is held only for non-hdrincl sock in this function
then this patch fix to do nothing about lock for hdrincl one.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:09 -08:00
YOSHIFUJI Hideaki
9a217a1c7e [IPV6]: Repair IPv6 Fragments
The commit "[IPV6]: Use kmemdup" (commit-id:
af879cc704) broke IPv6 fragments.

Bug was spotted by Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:08 -08:00
Patrick McHardy
5c804bfdcc [NET_SCHED]: cls_fw: fix NULL pointer dereference
When the first fw classifier is initialized, there is a small window
between the ->init() and ->change() calls, during which the classifier
is active but not entirely set up and tp->root is still NULL (->init()
does nothing).

When a packet is queued during this window a NULL pointer dereference
occurs in fw_classify() when trying to dereference head->mask;

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:07 -08:00
Bart De Schuymer
f216f082b2 [NETFILTER]: bridge netfilter: deal with martians correctly
The attached patch resolves an issue where a IP DNATed packet with a
martian source is forwarded while it's better to drop it. It also
resolves messages complaining about ip forwarding being disabled while
it's actually enabled. Thanks to lepton <ytht.net@gmail.com> for
reporting this problem.

This is probably a candidate for the -stable release.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:06 -08:00
Yasuyuki Kozakai
ece006416d [NETFILTER]: nf_conntrack: Don't try to find clashed expectation
The original code continues loop to find expectation in list if the master
conntrack of the found expectation is unconfirmed. But it never success
in that case, because nf_conntrack_expect_related() never insert
clashed expectation to the list.

This stops loop in that case.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:05 -08:00
Dmitry Mishin
f6677f4312 [NETFILTER]: Fix iptables compat hook validation
In compat mode, matches and targets valid hooks checks always successful due
to not initialized e->comefrom field yet. This patch separates this checks from
translation code and moves them after mark_source_chains() call, where these
marks are initialized.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by; Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:03 -08:00
Dmitry Mishin
74c9c0c17d [NETFILTER]: Fix {ip,ip6,arp}_tables hook validation
Commit 590bdf7fd2 introduced a regression
in match/target hook validation. mark_source_chains builds a bitmask
for each rule representing the hooks it can be reached from, which is
then used by the matches and targets to make sure they are only called
from valid hooks. The patch moved the match/target specific validation
before the mark_source_chains call, at which point the mask is always zero.

This patch returns back to the old order and moves the standard checks
to mark_source_chains. This allows to get rid of a special case for
standard targets as a nice side-effect.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:02 -08:00
Kazunori MIYAZAWA
7cf4c1a5fd [IPSEC]: Add support for AES-XCBC-MAC
The glue of xfrm.

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-12-06 18:38:51 -08:00
Jamal Hadi Salim
94b9bb5480 [XFRM] Optimize SA dumping
Same comments as in "[XFRM] Optimize policy dumping"

The numbers are (20K SAs):
2006-12-06 18:38:45 -08:00
Jamal Hadi Salim
baf5d743d1 [XFRM] Optimize policy dumping
This change optimizes the dumping of Security policies.

1) Before this change ..
speedopolis:~# time ./ip xf pol

real    0m22.274s
user    0m0.000s
sys     0m22.269s

2) Turn off sub-policies

speedopolis:~# ./ip xf pol

real    0m13.496s
user    0m0.000s
sys     0m13.493s

i suppose the above is to be expected

3) With this change ..
speedopolis:~# time ./ip x policy

real    0m7.901s
user    0m0.008s
sys     0m7.896s
2006-12-06 18:38:44 -08:00
Patrick McHardy
1b6651f1bf [XFRM]: Use output device disable_xfrm for forwarded packets
Currently the behaviour of disable_xfrm is inconsistent between
locally generated and forwarded packets. For locally generated
packets disable_xfrm disables the policy lookup if it is set on
the output device, for forwarded traffic however it looks at the
input device. This makes it impossible to disable xfrm on all
devices but a dummy device and use normal routing to direct
traffic to that device.

Always use the output device when checking disable_xfrm.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:38:43 -08:00
Jamal Hadi Salim
334c29a645 [GENETLINK]: Move command capabilities to flags.
This patch moves command capabilities to command flags. Other than
being cleaner, saves several bytes.
We increment the nlctrl version so as to signal to user space that
to not expect the attributes. We will try to be careful
not to do this too often ;->

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:38:41 -08:00
Chuck Lever
5847e1f4d0 SUNRPC: Remove pprintk() from net/sunrpc/xprt.c
These appear to be deprecated.  Removing them also gets rid of some sparse
noise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:55 -05:00
Chuck Lever
fbf76683ff SUNRPC: relocate the creation of socket-specific tunables
Clean-up:

The RPC client currently creates some sysctls that are specific to the
socket transport.  Move those entirely into xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:53 -05:00
Chuck Lever
282b32e17f SUNRPC: create stubs for xprtsock init and cleanup
Over time we will want to add some specific init and cleanup logic for the
xprtsock implementation.  Add stub routines for initialization and exit
processing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:53 -05:00
Chuck Lever
dd4564715e SUNRPC: Rename skb_reader_t and friends
Clean-up:  hch suggested that the RPC client shouldn't pollute the name
space used by the generic skb manipulation routines in net/core/skbuff.c.

Rename a couple of types in xdr.h to adhere to this convention.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:52 -05:00
Chuck Lever
9d29231690 SUNRPC: skb_read_bits is the same as xs_tcp_copy_data
Clean-up: eliminate xs_tcp_copy_data -- it's exactly the same logic as the
common routine skb_read_bits.  The UDP and TCP socket read code now share
the same routine for copying data into an xdr_buf.

Now that skb_read_bits() is exported, rename it to avoid confusing it with
a generic skb_* function.  As these functions are XDR-specific, they should
not have names that suggest they are of generic use.  Also rename
skb_read_and_csum_bits() to be consistent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:52 -05:00
Chuck Lever
7559c7a28f SUNRPC: Make address format buffers more generic
For now we will assume that all transports will use the address format
buffers in the rpc_xprt struct to store their addresses.  Change
rpc_peer2str() to be a generic routine to handle this, and get rid of the
print_address() op in the rpc_xprt_ops vector.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:51 -05:00
Chuck Lever
314dfd7987 SUNRPC: move saved socket callback functions to a private data structure
Move the three fields for saving socket callback functions out of the
rpc_xprt structure and into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:51 -05:00
Chuck Lever
7c6e066ec2 SUNRPC: Move the UDP socket bufsize parameters to a private data structure
Move the socket-specific buffer size parameters for UDP sockets to a
private data structure maintained in net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
c847546182 SUNRPC: Move rpc_xprt socket connect fields into private data structure
Move the socket-specific connection management fields out of the generic
rpc_xprt structure into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
e136d0926e SUNRPC: Move TCP state flags into xprtsock.c
Move "XPRT_LAST_FRAG" and friends from xprt.h into xprtsock.c, and rename
them to use the naming scheme in use in xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
51971139b2 SUNRPC: Move TCP receive state variables into private data structure
Move the TCP receive state variables from the generic rpc_xprt structure to
a private structure maintained inside net/sunrpc/xprtsock.c.

Also rename a function/variable pair to refer to RPC fragment headers
instead of record markers, to be consistent with types defined in
sunrpc/*.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:49 -05:00
Chuck Lever
ee0ac0c227 SUNRPC: Remove sock and inet fields from rpc_xprt
The "sock" and "inet" fields are socket-specific.  Move them to a private
data structure maintained entirely within net/sunrpc/xprtsock.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:49 -05:00
Chuck Lever
ffc2e518c9 SUNRPC: Allocate a private data area for socket-specific rpc_xprt fields
When setting up a new transport instance, allocate enough memory for an
rpc_xprt and a private area.  As part of the same memory allocation, it
will be easy to find one, given a pointer to the other.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:48 -05:00
J. Bruce Fields
94efa93435 rpcgss: krb5: miscellaneous cleanup
Miscellaneous cosmetic fixes.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:48 -05:00
J. Bruce Fields
717757ad10 rpcgss: krb5: ignore seed
We're currently not actually using seed or seed_init.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:47 -05:00
J. Bruce Fields
d922a84a8b rpcgss: krb5: sanity check sealalg value in the downcall
The sealalg is checked in several places, giving the impression it could be
either SEAL_ALG_NONE or SEAL_ALG_DES.  But in fact SEAL_ALG_NONE seems to
be sufficient only for making mic's, and all the contexts we get must be
capable of wrapping as well.  So the sealalg must be SEAL_ALG_DES.  As
with signalg, just check for the right value on the downcall and ignore it
otherwise.  Similarly, tighten expectations for the sealalg on incoming
tokens, in case we do support other values eventually.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:47 -05:00
J. Bruce Fields
39a21dd1b0 rpcgss: krb5: clean up some goto's, etc.
Remove some unnecessary goto labels; clean up some return values; etc.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:46 -05:00
J. Bruce Fields
ca54f89645 rpcgss: simplify make_checksum
We're doing some pointless translation between krb5 constants and kernel
crypto string names.

Also clean up some related spkm3 code as necessary.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:46 -05:00
J. Bruce Fields
2818bf81a8 rpcgss: krb5: kill checksum_type, miscellaneous small cleanup
Previous changes reveal some obvious cruft.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:45 -05:00
J. Bruce Fields
5eb064f939 rpcgss: krb5: expect a constant signalg value
We also only ever receive one value of the signalg, so let's not pretend
otherwise

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:45 -05:00
J. Bruce Fields
e678e06bf8 gss: krb5: remove signalg and sealalg
We designed the krb5 context import without completely understanding the
context.  Now it's clear that there are a number of fields that we ignore,
or that we depend on having one single value.

In particular, we only support one value of signalg currently; so let's
check the signalg field in the downcall (in case we decide there's
something else we could support here eventually), but ignore it otherwise.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:44 -05:00
Olga Kornievskaia
adeb8133dd rpc: spkm3 update
This updates the spkm3 code to bring it up to date with our current
understanding of the spkm3 spec.

In doing so, we're changing the downcall format used by gssd in the spkm3 case,
which will cause an incompatilibity with old userland spkm3 support.  Since the
old code a) didn't implement the protocol correctly, and b) was never
distributed except in the form of some experimental patches from the citi web
site, we're assuming this is OK.

We do detect the old downcall format and print warning (and fail).  We also
include a version number in the new downcall format, to be used in the
future in case any further change is required.

In some more detail:

	- fix integrity support
	- removed dependency on NIDs. instead OIDs are used
	- known OID values for algorithms added.
	- fixed some context fields and types

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:44 -05:00
Olga Kornievskaia
37a4e6cb03 rpc: move process_xdr_buf
Since process_xdr_buf() is useful outside of the kerberos-specific code, we
move it to net/sunrpc/xdr.c, export it, and rename it in keeping with xdr_*
naming convention of xdr.c.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:44 -05:00
J. Bruce Fields
87d918d667 rpc: gss: fix a kmap_atomic race in krb5 code
This code is never called from interrupt context; it's always run by either
a user thread or rpciod.  So KM_SKB_SUNRPC_DATA is inappropriate here.

Thanks to Aimé Le Rouzic for capturing an oops which showed the kernel
taking an interrupt while we were in this piece of code, resulting in a
nested kmap_atomic(.,KM_SKB_SUNRPC_DATA) call from
xdr_partial_copy_from_skb().

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:43 -05:00
J. Bruce Fields
8fc7500bb8 rpc: gss: eliminate print_hexl()'s
Dumping all this data to the logs is wasteful (even when debugging is turned
off), and creates too much output to be useful when it's turned on.

Fix a minor style bug or two while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:43 -05:00
Chuck Lever
2b577f1f14 SUNRPC: another pmap wakeup fix
Don't wake up bind waiters if a task finds that another task is already
trying to bind.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:42 -05:00
Chuck Lever
c8541ecdd5 SUNRPC: Make the transport-specific setup routine allocate rpc_xprt
Change the location where the rpc_xprt structure is allocated so each
transport implementation can allocate a private area from the same
chunk of memory.

Note also that xprt->ops->destroy, rather than xprt_destroy, is now
responsible for freeing rpc_xprt when the transport is destroyed.

Test plan:
Connectathon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:34 -05:00
Trond Myklebust
24c5684b65 SUNRPC: Clean up xs_send_pages()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:33 -05:00
Trond Myklebust
bee57c99c3 SUNRPC: Ensure xdr_buf_read_netobj() checks for memory overruns
Also clean up the code...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:33 -05:00
Trond Myklebust
4e3e43ad14 SUNRPC: Add __(read|write)_bytes_from_xdr_buf
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:32 -05:00
Trond Myklebust
1e78957e0a SUNRPC: Clean up argument types in xdr.c
Converts various integer buffer offsets and sizes to unsigned integer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:32 -05:00
Trond Myklebust
6d5fcb5a52 SUNRPC: Remove BKL around the RPC socket operations etc.
All internal RPC client operations should no longer depend on the BKL,
however lockd and NFS callbacks may still require it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:30 -05:00
Trond Myklebust
bbd5a1f9fc SUNRPC: Fix up missing BKL in asynchronous RPC callback functions
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:29 -05:00
Trond Myklebust
3e32a5d99a SUNRPC: Give cloned RPC clients their own rpc_pipefs directory
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:29 -05:00
Trond Myklebust
23bf85ba43 SUNRPC: Handle the cases where rpc_alloc_iostats() fails
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:28 -05:00
Trond Myklebust
8aca67f0ae SUNRPC: Fix a potential race in rpc_wake_up_task()
Use RCU to ensure that we can safely call rpc_finish_wakeup after we've
called __rpc_do_wake_up_task. If not, there is a theoretical race, in which
the rpc_task finishes executing, and gets freed first.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:26 -05:00
Trond Myklebust
e6b3c4db6f Fix a second potential rpc_wakeup race...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:25 -05:00
Christophe Saout
cc4dc59e55 Subject: Re: [PATCH] Fix SUNRPC wakeup/execute race condition
The sunrpc scheduler contains a race condition that can let an RPC
task end up being neither running nor on any wait queue. The race takes
place between rpc_make_runnable (called from rpc_wake_up_task) and
__rpc_execute under the following condition:

First __rpc_execute calls tk_action which puts the task on some wait
queue. The task is dequeued by another process before __rpc_execute
continues its execution. While executing rpc_make_runnable exactly after
setting the task `running' bit and before clearing the `queued' bit
__rpc_execute picks up execution, clears `running' and subsequently
both functions fall through, both under the false assumption somebody
else took the job.

Swapping rpc_test_and_set_running with rpc_clear_queued in
rpc_make_runnable fixes that hole. This introduces another possible
race condition that can be handled by checking for `queued' after
setting the `running' bit.

Bug noticed on a 4-way x86_64 system under XEN with an NFSv4 server
on the same physical machine, apparently one of the few ways to hit
this race condition at all.

Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Christophe Saout <christophe@saout.de>
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
2006-12-06 10:46:24 -05:00
Maxime Austruy
cc8ce997d2 [PATCH] softmac: fix unbalanced mutex_lock/unlock in ieee80211softmac_wx_set_mlme
Routine ieee80211softmac_wx_set_mlme has one return that fails
to release a mutex acquired at entry.

Signed-off-by: Maxime Austruy <maxime@tralhalla.org>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-12-05 19:31:33 -05:00
Ulrich Kunitz
2b50c24554 [PATCH] softmac: Fixed handling of deassociation from AP
In 2.6.19 a deauthentication from the AP doesn't start a
reassociation by the softmac code. It appears that
mac->associnfo.associating must be set and the
ieee80211softmac_assoc_work function must be scheduled. This patch
fixes that.

Signed-off-by: Ulrich Kunitz <kune@deine-taler.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-12-05 19:31:33 -05:00
David Howells
9db7372445 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/ata/libata-scsi.c
	include/linux/libata.h

Futher merge of Linus's head and compilation fixups.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 17:01:28 +00:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Linus Torvalds
9b8ab9f6c3 Merge branch 'for-linus4' of master.kernel.org:/pub/scm/linux/kernel/git/viro/bird
* 'for-linus4' of master.kernel.org:/pub/scm/linux/kernel/git/viro/bird:
  [PATCH] severing poll.h -> mm.h
  [PATCH] severing skbuff.h -> mm.h
  [PATCH] severing skbuff.h -> poll.h
  [PATCH] severing skbuff.h -> highmem.h
  [PATCH] severing uaccess.h -> sched.h
  [PATCH] severing fs.h, radix-tree.h -> sched.h
  [PATCH] severing module.h->sched.h
2006-12-04 10:37:06 -08:00
Al Viro
d7fe0f241d [PATCH] severing skbuff.h -> mm.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04 02:00:34 -05:00
Al Viro
a1f8e7f7fb [PATCH] severing skbuff.h -> highmem.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04 02:00:29 -05:00
David S. Miller
83ac58ba0a Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6 2006-12-03 19:24:40 -08:00
David S. Miller
b4ad86bf52 [XFRM] xfrm_user: Better validation of user templates.
Since we never checked the ->family value of templates
before, many applications simply leave it at zero.
Detect this and fix it up to be the pol->family value.

Also, do not clobber xp->family while reading in templates,
that is not necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-03 19:19:26 -08:00
Gerrit Renker
2bbf29acd8 [DCCP] tfrc: Binary search for reverse TFRC lookup
This replaces the linear search algorithm for reverse lookup with
binary search.

It has the advantage of better scalability: O(log2(N)) instead of O(N).
This means that the average number of iterations is reduced from 250
(linear search if each value appears equally likely) down to at most 9.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:53:27 -02:00
Gerrit Renker
44158306d7 [DCCP] ccid3: Deprecate TFRC_SMALLEST_P
This patch deprecates the existing use of an arbitrary value TFRC_SMALLEST_P
 for low-threshold values of p. This avoids masking low-resolution errors.
 Instead, the code now checks against real boundaries (implemented by preceding
 patch) and provides warnings whenever a real value falls below the threshold.

 If such messages are observed, it is a better solution to take this as an
 indication that the lookup table needs to be re-engineered.

Changelog:
----------
 This patch
   * makes handling all TFRC resolution errors local to the TFRC library

   * removes unnecessary test whether X_calc is 'infinity' due to p==0 -- this
     condition is already caught by tfrc_calc_x()

   * removes setting ccid3hctx_p = TFRC_SMALLEST_P in ccid3_hc_tx_packet_recv
     since this is now done by the TFRC library

   * updates BUG_ON test in ccid3_hc_tx_no_feedback_timer to take into account
     that p now is either 0 (and then X_calc is irrelevant), or it is > 0; since
     the handling of TFRC_SMALLEST_P is now taken care of in the tfrc library

Justification:
--------------
 The TFRC code uses a lookup table which has a bounded resolution.
 The lowest possible value of the loss event rate `p' which can be
 resolved is currently 0.0001.  Substituting this lower threshold for
 p when p is less than 0.0001 results in a huge, exponentially-growing
 error.  The error can be computed by the following formula:

    (f(0.0001) - f(p))/f(p) * 100      for p < 0.0001

 Currently the solution is to use an (arbitrary) value
     TFRC_SMALLEST_P  =   40 * 1E-6   =   0.00004
 and to consider all values below this value as `virtually zero'.  Due to
 the exponentially growing resolution error, this is not a good idea, since
 it hides the fact that the table can not resolve practically occurring cases.
 Already at p == TFRC_SMALLEST_P, the error is as high as 58.19%!

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:53:07 -02:00
Gerrit Renker
006042d7e1 [DCCP] tfrc: Identify TFRC table limits and simplify code
This
 * adds documentation about the lowest resolution that is possible within
   the bounds of the current lookup table
 * defines a constant TFRC_SMALLEST_P which defines this resolution
 * issues a warning if a given value of p is below resolution
 * combines two previously adjacent if-blocks of nearly identical
   structure into one

This patch does not change the algorithm as such.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:52:41 -02:00
Gerrit Renker
8d0086adac [DCCP] tfrc: Add protection against invalid parameters to TFRC routines
1) For the forward X_calc lookup, it
    * protects effectively against RTT=0 (this case is possible), by
      returning the maximal lookup value instead of just setting it to 1
    * reformulates the array-bounds exceeded condition: this only happens
      if p is greater than 1E6 (due to the scaling)
    * the case of negative indices can now with certainty be excluded,
      since documentation shows that the formulas are within bounds
    * additional protection against p = 0 (would give divide-by-zero)

 2) For the reverse lookup, it warns against
    * protects against exceeding array bounds
    * now returns 0 if f(p) = 0, due to function definition
    * warns about minimal resolution error and returns the smallest table
      value instead of p=0 [this would mask congestion conditions]

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:52:26 -02:00
Gerrit Renker
90fb0e60dd [DCCP] tfrc: Fix small error in reverse lookup of p for given f(p)
This fixes the following small error in tfrc_calc_x_reverse_lookup.

 1) The table is generated by the following equations:
	lookup[index][0] = g((index+1) * 1000000/TFRC_CALC_X_ARRSIZE);
	lookup[index][1] = g((index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE);
    where g(q) is 1E6 * f(q/1E6)

 2) The reverse lookup assigns an entry in lookup[index][small]

 3) This index needs to match the above, i.e.
    * if small=0 then

      		p  = (index + 1) * 1000000/TFRC_CALC_X_ARRSIZE

    * if small=1 then

		p = (index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE

These are exactly the changes that the patch makes; previously the code did
not conform to the way the lookup table was generated (this difference resulted
in a mean error of about 1.12%).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:52:01 -02:00
Gerrit Renker
50ab46c790 [DCCP] tfrc: Document boundaries and limits of the TFRC lookup table
This adds documentation for the TCP Reno throughput equation which is at
the heart of the TFRC sending rate / loss rate calculations.

It spells out precisely how the values were determined and what they mean.
The equations were derived through reverse engineering and found to be
fully accurate (verified using test programs).

This patch does not change any code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:51:29 -02:00
Gerrit Renker
26af3072b0 [DCCP] ccid3: Fix warning message about illegal ACK
This avoids a (harmless) warning message being printed at the DCCP server
(the receiver of a DCCP half connection).

Incoming packets are both directed to

 * ccid_hc_rx_packet_recv() for the server half
 * ccid_hc_tx_packet_recv() for the client half

The message gets printed since on a server the client half is currently not
sending data packets.
This is resolved for the moment by checking the DCCP-role first. In future
times (bidirectional DCCP connections), this test may have to be more
sophisticated.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:51:14 -02:00
Gerrit Renker
5c3fbb6acf [DCCP] ccid3: Fix bug in calculation of send rate
The main object of this patch is the following bug:
 ==> In ccid3_hc_tx_packet_recv, the parameters p and X_recv were updated
     _after_ the send rate was calculated. This is clearly an error and is
     resolved by re-ordering statements.

In addition,
  * r_sample is converted from u32 to long to check whether the time difference
    was negative (it would otherwise be converted to a large u32 value)
  * protection against RTT=0 (this is possible) is provided in a further patch
  * t_elapsed is also converted to long, to match the type of r_sample
  * adds a a more debugging information regarding current send rates
  * various trivial comment/documentation updates

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:50:56 -02:00
Gerrit Renker
76d127779e [DCCP]: Fix BUG in retransmission delay calculation
This bug resulted in ccid3_hc_tx_send_packet returning negative
delay values, which in turn triggered silently dequeueing packets in
dccp_write_xmit. As a result, only a few out of the submitted packets made
it at all onto the network.  Occasionally, when dccp_wait_for_ccid was
involved, this also triggered a bug warning since ccid3_hc_tx_send_packet
returned a negative value (which in reality was a negative delay value).

The cause for this bug lies in the comparison

 if (delay >= hctx->ccid3hctx_delta)
	return delay / 1000L;

The type of `delay' is `long', that of ccid3hctx_delta is `u32'. When comparing
negative long values against u32 values, the test returned `true' whenever delay
was smaller than 0 (meaning the packet was overdue to send).

The fix is by casting, subtracting, and then testing the difference with
regard to 0.

This has been tested and shown to work.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:50:42 -02:00
Gerrit Renker
8a508ac26e [DCCP]: Use higher RTO default for CCID3
The TFRC nofeedback timer normally expires after the maximum of 4
RTTs and twice the current send interval (RFC 3448, 4.3). On LANs
with a small RTT this can mean a high processing load and reduced
performance, since then the nofeedback timer is triggered very
frequently.

This patch provides a configuration option to set the bound for the
nofeedback timer, using as default 100 milliseconds.

By setting the configuration option to 0, strict RFC 3448 behaviour
can be enforced for the nofeedback timer.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-03 14:50:23 -02:00
Jamal Hadi Salim
2b5f6dcce5 [XFRM]: Fix aevent structuring to be more complete.
aevents can not uniquely identify an SA. We break the ABI with this
patch, but consensus is that since it is not yet utilized by any
(known) application then it is fine (better do it now than later).

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:22:25 -08:00
Yasuyuki Kozakai
02dba025b0 [NETFILTER]: xtables: fixes warning on compilation of hashlimit
To use ipv6_find_hdr(), IP6_NF_IPTABLES is necessary.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:19:01 -08:00
Alexey Dobriyan
0506d4068b [ROSE] rose_add_loopback_node: propagate -E
David Binderman's icc logs:
net/rose/rose_route.c(399): remark #593: variable "err" was set but never used

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:17:48 -08:00
Yasuyuki Kozakai
1863f0965e [NETFILTER]: nf_conntrack: fix header inclusions for helpers
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:12:54 -08:00
Patrick McHardy
13b1833910 [NETFILTER]: nf_conntrack: EXPORT_SYMBOL cleanup
- move EXPORT_SYMBOL next to exported symbol
- use EXPORT_SYMBOL_GPL since this is what the original code used

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:11:25 -08:00
Patrick McHardy
a3c479772c [NETFILTER]: Mark old IPv4-only connection tracking scheduled for removal
Also remove the references to "new connection tracking" from Kconfig.
After some short stabilization period of the new connection tracking
helpers/NAT code the old one will be removed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:11:01 -08:00
Patrick McHardy
807467c22a [NETFILTER]: nf_nat: add SNMP NAT helper port
Add nf_conntrack port of the SNMP NAT helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:10:34 -08:00
Patrick McHardy
a536df35b3 [NETFILTER]: nf_conntrack/nf_nat: add TFTP helper port
Add IPv4 and IPv6 capable nf_conntrack port of the TFTP conntrack/NAT helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:10:18 -08:00
Patrick McHardy
9fafcd7b20 [NETFILTER]: nf_conntrack/nf_nat: add SIP helper port
Add IPv4 and IPv6 capable nf_conntrack port of the SIP conntrack/NAT helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:09:57 -08:00
Patrick McHardy
f09943fefe [NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port
Add nf_conntrack port of the PPtP conntrack/NAT helper. Since there seems
to be no IPv6-capable PPtP implementation the helper only support IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:09:41 -08:00
Patrick McHardy
92703eee4c [NETFILTER]: nf_conntrack: add NetBIOS name service helper port
Add nf_conntrack port of the NetBIOS name service conntrack helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:09:24 -08:00
Patrick McHardy
869f37d8e4 [NETFILTER]: nf_conntrack/nf_nat: add IRC helper port
Add nf_conntrack port of the IRC conntrack/NAT helper. Since DCC doesn't
support IPv6 yet, the helper is still IPv4 only.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 22:09:06 -08:00