Commit Graph

5801 Commits

Author SHA1 Message Date
Stephen Hemminger
3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Stephen Hemminger
b95cce3576 [NET]: Wrap hard_header_parse
Wrap the hard_header_parse function to simplify next step of
header_ops conversion.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:51 -07:00
Stephen Hemminger
0c4e85813d [NET]: Wrap netdevice hardware header creation.
Add inline for common usage of hardware header creation, and
fix bug in IPV6 mcast where the assumption about negative return is
an errno. Negative return from hard_header means not enough space
was available,(ie -N bytes).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:50 -07:00
Eric W. Biederman
2774c7aba6 [NET]: Make the loopback device per network namespace.
This patch makes loopback_dev per network namespace.  Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.

This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working.  A later pass will be needed to
update the users to use something other than the initial network
namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:49 -07:00
Eric W. Biederman
0cc217e16c [IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_dev
Now that multiple loopback devices are becoming possible it makes
the code a little cleaner and more maintainable to test if a deivice
is th a loopback device by testing dev->flags & IFF_LOOPBACK instead
of dev == loopback_dev.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:48 -07:00
Eric W. Biederman
5967789dbc [IPV4]: Remove unnecessary test for the loopback device from inetdev_destroy
Currently we never call unregister_netdev for the loopback device so
it is impossible for us to reach inetdev_destroy with the loopback
device.  So the test in inetdev_destroy is unnecessary.

Further when testing with my network namespace patches removing
unregistering the loopback device and calling inetdev_destroy works
fine so there appears to be no reason for avoiding unregistering the
loopback device.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:48 -07:00
Eric W. Biederman
9dd776b6d7 [NET]: Add network namespace clone & unshare support.
This patch allows you to create a new network namespace
using sys_clone, or sys_unshare.

As the network namespace is still experimental and under development
clone and unshare support is only made available when CONFIG_NET_NS is
selected at compile time.

As this patch introduces network namespace support into code paths
that exist when the CONFIG_NET is not selected there are a few
additions made to net_namespace.h to allow a few more functions
to be used when the networking stack is not compiled in.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:46 -07:00
Eric W. Biederman
8b41d1887d [NET]: Fix running without sysfs
When sysfs support is compiled out the kernel still keeps and maintains
the kobject tree.  So it is not safe to skip our kobject reference counting or
to avoid becoming members of the kobject tree.  It is safe to not add
the networking specific sysfs attributes.

This patch removes the sysfs special cases from net/core/dev.c
renames functions from netdev_sysfs_xxxx to netdev_kobject_xxxx
and always compiles in net-sysfs.c

net-sysfs.c is modified with a CONFIG_SYSFS guard around the parts
that are actually sysfs specific.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:46 -07:00
Arnaldo Carvalho de Melo
bb293e6a24 [CCID3]: Remove ifdef surrounding BUG_ON
As suggested by DaveM.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:45 -07:00
Gerrit Renker
cecd8d0ec4 [DCCP]: Reduce the number of writable states
Since DCCP requires to close both ends of a connection simultaneously,
permission to write in state DCCP_CLOSING is removed in dccp_sendmsg():
 * if the sending end closed, it would encounter a write error anyhow;
 * if the other end has closed the connection, it accepts no more data.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:45 -07:00
Gerrit Renker
e356d37a09 [DCCP]: Factor out common code for generating Resets
This factors code common to dccp_v{4,6}_ctl_send_reset into a separate function,
and adds support for filling in the Data 1 ... Data 3 fields from RFC 4340, 5.6.

It is useful to have this separate, since the following Reset codes will always
be generated from the control socket rather than via dccp_send_reset:
 * Code 3, "No Connection", cf. 8.3.1;
 * Code 4, "Packet Error" (identification for Data 1 added);
 * Code 5, "Option Error" (identification for Data 1..3 added, will be used later);
 * Code 6, "Mandatory Error" (same as Option Error);
 * Code 7, "Connection Refused" (what on Earth is the difference to "No Connection"?);
 * Code 8, "Bad Service Code";
 * Code 9, "Too Busy";
 * Code 10, "Bad Init Cookie" (not used).

Code 0 is not recommended by the RFC, the following codes would be used in
dccp_send_reset() instead, since they all relate to an established DCCP connection:
 * Code 1, "Closed";
 * Code 2, "Aborted";
 * Code 11, "Aggression Penalty" (12.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:44 -07:00
Gerrit Renker
9bf55cda9b [DCCP]: Sequence number wrap-around when sending reset
This replaces normal addition with mod-48 addition so that sequence number
wraparound is respected.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:43 -07:00
Gerrit Renker
a94f0f9705 [DCCP]: Rate-limit DCCP-Syncs
This implements a SHOULD from RFC 4340, 7.5.4:
 "To protect against denial-of-service attacks, DCCP implementations SHOULD
  impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets,
  such as not more than eight DCCP-Syncs per second."

The rate-limit is maintained on a per-socket basis. This is a more stringent
policy than enforcing the rate-limit on a per-source-address basis and
protects against attacks with forged source addresses.

Moreover, the mechanism is deliberately kept simple. In contrast to
xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets
are not supported.  This foils such attacks where the receipt of a Sync
triggers further sequence-invalid packets. (I have tested this mechanism against
xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.)

In order to keep flexibility, the timeout parameter can be set via sysctl; and
the whole mechanism can even be disabled (which is however not recommended).

The algorithm in this patch has been improved with regard to wrapping issues
thanks to a suggestion by Arnaldo.

Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're
               sending a sync.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:43 -07:00
Gerrit Renker
ee1a15922d [DCCP]: Remove duplicate code for Reset from connected socket
In this patch, duplicated code is removed for the case when a Reset packet is
sent from a connected socket. This code duplication is between dccp_make_reset
and dccp_transmit_skb, which already contained an (up to now entirely unused)
switch statement to fill in the reset code from the DCCP_SKB_CB.

The only thing that has been removed is the call to dst_clone(dst), since
the queue_xmit functions use sk_dst_cache anyway.

I wasn't sure which purpose inet_sk_rebuild_header served, so I left it in.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:42 -07:00
Gerrit Renker
0430ee3451 [DCCP]: Add Support for Data 1 .. 3 fields of Reset packets
This adds fields to support the informational Data 1..3 fields of the
DCCP-Reset packets (RFC 4340, 5.6), and makes minor cosmetic changes
to documentation.
Code which fills in these fields follows in subsequent patches, it is
primarily used for reporting option-processing and feature-negotiation
errors.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:42 -07:00
Gerrit Renker
727ecc5faa [DCCP]: Add FIXME for send_delayed_ack
This adds a FIXME to signal that the function dccp_send_delayed_ack is nowhere
used in the entire DCCP/CCID code.

Using a delayed Ack timer is suggested in 11.3 of RFC 4340, but it has also
rather subtle implications for the Ack-Ratio-accounting.

CCID2 does not use this (maybe it should).

I think leaving the function in is good, in case someone wants to implement
this.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:41 -07:00
Gerrit Renker
2e86908f7d [CCID3]: Move NULL-protection into function
This moves several instances of testing against NULL into the function which is
used to de-reference the CCID-private data.

Committer note: Made the BUG_ON depend on having CONFIG_IP_DCCP_CCID3_DEBUG, as it
                is too much to have this on production code. Also made sure that
                the macro is used only after checking if sk_state is not LISTEN,
                to make it equivalent to what we had before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:41 -07:00
Gerrit Renker
08831700cc [DCCP]: Send Reset upon Sync in state REQUEST
This fixes the code to correspond to RFC 4340, 7.5.4, which states the
exception that a Sync received in state REQUEST generates a Reset (not
a SyncAck).

To achieve this, only a small change is required. Since
dccp_rcv_request_sent_state_process() already uses the correct Reset Code
number 4 ("Packet Error"), we only need to shift the if-statement a few
lines further down.

(To test this case: replace DCCP_PKT_RESPONSE with DCCP_PKT_SYNC
                    in dccp_make_response.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:40 -07:00
WANG Cong
53465eb4ab [BLUETOOTH]: Make hidp_setup_input() return int
This patch:
- makes hidp_setup_input() return int to indicate errors;
- checks its return value to handle errors.

And this time it is against -rc7-mm1 tree.

Thanks to roel and Marcel Holtmann for comments.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:39 -07:00
Ilpo Järvinen
912d8f0b1f [TCP] MIB: Count FRTO's successfully detected spurious RTOs
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:39 -07:00
Ilpo Järvinen
93e6802029 [TCP]: Reordered ACK's (old) SACKs not included to discarded MIB
In case of ACK reordering, the SACK block might be valid in it's
time but is already obsoleted since we've received another kind
of confirmation about arrival of the segments through snd_una
advancement of an earlier packet.

I didn't bother to build distinguishing of valid and invalid
SACK blocks but simply made reordered SACK blocks that are too
old always not counted regardless of their "real" validity which
could be determined by using the ack field of the reordered
packet (won't be significant IMHO).

DSACKs can very well be considered useful even in this situation,
so won't do any of this for them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:38 -07:00
Ilpo Järvinen
a6963a6b3d [TCP]: Re-place highest_sack check to a more robust position
I previously added checking to position that is rather poor as
state has already been adjusted quite a bit. Re-placing it above
all state changes should be more robust though the return should
never ever get executed regardless of its place :-).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:38 -07:00
Gerrit Renker
b0d045ca45 [DCCP]: Parameter renaming
The parameter `seq' of dccp_send_sync() is in fact an acknowledgement number
and not a sequence number - thus renamed by this patch into `ackno'.

Secondly, a `critical' warning is added when a Sync/SyncAck could not be sent.

Sanity: I have checked all other functions that are called in dccp_transmit_skb,
        there are no clashes with the use of dccpd_ack_seq; no other function is
        using this slot at the same time.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:37 -07:00
Gerrit Renker
e155d76922 [DCCP]: Fix Reset/Sync-Flood Bug
This updates sequence number checking with regard to RFC 4340, 7.5.4.
Missing in the code was an exception for sequence-invalid Reset packets,
which get a Sync acknowledging GSR, instead of (as usual) P.seqno.

This can lead to an oscillating ping-pong flood of Reset packets.

In fact, it has been observed on the wire as follows:

 1. client establishes connection to server;
 2. before server can write to client, client crashes without notifying
    the server (NB: now no longer possible due to ABORT function);
 3. server sends DCCP-Data packet (has no ackno);
 4. client generates Reset "No Connection", seqno=0, increments seqno;
 5. server replies with Sync, using ackno = P.seqno;
 6. client generates Reset "No Connection" with seqno = ackno + 1;
 7. goto (5).

The difference is that now in (5) the server uses GSR.  This causes the
Reset sent by the client in (6) to become sequence-valid, so that in (7)
the vicious circle is broken; the Reset is then enqueued and causes the
socket to enter TIMEWAIT state.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:37 -07:00
Gerrit Renker
cbe1f5f88a [DCCP]: Shorten variable names in dccp_check_seqno
This patch is in part required by the next patch; it

 * replaces 6 instances of `DCCP_SKB_CB(skb)->dccpd_seq' with `seqno';
 * replaces 7 instances of `DCCP_SKB_CB(skb)->dccpd_ack_seq' with `ackno';
 * replaces 1 use of dccp_inc_seqno() by unfolding `ADD48' macro in place.

No changes in algorithm, all changes are text replacement/substitution.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:36 -07:00
Gerrit Renker
3393da8241 [DCCP]: Simplify interface of dccp_sample_rtt
The third parameter of dccp_sample_rtt now becomes useless and is removed.

Also combined the subtraction of the timestamp echo and the elapsed time.
This is safe, since (a) presence of timestamp echo is tested first and (b)
elapsed time is either present and non-zero or it is not set and equals 0
due to the memset in dccp_parse_options.

To avoid measuring option-processing time, the timestamp for measuring the
initial Request/Response RTT sample is taken directly when the function is
called (the Linux implementation always adds a timestamp on the Request,
so there is no loss in doing this).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Gerrit Renker
4c70f383e0 [DCCP]: Provide 10s of microsecond timesource
This provides a timesource, conveniently used for DCCP timestamps, which
returns the elapsed time in 10s of microseconds since initialisation.
This makes for a wrap-around time of about 11.9 hours, which should be
sufficient for most applications.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Gerrit Renker
aa97efd97a [DCCP]: Reuse ktime_get_real() calls again
This patch reduces the number of timestamps taken in the receive path
for each packet.

The ccid3_hc_tx_update_x() routine is called in
 * the receive path for each CCID3-controlled packet
 * for the nofeedback timer (if no feedback arrives during 4 RTT)

Currently, when there is no loss, each packet gets timestamped twice.
The patch resolves this by recycling the first timestamp taken on packet
reception for RTT sampling.

When the no_feedback_timer() is called, then the timestamp argument is
simply set to NULL - so that ccid3_hc_tx_update_x() takes care of the logic.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:34 -07:00
Michael Wu
e0eb685962 [MAC80211]: rename ieee80211_cfg.h to cfg.h
Might as well rename ieee80211_cfg.h to cfg.h to keep things consistent.

Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:34 -07:00
Johannes Berg
d86ec781ef [MAC80211]: kill vlan_id
Each station has a vlan_id that is useless. Remove it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:33 -07:00
Johannes Berg
c095df531f [MAC80211]: kill IE parse typedef
The parse result typedef isn't needed.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:33 -07:00
Johannes Berg
fa5fea711f [MAC80211]: rename ieee80211_cfg.c to cfg.c
It's just painful to have the extra ieee80211_ prefix.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:32 -07:00
Johannes Berg
dd1cd4c620 [MAC80211]: print out wiphy name instead of master device
This makes mac80211 print out the wiphy name instead of the
master device name where appropriate.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:32 -07:00
Johannes Berg
693d454dff [MAC80211]: fix warnings introduced by the doc patches
This fixes a warning about NUM_IEEE80211_MODES missing
in a switch statement. Intentionally do not add a default
case so we get warnings at these places if we need to add
new modes.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:30 -07:00
Johannes Berg
011bfcc4f3 [MAC80211]: remove key threshold stuff
This patch removes the key threshold stuff from mac80211.
I have patches for later that add it as a per-key setting
to nl/cfg80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:29 -07:00
Johannes Berg
72abd81b98 [MAC80211]: allow drivers to indicate failed FCS/PLCP checksum
This patch allows drivers to indicate bad FCS/PLCP CRC to the stack and
have the stack drop packets like that except for monitor interfaces.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:28 -07:00
Michael Buesch
61609bc0e4 [MAC80211]: Add support for setting TX power and radio status
This adds support for disabling the radio and setting the TXpower
through wext.
This also fixes the prism TXpower ioctl (It always overwrote the TXpower
value in ieee80211_hw_config())

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:23 -07:00
Johannes Berg
501d857ec9 [IEEE80211]: Fix softmac lockdep reports.
It seems I was actually able to hit this deadlock, on my quad G5 softmac
locks up more often than not. This fixes it by using an own workqueue
that can safely be flushed under RTNL.

Not sure if the patch is correct with the workqueue naming. And don't
think with the patch it doesn't continually lock up. It still does, just
doesn't invoke lockdep warnings all the time.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:22 -07:00
Jamal Hadi Salim
8236632fb3 [NET_SCHED]: explict hold dev tx lock
For N cpus, with full throttle traffic on all N CPUs, funneling traffic
to the same ethernet device, the devices queue lock is contended by all
N CPUs constantly. The TX lock is only contended by a max of 2 CPUS.
In the current mode of operation, after all the work of entering the
dequeue region, we may endup aborting the path if we are unable to get
the tx lock and go back to contend for the queue lock. As N goes up,
this gets worse.

The changes in this patch result in a small increase in performance
with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3
showed similar behavior;

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:15 -07:00
Daniel Lezcano
de3cb747ff [NET]: Dynamically allocate the loopback device, part 1.
This patch replaces all occurences to the static variable
loopback_dev to a pointer loopback_dev. That provides the
mindless, trivial, uninteressting change part for the dynamic
allocation for the loopback.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-By: Kirill Korotaev <dev@sw.ru>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:14 -07:00
Johannes Berg
5568296573 [NL80211]: add netlink interface to cfg80211
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:14 -07:00
Ilpo Järvinen
b76892051c [TCP]: Avoid clearing sacktag hint in trivial situations
There's no reason to clear the sacktag skb hint when small part
of the rexmit queue changes. Account changes (if any) instead when
fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
no need to discard SACK tag hint at all.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:12 -07:00
Ilpo Järvinen
c96fd3d461 [TCP]: Enable SACK enhanced FRTO (RFC4138) by default
Most of the description that follows comes from my mail to
netdev (some editing done):

Main obstacle to FRTO use is its deployment as it has to be on
the sender side where as wireless link is often the receiver's
access link. Take initiative on behalf of unlucky receivers and
enable it by default in future Linux TCP senders. Also IETF
seems to interested in advancing FRTO from experimental [1].

How does FRTO help?
===================

FRTO detects spurious RTOs and avoids a number of unnecessary
retransmissions and a couple of other problems that can arise
due to incorrect guess made at RTO (i.e., that segments were
lost when they actually got delayed which is likely to occur
e.g. in wireless environments with link-layer retransmission).
Though FRTO cannot prevent the first (potentially unnecessary)
retransmission at RTO, I suspect that it won't cost that much
even if you have to pay for each bit (won't be that high
percentage out of all packets after all :-)). However, usually
when you have a spurious RTO, not only the first segment
unnecessarily retransmitted but the *whole window*. It goes like
this: all cumulative ACKs got delayed due to in-order delivery,
then TCP will actually send 1.5*original cwnd worth of data in
the RTO's slow-start when the delayed ACKs arrive (basically the
original cwnd worth of it unnecessarily). In case one is
interested in minimizing unnecessary retransmissions e.g. due to
cost, those rexmissions must never see daylight. Besides, in the
worst case the generated burst overloads the bottleneck buffers
which is likely to significantly delay the further progress of
the flow. In case of ll rexmissions, ACK compression often
occurs at the same time making the burst very "sharp edged" (in
that case TCP often loses most of the segments above high_seq
=> very bad performance too). When FRTO is enabled, those
unnecessary retransmissions are fully avoided except for the
first segment and the cwnd behavior after detected spurious RTO
is determined by the response (one can tune that by sysctl).

Basic version (non-SACK enhanced one), FRTO can fail to detect
spurious RTO as spurious and falls back to conservative
behavior. ACK lossage is much less significant than reordering,
usually the FRTO can detect spurious RTO if at least 2
cumulative ACKs from original window are preserved (excluding
the ACK that advances to high_seq). With SACK-enhanced version,
the detection is quite robust.

FRTO should remove the need to set a high lower bound for the
RTO estimator due to delay spikes that occur relatively common
in some environments (esp. in wireless/cellular ones).

[1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:12 -07:00
Ilpo Järvinen
009a2e3e4e [TCP] FRTO: Improve interoperability with other undo_marker users
Basically this change enables it, previously other undo_marker
users were left with nothing. Reverse undo_marker logic
completely to get it set right in CA_Loss. On the other hand,
when spurious RTO is detected, clear it. Clearing might be too
heavy for some scenarios but seems safe enough starting point
for now and shouldn't have much effect except in majority of
cases (if in any).

By adding a new FLAG_ we avoid looping through write_queue when
RTO occurs.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:11 -07:00
Ilpo Järvinen
7c46a03e67 [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue
Implements following cleanups:
- Comment re-placement (CodingStyle)
- tcp_tso_acked() local (wrapper-like) variable removal
  (readability)
- __-types removed (IMHO they make local variables jumpy looking
  and just was space)
- acked -> flag (naming conventions elsewhere in TCP code)
- linebreak adjustments (readability)
- nested if()s combined (reduced indentation)
- clarifying newlines added

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:10 -07:00
Ilpo Järvinen
13fcf850cc [TCP]: Move accounting from tso_acked to clean_rtx_queue
The accounting code is pretty much the same, so it's a shame
we do it in two places.

I'm not too sure if added fully_acked check in MTU probing is
really what we want perhaps the added end_seq could be used in
the after() comparison.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:09 -07:00
Ilpo Järvinen
5af4ec236f [TCP]: clear_all_retrans_hints prefixed by tcp_
In addition, fix its function comment spacing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
2007-10-10 16:52:09 -07:00
Ilpo Järvinen
91fed7a15c [TCP]: Make fackets_out accurate
Substraction for fackets_out is unconditional when snd_una
advances, thus there's no need to do it inside the loop. Just
make sure correct bounds are honored.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:08 -07:00
Ilpo Järvinen
0dde7b5404 [TCP]: Maintain highest_sack accurately to the highest skb
In general, it should not be necessary to call tcp_fragment for
already SACKed skbs, but it's better to be safe than sorry. And
indeed, it can be called from sacktag when a DSACK arrives or
some ACK (with SACK) reordering occurs (sacktag could be made
to avoid the call in the latter case though I'm not sure if it's
worth of the trouble and added complexity to cover such marginal
case).

The collapse case has return for SACKED_ACKED case earlier, so
just WARN_ON if internal inconsistency is detected for some
reason.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:08 -07:00
Joe Perches
0795af5729 [NET]: Introduce and use print_mac() and DECLARE_MAC_BUF()
This is nicer than the MAC_FMT stuff.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:42 -07:00