a/softnet-howto
This document is intended to help the porting of "legacy" network
drivers to the new interface provided by the new softnet architecture.
It does not delve into the details of softnet (that would be another
article).
Softnet's main website is at:
ftp://ftp://ftp.inr.ac.ru/
But unless you are going to try and find the latest and greatest version
please dont overload that important site with unnecessary traffic. Use
the mirrors at:
-------------
ftp://linux.wauug.org/pub/net
ftp://ftp.nc.ras.ru/pub/mirrors/ftp.inr.ac.ru/ip-routing/
ftp://ftp.gts.cz/MIRRORS/ftp.inr.ac.ru/
ftp://ftp.funet.fi/pub/mirrors/ftp.inr.ac.ru/ip-routing/
ftp://sunsite.icm.edu.pl/pub/Linux/iproute/
ftp://ftp.sunet.se/pub/Linux/ip-routing/
ftp://ftp.nvg.ntnu.no/pub/linux/ip-routing/
ftp://ftp.crc.ca/pub/systems/linux/ip-routing/
ftp://ftp.proxad.net/mirrors/ftp.inr.ac.ru/ip-routing/
ftp://donlug.dn.ua/pub/mirrors/ip-routing/
ftp://omni.rk.tusur.ru/mirrors/ftp.inr.ac.ru/ip-routing/
ftp://ftp.src.uchicago.edu/pub/linux/ip-routing/
http://www.asit.ro/ip-routing/
ftp://ftp.infoscience.co.jp/pub/linux/ip-routing/ (Japan)
ftp://ftp.sucs.swan.ac.uk/pub/mirrors
-------------
Throughout the document "Toplevel" and "core" are interchange-ably
used to refer to the network code above the driver.
FLAG OBSOLETION
---------------
Several device flags are obsoleted; their functionality is embedded in
a different re-incarnation as will be explained.
** The obsoleted device flags are:
dev->tbusy
dev->start
dev->interrupt
The current replacements/new flags are stored in the
bitmap dev->state
Currently defined state bits/flags are:
bit 0: LINK_STATE_XOFF
bit 1: LINK_STATE_DOWN
bit 3: LINK_STATE_START
bit 4: LINK_STATE_RXSEM
bit 5: LINK_STATE_TXSEM
bit 6: LINK_STATE_SCHED
LINK_STATE_XOFF is the replacement for dev->tbusy
LINK_STATE_DOWN is the replacement for the !IFF_RUNNING check
LINK_STATE_START is the replacement for dev->start
LINK_STATE_RXSEM is the replacement for dev->interrupt
LINK_STATE_TXSEM is a flag used only by fastrouting drivers
This document will not go into the details of fastrouting other
than to say it is a mechanism for direct NIC-to-NIC routing.
If your driver was fastroute enabled in 2.2 (and i only know of two
such drivers ;->) then this is equivalent to dev->tx_semaphore.
Essentially, if your device does not know about fast routing
then you need not to worry about this.
LINK_STATE_SCHED is an internal flag to the Top Level (layer
above the driver).
The driver writer need not be concerned about LINK_STATE_SCHED
but needs to understand the implications as will hopefully
be conveyed by this document.
Caution needs to be exercised as the replacement is not a
direct one to one mapping, given the new mechanism being
introduced by softnet to take advantage of SMP, there are
intricacies involved. Hopefully after reading this two times,
it should become clear;-> (OK, once for the experienced).
NO MORE POLLING dev->tbusy
--------------------------
** Gone is the "polling" mechanism that involved checking
by the core layer (above the driver) for the dev->tbusy setting;
No more baby-sitting; the driver is now in charge of its own fate ;->
(even if one can still define what happens as polling).
Some background explanation is necessary to understand the change
in the core:
For historical reasons it is worth noting that in 2.0
dev_queue_xmit() ignored dev->tbusy and a functionality similar to that
of a watchdog was played by _new_ packets pushing to the driver for
transmission. This way the driver could recover from transmission lockups
etc. All the devices, including DOWNed ones were stored
in a runqueue and scanned twice on each net_bh event.
In 2.2 only the devices which have some packets in the queue
or recently had something are added to the runqueue, which is still
scanned on each net_bh event twice. If dev->tbusy is set, the
packet is _not_ handed to the driver for transmission. As a backup,
a watchdog timer monitors dev->tbusy and if it is stuck for
too long a time, then a dev->hard_start_xmit() is forced.
The 2.2 behavior is maintained in the current 2.3.
With softnet, dev->hard_start_xmit() is no longer invoked by
the watchdog timer. Instead a new interface dev->tx_timeout() is
invoked.
The replacement timer routine is registered into the dev structure
by the driver. Most drivers already had a timer routine used to recover
from transmission lockups. A simple re-use of such a routine with proper
replacements of the old flag transitions would suffice in most cases.
As a result of the requirement, there are two new entries in the net_dev
structure: dev->tx_timeout which is the function pointer to the driver
timeout routine and dev->watchdog_timeo which is the value that is used
for the timeout.
The attachment to the device structure for these entries is done around
the same time that things like dev->hard_start_xmit are attached to the
net_dev structure by the driver i.e normally in the dev_probe() routine
The timer, however, is fired once the device is ifup'ed by the core
(i.e it is not the responsibility of the driver author to add it to
the timer list).
As a quick guideline:
code which used to do (in the mydev_hw_start_xmit() routine)
something like this:
---
if (test_and_set_bit(0, (void*)&dev->tbusy) != 0) {
if (jiffies - dev->trans_start >= TX_TIMEOUT)
dev_tx_timeout(dev);
return 1;
}
----
should now be ripped off and be replaced by (in the probe() routine)
---
dev->tx_timeout = &my_tx_timeout_func;
dev->watchdog_timeo = MY_DEV_TX_TIMEOUT;
---
*** Also a new important change:
It is up to the device to add itself to the runqueue
i.e it is no longer the core's responsibility. This is done via the
netif_wake_queue() call by the driver.
More on this below in the "GENERAL GUIDELINES" section.
[Obviously this change in softnet not only reduces CPU utilization,
but more importantly reduces the code complexity]
GENERAL GUIDELINES
------------------
This is a general guideline on how you replace the flags.
The timer should be added as indicated above.
************************************************************
1) OPENING the device:
************************************************************
current:
--------
This involves setting the dev->start flag and
clearing the dev->tbusy and dev->interrupt
New way:
-------
invoke netif_start_queue(dev). At the moment,
this clears the LINK_STATE_XOFF state bit.
On return from the open() the core/Toplevel code
sets the LINK_STATE_START bit.
It is not advisable for you to clr/set_bit() on any of the
above. Let the netif_start_queue() and the general Toplevel
code take care of things;
it is more portable this way, in case in the future some new
functionality gets added to netif_start_queue() etc.
****************************************************
2) the dev->tbusy DURING DEVICE OPERATIONS
i.e when the device is sending and receiving packets
****************************************************
A) Clearing it:
current:
=======
code snippets from different drivers:
--
clear_bit(0,(void *)&dev->tbusy)
--
or in some drivers
---
if (test_and_set_bit(0, (void*)&dev->tbusy) != 0)
dev->tbusy = 0;
---
etc
new way:
=======
invoke netif_wake_queue(dev)
this takes the the device out of the LINK_STATE_XOFF state
(meaning it is ready to receive packets from the upper layers)
Note that it is extremely dangerous for you to clr_bit()
directly on the LINK_STATE_XOFF. netif_wake_queue() does a lot
more than just reset this bit. It schedules the device to be serviced
Scheduling here also includes explicitly doing something equivalent
to the mark_bh(NET_BH) which was done by some drivers
to help kick some of the receive packet input processing
as well as getting added to the device runqueue.
B) setting it
current:
========
--
test_and_set_bit(0,(void *)&dev->tbusy)
---
etc
new way:
========
call:
netif_stop_queue(dev)
this takes the the device into the LINK_STATE_XOFF state
(which means it cant receive packets anymore from upper layers.)
It is your responsibility to ensure that you go back to the
non-XOFF state at some point later on; somehow somewhere you need
to guarantee that the netif_wake_queue() is invoked.
Your dev_tx_timeout() must either call netif_wake_queue() or
restart its transmitter, so that netif_wake_queue() is called
later from interrupt code.
General rules of thumb:
1. If the device sets XOFF (due to a call to netif_stop_queue()), it MUST
call netif_wake_queue() at some point later.
2. Top level (core above the driver) WILL NOT call hard_start_xmit(),
if XOFF is set.
3. Top level forgets about the device if XOFF is set and it is the
device's responsibility to re-link itself to the runqueue with
netif_wake_queue() so it can be serviced again.
4. The device MUST not set XOFF from a context not protected by
a private xmit_lock, really. Such a lock could be stored in the device's
private structure.
If it does, it cannot guarantee that hard_start_xmit() is called
with a clean XOFF. The interrupt handler could for example
call netif_wake_queue() and a race condition could result.
One way to reduce the probability of a race condition happening is
not to stop the interface (calling to netif_stop_queue()) right at
the entry to hard_start_xmit(), but rather stop it only when
hard_start_xmit() finds that the device will not able to accept
more packets (eg when its txmit ring is full).
The general principles of this are as follows:
-------OLD-------------------------------|--------NEW---------------
if (test_and_set_bit(0, &dev->tbusy)) { | netif_stop_queue(dev)
return 1; |
} else { | if (1) {
real work | real work
if (it_is_not_full()) | if (it_is_not_full())
dev->tbusy = 0; | netif_wake_queue()
return 0; | return 0;
} | }
-----------------------------------------|--------------------------
Look at the softnetted-3c509.c for an example of such a trick.
Such code should be used really as an intermediate solution only
because it does not guarantee you safety. The safe way to do it, without
a doubt, is to add a spinlock in your device_private_structure and use
it to lock critical sections such as those clearing XOFF.
This is a trivial task for cards which already have this lock
(eg eepro100, 8390 etc.), but is difficult for those that lack
it (eg tulip). The softneted eepro100.c is a really
good example of something that conforms well.
[TODO: Update this with an example from the eepro]
****************************************************
3) The dev->interrupt DURING DEVICE OPERATIONS
i.e when the device is sending and receiving packets
************************************************************
A) setting it:
current:
========
dev->interrupt=1;
new way
=======
bit_set(LINK_STATE_RXSEM, &dev->state)
This takes the device into the "input interrupt processing" state
B) clearing it
current:
========
dev->interrupt=0;
clear_bit(LINK_STATE_RXSEM, &dev->state)
This takes the device out of the "input interrupt processing" state
Really, the dev->interrupt was obsoleted even in 2.2 but
was still being used by some drivers.
dev->interrupt is noop in 2.2 kernels and a SMP bug workaround in 2.0.
************************************************************
4) mark_bh(NET_BH)
************************************************************
Gone.
invoke instead: netif_wake_queue(dev);
************************************************************
5) Miscellenia:
************************************************************
checking for set bits etc
use the set_bit() calls
eg:
to check if we are in the LINK_STATE_START state
do:
if (test_bit(LINK_STATE_START, &dev->state))
************************************************************
6) TODO
************************************************************
- eepro example
- skeleton.c example
************************************************************
7) Credits
************************************************************
19991227: Authors
- Jamal Hadi Salim <hadi@cyberus.ca>
- Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
************************************************************
7) Changes
************************************************************
20000106: JHS: Add the sites where softnet is found
20000107: Andre Dahlqvist <andre@beta.telenordia.se>: typo fixes