Some USB endpoints require that a short packet be used to terminate
transfers, since they have no other way to determine message
boundaries. If the message length happens to be an exact multiple of
the USB packet size, then this requires the use of an additional
zero-length packet.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
USB Communications Device Class devices may use a union functional
descriptor to group several interfaces into a function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Iterate over a USB device's available configurations until we find one
for which we have working drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some protocols (such as ARP) may modify the received packet and re-use
the same I/O buffer for transmission of a reply. To allow this,
reserve sufficient headroom at the start of each received packet
buffer for our transmit datapath headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some devices have a very small number of internal buffers, and rely on
being able to pack multiple packets into each buffer. Using 2048-byte
buffers on such devices produces throughput of around 100Mbps. Using
a small number of much larger buffers (e.g. 32kB) increases the
throughput to around 780Mbps. (The full 1Gbps is not reached because
the high RTT induced by the use of multi-packet buffers causes us to
saturate our 256kB TCP window.)
Since allocation of large buffers is very likely to fail, allocate the
buffer set only once when the device is opened and recycle buffers
immediately after use. Received data is now always copied to
per-packet buffers.
If allocation of large buffers fails, fall back to allocating a larger
number of smaller buffers. This will give reduced performance, but
the device will at least still be functional.
Share code between the interrupt and bulk IN endpoint handlers, since
the buffer handling is now very similar.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow drivers to specify a supported PCI class code. To save space in
the final binary, make this an attribute of the driver rather than an
attribute of a PCI device ID list entry.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The (undocumented) VMBus protocol seems to allow for transfer
page-based packets where the data payload is split into an arbitrary
set of ranges within the transfer page set.
The RNDIS protocol includes a length field within the header of each
message, and it is known from observation that multiple RNDIS messages
can be concatenated into a single VMBus message.
iPXE currently assumes that the transfer page range boundaries are
entirely arbitrary, and uses the RNDIS header length to determine the
RNDIS message boundaries.
Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG for an
undocumented and unknown status code (0x40020006) with a malformed
RNDIS header length: the length does not cover the StatusBuffer
portion of the message. This causes iPXE to report a malformed RNDIS
message and to discard any further RNDIS messages within the same
VMBus message.
The Linux Hyper-V driver assumes that the transfer page range
boundaries correspond to RNDIS message boundaries, and so does not
notice the malformed length field in the RNDIS header.
Match the behaviour of the Linux Hyper-V driver: assume that the
transfer page range boundaries correspond to the RNDIS message
boundaries and ignore the RNDIS header length. This avoids triggering
the "malformed packet" error and also avoids unnecessary data copying:
since we now have one I/O buffer per RNDIS message, there is no longer
any need to use iob_split().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Empirical observation suggests that 32 is a sensible size to minimise
the number of deferred packet transmissions without overflowing the
VMBus transmit ring buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for elision of transmitted TCP ACKs by handling all received
VMBus messages in each network device poll operation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On Windows Server 2012 R2, the receive buffer teardown completion
message seems to occasionally be deferred until after the VMBus
channel has been closed. This happens even if there are no packets
currently in the receive buffer.
Work around this problem by separating the revocation and teardown of
the receive buffer, and deferring the teardown until after the VMBus
channel has been closed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The i350 (and possibly other Intel NICs) have a non-trivial
correspondence between the PCI function number and the external
physical port number. For example, the i350 has a "LAN Function Sel"
bit within the EEPROM which can invert the mapping so that function 0
becomes port 3, function 1 becomes port 2, etc.
Unfortunately the MAC addresses within the EEPROM are indexed by
physical port number rather than PCI function number. The end result
is that when anything other than the default mapping is used, iPXE
will use the wrong address as the base MAC address.
Fix by using the autoloaded MAC address if it is valid, and falling
back to reading the MAC address directly from the EEPROM only if no
autoloaded address is available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
End users almost certainly don't care whether the underlying interface
is SNP or NII/UNDI. Try to minimise surprise and unnecessary
documentation by including the NII driver whenever the SNP driver is
requested.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE itself exposes a dummy NII protocol with no UNDI. Avoid
potentially dereferencing a NULL pointer by checking for a non-zero
UNDI address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI network drivers provide a software UNDI interface which is
exposed via the Network Interface Identifier Protocol (NII), rather
than providing a Simple Network Protocol (SNP).
The UEFI platform firmware will usually include the SnpDxe driver,
which attaches to NII and provides an SNP interface. The SNP
interface is usually provided on the same handle as the underlying NII
device. This causes problems for our EFI driver model: when
efi_driver_connect() detaches existing drivers from the handle it will
cause the SNP interface to be uninstalled, and so our SNP driver will
not be able to attach to the handle. The platform firmware will
eventually reattach the SnpDxe driver and may attach us to the SNP
handle, but we have no way to prevent other drivers from attaching
first.
Fix by providing a driver which can attach directly to the NII
protocol, using the software UNDI interface to drive the network
device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The snpnet driver uses netdev_tx_defer() and so must ensure that space
in the (single-entry) transmit descriptor ring is freed up before
calling netdev_tx_complete().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the ID for the LM variant and differentiate it from the I217-V.
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently require information about the underlying PCI device to
populate the snpnet device's name and description. If the underlying
device is not a PCI device, this will fail and prevent the device from
being registered.
Fix by falling back to populating the device description with
information based on the EFI handle, if no PCI device information is
available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some systems will install a child of the SNP device and use this as
our loaded image's device handle, duplicating the installation of the
underlying SNP protocol onto the child device handle. On such
systems, we want to end up driving the parent device (and
disconnecting any other drivers, such as MNP, which may be attached to
the parent device).
Fix by recording the SNP protocol instance at initialisation time, and
using this to match against device handles (rather than simply
comparing the handles themselves).
Reported-by: Jarrod Johnson <jarrod.b.johnson@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ICH8 devices have an errata which requires us to reconfigure the
packet buffer size (PBS) register, and correspondingly adjust the
packet buffer allocation (PBA) register. The "Intel I/O Controller
Hub ICH8/9/10 and 82566/82567/82562V Software Developer's Manual"
notes for the PBS register that:
10.4.20 Packet Buffer Size - PBS (01008h; R/W)
Note: The default setting of this register is 20 KB and is
incorrect. This register must be programmed to 16 KB.
Initial value: 0014h
0018h (ICH9/ICH10)
It is unclear from this comment precisely which devices require the
workaround to be applied. We currently attempt to err on the side of
caution: if we detect an initial value of either 0x14 or 0x18 then the
workaround will be applied. If the workaround is applied
unnecessarily, then the effect should be just that we use less than
the full amount of the available packet buffer memory.
Unfortunately this approach does not play nicely with other device
drivers. For example, the Linux e1000e driver will rewrite PBA while
assuming that PBS still contains the default value, which can result
in inconsistent values between the two registers, and a corresponding
inability to transmit or receive packets. Even more unfortunately,
the contents of PBS and PBA are not reset by anything less than a
power cycle, meaning that this error condition will survive a hardware
reset.
The Linux driver (written and maintained by Intel) applies the PBS/PBA
errata workaround only for devices in the ICH8 family, identified via
the PCI device ID. Adopt a similar approach, using the PCI_ROM()
driver data field to indicate when the workaround is required.
Reported-by: Donald Bindner <dbindner@truman.edu>
Debugged-by: Donald Bindner <dbindner@truman.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Under some circumstances (e.g. if iPXE itself is booted via iSCSI, or
after an unclean reboot), the backend may not be in the expected
InitWait state when iPXE starts up.
There is no generic reset mechanism for Xenbus devices. Recent
versions of xen-netback will gracefully perform all of the required
steps if the frontend sets its state to Initialising. Older versions
(such as that found in XenServer 6.2.0) require the frontend to
transition through Closed before reaching Initialising.
Add a reset mechanism for netfront devices which does the following:
- read current backend state
- if backend state is anything other than InitWait, then set the
frontend state to Closed and wait for the backend to also reach
Closed
- set the frontend state to Initialising and wait for the backend to
reach InitWait.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using version 1 grant tables limits guests to using 16TB of grantable
RAM, and prevents the use of subpage grants. Some versions of the Xen
hypervisor refuse to allow the grant table version to be set after the
first grant references have been created, so the loaded operating
system may be stuck with whatever choice we make here. We therefore
currently use version 2 grant tables, since they give the most
flexibility to the loaded OS.
Current versions (7.2.0) of the Windows PV drivers have no support for
version 2 grant tables, and will merrily create version 1 entries in
what the hypervisor believes to be a version 2 table. This causes
some confusion.
Avoid this problem by attempting to use version 1 tables, since
otherwise we may render Windows unable to boot.
Play nicely with other potential bootloaders by accepting either
version 1 or version 2 grant tables (if we are unable to set our
requested version).
Note that the use of version 1 tables on a 64-bit system introduces a
possible failure path in which a frame number cannot fit into the
32-bit field within the v1 structure. This in turn introduces
additional failure paths into netfront_transmit() and
netfront_refill_rx().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The behavior observed in the Apple EFI (1.10) RecieveFilters() call
is:
- failure if any of the PROMISCUOUS or MULTICAST filters are
included
- success if only UNICAST is included, however the result is
UNICAST|BROADCAST
- success if only UNICAST and BROADCAST are included
- if UNICAST, or UNICAST|BROADCAST are used, but the previous call
tried (and failed) to set UNICAST|BROADCAST|MULTICAST, then the
result is UNICAST|BROADCAST|MULTICAST
Work around this apparently broken SNP implementation by trying
RecieveFilterMask, then falling back to UNICAST|BROADCAST|MULTICAST,
then UNICAST|BROADCAST, and finally UNICAST.
Modified-by: Michael Brown <mcb30@ipxe.org>
Tested-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EFI 1.10 systems (observed on an Apple iMac) do not allow us to
open the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_BY_DRIVER and so we cannot maintain a safe,
long-lived pointer to the device path. Work around this by instead
opening the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_GET_PROTOCOL whenever we need to use it.
Debugged-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to the UEFI specification, the MCastFilter parameter (which
we currently pass as NULL, along with a zero MCastFilterCnt) is
optional only if ResetMCastFilter is true.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Dump the existing openers of a protocol whenever we are unable to open
a protocol using attributes of BY_DEVICE, EXCLUSIVE, or
BY_CHILD_CONTROLLER.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using efi_devpath_text() is marginally more efficient if we already
have the device path protocol available, but the mild increase in
efficiency is not worth compromising the clarity of the pattern:
DBGC ( device, "THING %p %s ...", device, efi_handle_name ( device ) );
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite the SNP NIC driver to use non-blocking and deferrable
transmissions, to provide link status detection, to provide
information about the underlying (PCI) hardware device, and to avoid
unnecessary I/O buffer allocations during receive polling.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The VF might not have assigned a MAC address upon startup, and will
end up with a random MAC address during probe(). With this patch the
MAC address can be changed later on.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the VF doesn't have a MAC address assigned we should create a
random MAC address.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iBFT includes an "origin" field to indicate the source of the IP
address. We use the heuristic of assuming that the source should be
"manual" if the IP address originates directly from the network device
settings block, and "DHCP" otherwise. This is an imperfect guess, but
is likely to be correct in most common situations.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Parse the sense data to extract the reponse code, the sense key, the
additional sense code, and the additional sense code qualifier.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit d28bb51 ("[tcp] Defer sending ACKs until all received
packets have been processed"), increasing the RX ring size will
increase the number of received packets per transmitted ACK (since
each poll will process up to one complete receive ring). Under KVM,
this can make a substantial (up to ~200%) difference to the overall
download speed, since transmissions are very expensive.
Increase the ring fill level from four to eight packets: this
increases the download speed by around 50% at a cost of around 8kB of
heap space. Further speedups are possible by increasing the ring size
further, but it would be preferable to find alternative methods which
do not use noticeable amounts of heap space.
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When profiling, exclude any time spent inside the hypervisor
responding to our MMIO accesses. This substantially reduces the
variance accumulated on many other profilers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Inside a virtual machine, writing the RX ring tail pointer may incur a
substantial overhead of processing inside the hypervisor. Minimise
this overhead by writing the tail pointer once per batch of
descriptors, rather than once per descriptor.
Profiling under qemu-kvm (version 1.6.2) shows that this reduces the
amount of time taken to refill the RX descriptor ring by around 90%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Operations which are negligible on physical hardware (such as issuing
a posted write to the transmit ring tail register) may involve
substantial amounts of processing within the hypervisor if running in
a virtual machine.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is unclear from the datasheets whether or not the TX ring can be
completely filled (i.e. whether writing the tail value as equal to the
current head value will cause the ring to be treated as completely
full or completely empty). It is very plausible that this edge case
could differ in behaviour between real hardware and the many
implementations of an emulated Intel NIC found in various virtual
machines. Err on the side of caution and always leave at least one
ring entry empty.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On an Asus Z87-K motherboard with an onboard 8168 NIC, booting into
Windows 7 and then warm rebooting into iPXE results in a broken RX
datapath: packets can be transmitted successfully but garbage is
received. A cold reboot clears the problem.
A dump of the PHY registers reveals only one difference: in the
failure case the bits ADVERTISE_PAUSE_CAP and ADVERTISE_PAUSE_ASYM are
cleared. Explicitly setting these bits does not fix the problem.
A dump of the MAC registers reveals a few differences, of which the
most obvious culprit is the undocumented bit 24 of the Receive
Configuration Register (RCR), which is set in the failure case.
Explicitly clearing this bit does fix the problem.
Reported-by: Sebastian Nielsen <ipxe@sebbe.eu>
Reported-by: Oliver Rath <rath@mglug.de>
Debugged-by: Sebastian Nielsen <ipxe@sebbe.eu>
Tested-by: Sebastian Nielsen <ipxe@sebbe.eu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The fetch_setting() family of functions may currently modify the
definition of the specified setting (e.g. to add missing type
information). Clean up this interface by requiring callers to provide
an explicit buffer to contain the completed definition of the fetched
setting, if required.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Give tap devices a meaningful name, and avoid segmentation faults when
attempting to retrieve ${net0/bustype} by assigning a new bus type for
tap devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Prevent the card from flagging packets of 1518 bytes length as
overlength.
This fixes the High-MTU loopback test.
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The 3c90x B and C revisions support rounding up the packet length to a
specific boundary. Disable this feature to avoid overlength packets.
This fixes the loopback test.
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to the 3c90x datasheet we have to stall the upload (receive)
engine before setting the receive ring address.
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some hardware (observed with an onboard RTL8168) will erroneously
report a buffer overflow error if the received packet exactly fills
the receive buffer.
Fix by adding an extra four bytes of padding to each receive buffer.
Debugged-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Adrian Jamróz <adrian.jamroz@gmail.com>
Modified-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the old via-rhine driver with a new version using the iPXE
API.
Includes fixes by Thomas Miletich for:
- MMIO access
- Link detection
- RX completion in RX overflow case
- Reset and EEPROM reloading
- CRC stripping
- Missing cpu_to_le32() calls
- Missing memory barriers
Signed-off-by: Adrian Jamróz <adrian.jamroz@gmail.com>
Modified-by: Thomas Miletich <thomas.miletich@gmail.com>
Tested-by: Thomas Miletich <thomas.miletich@gmail.com>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Modified-by: Michael Brown <mcb30@ipxe.org>
Tested-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow values to be read from PCI configuration space using the syntax
${pci/<busdevfn>.<offset>.<length>}
where <busdevfn> is the bus:dev.fn address of the PCI device
(expressed as a single integer, as returned by ${net0/busloc}),
<offset> is the offset within PCI configuration space, and <length> is
the length within PCI configuration space.
Values are returned in reverse byte order, since PCI configuration
space is little-endian by definition.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
realtek_destroy_ring() currently does nothing if the card is operating
in legacy (pre-RTL8139C+) mode. In particular, the producer and
consumer counters are incorrectly left holding their current values.
Virtual hardware (e.g. the emulated RTL8139 in qemu and similar VMs)
is tolerant of this behaviour, but real hardware will fail to transmit
if the descriptors are not used in the correct order.
Fix by resetting the producer and consumer counters in
realtek_destroy_ring() even if the card is operating in legacy mode.
Reported-by: Gelip <mrgelip@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Create an explicit concept of "settings scope" and eliminate the magic
values used for numerical setting tags.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some systems, it appears to be possible for writes to the EEPROM
registers to be delayed for long enough that the EEPROM's setup and
hold times are violated, resulting in invalid data being read from the
EEPROM.
Fix by inserting a PCI read cycle immediately after writes to
RTL_9346CR, to ensure that the write has completed before starting the
udelay() used to time the SPI bus transitions.
Reported-by: Gelip <mrgelip@gmail.com>
Tested-by: Gelip <mrgelip@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older RTL8139 chips seem to not immediately update the
RTL_CR.BUFE bit in response to a write to RTL_CAPR. This results in
iPXE seeing a spurious zero-length received packet, and thereafter
being out of sync with the hardware's RX ring offset.
Fix by inserting an extra PCI read cycle after writing to RTL_CAPR, to
give the chip time to react before we next read RTL_CR.
Reported-by: Gelip <mrgelip@gmail.com>
Tested-by: Gelip <mrgelip@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some onboard RTL8169 NICs seem to leave the EEPROM pins disconnected.
The existing is_valid_ether_addr() test will not necessarily catch
this, since it expects a missing EEPROM to show up as a MAC address of
00:00:00:00:00:00 or ff:ff:ff:ff:ff:ff. When the EEPROM pins are
floating the MAC address may read as e.g. 00:00:00:00:0f:00, which
will not be detected as invalid.
Check the ID word in the first two bytes of the EEPROM (which should
have the value 0x8129 for all RTL8139 and RTL8169 chips), and use this
to determine whether or not an EEPROM is present.
Reported-by: Carl Karsten <carl@nextdayvideo.com>
Tested-by: Carl Karsten <carl@nextdayvideo.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Exploit the redefinition of iPXE error codes to include a "platform
error code" to allow for meaningful conversion of EFI_STATUS values to
iPXE errors and vice versa.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Intel 10 Gigabit NICs have a datapath that is almost
register-compatible with the Intel 1 Gigabit NICs. Expose common
functionality to avoid duplication of code in the new "intelx" driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Intel 10 Gigabit NICs use the same simplified (aka "legacy")
descriptor format and the same layout for descriptor register blocks
as the Intel 1 Gigabit NICs. The offsets of the descriptor register
blocks are not the same.
Simplify reuse of the existing code by removing all hardcoded offsets
for registers within descriptor register blocks, and ensuring that all
offsets are calculated using the descriptor register block base
address provided via intel_init_ring().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid using UINT16 and similar typedefs, which are non-standard in the
iPXE codebase and generate conflicts when trying to include any of the
EFI headers.
Also fix trailing whitespace in the affected files, to prevent
complaints from git.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove macros which aren't used anywhere in the driver, and which
conflict with macros of the same name used in the EFI headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove macros which aren't used anywhere in the driver, and which
conflict with macros of the same name used in the EFI headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iBFT NIC section has a VLAN field which must be filled in so that
iSCSI booting works over VLANs.
Unfortunately it is unclear from the IBM specification linked in
ibft.c whether the VLAN field is just the 802.1Q VLAN Identifier or
the full 802.1Q TCI. For now just fill in the VID, the Priority Code
Point and Drop Eligible Indicator could be set in the future if it
turns out they should be present too.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 947976d ("[netdevice] Do not force a poll on net_tx()")
requires network devices to have TX rings that are sufficiently large
to allow a transmitted response to all packets received during a
single poll.
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Intel NIC emulation in some versions of VMware seems to suffer
from a flaw whereby the Interrupt Cause Register (ICR) fails to assert
the usual "packet received" bit (ICR.RXT0) if a receive overflow
(ICR.RXO) has also occurred.
Work around this flaw by polling for completed descriptors whenever
either ICR.RXT0 or ICR.RXO is asserted.
Reported-by: Miroslav Halas <miroslav.halas@bankofamerica.com>
Debugged-by: Miroslav Halas <miroslav.halas@bankofamerica.com>
Tested-by: Miroslav Halas <miroslav.halas@bankofamerica.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Almost all clients of the raw-packet interfaces (UNDI and SNP) can
handle only Ethernet link layers. Expose an Ethernet-compatible link
layer to local clients, while remaining compatible with IPoIB on the
wire. This requires manipulation of ARP (but not DHCP) packets within
the IPoIB driver.
This is ugly, but it's the only viable way to allow IPoIB devices to
be driven via the raw-packet interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards seem to drive the EEPROM CS line high (i.e. active)
when 9346CR.EEM is set to "normal operating mode", with the result
that the CS line is never deasserted. The symptom of this is that the
first read from the EEPROM will work, while all subsequent reads will
return garbage data.
Reported-by: Thomas Miletich <thomas.miletich@gmail.com>
Debugged-by: Thomas Miletich <thomas.miletich@gmail.com>
Tested-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) power up advertising
only 100Mbps, despite being capable of 1000Mbps. Forcibly enable
advertisement of 1000Mbps on any RTL8169-like card.
This change relies on the assumption that the CTRL1000 register will
not exist on 100Mbps-only RTL8169 cards such as the RTL8101.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) crash and burn if DAC
is enabled, even if only 32-bit addresses are used. Observed
behaviour includes system lockups and repeated transmission of garbage
data onto the wire.
This seems to be a known problem. The Linux r8169 driver disables DAC
by default and provides a "use_dac" module parameter.
There appears to be no known test for determining whether or not DAC
will work. As a workaround, enable DAC only if we are built as as
64-bit binary. This at least eliminates the problem in the common
case of a 32-bit build, which will never use 64-bit addresses anyway.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some bits in the C+ Command register are always one. Testing for the
presence of the register must allow for this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) power up with
TCR.MXDMA set to 16 bytes. While this does not prevent proper
operation, it almost certainly degrades performance.
Fix by explicitly setting TCR.MXDMA to "unlimited".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) power up with invalid
values in RCR.RXFTH and RCR.MXDMA, causing receive DMA to fail. Fix
by setting explicit values for both fields.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) power up with garbage
values in the ring address registers, and do not clear the registers
on reset.
Fix by always setting the high dword of the ring address registers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Change the DMA alignment from 4096 bytes to 16 bytes, to conserve
available DMA memory. The hardware doesn't have any specific
alignment requirements.
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Datasheet pp. 41-42 defines 'rx packet length' as upper word of
'status' dword field of the receive descriptor table.
http://www.smsc.com/media/Downloads_Archive/discontinued/83c171.pdf
Tested on SMC EtherPower II.
Signed-off-by: Alexey Smazhenko <darkover@corbina.com.ua>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gcc 4.7 produces a spurious warning about an array subscript being out
of bounds. Use a pointer dereference instead of an array lookup to
inhibit this spurious warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On i350 the datasheet contradicts itself in stating that the default
value of RXDCTL.ENABLE for queue zero is both set (according to the
"Receive Initialization" section) and unset (according to the "Receive
Descriptor Control - RXDCTL" section). Empirical evidence suggests
that the default value is unset.
Explicitly enable both transmit and receive queues to avoid any
ambiguity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On 82576 (and probably others), the datasheet states that "the tail
register of the queue (RDT[n]) should not be bumped until the queue is
enabled". There is some confusion over exactly what constitutes
"enabled": the initialisation blurb says that we should "poll the
RXDCTL register until the ENABLE bit is set", while the description
for the RXDCTL register says that the ENABLE bit is set by default
(for queue zero). Empirical evidence suggests that the ENABLE bit
reads as set immediately after writing to RCTL.EN, and so polling is
not necessary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use hw pointer in PCI driver data as expected by sky2_remove().
Signed-off-by: Valentine Barshak <gvaxon@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RTL8139C+ cards use essentially the same datapath as RTL8169, which is
zerocopy and 64-bit capable. Older RTL8139 cards use a single receive
ring buffer rather than a descriptor ring, but still share substantial
amounts of functionality with RTL8169.
Include support for RTL8139 cards within the generic Realtek driver,
since there is no way to differentiate between RTL8139 and RTL8139C+
cards based on the PCI IDs alone.
Many thanks to all the people who worked on the rtl8139 driver over
the years.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The link state is currently set at probe time, and updated only when
the device is polled. This results in the user seeing a misleading
stale "Link: down" message, if autonegotiation did not complete within
the short timespan of the probe routine.
Fix by updating the link state when the device is opened, so that the
message that ends up being displayed to the user reflects the real
link state at device open time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Tested-by: Thomas Miletich <thomas.miletich@gmail.com>
Debugged-by: Thomas Miletich <thomas.miletich@gmail.com>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE provides no support for manually configuring the link speed.
Provide a generic routine which should be able to reset any MII/GMII
PHY and enable autonegotiation.
Prototyped-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for 82579-based chips such as those found on Sandy Bridge
motherboards. Based on d3738bb8203acf8552c3ec8b3447133fc0938ddd in
Linux.
Signed-off-by: Daniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Practically speaking, it seems the convention is to only have one
packet pending and not rely upon any mechanism to associate returned
txbuf with txqueue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This function never did much in this driver anyway, and after commit
b5ed30b2 ("[tg3] Fix compilation on newer gcc versions") it became
apparent that its remaining functionality could be easily moved to
tg3_test_dma().
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the old Etherboot tg3 driver with a more up-to-date driver
using the iPXE API.
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RS bit is used to instruct the NIC to update the TX descriptor
status byte. The RPS bit is used to instruct the NIC to defer this
update until after the packet has been transmitted on the wire (rather
than merely read into the transmit FIFO).
The driver currently sets RPS but not RS. Some e1000 models seem to
interpret this as implying that the status byte should be updated;
some don't. On the ones that don't, we never see any TX completions
and so rapidly run out of TX buffers.
Fix by setting the RS bit in the TX descriptor. (We don't care about
when the packet reaches the wire, so don't bother setting the RPS
bit.)
Reported-by: Miroslav Halas <miroslav.halas@bankofamerica.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
WinPE has been observed to call PXENV_UNDI_SHUTDOWN but not
PXENV_STOP_UNDI. This means that Hermon hardware is left partially
active (firmware running and one event queue mapped) when WinPE starts
up, which can cause a Blue Screen of Death.
Fix by ensuring that the hardware is left quiescent (with the firmware
stopped) when no interfaces are open.
Reported-by: Itay Gazit <itayg@mellanox.co.il>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid spurious matches for peer key 0 against empty peer cache
entries, and set the LL_MULTICAST flag in addition to LL_BROADCAST.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ChipCmd register is only an 8-bit register. The 16-bit access
used by iPXE was causing an issue when used with qemu emulated rtl8139
device which was improperly aligning IOs.
Signed-off-by: Julian Pidancet <julian.pidancet@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Arbel seems to crash the system as soon as the first send WQE
completes on an RC queue pair. (NOPs complete successfully, so this
is a problem specific to the work queue rather than the completion
queue.) The cause of this problem has remained unknown for over a
year.
Check in the non-functioning code to avoid bit-rot, and in the hope
that someone will find the fix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the link layer to directly report whether or not a packet is
multicast or broadcast at the time of calling pull(), rather than
relying on heuristics to determine this at a later stage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Give the step() method a pointer to the containing object, rather than
a pointer to the process. This is consistent with the operation of
interface methods, and allows a single function to serve as both an
interface method and a process step() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This reverts commit 15c1200 ("[hermon] Work around missing mport
support in current BOFM implementations").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Current BOFM versions are unable to create entries with mport>1, which
means that only the port 1 MAC address can be explicitly specified.
Work around this by using the provided MAC address as a base address
for all subsequent ports. For example, if BOFM assigns the address
00:1A:64:76:00:09 for port 1
then we will assign the addresses
00:1A:64:76:00:09 for port 1
00:1A:64:76:00:0a for port 2
Future BOFM versions that may correctly support mport will work with
this scheme without modification provided that the BOFM entries are
created in increasing order of mport. Since BOFM tools tend to
generate entries in increasing order (of slot, port, etc), this is not
an unreasonable compromise.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE operates the forcedeth NIC in promiscuous mode, and never changes
the unicast MAC address filter registers. We should not therefore set
the flag indicating (to other drivers loaded later) that the MAC
address order has already been corrected.
Reported-by: Tal Aloni <tal.aloni.il@gmail.com>
Tested-by: Tal Aloni <tal.aloni.il@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The forcedeth driver currently implements unicast MAC address
filtering in software. This is almost invariably the wrong thing to
do (since the network stack must already be able to cope with unwanted
packets) and it breaks FCoE (which requires the card to operate in
promiscuous mode).
Also, the implementation is buggy: is_local_ether_addr() is used to
check for a locally-assigned Ethernet address (not to check for a
unicast address), and the current link-layer address is in
netdev->ll_addr, not netdev->hw_addr.
Fix by removing this code.
Reported-by: Tal Aloni <tal.aloni.il@gmail.com>
Tested-by: Tal Aloni <tal.aloni.il@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid unused-but-set variable warning in gcc 4.6 which was introduced
by commit 9215b7f ("[forcedeth] Clear the MII link status register on
link status changes").
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose a function setting_applies() to allow a caller to determine
whether or not a particular setting is applicable to a particular
settings block.
Restrict DHCP-backed settings blocks to accepting only DHCP-based
settings.
Restrict network device settings blocks to accepting only DHCP-based
settings and network device-specific settings such as "mac".
Inspired-by: Glenn Brown <glenn@myri.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
From a cursory examination, it appears as though the calculation of
tx_available is redundant, since eepro_transmit() waits for transmit
completion before returning anyway.
Reported-by: Ralph Giles <giles@thaumas.net>
Tested-by: Ralph Giles <giles@thaumas.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On reset and close, the ICR register is read to clear any pending
interrupts, but the value is simply ignored. Avoid assigning the
value to a variable, to inhibit a warning from gcc 4.6.
Also fix a potential race condition in reset routines which clear
interrupts before disabling them.
Reported-by: Ralph Giles <giles@thaumas.net>
Tested-by: Ralph Giles <giles@thaumas.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
These unused portions trigger a compiler warning under gcc 4.6, due to
the ambiguity over the "page" field in struct igbvf_buffer.
Reported-by: Ralph Giles <giles@thaumas.net>
Tested-by: Ralph Giles <giles@thaumas.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a virtual environment such as qemu, we can legitimately receive
packets less than 64 bytes in length, such as ARP replies. These are
currently discarded, causing most IPv4 communication to fail.
Fix by ignoring the RFDShort bit when receiving packets.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading rtl8139.pxe from an old Etherboot rtl8139.zrom, iPXE
can end up misreading the first word of the MAC address from the
EEPROM as being all zeroes. This is presumably because Etherboot has
left the serial EEPROM in an unexpected state.
Fix by using the chip select line to reset the SPI device before we
start accessing it.
Reported-by: Mandar U Jog <mandarjog@gmail.com>
Tested-by: Mandar U Jog <mandarjog@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The default initiator IQN is "iqn.2000-09.org.etherboot:UNKNOWN".
This is problematic for two reasons:
a) the etherboot.org domain (and hence the associated IQN namespace)
is not under the control of the iPXE project, and
b) some targets (correctly) refuse to allow concurrent connections
from different initiators using the same initiator IQN.
Solve both problems by changing the default initiator IQN to be
iqn.2010-04.org.ipxe:<hostname> if a hostname is set, or
iqn.2010-04.org.ipxe:<uuid> if no hostname is set.
Explicit initiator IQNs set via DHCP option 203 are not affected by
this change.
Unfortunately, this change is likely to break some existing
configurations, where ACL rules have been put in place referring to
the old default initiator IQN. Users may need to update ACLs, or
force the use of the old IQN using an iPXE script line such as
set initiator-iqn iqn.2000-09.org.etherboot:UNKNOWN
or a dhcpd.conf option such as
option iscsi-initiator-iqn "iqn.2000-09.org.etherboot:UNKNOWN"
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some operating environments require (or at least prefer) that we do
not perform our own PCI bus scan, but deal only with specified
devices. Modularise the PCI core to allow for this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Merge the "bus" and "devfn" fields into a single "busdevfn" field, to
match the format used by the majority of external code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On 64-bit builds, MLX_DECLARE_STRUCT() produces a structure that is
always a multiple of 64 bits long, causing the HCR structure to be
over-length by one dword. This in turn causes hermon_cmd() to write
beyond the end of the HCR, which causes commands to fail.
Reported-by: Itay Gazit <itayg@mellanox.co.il>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid memory leak of untreated events by having circular event queue
operation.
Signed-off-by: Itay Gazit <itaygazit@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove duplicate hardware resets, remove network interface logic
reset.
This also fixes a bug where some 3c905C variants would return bogus
EEPROM values because of a too short delay after the network reset.
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Reported-by: Peter Huewe <peterhuewe@gmx.de>
Tested-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
DBG is reserved for errors and important warnings only.
DBG2 for additional information, e.g. "received packet".
DBGP is used to print the name of every function as it is called.
Signed-off-by: Thomas Miletich<thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the concept of shutdown exit flags, and replace it with a
counter used to keep track of exposed interfaces that require devices
to remain active.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support a new function mode "multi-function 8 Direct IO" which is used
in ESX Direct I/O configuration.
Update driver version to 3.5.0.1
Signed-off-by: Masroor Vettuparambil <masroor.vettuparambil@exar.com>
Signed-off-by: Sivakumar Subramani <sivakumar.subramani@exar.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most xxx_init() functions are void functions with no failure cases.
Allow pci_vpd_init() to be used in the same way. (Subsequent calls to
pci_vpd_read() etc. will fail if pci_vpd_init() fails.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since its implementation several years ago, no driver has used a
fragment list containing more than a single fragment. Simplify the
NVO core and the drivers that use it by removing the whole concept of
the fragment list, and using a simple (address,length) pair instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Driver for Intel 82576 based virtual functions, based on Intel source
code available at:
http://sourceforge.net/projects/e1000 (igbvf-1.0.7)
Based on initial port from Eric Keller <ekeller@princeton.edu>.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Changes were made to files where the licence text within the files
themselves confirms that the files are GPL version 2 or later.
Signed-off-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Pass the settings block name as a parameter to register_settings(),
rather than defining it with settings_init() (and then possibly
changing it by directly manipulating settings->name).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some SCSI targets (observed with an EMC CLARiiON Fibre Channel target)
will not respond to commands correctly until a TEST UNIT READY has
been issued. In particular, a READ CAPACITY (10) command will return
with a success status, but no capacity data.
Fix by issuing a TEST UNIT READY command automatically, and delaying
further SCSI commands until the TEST UNIT READY has succeeded.
Reported-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
FCoE requires us to be able to receive unicast packets for multiple
addresses. Support this by operating in promiscuous mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PRM structures to support Hermon Ethernet devices.
Signed-off-by: Itay Gazit <itaygazit@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Unlike Arbel, port parameters must be applied via a separate call to
SET_PORT, rather than as parameters to INIT_PORT.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mapping a single page at a time causes a several-second delay at
device initialisation time. Reduce this by mapping multiple pages at
a time, using the largest block sizes possible given the alignment
constraints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mapping a single page at a time causes a several-second delay at
device initialisation time. Reduce this by mapping multiple pages at
a time, using the largest block sizes possible given the alignment
constraints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use individual page mappings rather than a single whole-region
mapping, to avoid the waste of memory that occurs due to the
constraint that each mapped block must be aligned on its own size.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Backport some changes from the Hermon driver to the Arbel driver.
Specifically:
o Rename reserved_lkey to lkey
o Add arbel_rate() to calculate transmission rates
o Structure code to allow for addition of RC queue pairs
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the amount of ICM space required by choosing to order the
various allocations in approximately descending order of alignment
requirements.
This saves approximately 512kB of host memory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current method for ICM allocation exactly matches the addresses
chosen by the old Etherboot driver, but does not match the
specification. Some ICM tables (notably the queue pair context table)
therefore end up incorrectly aligned.
Fix by performing allocations as per the specification.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Improve the utility of debugging messages by including the relevant
port number, queue number (QPN, CQN, EQN), work queue entry (WQE)
number, and physical addresses wherever applicable.
Add arbel_dump_cqctx() for dumping a completion queue context and
arbel_dump_qpctx() for dumping a queue pair context.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This is a backport of commit 0b1222f ("[hermon] Randomise the
high-order bits of queue pair numbers") to the Arbel driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This is a backport of commit cd5a213 ("[hermon] Allow software GMA to
receive packets destined for QP1") to the Arbel driver.
This patch includes a correction to a bug in the autogenerated
hardware description header file.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Only port state change events are currently mapped to our event queue,
since those are the only events we are prepared to handle. This
ignores a potentially useful source of diagnostic information in the
case of unexpected failures.
Fix by mapping all events to the event queue; a build with debugging
enabled will therefore at least dump the raw content of the unexpected
events.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Only port state change events are currently mapped to our event queue,
since those are the only events we are prepared to handle. This
ignores a potentially useful source of diagnostic information in the
case of unexpected failures.
Fix by mapping all events to the event queue; a build with debugging
enabled will therefore at least dump the raw content of the unexpected
events.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently uses the first port's port GUID as the node GUID,
rather than using the (possibly distinct) real node GUID. This can
confuse opensm during the handover to a loaded OS: it thinks the port
already belongs to a different node and so discards our port
information with a warning message about duplicate ports. Everything
is picked up correctly on the second subnet sweep, after opensm has
established that the "old" node no longer exists, but this can delay
link-up unnecessarily by several seconds.
Fix by using the real node GUID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
No event is generated upon reaching INIT, so we must poll separately
for link state changes while we remain DOWN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
No event is generated upon reaching INIT, so we must poll separately
for link state changes while we remain DOWN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ib_smc_update() potentially updates the Infiniband port state, and so
should almost always be followed by a call to ib_link_state_changed().
The one exception is the call made to ib_smc_update() before the
device is registered.
Fix by removing explicit calls to ib_link_state_changed() from drivers
using ib_smc_update(), including a call to ib_link_state_changed()
within ib_smc_update(), and creating a separate ib_smc_init() for use
prior to device registration.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The sense key gives a first idea of what the problem might be, and so
is potentially useful in diagnosing problems in a non-debug build.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The block device interface used in gPXE predates the invention of even
the old gPXE data-transfer interface, let alone the current iPXE
generic asynchronous interface mechanism. Bring this old code up to
date, with the following benefits:
o Block device commands can be cancelled by the requestor. The INT 13
layer uses this to provide a global timeout on all INT 13 calls,
with the result that an unexpected passive failure mode (such as
an iSCSI target ACKing the request but never sending a response)
will lead to a timeout that gets reported back to the INT 13 user,
rather than simply freezing the system.
o INT 13,00 (reset drive) is now able to reset the underlying block
device. INT 13 users, such as DOS, that use INT 13,00 as a method
for error recovery now have a chance of recovering.
o All block device commands are tagged, with a numerical tag that
will show up in debugging output and in packet captures; this will
allow easier interpretation of bug reports that include both
sources of information.
o The extremely ugly hacks used to generate the boot firmware tables
have been eradicated and replaced with a generic acpi_describe()
method (exploiting the ability of iPXE interfaces to pass through
methods to an underlying interface). The ACPI tables are now
built in a shared data block within .bss16, rather than each
requiring dedicated space in .data16.
o The architecture-independent concept of a SAN device has been
exposed to the iPXE core through the sanboot API, which provides
calls to hook, unhook, boot, and describe SAN devices. This
allows for much more flexible usage patterns (such as hooking an
empty SAN device and then running an OS installer via TFTP).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Doorbell records are currently embedded within the completion queue
and receive work queue strucures, which are allocated using zalloc()
and so have an alignment guarantee of only sizeof(void*), i.e. four
bytes. This is sufficient for the receive work queue, but not for the
completion queue, which requires an alignment guarantee of eight
bytes.
Though not guaranteed, it so happens that zalloc() will always return
a pointer that is exactly four bytes above a sixteen-byte boundary.
The completion queue doorbell record is therefore always misaligned,
and the value passed to the hardware via SW2HW_CQ is actually always
pointing to the page_offset value within the MTT descriptor (which
directly precedes the inline doorbell record). Provided that the page
offset is greater than 0x100, this looks to the hardware like an
update_ci value of greater than 0x010000 (taking into account
endianness differences), and so the hardware will happily deliver more
than 0x010000 completions before stopping. Hence this problem is
rarely observable.
Fix by allocating the doorbell records separately and using the
correct alignment constraints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Give completion queues a chance to deliver exception events by
programming in the number of our event queue (currently used only for
port state changes).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Improve the utility of debugging messages by including the relevant
port number, queue number (QPN, CQN, EQN), work queue entry (WQE)
number, and physical addresses wherever applicable.
Add hermon_dump_cqctx() for dumping a completion queue context, and
hermon_fill_nop_send_wqe() for inserting NOPs into send work queues.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An attempt to transmit a packet of 8192 bytes or larger will collide
with the status bits in the TX descriptor. This gives the appearance
of the network card's transmit data path having just suddenly stopped
responding; iPXE is waiting for the card to report a TX completion
but, because of the status bit collision, the card thinks that the
descriptor has not yet been written.
Fix by explicitly checking for oversized packets in rtl_transmit().
Discovered during Fibre Channel over Ethernet testing, and debugged by
using gdb to examine the state of the emulated rtl8139 card in qemu.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the explicit calls from the Infiniband core to the IPoIB layer
with the general concept of an Infiniband upper-layer driver
(analogous to a PCI driver) which can create arbitrary devices on top
of Infiniband devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The rtl8139 driver includes the Ethernet CRC within the received
packet. All current protocols ignore trailing garbage, but FCoE
requires the frame length to be correct (since the FCoE footer
position is calculated from the end of the packet), so fix the driver
to strip out the CRC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PCI ID 8086:27dc to the eepro100 driver.
Reported-by: Cédric Delmas <c.delmas@akka.eu>
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the tap driver that can be used like:
$ ./ipxe.linux --net tap,if=tap0,mac=00:0c:29:c5:39:a1
The if setting is mandatory.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the base to build linux drivers and the linux UI code on. UI
fills device requests, which are later walked over by the linux
root_driver and delegated to specific linux drivers.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
alloc_memblock() and free_memblock() are internal.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
pcbios specific get_memmap() is used by the b44 driver making
all-drivers builds fail on other platforms. Move it to the I/O API
group and provide a dummy implementation on EFI.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This patch adds a native iPXE forcedeth driver and removes the legacy
Etherboot forcedeth driver. It supports 40 different chips, compared
to the original 14.
It has been tested on a NIC with an CK804 Ethernet Controller, and the
results of downloading 5 100mb images in a row have been:
12/11/11/11/11 seconds; booting DSL using pxelinux also succeeded. The
driver has also been tested by chaining undionly.kpxe and it worked.
Signed-off-by: Andrei Faur <da3drus@gmail.com>
Tested-by: Andrei Faur <da3drus@gmail.com>
Tested-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This patch adds a native iPXE virtio-net driver and removes the legacy
Etherboot virtio-net driver. The main reasons for doing this are:
1. Multiple virtio-net NICs are now supported by iPXE. The legacy
driver kept global state and caused issues in virtual machines with
more than one virtio-net device.
2. Faster downloads. The native iPXE driver downloads 100 MB over
HTTP in 12s, the legacy Etherboot driver in 37s. This simple
benchmark uses KVM with tap networking and the Python
SimpleHTTPServer both running on the same host.
Changes to core virtio code reduce vring descriptors to 256 (QEMU uses
128 for virtio-blk and 256 for virtio-net) and change the opaque token
from u16 to void*. Lowering the descriptor count reduces memory
consumption. The void* opaque token change makes driver code simpler.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This bug caused .probe to fail because the NIC did not reset properly.
Signed-off-by: Andrei Faur <da3drus@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add NonVolatile Option (nvo) and NonVolatile Storage (nvs) support to
the myri10ge driver using the EEPROM read/write mechanism provided by
the NIC's Vendor Specific PCI capability.
The myri10ge NIC is capabile of storing 64KB or more of nonvolatile
options, but this patch advertises only 512 bytes of nvo storage
because iPXE malloc's a buffer matching the total size we advertise.
512 is plenty without wasting malloc'd memory. (The 2 other drivers
currently supporting nvo advertise 256 bytes or less.)
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This patch removes the cfg lookup made in the r8169 driver and
replaces it with equivalent information found in the driver_data field
of the pci_device_id structure.
Signed-off-by: Andrei Faur <da3drus@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove data-xfer as an interface type, and replace data-xfer
interfaces with generic interfaces supporting the data-xfer methods.
Filter interfaces (as used by the TLS layer) are handled using the
generic pass-through interface capability. A side-effect of this is
that deliver_raw() no longer exists as a data-xfer method. (In
practice this doesn't lose any efficiency, since there are no
instances within the current codebase where xfer_deliver_raw() is used
to pass data to an interface supporting the deliver_raw() method.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Standardise on using ref_init() to initialise an embedded reference
count, to match the coding style used by other embedded objects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This patch replaces the old pcnet32 driver with a new one that
uses iPXE's API.
Signed-off-by: Andrei Faur <da3drus@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
After changing the driver to refill after feed, if any error occurs a
non-contiguous empty buffer will be introduced in the ring due to my
reuse-buffer-when-error implementation.
Reported-by: Marty Connor <mdc@etherboot.org>
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A new driver for JMicron Ethernet controller.
Reviewed-by: Joshua Oreman <oremanj@rwcr.net>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Reviewed-by: Marty Connor <mdc@etherboot.org>
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a new network driver that consumes the EFI Simple Network
Protocol. Also add a bus driver that can find the Simple Network
Protocol that iPXE was loaded from; the resulting behavior is similar
to the "undionly" driver for BIOS systems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix up the whitespace errors inadvertently introduced by the
last-minute rename from the internal QLogic codename to "qib7322".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Apart from format specifier fixes there are two changes in proper code:
- Change type of regs in skge_hw to unsigned long
- Cast result of sizeof in myri10ge to uint32_t
Both don't change anything for i386 and should be fine on x86_64.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Joshua Oreman <oremanj@rwcr.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Access to the gpxe.org and etherboot.org domains and associated
resources has been revoked by the registrant of the domain. Work
around this problem by renaming project from gPXE to iPXE, and
updating URLs to match.
Also update README, LOG and COPYRIGHTS to remove obsolete information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Christopher Armenio reported link detection problems with an
integrated eepro100 NIC. Thomas Miletich removed link detection code
from the eepro100 driver and verified that the driver continued to
function. Christopher verified Thomas' patch on his integrated
eepro100 NIC.
Reported-by: Christopher Armenio <christopher.armenio@resquared.com>
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
In building gpxe for openSUSE Factory (part of kvm package), there were
a few problems identified by the compiler. This patch addresses them.
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The interrupt control mechanism on Phantom cards has changed
substantially since the driver was initially written. This updates
the code to match the mechanism used in production firmware.
This is sufficient to allow DOS wget to function successfully using
the 3Com UNDI/NDIS, Intel UNDI/NDIS, and UNDIPD.COM UNDI/PD stacks.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
This commit adds an igb (Intel GigaBit) driver based on Intel source
code available at:
http://sourceforge.net/projects/e1000/
which is upstream source for the Linux kernel e1000 drivers, and
should support some PCIe e1000 variants.
Signed-off-by: Marty Connor <mdc@etherboot.org>
This commit adds an e1000e driver based on Intel source code
available at:
http://sourceforge.net/projects/e1000/
which is upstream source for the Linux kernel e1000 drivers, and
should support many PCIe e1000 variants.
Signed-off-by: Marty Connor <mdc@etherboot.org>
This commit replaces the current gPXE e1000 driver with one ported
from Intel source code available at
http://sourceforge.net/projects/e1000/
which is upstream source for the Linux kernel e1000 drivers, and
should support most if not all PCI e1000 variants.
Signed-off-by: Marty Connor <mdc@etherboot.org>
The vxge driver code is split over several files, including vxge_main.c.
This causes the build system and ROM-o-matic to see the driver as
"vxge_main".
This patch adds a stub vxge.c which takes up no space but gives the
driver its proper name, "vxge".
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
Align each ICM member alloc to the member size instead of page size.
Increase multicast table size to 128.
Signed-off-by: Itay Gazit <itaygazit@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The rtl818x driver uses programmed I/O but has a fallback to
memory-mapped I/O registers. The fallback currently will not work since
the registers are accessed using inl()/outl() programmed I/O functions
in the driver. This patch removes the fallback to we fail cleanly when
programmed I/O is not possible.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Joshua Oreman <oremanj@rwcr.net>
Signed-off-by: Marty Connor <mdc@etherboot.org>
This driver uses programmed I/O to access hardware registers. There is
a stray memory-mapped I/O read on a programmed I/O address. Perhaps
this is an artifact of porting the driver. Fix this by converting it to
programmed I/O.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
This seems to be necessary for some types of PCI devices. We had
problems when using gPXE in KVM virtual machines with direct
PCI device access.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Modified-by: Marty Connor <mdc@etherboot.org>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The previous [skge] commit should have been recorded as authored by
Thomas Miletich <thomas.miletich@gmail.com>
I mistakenly committed it improperly after fixing a merge issue.
Signed-off-by: Marty Connor <mdc@etherboot.org>
This code is based on the linux skge driver. It supports Marvell Yukon
and SysKonnect Gigabit chipsets.
The code is based on code Michael Decker <mrd999@gmail.com> wrote for
Google Summer of Code 2008.
Support for dual-port cards is untested. The code, however, was left
in. In my opinion it's easier to fix the code if we need to, instead
of having to add support for it from scratch.
Signed-off-by: Marty Connor <mdc@etherboot.org>
This driver supports all current Myricom 10 gigabit Ethernet NICs.
It was written from scratch for gPXE by Glenn Brown <glenn@myri.com>,
referenencing Myricom's Linux and EFI drivers, with permission.
Signed-off-by: Glenn Brown <glenn@myri.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
This version is Based on Michael Decker's GSoC 2008 code.
A number cleanups and fixes were applied.
Earlier-version-reviewed-by: Marty Connor <mdc@etherboot.org>
Earlier-version-tested-by: Marty Connor <mdc@etherboot.org>
Earlier-version-tested-by: Shao Miller <Shao.Miller@yrdsb.edu.on.ca>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Joshua Oreman <oremanj@rwcr.net>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The 82571 supports an alternate MAC address location in NVRAM.
When this is set, use this for the MAC rather than the default
physical MAC address.
Ported from linux-2.6.git 93ca161027eb6a1761fb674ad7b995aedccf5f6e
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Tested-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The latest RTL-generated register lists include (mostly redundant)
xxx_MSB values alongside xxx_LSB and xxx_RMASK, and also include
default register values.
Some subnet managers expect the GetResponse from a SetPortInfo MAD to
contain the new link state. The transition is not immediate, so we
often end up returning the previous link state. This can cause the SM
to fail to activate the port.
Fix by waiting for up to 20us for the link state transition to take
effect.
The first byte of the IPoIB MAC address is used for flags indicating
support for "connected mode". Strip out the non-QPN bits of the first
dword when constructing the address vector for transmitted IPoIB
packets, so as not to end up passing an invalid QPN in the BTH.
Error message was:
[BUILD] bin/atl1e.oncc1: warnings being treated as errors
drivers/net/atl1e.c: In function 'atl1e_get_permanent_address':
drivers/net/atl1e.c:1326: error: dereferencing type-punned pointer will break strict-aliasing rules
make: *** [bin/atl1e.o] Error 1
Reported-by: Giandomenico De Tullio <ghisha@email.it>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Modified-by: Michael Brown <mcb30@etherboot.org>
Remove spaces in 3rd PCI_ROM field.
Debugged-by: Marty Connor <mdc@etherboot.org>
Reported-by: Giandomenico De Tullio <ghisha@email.it>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
The iBFT is Ethernet-centric in providing only six bytes for a MAC
address. This is most probably an indirect consequence of a similar
design flaw in the Windows NDIS stack. (The WinOF IPoIB stack
performs all sorts of contortions in order to pretend to the NDIS
layer that it is dealing with six-byte MAC addresses.)
There is no sensible way in which to extend the iBFT without breaking
compatibility with programs that expect to parse it. Add the notion
of an "Ethernet-compatible" MAC address to our link layer abstraction,
so that link layers can provide their own workarounds for this
limitation.
gcc 3.3.3 gave the following error when compiling sis190.c
drivers/net/sis190.c: In function 'sis190_get_mac_addr_from_apc':
drivers/net/sis190.c:966: warning: 'isa_bridge' might be used
uninitialized in this function
make: *** [bin/sis190.o] Error 1
This patch allows error-free compilation.
Signed-off-by: Marty Connor <mdc@etherboot.org>
Some BIOSes set the PCI cacheline size to zero for the card; the ath5k
driver fixes it to a reasonable in PCI config space, but failed to
correct the internal value it had already read. This resulted in
divide-by-zero errors when cacheline-aligning various data structures.
Fix by setting the internal cachelsz to a sane value at the same time
as we write that value to PCI config space.
Signed-off-by: Marty Connor <mdc@etherboot.org>
This adds basic rfkill support for enabling the wireless card on certain
laptops, and changes miscellaneous other details that may help in obscure
cases.
Also change the error handling to not report CRC errors, which due to the
basic facts of wireless may happen even more frequently than valid packets.
Signed-off-by: Marty Connor <mdc@etherboot.org>
Add the 82576 to the e1000 driver.
- Examining the Linux 2.6.30-rc4 igb driver, which supports this card and;
- Information available in the Intel® 82576 Gigabit Ethernet
Controller Datasheet v2.1, which is available from Intel's web site.
I only have a dual-ported card with Copper PHY, so any code paths relating
to Fibre haven't been tested. Also, I have only tested using auto-negotiation
of speed and duplex, and no flow control. Other code paths relating to
those settings also have not been exercised.
Signed-off-by: Simon Horman <horms@verge.net.au>
Sponsored-by: Thomas Miletich <thomas.miletich@gmail.com>
Modified-by: Thomas Miletich <thomas.miletich@gmail.com>
Modified-by: Marty Connor <mdc@etherboot.org>
Signed-off-by: Marty Connor <mdc@etherboot.org>
Enable interrupts in sis900_irq(). Doing so allows some programs using
gPXE's UNDI interface to work properly, including Symantec Ghost.
Tested-by: Hubert Mercier <hubert.mercier@unilim.fr>
Signed-off-by: Marty Connor <mdc@etherboot.org>
Both methods disabled packet tx and rx just to have it enabled again
by calling a3c90x_reset().
Fixed by disabling tx and rx after the call to a3c90x_reset().
Tested by booting Ubuntu intrepid(8.10) directly from gPXE and pxelinux.
Tested on 3c905, 3c905B, 3c905C.
Signed-off-by: Marty Connor <mdc@etherboot.org>
Some systems will retry their boot sequence in the event of a boot
failure. On these systems, the second and subsequent boot attempts
will fail to initialise the Hermon HCA.
Fix by resetting the HCA during probe(). This incurs a one-second
cost, but there seems to be no viable alternative.
Originally-fixed-by: Itay Gazit <itaygazit@gmail.com>
Some devices can only be reset via a mechanism that also resets the
card's PCI core, thus necessitating a backup and restore of all or
part of the PCI configuration space across a reset.
IPoIB has a 20-byte link-layer address, of which only eight bytes
represent anything relating to a "hardware address".
The PXE and EFI SNP APIs expect the permanent address to be the same
size as the link-layer address, so fill in the "permanent address"
field with the initial link layer address (as generated by
register_netdev() based upon the real hardware address).
The hardware address is an intrinsic property of the hardware, while
the link-layer address can be changed at runtime. This separation is
exposed via APIs such as PXE and EFI, but is currently elided by gPXE.
Expose the hardware and link-layer addresses as separate properties
within a net device. Drivers should now fill in hw_addr, which will
be used to initialise ll_addr at the time of calling
register_netdev().
With iSCSI, connection attempts are expensive; it may take many
seconds to determine that a connection will fail. SRP connection
attempts are much less expensive, so we may as well avoid the
"optimisation" of declaring a state of permanent failure after a
certain number of attempts. This allows a gPXE SRP initiator to
resume operations after an arbitrary amount of SRP target downtime.
SRP is the SCSI RDMA Protocol. It allows for a method of SAN booting
whereby the target is responsible for reading and writing data using
Remote DMA directly to the initiator's memory. The software initiator
merely sends and receives SCSI commands; it never has to touch the
actual data.
The ACK timeout determines how long we take to notice a failed
Reliable Connection. Reducing it from the arbitrary value of 19 down
to 14 reduces the individual ACK timeout from around 2.1s to 67ms;
this in turn reduces the time to tear down and re-establish a broken
SRP session from around 30s to around 1s.
The Infiniband Communication Manager will refuse to establish a
connection if it believes the connection is already established.
There is no immediately obvious way to ask it to tear down the
existing connection and replace it; to issue a DREP we would need to
know the local and remote communication IDs used for the previous
connection setup.
We can work around this by randomising the high-order bits of the
queue pair number; these have no significance to the hardware, but are
sufficient to convince the IB CM that this is a different connection.
The prior net80211 model of physical-layer behavior for drivers was
overly simplistic and limited the drivers that could be written. To
be more flexible, split the driver-provided list of supported rates by
band, and add a means for specifying a list of supported channels.
Allow drivers to specify a hardware channel value that will be tied to
uses of the channel.
Expose net80211_duration() to drivers, and make the rate it uses in
its computations configurable, so that it can be used in calculating
durations that must be set in hardware for ACK and CTS packets. Add
net80211_cts_duration() for the common case of calculating the
duration for a CTS packet.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
The IBA specification refers to management "interfaces" and "agents".
The interface is the component that connects to the queue pair and
sends and receives MADs; the agent is the component that constructs
the reply to the MAD.
Rename the IB_{QPN,QKEY,QPT} constants as a first step towards making
this separation in gPXE.
The Linux IB Communication Manager will always send MADs to QP1,
rather than back to the originating QP. On Hermon, QP1 is by default
handled by the embedded firmware. We can change this, but the cost is
that we have to handle both QP0 and QP1 (i.e. we have to provide SMA
as well as GMA service in software), and we have to use MLX queues
rather than standard UD queues (i.e. we have to construct the UD
datagrams by hand).
There doesn't seem to be any viable way around this situation, ugly
though it is.
Queue pairs are now assumed to be created in the INIT state, with a
call to ib_modify_qp() required to bring the queue pair to the RTS
state.
ib_modify_qp() no longer takes a modification list; callers should
modify the relevant queue pair parameters (e.g. qkey) directly and
then call ib_modify_qp() to synchronise the changes to the hardware.
The packet sequence number is now a property of the queue pair, rather
than of the device.
Each queue pair may have an associated address vector. For RC queue
pairs, this is the address vector that will be programmed in to the
hardware as the remote address. For UD queue pairs, it will be used
as the default address vector if none is supplied to ib_post_send().
The queue key is stored as a property of the queue pair, and so can
optionally be added by the Infiniband core at the time of calling
ib_post_send(), rather than always having to be specified by the
caller.
This allows IPoIB to avoid explicitly keeping track of the data queue
key.
Now that path record lookups are handled entirely via
ib_resolve_path(), the only role of the IPoIB peer cache is as a
lookup table for MAC addresses. Update the code structure and
comments to reflect this.
The IPoIB broadcast MAC address varies according to the partition key.
Now that the broadcast MAC address is a property of the network device
rather than of the link layer, we can expose this real MAC address
directly.
The broadcast LID is now identified via a path record lookup; this is
marginally inefficient (since it was present in the MCMemberRecord
GetResponse), but avoids the need to special-case broadcasts when
constructing the address vector in ipoib_transmit().
Currently, all Infiniband users must create a process for polling
their completion queues (or rely on a regular hook such as
netdev_poll() in ipoib.c).
Move instead to a model whereby the Infiniband core maintains a single
process calling ib_poll_eq(), and polling the event queue triggers
polls of the applicable completion queues. (At present, the
Infiniband core simply polls all of the device's completion queues.)
Polling a completion queue will now implicitly refill all attached
receive work queues; this is analogous to the way that netdev_poll()
implicitly refills the RX ring.
Infiniband users no longer need to create a process just to poll their
completion queues and refill their receive rings.
IPoIB and the SMA have separate constants for the packet size to be
used to I/O buffer allocations. Merge these into the single
IB_MAX_PAYLOAD_SIZE constant.
(Various other points in the Infiniband stack have hard-coded
assumptions of a 2048-byte payload; we don't currently support
variable MTUs.)
IPoIB has a link-layer broadcast address that varies according to the
partition key. We currently go through several contortions to pretend
that the link-layer address is a fixed constant; by making the
broadcast address a property of the network device rather than the
link-layer protocol it will be possible to simplify IPoIB's broadcast
handling.
Move the icky call to step() from aoe.c to ata.c; this takes it at
least one step further away from where it really doesn't belong.
Unfortunately, AoE has the ugly aoe_discover() mechanism which means
that we still have a step() loop in aoe.c for now; this needs to be
replaced at some future point.
In order to construct outgoing link-layer frames or parse incoming
ones properly, some protocols (such as 802.11) need more state than is
available in the existing variables passed to the link-layer protocol
handlers. To remedy this, add struct net_device *netdev as the first
argument to each of these functions, so that more information can be
fetched from the link layer-private part of the network device.
Updated all three call sites (netdevice.c, efi_snp.c, pxe_undi.c) and
both implementations (ethernet.c, ipoib.c) of ll_protocol to use the
new argument.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Several SPI chips will respond to an SPI read command with a dummy
zero bit immediately prior to the first real data bit. This can be
used to autodetect the address length, provided that the command
length and data length are already known, and that the MISO data line
is tied high.
Tested-by: Thomas Miletich <thomas.miletich@gmail.com>
Debugged-by: Thomas Miletich <thomas.miletich@gmail.com>
The pcnet32 driver mismanages its RX buffers, with the result that
packets get corrupted if more than one packet arrives between calls to
poll().
Originally-fixed-by: Bill Lortz <Bill.Lortz@premier.org>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Tested-by: Stefan Hajnoczi <stefanha@gmail.com>
Also adds the MAC_ADDR_CORRECT flag, to indicate whether or not the
MAC address needs to be fixed up by the driver.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
This is a major rewrite of the legacy etherboot 3c90x driver using the
gPXE API for much improved performance over the legacy driver it
replaces.
This driver has been tested on 3c905, 3c905B, and 3c905C cards.
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Marty Connor <mdc@etherboot.org>
Tested-by: Marty Connor <mdc@etherboot.org>
Tested-by: Daniel Verkamp <daniel@drv.nu>
Signed-off-by: Marty Connor <mdc@etherboot.org>
Intel's C compiler (icc) chokes on the zero-length arrays that we
currently use as part of the mechanism for accessing linker table
entries. Abstract away the zero-length arrays, to make a port to icc
easier.
Introduce macros such as for_each_table_entry() to simplify the common
case of iterating over all entries in a linker table.
Represent table names as #defined string constants rather than
unquoted literals; this avoids visual confusion between table names
and C variable or type names, and also allows us to force a
compilation error in the event of incorrect table names.
Following the example of the Linux driver, we add a check and delay to
make sure that the NIC has finished resetting before the driver issues
any additional commands.
Signed-off-by: Marty Connor <mdc@etherboot.org>
This previously unsupported NIC variant was was found to work using
the current driver:
PCI_ROM(0x13f0, 0x0200, "ip100a", "IC+ IP100A"),
Signed-off-by: Marty Connor <mdc@etherboot.org>
Some targets send a spurious CHECK CONDITION message in response to
the first SCSI command. We issue (and ignore the status of) an
arbitary harmless SCSI command (a READ CAPACITY (10)) in order to draw
out this response.
The Solaris Comstar target seems to send more than one spurious CHECK
CONDITION response. Attempt up to SCSI_MAX_DUMMY_READ_CAP dummy READ
CAPACITY (10) commands before assuming that error responses are
meaningful.
Problem reported by Kristof Van Doorsselaere <kvandoor@aserver.com>
and Shiva Shankar <802.11e@gmail.com>.
Driver was storing the result of pci_bar_start() and pci_bar_size() in
an int, rather than an unsigned long.
(Bug was introduced in the vendor's tree in commit eac85cd "Port
etherfabric driver to net_device api".)
adjust_pci_device() has historically enabled bus-mastering and I/O
cycles, but has never previously needed to enable memory cycles. Some
EFI systems seem not to enable memory cycles by default, so add that
to the list of PCI command register bits that we force on.
When compiling for the Linux kernel, PCI_BASE_ADDRESS_0 == 0, and
PCI_BASE_ADDRESS_1 == 1. This is not so when compiling for gPXE. We
must use the symbolic names rather than integers to get the correct
values.
Bug identified and patch supplied by:
George Chou <george.chou@advantech.com>
The patch file supplied for commit 3a799e9 ("[hermon] Add PCI ID for
ConnectX QDR card") accidentally marked drivers/infiniband/hermon.c as
being executable.
This driver is based on Stefan Hajnoczi's summer work, which
is in turn based on version 1.01 of the linux b44 driver.
I just assembled the pieces and fixed/added a few pieces
here and there to make it work for my hardware.
The most major limitation is that this driver won't work
on systems with >1GB RAM due to the card not having enough
address bits for that and gPXE not working around this
limitation.
Still, other than that the driver works well enough for
at least 2 users :) and the above limitation can always
be fixed when somebody wants it bad enough :)
Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com>
This brings us in to line with Linux definitions, and also simplifies
adding x86_64 support since both platforms have 2-byte shorts, 4-byte
ints and 8-byte long longs.
The return path in directed route SMPs lists the egress ports in order
from SM to node, rather than from node to SM.
To write to the correct offset within the return path, we need to
parse the hop pointer. This is held within the class-specific data
portion of the MAD header, which was previously unused by us and
defined to be a uint16_t. Define this field to be a union type; this
requires some rearrangement of ib_mad.h and corresponding changes to
ipoib.c.
These cards very nearly support our current IB Verbs model. There is
one minor difference: multicast packets will always be delivered by
the hardware to QP0, so the driver has to redirect them to the
appropriate QP. This means that QP owners may see receive completions
for buffers that they never posted. Nothing in our current codebase
will break because of this.
This can be used with cards that require the driver to construct and
parse packet headers manually. Headers are optionally handled
out-of-line from the packet payload, since some such cards will split
received headers into a separate ring buffer.
Some Infiniband cards will not be as accommodating as the Arbel and
Hermon cards in providing enough space for us to push a fake extra
header at the start of the received packet. We must therefore make do
with squeezing enough information to identify source and destination
addresses into the two bytes of padding within a genuine IPoIB
link-layer header.
Not all Infiniband cards have embedded subnet management agents.
Split out the code that communicates with such an embedded SMA into a
separate ib_smc.c file, and have drivers call ib_smc_update()
explicitly when they suspect that the answers given by the embedded
SMA may have changed.
Receive completion handlers now get passed an address vector
containing the information extracted from the packet headers
(including the GRH, if present), and only the payload remains in the
I/O buffer.
This breaks the symmetry between transmit and receive completions, so
remove the ib_completer_t type and use an ib_completion_queue_operations
structure instead.
Rename the "destination QPN" and "destination LID" fields in struct
ib_address_vector to reflect its new dual usage.
Since the ib_completion structure now contains only an IB status code,
("syndrome") replace it with a generic gPXE integer status code.
Avoid leaking I/O buffers in ib_destroy_qp() by completing any
outstanding work queue entries with a generic error code. This
requires the completion handlers to be available to ib_destroy_qp(),
which is done by making them static configuration parameters of the CQ
(set by ib_create_cq()) rather than being provided on each call to
ib_poll_cq().
This mimics the functionality of netdev_{tx,rx}_flush(). The netdev
flush functions would previously have been catching any I/O buffers
leaked by the IPoIB data queue (though not by the IPoIB metadata
queue).
Add the simplified ne2k_isa driver. It is just a selective copy+paste
of the relevant parts from ns8390.c plus a little trivial hacking to
make it actually work.
It is true that the code is pretty ugly, but:
a) ns8390.c is worse
b) It is only 372 lines and no #ifdefs
c) It works both in qemu/bochs and in real hardware
and we all know it is easier to cleanup working code
Hope someone will find the time to rewrite this driver properly,
but until then at least for me this is an ok solution.
Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com>
Halting the PEGs breaks platforms where there is sideband access to
the NIC (e.g. HP machines using iLO). (We have to retain the
unhalting code because on some other platforms (e.g. IBM blades with
BOFM) the pre-PXE firmware must halt the PEGs to avoid issues with the
BIOS rereading via the expansion ROM BAR.)
This is something of an ugly hack to accommodate an OEM requirement.
The NIC has only one expansion ROM BAR, rather than one per port. To
allow individual ports to be selectively enabled/disabled for PXE boot
(as required), we must therefore leave the expansion ROM always
enabled, and place the per-port enable/disable logic within the gPXE
driver.
The Phantom firmware selectively disables PCI functions based on the
board type, with the end result that we see one PCI function for each
network port. This allows us to eliminate the code for reading from
flash and, more importantly, removes knowledge of the board type magic
number from the gPXE driver.
Settings can be constructed using a dotted-decimal notation, to allow
for access to unnamed settings. The default interpretation is as a
DHCP option number (with encapsulated options represented as
"<encapsulating option>.<encapsulated option>".
In several contexts (e.g. SMBIOS, Phantom CLP), it is useful to
interpret the dotted-decimal notation as referring to non-DHCP
options. In this case, it becomes necessary for these contexts to
ignore standard DHCP options, otherwise we end up trying to, for
example, retrieve the boot filename from SMBIOS.
Allow settings blocks to specify a "tag magic". When dotted-decimal
notation is used to construct a setting, the tag magic value of the
originating settings block will be ORed in to the tag number.
Store/fetch methods can then check for the magic number before
interpreting arbitrarily-numbered settings.
This interface provides access to firmware settings (e.g. MAC address)
that will apply to all drivers loaded for the duration of the current
system boot.
A hardware bug means that reads through the expansion ROM BAR can
return corrupted data if the PEGs are running. This breaks platforms
that re-read the expansion ROM after invoking gPXE code, such as IBM
blade servers.
Halt PEGs during driver shutdown, and unhalt PEGs during driver
startup if we detect that this is not the first startup since
power-on.
Most other Phantom drivers define a register space in terms of a 64M
virtual address space. While this doesn't map in any meaningful way
to the actual addresses used on the latest cards, it makes maintenance
easier if we do the same.
The virtnet_transmit() logic for waiting the packet to be transmitted is
reversed: we can't wait the packet to be transmitted if we didn't kick()
the ring yet. The vring_more_used() while loop logic is reversed also,
that explains why the code works today.
The current code risks trying to free a buffer from the used ring
when none was available, that will happen most times because KVM
doesn't handle the packet immediately on kick(). Luckily it was working
because it was unlikely to have a buffer still queued for transmit when
virtnet_transmit() was called.
Also, adds a BUG_ON() to vring_get_buf(), to catch cases where we try
to free a buffer from the used ring when there was none available.
Patch for Etherboot. gPXE has the same problem on the code, but I hadn't
a chance to test gPXE using virtio-net yet.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
EFI requires us to be able to specify the source address for
individual transmitted packets, and to be able to extract the
destination address on received packets.
Take advantage of this to rationalise the push() and pull() methods so
that push() takes a (dest,source,proto) tuple and pull() returns a
(dest,source,proto) tuple.
Multicast hashing is an ugly overlap between network and link layers.
EFI requires us to provide access to this functionality, so move it
out of ipv4.c and expose it as a method of the link layer.
-Wformat-nonliteral is not enabled by -Wall and needs to be explicitly
specified.
Modified the few files that use nonliteral format strings to work with
this new setting in place.
Inspired by a patch from Carl Karsten <carl@personnelware.com> and an
identical patch from Rorschach <r0rschach@lavabit.com>.
Some devices (e.g. the Atmel AT24C11) have no concept of a device
address; they respond to every device address and use this value as
the word address. Some other devices use part of the device address
field to extend the word address field.
Generalise the i2c bit-bashing support to handle this by defining the
device address length and word address length as properties of an i2c
device. The word address is assumed to overflow into the device
address field if the address used exceeds the width of the word
address field.
Also add a bus reset mechanism. i2c chips don't usually have a reset
line, so rebooting the host will not clear any bizarre state that the
chip may be in. We reset the bus by clocking SCL until we see SDA
high, at which point we know we can generate a start condition and
have it seen by all devices. We then generate a stop condition to
leave the bus in a known state prior to use.
Finally, add some extra debugging messages to i2c_bit.c.
Use individual page mappings rather than a single whole-region
mapping, to avoid the waste of memory that occurs due to the
constraint that each mapped block must be aligned on its own size.
We were accidentally allocating only half the required amount of
memory (given the alignment method) for the firmware buffer, leading
to conflicts between the firmware buffer and gPXE code/data segments.
We were accidentally allocating only half the required amount of
memory (given the alignment method) for the firmware buffer, leading
to conflicts between the firmware buffer and gPXE code/data segments.
It is possible for the BIOS to use the UNDI API to bring up the NIC
prior to system boot. If this happens, UNM_NIC_REG_CMDPEG_STATE will
contain the value 0xf00f (UNM_NIC_REG_CMDPEG_STATE_INITIALIZE_ACK),
and we should skip initialising the command PEG.
The firmware will now determine the right port mode on all cards, so
the PXE driver doesn't have to set it. (Setting the port mode
apparently breaks some newer cards.)
Commit f58cc3f introduced a temporary workaround for a bug in current
prototype silicon, but failed to apply it to all eight PCI functions
within the device.
Determine the network-layer packet type and fill it in for UNDI
clients. This is required by some NBPs such as emBoot's winBoot/i.
This change requires refactoring the link-layer portions of the
gPXE netdevice API, so that it becomes possible to strip the
link-layer header without passing the packet up the network stack.
This patch adds support for the virtio-net adapter provided by KVM.
Written by Laurent Vivier <Laurent.Vivier@bull.net> for Etherboot.
Wrapped as legacy driver for gPXE by Stefan Hajnoczi
<stefanha@gmail.com>.
In tg3_chip_reset(), the PCI_EXPRESS change is taken from the Linux
tg3 driver. I am not sure what exactly it does (it is not documented
in the Linux driver), but it is necessary for the NIC to work
correctly.
Conjecture: The hardware issues 64-bit DMA writes of status descriptors,
which some PCI bridges seem to split into two 32-bit writes in reverse
order (i.e. dword 1 first). This means that we sometimes observe a
partial status descriptor. Add an explicit check to ensure that the
descriptor is complete before processing it.
Also ensure that the RDS consumer counter is incremented only when we
know that we have actually consumed an RX descriptor.
ns8390.c can produce four different drivers (one PCI, three ISA.) The
ISA driver requires setting a few macros; do that by setting defines
in stub files instead of using src/Config.
Currently, all the ISA drivers are broken (they were not enabled by
default), so #if 0 them out.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
From: Daniel Mealha Cabrita <dancab@utfpr.edu.br>
I've added tg3-5721 support for gPXE, the patch (against gpxe-0.9.3) is
attached to this message.
This chipset is present in HP ML150 G2 servers (possibly other HP machines
as well).
Drivers are not allowed to call printf(). Converted eprintf() to DBG(),
and removed spurious startup banner.
Fixed hardcoded inclusion of little_bswap.h
Use EIO rather than 1 as an error number.
Add ability for network devices to flag link up/down state to the
networking core.
Autobooting code will now wait for link-up before attempting DHCP.
IPoIB reflects the Infiniband link state as the network device link state
(which is not strictly correct; we also need a succesful IPoIB IPv4
broadcast group join), but is probably more informative.
Infiniband devices no longer block waiting for link-up in
register_ibdev().
Hermon driver needs to create an event queue and poll for link-up events.
Infiniband core needs to reread MAD parameters when link state changes.
IPoIB needs to cope with Infiniband link parameters being only partially
available at probe and open time.
Arbel and Hermon cards both have multiple ports. Add the
infrastructure required to register each port as a separate IB
device. Don't yet register more than one port, since registration
will currently fail unless a valid link is detected.
Use ib_*_{set,get}_{drv,owner}data wrappers to access driver- and
owner-private data on Infiniband structures.
Pull out common code for handling management datagrams from arbel.c
and hermon.c into infiniband.c.
Add port number to struct ib_device.
Add open(), close() and mad() methods to struct ib_device_operations.
From: Geert Stappers <stappers@stappers.nl>
To: etherboot-developers@lists.sourceforge.net
Subject: [Etherboot-developers] 3c90x polling again [patch]
Date: Thu, 29 Nov 2007 09:22:36 +0100
User-Agent: Mutt/1.5.16 (2007-06-11)
Hello,
gPXE didn't work on 3COM 905C Tornado cards for me.
It did transmit the DHCP request, but it didn't see the DHCP offer.
Adding debug print statements allready solved the problem.
Attached is a patch that has a cleaner delay then print statements.
The core of it is
- for(i=0;i<40000;i++);
+ mdelay(1);
There was no research if the change is about a longer delay
or about code NOT being optimized away. It works for me :-)
Cheers
Geert Stappers
driver's probe() routine fills in in nic->irqno. This is so that
non-interrupt-capable legacy drivers which set nic->irqno=0 will end
up reporting IRQ#0 via PXENV_UNDI_GET_INFORMATION; this in turn means
that the calling PXE NBP will (should) hook the timer interrupt, and
everything will sort of work.
The e1000_irq() routine should (per mcb30) do enable on non-zero,
disable on zero. This is not consistent in all drivers, so I'll
wait to update it when doing a global sweep.
This needs to be done manually because if the irq() routine is
implemented then we want something like "nic->irqno = pci->irqno;",
else we do "nic->irqno = 0;" nic->ioaddr may also need to be set
carefully.
Also added local variables to end of many files, for emacs indentation
to match kernel style (tab does 8 space indent).
There may still be an issue with memory handling, since it seems to
die ungracefully when ARP packets come in after loading a kernel.
Something to debug.