On devices with no EEPROM or OTP, the MAC_CR register defaults to not
using automatic link speed detection, with the result that no packets
are successfully sent or received.
Fix by always enabling automatic speed and duplex detection, since
iPXE provides no mechanism for manual configuration of either link
speed or duplex.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function defaults to operating in "PXE mode" after a
power-on reset. In this mode, receive descriptors are fetched and
written back as single descriptors. In normal (non-PXE mode)
operation, receive descriptors are fetched and written back only as
complete cachelines unless an interrupt is raised.
There is no way to return to PXE mode from non-PXE mode, and there is
no way for the virtual function driver to operate in PXE mode.
Choose to operate in non-PXE mode. This requires us to trick the
hardware into believing that it is raising an interrupt, so that it
will not defer writing back receive descriptors until a complete
cacheline (i.e. four packets) have been consumed. We do so by
configuring the hardware to use MSI-X with a dummy target location in
place of the usual APIC register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The virtual function driver will use the same transmit and receive
descriptor ring structures, but will not itself construct and program
the ring context. Split out ring creation and destruction from the
programming of the ring context, to allow code to be shared between
physical and virtual function drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The virtual function transmit and receive ring tail register offsets
do not match those of the physical function. Allow the tail register
offsets to be specified separately.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function driver does not allow the virtual function to
request the use of 16-byte receive descriptors. Switch to using
32-byte receive descriptors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a weak stub function for handling the "send to VF" event used
for communications between the physical and virtual function drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "send to PF" and "send to VF" admin queue descriptors (ab)use the
cookie field to hold the extended opcode and return code values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A virtual function reset is triggered via an admin queue command and
will reset the admin queue configuration registers. Allow the admin
queues to be reinitialised after such a reset, without requiring the
overhead (and potential failure paths) of freeing and reallocating the
queues.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use a single data buffer shared between all admin queue
descriptors. This works for the physical function driver since we
have at most one command in progress and only a single event (which
does not use a data buffer).
The communication path between the physical and virtual function
drivers uses the event data buffer, and there is no way to prevent a
solicited event (i.e. a response to a request) from being overwritten
by an unsolicited event (e.g. a link status change).
Provide individual data buffers for each admin event queue descriptor
(and for each admin command queue descriptor, for the sake of
consistency).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The register map for the virtual functions appears to have been
constructed using a random number generator.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function driver does not allow the virtual function to
request that VLAN tags are left unstripped. Extract and use the VLAN
tag from the receive descriptor if present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The first adapters in this family are X2522-10, X2522-25, X2541 and
X2542.
These no longer use PCI BAR 0 for I/O, but use that for memory. In
other words, BAR 2 on SFN8xxx adapters now becomes BAR 0.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Devices that support jumbo frames will currently default to the
largest possible MTU. This assumption is valid for virtual adapters
such as virtio-net, where the MTU must have been configured by a
system administrator, but is unsafe in the general case of a physical
adapter.
Default to the standard Ethernet MTU, unless explicitly overridden
either by the driver or via the ${netX/mtu} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the function mii_find() in order to locate the PHY address.
Signed-off-by: Sylvie Barlow <sylvie.c.barlow@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently have no generic concept of a PHY address, since all
existing implementations simply hardcode the PHY address within the
MII access methods.
A bit-bashing MII interface will need to be provided with an explicit
PHY address in order to generate the correct waveform. Allow for this
by separating out the concept of a MII device (i.e. a specific PHY
address attached to a particular MII interface).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some drivers are known to call the optional Map_Mem() callback without
first checking that the callback exists. Provide a usable basic
implementation of Map_Mem() along with the other callbacks that become
mandatory if Map_Mem() is provided.
Note that in theory the PCI I/O protocol is allowed to require
multiple calls to Map(), with each call handling only a subset of the
overall mapped range. However, the reference implementation in EDK2
assumes that a single Map() will always suffice, so we can probably
make the same simplifying assumption here.
Tested with the Intel E3522X2.EFI driver (which, incidentally, fails
to cleanly remove one of its mappings).
Originally-implemented-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The i219 appears to have a seriously broken reset mechanism. After
any transmit or receive activity, resetting the card will break both
the transmit and receive datapaths until the next PCI bus reset.
The Linux and BSD drivers include a convoluted workaround authored by
Intel which involves setting a bit in the undocumented FEXTNVM11
register, then transmitting a dummy 512-byte packet containing garbage
data, then reconfiguring the receive descriptor prefetch thresholds
and temporarily reenabling the receive datapath. The comments in the
Intel fix do not even remotely match what the code actually does, and
the code accidentally leaves the transmitter enabled after use.
Experimentation suggests that an equivalent fix is to simply set the
undocumented bit in FEXTNVM11 before enabling the transmit or receive
descriptor rings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of gcc (observed with gcc 4.7.2) report a spurious
uninitialised variable warning in ena_get_device_attributes(). Work
around this warning by manually inlining the relevant code (which has
only a single call site).
Reported-by: xbgmsharp <xbgmsharp@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most drivers do not utilise an MII interface, since the link state is
typically available directly from a memory-mapped register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SnpDxe driver raises the task priority level to TPL_CALLBACK when
calling the UNDI entry point. This does not appear to be a documented
requirement, but we should probably match the behaviour of SnpDxe to
minimise surprises to third party code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification has an implicit and demonstrably incorrect
requirement (in the Mem_IO() calling convention) that any UNDI network
device has at most one memory BAR and one I/O BAR.
Some UEFI platforms have been observed to report the existence of
non-existent additional I/O BARs, causing iPXE to select the wrong
BAR. This problem does not affect the SnpDxe driver, since that
driver will always choose the lowest numbered existent BAR of each
type.
Adjust iPXE's behaviour to match that of SnpDxe, i.e. to always select
the lowest numbered BAR(s).
Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Debugged-by: Adklei <adklei@realtek.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx datapath is essentially identical to that of the SMSC75xx.
Expose the transmit, poll, and bulk IN endpoint operations to allow
for reuse by the LAN78xx driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx PHY interrupt source and mask registers do not match those
used by the SMSC75xx and SMSC95xx.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since we don't enable IOMMU at all, we can then simply enable the
IOMMU support by claiming the support of VIRITO_F_IOMMU_PLATFORM.
This fixes booting failure when iommu_platform is set from qemu cli.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The smsc75xx and smsc95xx drivers include a substantial amount of
identical functionality, varying only in the base address of register
sets. Abstract out this common functionality to allow code to be
shared between the drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Xen network backend (xen-netback) suffered from a regression
between upstream Linux kernels 3.18 and 4.2 inclusive, which would
cause packet reception to fail unless at least 18 receive buffers were
available. This bug was fixed in kernel commit 1d5d485 ("xen-netback:
require fewer guest Rx slots when not using GSO").
Work around this bug in affected versions of xen-netback by providing
the requisite 18 receive buffers.
Reported-by: Taylor Schneider <tschneider@live.com>
Tested-by: Taylor Schneider <tschneider@live.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On most Intel NICs, Auto-Speed Detection Enable (ASDE) can be used to
automatically detect the correct link speed by sampling the link using
the internal PHY. This feature is automatically inhibited when not
appropriate for the physical link (e.g. when using internal SerDes
mode on the 8254x).
On the i350 datasheet ASDE is a reserved bit, but the relevant
auto-speed detection hardware appears still to be present. However,
enabling ASDE on the i350 1000BASE-KX backplane NIC seems to cause an
immediate link failure. It is possible that the auto-speed detection
hardware is still present, is not connected to a physical link, and is
not inhibited from being applied in this mode.
Work around this problem by adding an INTEL_NO_ASDE flag bit
(analogous to INTEL_NO_PHY_RST), and applying this for the i350
backplane NIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In situations where iPXE fails to reach link-up as expected, it is
useful to know the original values of the CTRL and STATUS registers
prior to our reset attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Following changes were introduced:
- added GetBgxProp and GetLmacProp methods to ThunderxConfigProtocol
- replaced direct BOARD_CFG access with usage of introduced methods
- removed redundant BOARD_CFG
- changed GUID of ThunderxConfigProtocol, as this is not compatible
with previous version
- changed UINTN* to UINT64* buffer type to fix issue on 32-bit
platforms with MAC address
This change allows us to avoid alignment of BOARD_CFG definitions
every time it changes in UEFI.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The value of ( ( x & 0x0c00 ) | 0x0c00 ) is always 0x0c00 regardless
of the value of x, and so the read_csr() is redundant. (There are no
read side effects for this register, according to the datasheet.)
This line of code originated in Linux kernel 2.3.19pre1 as
a->write_csr(ioaddr, 80, a->read_csr(ioaddr, 80) | 0x0c00);
and was modified in kernel 2.3.41pre4 to read
a->write_csr(ioaddr, 80, (a->read_csr(ioaddr, 80) & 0x0C00) | 0x0c00);
In the absence of commit messages, the intention of the code is
unclear. However, the logic resulting in a fixed value of 0x0c00 has
remained unaltered for over 17 years, and can probably be assumed to
have the correct overall result.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Report errors in eoib_duplicate() via netdev_tx_err() rather than
netdev_tx_complete_err(), since netdev_tx_complete_err() accepts only
valid I/O buffers that are currently in the network device's transmit
queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently request cable detection in PXE_OPCODE_INITIALIZE to work
around buggy Emulex drivers (see commit c0b61ba ("[efi] Work around
bugs in Emulex NII driver")).
This causes problems with some other NII drivers (e.g. Mellanox),
which may time out if the underlying link is intrinsically slow to
come up.
Attempt to work around both problems simultaneously by requesting
cable detection only if the underlying NII driver does not support
link status reporting via PXE_OPCODE_GET_STATUS. (This is based on a
potentially incorrect assumption that the buggy Emulex drivers do not
claim to report link status via PXE_OPCODE_GET_STATUS.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several files define the ARRAY_SIZE() macro as used in Linux. Provide
a common definition for this in include/compiler.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some VF data is not cleared with reset, so make sure to return all the
settings to default before configuring the VF.
This fixes an issue where network packets would fail to be received if
the VF was previously used by the linux ixgbevf driver.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to ThunderX Errata G-17560, NIC_PF_CFG[ENA] bit should not
be cleared at exit. This allows other drivers to access the NIC regs
correctly.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is required to reset BGX context state for the LMAC using
BGX_CMR_CONFIG register.
This solves problem with network connectivity in Linux booted from
iPXE.
Signed-off-by: Bartosz Szczepanek <bartosz.szczepanek@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Malte zu Klampen <malte@pclab.ifg.uni-kiel.de>
Originally-implemented-by: Richard Moore <rich@richud.com>
Tested-by: Esben Storgaard Nielsen <esn@solar.dk>
Signed-off-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the active timer (providing udelay() and currticks()) to be
selected at runtime based on probing during the INIT_EARLY stage of
initialisation.
TICKS_PER_SEC is now a fixed compile-time constant for all builds, and
is independent of the underlying clock tick rate. We choose the value
1024 to allow multiplications and divisions on seconds to be converted
to bit shifts.
TICKS_PER_MS is defined as 1, allowing multiplications and divisions
on milliseconds to be omitted entirely. The 2% inaccuracy in this
definition is negligible when using the standard BIOS timer (running
at around 18.2Hz).
TIMER_RDTSC now checks for a constant TSC before claiming to be a
usable timer. (This timer can be tested in KVM via the command-line
option "-cpu host,+invtsc".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some host implementations (notably Google Compute Platform) are known
to unconditionally write back VIRTIO_NET_HDR_F_DATA_VALID to
header->flags for received packets, regardless of the features
negotiated by the driver. This breaks the transmit datapath by
effectively setting an illegal flag for all subsequent transmitted
packets.
Work around this problem by using separate empty header buffers for
the receive and transmit queues.
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 0.9 implementation was limited to the maximum virtqueue size of
MAX_QUEUE_NUM and the virtio-net driver would fail to initialize on hosts
exceeding this limit.
This commit lifts the restriction by allocating the queue memory based on
the actual queue size instead of using a fixed maximum. Note that virtio
1.0 still uses the MAX_QUEUE_NUM constant to cap the size (unfortunately
this functionality is not available in virtio 0.9).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit introduces virtnet_free_virtqueues called on all virtqueue
error and shutdown paths. vpm_find_vqs no longer cleans up after itself
and instead expects virtnet_free_virtqueues to be always called to undo
its effect.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit db34436 ("[intel] Strip spurious VLAN tags received by virtual
function NICs") accidentally introduced two copies of the
intel[x]vf_mbox_queues() function. Remove the unintended copy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function may be configured to transparently insert a VLAN
tag into all transmitted packets. Unfortunately, it does not
equivalently strip this same VLAN tag from all received packets. This
behaviour may be observed in some Amazon EC2 instances with Enhanced
Networking enabled: transmissions work as expected but all packets
received by iPXE appear to have a spurious VLAN tag.
We can configure the receive queue to strip VLAN tags via the
RXDCTL.VME bit. We need to find out from the PF driver whether or not
we should do so.
There exists a "get queue configuration" mailbox message which
contains a field labelled IXGBE_VF_TRANS_VLAN in the Linux driver.
A comment in the Linux PF driver describes this field as "notify VF of
need for VLAN tag stripping, and correct queue". It will be filled
with a non-zero value if the PF is enforcing the use of a single VLAN
tag. It will also be filled with a non-zero value if the PF is using
multiple traffic classes.
The Linux VF driver seems to treat this field as being simply the
number of traffic classes, and gives it no VLAN-related
interpretation. The Linux VF driver instead handles the VLAN tag
stripping by simply assuming that any unrecognised VLAN tag ought to
be silently dropped.
We choose to strip and ignore the VLAN tag if the IXGBE_VF_TRANS_VLAN
field has a non-zero value.
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Tested-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ARM64 has a weaker memory order model than x86. The missing memory
barrier caused phy initialization notification to be delayed beyond
the link-wait timeout (15 secs).
Signed-off-by: Leendert van Doorn <leendert@paramecium.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This backport is from linux kernel upstream commit 83d6f1f ("ath9k:
fix buffer overrun for ar9287").
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification requires the EFI_SIMPLE_NETWORK_PROTOCOL
GetStatus() method to set TxBuf to NULL if there are no transmit
buffers to recycle.
Some implementations (observed with Lan9118Dxe in EDK2) fill in TxBuf
only when there is a transmit buffer to recycle, which leads to large
numbers of "spurious TX completion" errors.
Work around this problem by initialising TxBuf to NULL before calling
the GetStatus() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>