This is required to work around a bug in some firmware versions.
Signed-off-by: Ameer Mahagneh <ameerm@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some drivers are known to call the optional Map_Mem() callback without
first checking that the callback exists. Provide a usable basic
implementation of Map_Mem() along with the other callbacks that become
mandatory if Map_Mem() is provided.
Note that in theory the PCI I/O protocol is allowed to require
multiple calls to Map(), with each call handling only a subset of the
overall mapped range. However, the reference implementation in EDK2
assumes that a single Map() will always suffice, so we can probably
make the same simplifying assumption here.
Tested with the Intel E3522X2.EFI driver (which, incidentally, fails
to cleanly remove one of its mappings).
Originally-implemented-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the global variable shomron_nodnic_supported, since it may have
different values for different PCI devices.
Originally-fixed-by: Mohammed Taha <mohammedt@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reporting a completion via usb_complete() will pass control outside
the scope of xhci.c, and could potentially result in a further call to
xhci_event_poll() before returning from usb_complete(). Since we
currently update the event consumer counter only after calling
usb_complete(), this can result in duplicate completions and
consequent corruption of the submission TRB ring structures.
Fix by updating the event ring consumer counter before passing control
to usb_complete().
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The i219 appears to have a seriously broken reset mechanism. After
any transmit or receive activity, resetting the card will break both
the transmit and receive datapaths until the next PCI bus reset.
The Linux and BSD drivers include a convoluted workaround authored by
Intel which involves setting a bit in the undocumented FEXTNVM11
register, then transmitting a dummy 512-byte packet containing garbage
data, then reconfiguring the receive descriptor prefetch thresholds
and temporarily reenabling the receive datapath. The comments in the
Intel fix do not even remotely match what the code actually does, and
the code accidentally leaves the transmitter enabled after use.
Experimentation suggests that an equivalent fix is to simply set the
undocumented bit in FEXTNVM11 before enabling the transmit or receive
descriptor rings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Invalid protocol speed ID tables appear to be increasingly common in
the wild, to the point that it is infeasible to apply an explicit
XHCI_BAD_PSIV flag for each offending PCI device ID.
Fix by assuming an invalid PSI table as soon as any invalid value is
reported by the hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of gcc (observed with gcc 4.7.2) report a spurious
uninitialised variable warning in ena_get_device_attributes(). Work
around this warning by manually inlining the relevant code (which has
only a single call site).
Reported-by: xbgmsharp <xbgmsharp@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most drivers do not utilise an MII interface, since the link state is
typically available directly from a memory-mapped register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SnpDxe driver raises the task priority level to TPL_CALLBACK when
calling the UNDI entry point. This does not appear to be a documented
requirement, but we should probably match the behaviour of SnpDxe to
minimise surprises to third party code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The tap driver can retrieve a potentially unlimited number of packets
in a single poll. This can lead to heap exhaustion under heavy load.
Fix by imposing an artificial receive quota (as already used in other
drivers without natural receive limits).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification has an implicit and demonstrably incorrect
requirement (in the Mem_IO() calling convention) that any UNDI network
device has at most one memory BAR and one I/O BAR.
Some UEFI platforms have been observed to report the existence of
non-existent additional I/O BARs, causing iPXE to select the wrong
BAR. This problem does not affect the SnpDxe driver, since that
driver will always choose the lowest numbered existent BAR of each
type.
Adjust iPXE's behaviour to match that of SnpDxe, i.e. to always select
the lowest numbered BAR(s).
Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Debugged-by: Adklei <adklei@realtek.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx datapath is essentially identical to that of the SMSC75xx.
Expose the transmit, poll, and bulk IN endpoint operations to allow
for reuse by the LAN78xx driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx PHY interrupt source and mask registers do not match those
used by the SMSC75xx and SMSC95xx.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since we don't enable IOMMU at all, we can then simply enable the
IOMMU support by claiming the support of VIRITO_F_IOMMU_PLATFORM.
This fixes booting failure when iommu_platform is set from qemu cli.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The smsc75xx and smsc95xx drivers include a substantial amount of
identical functionality, varying only in the base address of register
sets. Abstract out this common functionality to allow code to be
shared between the drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use a zero language ID to retrieve strings such as the
ECM/NCM MAC address. This works on most hardware devices, but is
known to fail on some software emulated CDC-NCM devices.
Fix by using the first supported language ID, falling back to English
(0x0409) if any error occurs when fetching the list of supported
languages. This matches the behaviour of the Linux kernel.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Xen network backend (xen-netback) suffered from a regression
between upstream Linux kernels 3.18 and 4.2 inclusive, which would
cause packet reception to fail unless at least 18 receive buffers were
available. This bug was fixed in kernel commit 1d5d485 ("xen-netback:
require fewer guest Rx slots when not using GSO").
Work around this bug in affected versions of xen-netback by providing
the requisite 18 receive buffers.
Reported-by: Taylor Schneider <tschneider@live.com>
Tested-by: Taylor Schneider <tschneider@live.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 7cfdd76 ("[block] Describe all SAN devices via ACPI tables")
changed the definition of the iSCSI initiator IQN in the iBFT to
represent a common initiator IQN used for all iSCSI sessions, and
attempted to calculate this common initiator IQN by fetching the
common ${initiator-iqn} setting.
This fails when no explicit ${initiator-iqn} has been specified
(i.e. when an initiator IQN has instead been constructed from either
the hostname or system UUID), and results in an empty initiator IQN in
the iBFT.
Fix by using the initiator IQN of an arbitrary iSCSI session
present in the iBFT.
Debugged-by: Tal Aloni <tal.aloni.il@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On most Intel NICs, Auto-Speed Detection Enable (ASDE) can be used to
automatically detect the correct link speed by sampling the link using
the internal PHY. This feature is automatically inhibited when not
appropriate for the physical link (e.g. when using internal SerDes
mode on the 8254x).
On the i350 datasheet ASDE is a reserved bit, but the relevant
auto-speed detection hardware appears still to be present. However,
enabling ASDE on the i350 1000BASE-KX backplane NIC seems to cause an
immediate link failure. It is possible that the auto-speed detection
hardware is still present, is not connected to a physical link, and is
not inhibited from being applied in this mode.
Work around this problem by adding an INTEL_NO_ASDE flag bit
(analogous to INTEL_NO_PHY_RST), and applying this for the i350
backplane NIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In situations where iPXE fails to reach link-up as expected, it is
useful to know the original values of the CTRL and STATUS registers
prior to our reset attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Following changes were introduced:
- added GetBgxProp and GetLmacProp methods to ThunderxConfigProtocol
- replaced direct BOARD_CFG access with usage of introduced methods
- removed redundant BOARD_CFG
- changed GUID of ThunderxConfigProtocol, as this is not compatible
with previous version
- changed UINTN* to UINT64* buffer type to fix issue on 32-bit
platforms with MAC address
This change allows us to avoid alignment of BOARD_CFG definitions
every time it changes in UEFI.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TEST UNIT READY command is issued automatically when the device is
opened, and is not the result of a command being issued by the caller.
This is required in order that a permanent TEST UNIT READY failure can
be used to identify unusable paths in a multipath SAN device.
Since the TEST UNIT READY command is not part of the caller's command
issuing process, it is not covered by any external retry loops (such
as the main retry loop in sandev_command()).
We must therefore be prepared to retry the TEST UNIT READY command
within the SCSI layer itself. We retry only the TEST UNIT READY
command so as not to multiply the number of potential retries for
normal commands (which are already retried by sandev_command()).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the TEST UNIT READY command receives an error response, the
shutdown of the command's block data interface will result in
scsidev_ready() closing the SCSI device. This will subsequently
result in a duplicate call to scsicmd_close(), leading to an assertion
failure when list_del() is called for the second time.
Fix by removing the command from the list of outstanding commands
before shutting down the command's interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The value of ( ( x & 0x0c00 ) | 0x0c00 ) is always 0x0c00 regardless
of the value of x, and so the read_csr() is redundant. (There are no
read side effects for this register, according to the datasheet.)
This line of code originated in Linux kernel 2.3.19pre1 as
a->write_csr(ioaddr, 80, a->read_csr(ioaddr, 80) | 0x0c00);
and was modified in kernel 2.3.41pre4 to read
a->write_csr(ioaddr, 80, (a->read_csr(ioaddr, 80) & 0x0C00) | 0x0c00);
In the absence of commit messages, the intention of the code is
unclear. However, the logic resulting in a fixed value of 0x0c00 has
remained unaltered for over 17 years, and can probably be assumed to
have the correct overall result.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Report errors in eoib_duplicate() via netdev_tx_err() rather than
netdev_tx_complete_err(), since netdev_tx_complete_err() accepts only
valid I/O buffers that are currently in the network device's transmit
queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently request cable detection in PXE_OPCODE_INITIALIZE to work
around buggy Emulex drivers (see commit c0b61ba ("[efi] Work around
bugs in Emulex NII driver")).
This causes problems with some other NII drivers (e.g. Mellanox),
which may time out if the underlying link is intrinsically slow to
come up.
Attempt to work around both problems simultaneously by requesting
cable detection only if the underlying NII driver does not support
link status reporting via PXE_OPCODE_GET_STATUS. (This is based on a
potentially incorrect assumption that the buggy Emulex drivers do not
claim to report link status via PXE_OPCODE_GET_STATUS.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several files define the ARRAY_SIZE() macro as used in Linux. Provide
a common definition for this in include/compiler.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some VF data is not cleared with reset, so make sure to return all the
settings to default before configuring the VF.
This fixes an issue where network packets would fail to be received if
the VF was previously used by the linux ixgbevf driver.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a SCSI device is closed in error, the shutdown of the device's
block data interface will probably lead to any outstanding commands
being closed (by whichever object is currently connected to the block
data interface). However, commands remain in the list of outstanding
commands until the final reference is dropped. The result is that
scsidev_close() will make a second call to scsicmd_close() for each
command. This is harmless, but produces confusing debug messages.
Fix by treating the outstanding command list as holding an explicit
reference to each command, and removing the command from the list of
outstanding commands in scsicmd_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SCSI layer currently implements a retry loop in order to retry
commands that fail due to spurious "error" conditions such as "power
on occurred". Move this retry loop to the generic SAN device layer:
this allow for retries due to other transient error conditions such as
an iSCSI target having dropped the connection due to inactivity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to ThunderX Errata G-17560, NIC_PF_CFG[ENA] bit should not
be cleared at exit. This allows other drivers to access the NIC regs
correctly.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is required to reset BGX context state for the LMAC using
BGX_CMR_CONFIG register.
This solves problem with network connectivity in Linux booted from
iPXE.
Signed-off-by: Bartosz Szczepanek <bartosz.szczepanek@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Malte zu Klampen <malte@pclab.ifg.uni-kiel.de>
Originally-implemented-by: Richard Moore <rich@richud.com>
Tested-by: Esben Storgaard Nielsen <esn@solar.dk>
Signed-off-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the active timer (providing udelay() and currticks()) to be
selected at runtime based on probing during the INIT_EARLY stage of
initialisation.
TICKS_PER_SEC is now a fixed compile-time constant for all builds, and
is independent of the underlying clock tick rate. We choose the value
1024 to allow multiplications and divisions on seconds to be converted
to bit shifts.
TICKS_PER_MS is defined as 1, allowing multiplications and divisions
on milliseconds to be omitted entirely. The 2% inaccuracy in this
definition is negligible when using the standard BIOS timer (running
at around 18.2Hz).
TIMER_RDTSC now checks for a constant TSC before claiming to be a
usable timer. (This timer can be tested in KVM via the command-line
option "-cpu host,+invtsc".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some host implementations (notably Google Compute Platform) are known
to unconditionally write back VIRTIO_NET_HDR_F_DATA_VALID to
header->flags for received packets, regardless of the features
negotiated by the driver. This breaks the transmit datapath by
effectively setting an illegal flag for all subsequent transmitted
packets.
Work around this problem by using separate empty header buffers for
the receive and transmit queues.
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This code largely inspired by tap.c. Allows for testing iPXE on real
NICs from within Linux. For example:
make bin-x86_64-linux/af_packet.linux
valgrind ./bin-x86_64-linux/af_packet.linux --net af_packet,if=eth3
Tested as x86_64 and i386 binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 0.9 implementation was limited to the maximum virtqueue size of
MAX_QUEUE_NUM and the virtio-net driver would fail to initialize on hosts
exceeding this limit.
This commit lifts the restriction by allocating the queue memory based on
the actual queue size instead of using a fixed maximum. Note that virtio
1.0 still uses the MAX_QUEUE_NUM constant to cap the size (unfortunately
this functionality is not available in virtio 0.9).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit introduces virtnet_free_virtqueues called on all virtqueue
error and shutdown paths. vpm_find_vqs no longer cleans up after itself
and instead expects virtnet_free_virtqueues to be always called to undo
its effect.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
vpm_find_vqs incorrectly accepted the host provided queue size with no
regard to iPXE's internal limitations. Virtio 1.0 makes it possible for
the driver to override the queue size to reduce memory requirements and
iPXE is a great use case for this feature.
Also removing the extra vq->vring.num assignment which is already
handled in vring_init.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Updates:
- Nodnic: Support for arm cq doorbell via the UAR BAR
- Ensure hardware is quiescent when no interface is open - WinPE WA
- Support for clear interrupt via BAR
- Nodnic: Support for send TX doorbells via the UAR BAR
- Added ConnectX-5EX device
- Added ConnectX-5 device
Signed-off-by: Raed Salem <raeds@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit db34436 ("[intel] Strip spurious VLAN tags received by virtual
function NICs") accidentally introduced two copies of the
intel[x]vf_mbox_queues() function. Remove the unintended copy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function may be configured to transparently insert a VLAN
tag into all transmitted packets. Unfortunately, it does not
equivalently strip this same VLAN tag from all received packets. This
behaviour may be observed in some Amazon EC2 instances with Enhanced
Networking enabled: transmissions work as expected but all packets
received by iPXE appear to have a spurious VLAN tag.
We can configure the receive queue to strip VLAN tags via the
RXDCTL.VME bit. We need to find out from the PF driver whether or not
we should do so.
There exists a "get queue configuration" mailbox message which
contains a field labelled IXGBE_VF_TRANS_VLAN in the Linux driver.
A comment in the Linux PF driver describes this field as "notify VF of
need for VLAN tag stripping, and correct queue". It will be filled
with a non-zero value if the PF is enforcing the use of a single VLAN
tag. It will also be filled with a non-zero value if the PF is using
multiple traffic classes.
The Linux VF driver seems to treat this field as being simply the
number of traffic classes, and gives it no VLAN-related
interpretation. The Linux VF driver instead handles the VLAN tag
stripping by simply assuming that any unrecognised VLAN tag ought to
be silently dropped.
We choose to strip and ignore the VLAN tag if the IXGBE_VF_TRANS_VLAN
field has a non-zero value.
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Tested-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE debug logging doesn't support %u. This commit replaces it with
%d in virtio-pci debug format strings.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ARM64 has a weaker memory order model than x86. The missing memory
barrier caused phy initialization notification to be delayed beyond
the link-wait timeout (15 secs).
Signed-off-by: Leendert van Doorn <leendert@paramecium.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This backport is from linux kernel upstream commit 83d6f1f ("ath9k:
fix buffer overrun for ar9287").
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification requires the EFI_SIMPLE_NETWORK_PROTOCOL
GetStatus() method to set TxBuf to NULL if there are no transmit
buffers to recycle.
Some implementations (observed with Lan9118Dxe in EDK2) fill in TxBuf
only when there is a transmit buffer to recycle, which leads to large
numbers of "spurious TX completion" errors.
Work around this problem by initialising TxBuf to NULL before calling
the GetStatus() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 86f96a4 ("[tg3] Remove x86-specific inline assembly")
introduced a regression in _tg3_flag() in 64-bit builds, since any
flags in the upper 32 bits of a 64-bit unsigned long would be
discarded when truncating to a 32-bit int.
Debugged-by: Shane Thompson <shane.thompson@aeontech.com.au>
Tested-by: Shane Thompson <shane.thompson@aeontech.com.au>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit makes virtio-net support devices with VEN 0x1af4 and DEV
0x1041, which is how non-transitional (modern-only) virtio-net devices
are exposed on the PCI bus.
Transitional devices supporting both the old 0.9.5 and new 1.0 version
of the virtio spec are driven using the new protocol. Legacy devices
are driven using the old protocol, same as before this commit.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit adds support for driving virtio 1.0 PCI devices. In
addition to various helpers, a number of vpm_ functions are introduced
to be used instead of their legacy vp_ counterparts when accessing
virtio 1.0 (aka modern) devices.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 1.0 introduces new constants and data structures, common to all
devices as well as specific to virtio-net. This commit adds a subset
of these to be able to drive the virtio-net 1.0 network device.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI devices may support more capabilities of the same type (for
example PCI_CAP_ID_VNDR) and there was no way to discover all of them.
This commit adds a new API pci_find_next_capability which provides
this functionality. It would typically be used like so:
for (pos = pci_find_capability(pci, PCI_CAP_ID_VNDR);
pos > 0;
pos = pci_find_next_capability(pci, pos, PCI_CAP_ID_VNDR)) {
...
}
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no way for the hardware to give us an invalid length in the
LRH, since it must have parsed this length field in order to perform
header splitting. However, this is difficult to prove conclusively.
Add an unnecessary length check to explicitly reject any packets
larger than the posted receive I/O buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no way for the hardware to give us an invalid length in the
LRH, since it must have parsed this length field in order to perform
header splitting. However, this is difficult to prove conclusively.
Add an unnecessary length check to explicitly reject any packets
larger than the posted receive I/O buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc complain that "'__bswap_variable_32' is static
but used in inline function 'golan_check_rc_and_cmd_status' which is
not static".
Fix by making golan_check_rc_and_cmd_status() a static inline.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Infiniband specification (volume 1, section 11.4.1.2 "Post Receive
Request") notes that for UD QPs, the GRH will be placed in the first
40 bytes of the receive buffer if present. (If no GRH is present,
which is normal, then the first 40 bytes of the receive buffer will be
unused.)
Mellanox hardware performs this placement automatically: other headers
will be stripped (and their values returned via the CQE), but the
first 40 bytes of the data buffer will be consumed by the (probably
non-existent) GRH.
This does not fit neatly into iPXE's internal abstraction, which
expects the data buffer to represent just the data payload with the
addresses from the GRH (if present) passed as additional parameters to
ib_complete_recv().
The end result of this discrepancy is that attempts to receive
full-sized 2048-byte IPoIB packets on Mellanox hardware will fail.
Fix by allocating a separate ring buffer to hold the received GRHs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This driver is the original source of the current readq() and writeq()
implementations for 32-bit iPXE. Switch to using the now-centralised
definitions, to avoid including architecture-specific code in an
otherwise architecture-independent driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations utilise an EoIB-to-Ethernet gateway device
that does not perform a FullMember join to the multicast group for the
EoIB broadcast domain. This has various exciting side-effects, such
as requiring every EoIB node to send every broadcast packet twice.
As an added bonus, the gateway may also break the EoIB MAC address to
GID mapping protocol by sending Ethernet-sourced packets from the
wrong QPN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations require each individual EoIB node to create
the multicast group for the EoIB broadcast domain.
It is left as an exercise for the interested reader to determine how
such an implementation might ever allow the parameters of such a
multicast group to be changed without requiring a simultaneous upgrade
of every driver on every operating system on every machine currently
attached to the fabric.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations transmit a vendor-proprietary heartbeat
packet on the same multicast group used to provide the EoIB broadcast
domain.
Silently ignore these heartbeat packets, to avoid cluttering up the
network interface error statistics.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EoIB is a fairly simple protocol in which raw Ethernet frames
(excluding the CRC) are encapsulated within Infiniband Unreliable
Datagrams, with a four-byte fixed EoIB header (which conveys no actual
information). The Ethernet broadcast domain is provided by a
multicast group, similar to the IPoIB IPv4 multicast group.
The mapping from Ethernet MAC addresses to Infiniband address vectors
is achieved by snooping incoming traffic and building a peer cache
which can then be used to map a MAC address into a port GID. The
address vector is completed using a path record lookup, as for IPoIB.
Note that this requires every packet to include a GRH.
Add basic support for EoIB devices. This driver is substantially
derived from the IPoIB driver. There is currently no mechanism for
automatically creating EoIB devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit e62e52b ("[ipoib] Simplify test for received broadcast
packets") relies upon the multicast LID being present in the
destination address vector as passed to ipoib_complete_recv().
Unfortunately, this information is not present in many Infiniband
devices' completion queue entries.
Fix by testing instead for the presence of a multicast GID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>