opengnsys_ipxe

Commit Graph

Author	SHA1	Message	Date
Michael Brown	7737fec5c6	[efi] Define an attachment priority order for EFI drivers Define an ordering for internal EFI drivers on the basis of how close the driver is to the hardware, and attempt to start drivers in this order. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2025-03-29 18:44:34 +00:00
Michael Brown	2399c79980	[fdt] Allow for the existence of multiple device trees When running on a platform that uses FDT as its hardware description mechanism, we are likely to have multiple device tree structures. At a minimum, there will be the device tree passed to us from the previous boot stage (e.g. OpenSBI), and the device tree that we construct to be passed to the booted operating system. Update the internal FDT API to include an FDT pointer in all function parameter lists. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2025-03-28 14:14:32 +00:00
Michael Brown	32a9408217	[efi] Allow use of typed pointers for efi_open() et al Provide wrapper macros to allow efi_open() and related functions to accept a pointer to any pointer type as the "interface" argument, in order to allow a substantial amount of type adjustment boilerplate to be removed. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2025-03-24 15:43:56 +00:00
Michael Brown	bac3187439	[efi] Use efi_open() for all ephemeral protocol opens Signed-off-by: Michael Brown <mcb30@ipxe.org>	2025-03-24 13:19:26 +00:00
Michael Brown	5a5e2a1dae	[efi] Use efi_open_unsafe() for all explicitly unsafe protocol opens Signed-off-by: Michael Brown <mcb30@ipxe.org>	2025-03-24 13:19:26 +00:00
Michael Brown	9dd30f11f7	[efi] Use efi_open_by_driver() for all by-driver protocol opens Signed-off-by: Michael Brown <mcb30@ipxe.org>	2025-03-24 13:19:26 +00:00
Joseph Wong	bd90abf487	[bnxt] Allocate TX rings with firmware input Use queue_id value retrieved from firmware unconditionally when allocating TX rings. Signed-off by: Joseph Wong <joseph.wong@broadcom.com>	2025-02-07 09:26:15 +00:00
Michael Brown	24db39fb29	[gve] Run startup process only while device is open The startup process is scheduled to run when the device is opened and terminated (if still running) when the device is closed. It assumes that the resource allocation performed in gve_open() has taken place, and that the admin and transmit/receive data structure pointers are therefore valid. The process initialisation in gve_probe() erroneously calls process_init() rather than process_init_stopped() and will therefore schedule the startup process immediately, before the relevant resources have been allocated. This bug is masked in the typical use case of a Google Cloud instance with a single NIC built with the config/cloud/gce.ipxe embedded script, since the embedded script will immediately open the NIC (and therefore allocate the required resources) before the scheduled process is allowed to run for the first time. In a multi-NIC instance, undefined behaviour will arise as soon as the startup process for the second NIC is allowed to run. Fix by using process_init_stopped() to avoid implicitly scheduling the startup process during gve_probe(). Originally-fixed-by: Kal Cutter Conley <kalcutterc@nvidia.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-12-03 13:57:06 +00:00
Michael Brown	c69f9589cc	[usb] Expose USB device descriptor and strings via settings Allow scripts to read basic information from USB device descriptors via the settings mechanism. For example: echo USB vendor ID: ${usb/${busloc}.8.2} echo USB device ID: ${usb/${busloc}.10.2} echo USB manufacturer name: ${usb/${busloc}.14.0} The general syntax is usb/<bus:dev>.<offset>.<length> where bus:dev is the USB bus:device address (as obtained via the "usbscan" command, or from e.g. ${net0/busloc} for a USB network device), and <offset> and <length> select the required portion of the USB device descriptor. Following the usage of SMBIOS settings tags, a <length> of zero may be used to indicate that the byte at <offset> contains a USB string descriptor index, and an <offset> of zero may be used to indicate that the <length> contains a literal USB string descriptor index. Since the byte at offset zero can never contain a string index, and a literal string index can never be zero, the combination of both <length> and <offset> being zero may be used to indicate that the entire device descriptor is to be read as a raw hex dump. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-10-18 13:13:51 +01:00
Michael Brown	59d123658b	[gve] Allocate all possible event counters The admin queue API requires us to tell the device how many event counters we have provided via the "configure device resources" admin queue command. There is, of course, absolutely no documentation indicating how many event counters actually need to be provided. We require only two event counters: one for the transmit queue, one for the receive queue. (The receive queue doesn't seem to actually make any use of its event counter, but the "create receive queue" admin queue command will fail if it doesn't have an available event counter to choose.) In the absence of any documentation, we currently make the assumption that allocating and configuring 16 counters (i.e. one whole cacheline) will be sufficient to allow for the use of two counters. This assumption turns out to be incorrect. On larger instance types (observed with a c3d-standard-16 instance in europe-west4-a), we find that creating the transmit or receive queues will each fail with a probability of around 50% with the "failed precondition" error code. Experimentation suggests that even though the device has accepted our "configure device resources" command indicating that we are providing only 16 event counters, it will attempt to choose any of its potential 32 event counters (and will then fail since the event counter that it unilaterally chose is outside of the agreed range). Work around this firmware bug by always allocating the maximum number of event counters supported by the device. (This requires deferring the allocation of the event counters until after issuing the "describe device" command.) Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-09-17 13:37:20 +01:00
Michael Brown	f88761ef49	[ena] Change reported operating system type to "iPXE" As described in commit `3b81a4e` ("[ena] Provide a host information page"), we currently report an operating system type of "Linux" in order to work around broken versions of the ENA firmware that will fail to create a completion queue if we report the correct operating system type. As of September 2024, the ENA team at AWS assures us that the entire AWS fleet has been upgraded to fix this bug, and that we are now safe to report the correct operating system type value in the "type" field of struct ena_host_info. The ENA team has also clarified that at least some deployed versions of the ENA firmware still have the defect that requires us to report an operating system version number of 2 (regardless of operating system type), and so we continue to report ENA_HOST_INFO_VERSION_WTF in the "version" field of struct ena_host_info. Add an explicit warning on the previous known failure path, in case some deployed versions of the ENA firmware turn out to not have been upgraded as expected. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-09-05 14:13:16 +01:00
Animesh Bhatt	c7f2e75519	[aqc1xx] Add support for Marvell AQtion Ethernet controller This patch adds support for the AQtion Ethernet controller, enabling iPXE to recognize and utilize the specific models (AQC114, AQC113, and AQC107). Tested-by: Animesh Bhatt <animeshb@marvell.com> Signed-off-by: Animesh Bhatt <animeshb@marvell.com>	2024-09-02 13:45:54 +01:00
Michael Brown	7f75d320f6	[etherfabric] Fix use of uninitialised variable in falcon_xaui_link_ok() The link status check in falcon_xaui_link_ok() reads from the FCN_XX_CORE_STAT_REG_MAC register only on production hardware (where the FPGA version reads as zero), but modifies the value and writes back to this register unconditionally. This triggers an uninitialised variable warning on newer versions of gcc. Fix by assuming that the register exists only on production hardware, and so moving the "modify-write" portion of the "read-modify-write" operation to also be covered by the same conditional check. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-09-02 12:24:57 +01:00
Michael Brown	46937a9df6	[crypto] Remove the concept of a public-key algorithm reusable context Instances of cipher and digest algorithms tend to get called repeatedly to process substantial amounts of data. This is not true for public-key algorithms, which tend to get called only once or twice for a given key. Simplify the public-key algorithm API so that there is no reusable algorithm context. In particular, this allows callers to omit the error handling currently required to handle memory allocation (or key parsing) errors from pubkey_init(), and to omit the cleanup calls to pubkey_final(). This change does remove the ability for a caller to distinguish between a verification failure due to a memory allocation failure and a verification failure due to a bad signature. This difference is not material in practice: in both cases, for whatever reason, the caller was unable to verify the signature and so cannot proceed further, and the cause of the error will be visible to the user via the return status code. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-08-21 21:00:57 +01:00
Michael Brown	be2784649d	[gve] Add missing error codes in EUNIQ() list of potential errors Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-08-20 22:44:15 +01:00
Michael Brown	53f089b723	[crypto] Pass asymmetric keys as ASN.1 cursors Asymmetric keys are invariably encountered within ASN.1 structures such as X.509 certificates, and the various large integers within an RSA key are themselves encoded using ASN.1. Simplify all code handling asymmetric keys by passing keys as a single ASN.1 cursor, rather than separate data and length pointers. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-08-18 15:44:38 +01:00
Michael Brown	d2d194bc60	[gve] Increase number of receive buffers to reduce packet loss Experiments suggest that using fewer than 64 receive buffers leads to excessive packet drop rates on some instance types (observed with a c3-standard-4 instance in europe-west4-a). Fix by increasing the number of receive data buffers (and adjusting the length of the registrable queue page address list to match). Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-07-25 00:13:33 +01:00
Michael Brown	c7b76e3adc	[gve] Add driver for Google Virtual Ethernet NIC The Google Virtual Ethernet NIC (GVE or gVNIC) is found only in Google Cloud instances. There is essentially zero documentation available beyond the mostly uncommented source code in the Linux kernel. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-07-24 14:45:46 +01:00
Michael Brown	b940d54235	[cachedhcp] Allow cached DHCPACK to apply to temporary network devices Retain a reference to the cached DHCPACK until the late startup phase, and allow it to be recycled for reuse. This allows the cached DHCPACK to be used for a temporary MNP network device and then subsequently reused for the corresponding real network device. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-04-02 22:59:50 +01:00
Michael Brown	b66f6025fa	[efi] Add the ability to create a temporary MNP network device An MNP network device may be temporarily and non-destructively installed on top of an existing UEFI network stack without having to disconnect existing drivers. Add the ability to create such a temporary network device. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-03-29 14:46:13 +00:00
Michael Brown	dcad73ca5a	[efi] Add support for driving EFI_MANAGED_NETWORK_PROTOCOL devices We want exclusive access to the network device, both for performance reasons and because we perform operations such as EAPoL that affect the entire link. We currently drive the network card via either a native hardware driver or via the SNP or NII/UNDI interfaces, both of which grant us this exclusive access. Add an alternative driver that drives the network card non-exclusively via the EFI_MANAGED_NETWORK_PROTOCOL interface. This can function as a fallback for situations where neither SNP nor NII/UNDI interfaces are functional, and also opens up the possibility of non-destructively installing a temporary network device over which to download the autoexec.ipxe script. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-03-25 17:58:33 +00:00
Michael Brown	a15ce00182	[efi] Match chainloaded device by uppermost matching handle Commit `4c5b794` ("[efi] Use the SNP protocol instance to match the SNP chainloading device") switched the chainloaded device matching logic to use a target protocol instance rather than the loaded image's device handle, on the basis that we want to bind to the parent SNP device rather than to a duplicate SNP protocol instance installed onto an IPv4 or IPv6 child device handle. It is possible that our calls to DisconnectController() and ConnectController() will cause the target protocol instance to be uninstalled and reinstalled, which may change the value of the protocol instance pointer. Allow for this by identifying and matching against the uppermost handle that initially has this target protocol instance installed. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-03-25 17:58:33 +00:00
Michael Brown	926816c58f	[efi] Pad transmit buffer length to work around vendor driver bugs The Mellanox/Nvidia UEFI driver is built from the same codebase as the iPXE driver, and appears to contain the bug that was fixed in commit `c11734e` ("[golan] Use ETH_HLEN for inline header size"). This results in identical failures when using the SNP or NII interface (via e.g. snponly.efi) to drive a Mellanox card while EAPoL is enabled. Work around the underlying UEFI driver bug by padding transmit I/O buffers to the minimum Ethernet frame length before passing them to the underlying driver's transmit function. This padding is not technically necessary, since almost all modern hardware will insert transmit padding as necessary (and where the hardware does not support doing so, the underlying UEFI driver is responsible for adding any necessary padding). However, it is guaranteed to be harmless (other than a miniscule performance impact): the Ethernet specification requires zero padding up to the minimum frame length for packets that are transmitted onto the wire, and so the receiver will see the same packet whether or not we manually insert this padding in software. The additional padding causes the underlying Mellanox driver to avoid its faulty code path, since it will never be asked to transmit a very short packet. Tested-by: Eric Hagberg <ehagberg@janestreet.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-03-18 22:52:05 +00:00
Michael Brown	bac967d51a	[snp] Allocate additional padding for receive buffers Some SNP implementations (observed with a wifi adapter in a Dell Latitude 3440 laptop) seem to require additional space in the allocated receive buffers, otherwise full-length packets will be silently dropped. The EDK2 MnpDxe driver happens to allocate an additional 8 bytes of padding (4 for a VLAN tag, 4 for the Ethernet frame checksum). Match this behaviour since drivers are very likely to have been tested against MnpDxe. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-03-16 23:28:34 +00:00
Geert Stappers	e5f3ba0ca7	[drivers] Sort PCI_ROM() entries numerically Done with the help of this Perl script: $MARKER = 'PCI_ROM'; # a regex $AB = 1; # At Begin @HEAD = (); @ITEMS = (); @TAIL = (); foreach $fn (@ARGV) { open(IN, $fn) or die "Can't open file '$fn': $!\n"; while (<IN>) { if (/$MARKER/) { push @ITEMS, $_; $AB = 0; # not anymore at begin } else { if ($AB) { push @HEAD, $_; } else { push @TAIL, $_; } } } } continue { close IN; open(OUT, ">$fn") or die "Can't open file '$fn' for output: $!\n"; print OUT @HEAD; print OUT sort @ITEMS; print OUT @TAIL; close OUT; # For a next file $AB = 1; @HEAD = (); @ITEMS = (); @TAIL = (); } Executed that script while src/drivers/ as current working directory, provided '$(grep -rl PCI_ROM)' as argument. Signed-off-by: Geert Stappers <stappers@stappers.it>	2024-02-22 14:19:04 +00:00
Joseph Wong	a846c4ccfc	[bnxt] Add support for BCM957608 Add support for BCM957608 device. Add support for additional link speeds supported by BCM957608. Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>	2024-02-08 15:10:12 +00:00
Joseph Wong	de8a0821c7	[bnxt] Add support for additional chip IDs Add additional chip IDs that can be recognized as part of the thor family. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2024-01-19 22:08:48 +00:00
Christian Helmuth	119c415ee4	[intel] Add PCI ID for I219-LM (23) Successfully tested on FUJITSU LIFEBOOK U7413. Signed-off-by: Christian Helmuth <christian.helmuth@genode-labs.com>	2023-12-21 13:53:24 +01:00
Michael Brown	115707c0ed	[iphone] Add missing va_start()/va_end() around reused argument list The ipair_tx() function uses a va_list twice (first to calculate the formatted string length before allocation, then to construct the string in the allocated buffer) but is missing the va_start() and va_end() around the second usage. This is undefined behaviour that happens to work on some build platforms. Fix by adding the missing va_start() and va_end() around the second usage of the variadic argument list. Reported-by: Andreas Hammarskjöld <andreas@2PintSoftware.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-10-24 11:43:56 +01:00
Michael Brown	ae4e85bde9	[netdevice] Allocate private data for each network upper-layer driver Allow network upper-layer drivers (such as LLDP, which attaches to each network device in order to provide a corresponding LLDP settings block) to specify a size for private data, which will be allocated as part of the network device structure (as with the existing private data allocated for the underlying device driver). This will allow network upper-layer drivers to be simplified by omitting memory allocation and freeing code. If the upper-layer driver requires a reference counter (e.g. for interface initialisation), then it may use the network device's existing reference counter, since this is now the reference counter for the containing block of memory. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-09-13 20:23:46 +01:00
Michael Brown	eeb7cd56e5	[netdevice] Remove netdev_priv() helper function Some network device drivers use the trivial netdev_priv() helper function while others use the netdev->priv pointer directly. Standardise on direct use of netdev->priv, in order to free up the function name netdev_priv() for reuse. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-09-13 16:29:48 +01:00
Michael Brown	2689a6e776	[efi] Always poll for TX completions Polling for TX completions is arguably redundant when there are no transmissions currently in progress. Commit `c6c7e78` ("[efi] Poll for TX completions only when there is an outstanding TX buffer") switched to setting the PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS flag only when there is an in-progress transmission awaiting completion, in order to reduce reported TX errors and debug message noise from buggy NII implementations that report spurious TX completions whenever the transmit queue is empty. Some other NII implementations (observed with the Realtek driver in a Dell Latitude 3440) seem to have a bug in the transmit datapath handling which results in the transmit ring freezing after sending a few hundred packets under heavy load. The symptoms are that the TPPoll register's NPQ bit remains set and the 256-entry transmit ring contains a large number of uncompleted descriptors (with the OWN bit set), the first two of which have identical data buffer addresses. Though iPXE will submit at most one in-progress transmission via NII, the Dell/Realtek driver seems to make a page-aligned copy of each transmit data buffer and to report TX completions immediately without waiting for the packet to actually be transmitted. These synthetic TX completions continue even after the hardware transmit ring freezes. Setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll reduces the probability of this Dell/Realtek driver bug being triggered by a factor of around 500, which brings the failure rate down to the point that it can sensibly be managed by external logic such as the "--timeout" option for image downloads. Closing and reopening the interface (via "ifclose"/"ifopen") will clear the error condition and allow transmissions to resume. Revert to setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll, and silently ignore situations in which the hardware reports a completion when no transmission is in progress. This approximately matches the behaviour of the SnpDxe driver, which will also generally set PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-06-21 11:49:53 +01:00
Matt Parrella	bf25e23d07	[intel] Add workaround for I210 reset hardware bugs The Intel I210's packet buffer size registers reset only on power up, not when a reset signal is asserted. This can lead to the inability to pass traffic in the event that the DMA TX Maximum Packet Size (which does reset to its default value on reset) is bigger than the TX Packet Buffer Size. For example, an operating system may be using the time sensitive networking features of the I210 and the registers may be programmed correctly, but then a reset signal is asserted and iPXE on the next boot will be unable to use the I210. Mimic what Linux does and forcibly set the registers to their default values. Signed-off-by: Matt Parrella <parrella.matthew@gmail.com>	2023-03-14 14:44:32 +00:00
Forest Crossman	523788ccda	[intelx] Add PCI IDs for Intel 82599 10GBASE-T NIC Signed-off-by: Forest Crossman <cyrozap@gmail.com>	2023-03-05 18:22:18 -06:00
Michael Brown	b6304f2984	[realtek] Explicitly disable VLAN offload Some cards seem to have the receive VLAN tag stripping feature enabled by default, which causes received VLAN packets to be misinterpreted as being received by the trunk device. Fix by disabling VLAN tag stripping in the C+ Command Register. Debugged-by: Xinming Lai <yiyihu@gmail.com> Tested-by: Xinming Lai <yiyihu@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-02-01 19:09:30 +00:00
Michael Brown	68734b9a4d	[efi] Bind to only the topmost instance of the SNP or NII protocols UEFI has the mildly annoying habit of installing copies of the EFI_SIMPLE_NETWORK_PROTOCOL instance on the IPv4 and IPv6 child device handles. This can cause iPXE's SNP driver to attempt to bind to a copy of the EFI_SIMPLE_NETWORK_PROTOCOL that iPXE itself provided on a different handle. Fix by refusing to bind to an SNP (or NII) handle if there exists another instance of the same protocol further up the device path (on the basis that we always want to bind to the highest possible device). Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-01-23 19:27:13 +00:00
Michael Brown	2fef0c541e	[efi] Extend efi_locate_device() to allow searching up the device path Extend the functionality of efi_locate_device() to allow callers to find instances of the protocol that may exist further up the device path. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-01-23 19:27:13 +00:00
Alexander Graf	6b977d1250	[ena] Allocate an unused Asynchronous Event Notification Queue (AENQ) We currently don't allocate an Asynchronous Event Notification Queue (AENQ) because we don't actually care about any of the events that may come in. The ENA firmware found on Graviton instances requires the AENQ to exist, otherwise all admin queue commands will fail. Fix by allocating an AENQ and disabling all events (so that we do not need to include code to acknowledge any events that may arrive). Signed-off-by: Alexander Graf <graf@amazon.com>	2023-01-18 22:47:58 +00:00
Michael Brown	c4c03e5be8	[netdevice] Allow duplicate MAC addresses Many laptops now include the ability to specify a "system-specific MAC address" (also known as "pass-through MAC"), which is supposed to be used for both the onboard NIC and for any attached docking station or other USB NIC. This is intended to simplify interoperability with software or hardware that relies on a MAC address to recognise an individual machine: for example, a deployment server may associate the MAC address with a particular operating system image to be deployed. This therefore creates legitimate situations in which duplicate MAC addresses may exist within the same system. As described in commit `98d09a1` ("[netdevice] Avoid registering duplicate network devices"), the Xen netfront driver relies on the rejection of duplicate MAC addresses in order to inhibit registration of the emulated PCI devices that a Xen PV-HVM guest will create to shadow each of the paravirtual network devices. Move the code that rejects duplicate MAC addresses from the network device core to the Xen netfront driver, to allow for the existence of duplicate MAC addresses in non-Xen setups. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-01-15 00:42:52 +00:00
Michael Brown	ab19546386	[efi] Disable receive filters to work around buggy UNDI drivers Some UNDI drivers (such as the AMI UsbNetworkPkg currently in the process of being upstreamed into EDK2) have a bug that will prevent any packets from being received unless at least one attempt has been made to disable some receive filters. Work around these buggy drivers by attempting to disable receive filters before enabling them. Ignore any errors, since we genuinely do not care whether or not the disabling succeeds. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2023-01-11 00:18:18 +00:00
Christian I. Nilsson	563bff4722	[intel] Add PCI ID for I219-V and -LM 16,17 Signed-off-by: Christian I. Nilsson <nikize@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-11-15 13:05:28 +00:00
Michael Brown	081b3eefc4	[ena] Assign memory BAR if left empty by BIOS Some BIOSes in AWS EC2 (observed with a c6i.metal instance in eu-west-2) will fail to assign an MMIO address to the ENA device, which causes ioremap() to fail. Experiments show that the ENA device is the only device behind its bridge, even when multiple ENA devices are present, and that the BIOS does assign a memory window to the bridge. We may therefore choose to assign the device an MMIO address at the start of the bridge's memory window. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-09-19 17:49:25 +01:00
Michael Brown	a80124456e	[ena] Increase receive ring size to 128 entries Some versions of the ENA hardware (observed on a c6i.large instance in eu-west-2) seem to require a receive ring containing at least 128 entries: any smaller ring will never see receive completions or will stall after the first few completions. Increase the receive ring size to 128 entries (determined empirically) for compatibility with these hardware versions. Limit the receive ring fill level to 16 (as at present) to avoid consuming more memory than will typically be available in the internal heap. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-26 19:38:27 +01:00
Michael Brown	3b81a4e256	[ena] Provide a host information page Some versions of the ENA firmware (observed on a c6i.large instance in eu-west-2) seem to require a host information page, without which the CREATE_CQ command will fail with ENA_ADMIN_UNKNOWN_ERROR. These firmware versions also seem to require us to claim that we are a Linux kernel with a specific driver major version number. This appears to be a firmware bug, as revealed by Linux kernel commit 1a63443af ("net/amazon: Ensure that driver version is aligned to the linux kernel"): this commit changed the value of the driver version number field to be the Linux kernel version, and was hastily reverted in commit 92040c6da ("net: ena: fix broken interface between ENA driver and FW") which clarified that the version number field does actually have some undocumented significance to some versions of the firmware. Fix by providing a host information page via the SET_FEATURE command, incorporating the apparently necessary lies about our identity. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-26 19:38:27 +01:00
Michael Brown	9f81e97af5	[ena] Specify the unused completion queue MSI-X vector as 0xffffffff Some versions of the ENA firmware (observed on a c6i.large instance in eu-west-2) will complain if the completion queue's MSI-X vector field is left empty, even though the queue configuration specifies that interrupts are not used. Work around these firmware versions by passing in what appears to be the magic "no MSI-X vector" value in this field. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-26 19:38:27 +01:00
Michael Brown	6d2cead461	[ena] Allow for out-of-order completions The ENA data path design has separate submission and completion queues. Submission queues must be refilled in strict order (since there is only a single linear tail pointer used to communicate the existence of new entries to the hardware), and completion queue entries include a request identifier copied verbatim from the submission queue entry. Once the submission queue doorbell has been rung, software never again reads from the submission queue entry and nothing ever needs to write back to the submission queue entry since completions are reported via the separate completion queue. This design allows the hardware to complete submission queue entries out of order, provided that it internally caches at least as many entries as it leaves gaps. Record and identify I/O buffers by request identifier (using a circular ring buffer of unique request identifiers), and remove the assumption that submission queue entries will be completed in order. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-26 19:38:25 +01:00
Michael Brown	856ffe000e	[ena] Limit submission queue fill level to completion queue size The CREATE_CQ command is permitted to return a size smaller than requested, which could leave us in a situation where the completion queue could overflow. Avoid overflow by limiting the submission queue fill level to the actual size of the completion queue. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-26 19:37:54 +01:00
Michael Brown	c5af41a6f5	[intelxl] Explicitly request a single queue pair for virtual functions Current versions of the E810 PF driver fail to set the number of in-use queue pairs in response to the CONFIG_VSI_QUEUES message. When the number of in-use queue pairs is less than the number of available queue pairs, this results in some packets being directed to nonexistent receive queues and hence silently dropped. Work around this PF driver bug by explicitly configuring the number of available queue pairs via the REQUEST_QUEUES message. This message triggers a VF reset that, in turn, requires us to reopen the admin queue and issue an additional GET_RESOURCES message to restore the VF to a functional state. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-16 19:31:06 +01:00
Michael Brown	04879352c4	[intelxl] Allow for admin commands that trigger a VF reset The RESET_VF admin queue command does not complete via the usual mechanism, but instead requires us to poll registers to wait for the reset to take effect and then reopen the admin queue. Allow for the existence of other admin queue commands that also trigger a VF reset, by separating out the logic that waits for the reset to complete. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-16 19:29:01 +01:00
Michael Brown	491c075f7f	[intelxl] Negotiate virtual function API version 1.1 Negotiate API version 1.1 in order to allow access to virtual function opcodes that are disallowed by default on the E810. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2022-08-16 17:58:52 +01:00

1 2 3 4 5 ...

1032 Commits (7737fec5c63077aa8407f559dea0e3666d9abf70)