Commit Graph

6384 Commits (a80124456ee9ade7a63b69f167c3b1348c8d93c0)
 

Author SHA1 Message Date
Michael Brown a80124456e [ena] Increase receive ring size to 128 entries
Some versions of the ENA hardware (observed on a c6i.large instance in
eu-west-2) seem to require a receive ring containing at least 128
entries: any smaller ring will never see receive completions or will
stall after the first few completions.

Increase the receive ring size to 128 entries (determined empirically)
for compatibility with these hardware versions.  Limit the receive
ring fill level to 16 (as at present) to avoid consuming more memory
than will typically be available in the internal heap.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 3b81a4e256 [ena] Provide a host information page
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) seem to require a host information page, without which the
CREATE_CQ command will fail with ENA_ADMIN_UNKNOWN_ERROR.

These firmware versions also seem to require us to claim that we are a
Linux kernel with a specific driver major version number.  This
appears to be a firmware bug, as revealed by Linux kernel commit
1a63443af ("net/amazon: Ensure that driver version is aligned to the
linux kernel"): this commit changed the value of the driver version
number field to be the Linux kernel version, and was hastily reverted
in commit 92040c6da ("net: ena: fix broken interface between ENA
driver and FW") which clarified that the version number field does
actually have some undocumented significance to some versions of the
firmware.

Fix by providing a host information page via the SET_FEATURE command,
incorporating the apparently necessary lies about our identity.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 9f81e97af5 [ena] Specify the unused completion queue MSI-X vector as 0xffffffff
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) will complain if the completion queue's MSI-X vector field
is left empty, even though the queue configuration specifies that
interrupts are not used.

Work around these firmware versions by passing in what appears to be
the magic "no MSI-X vector" value in this field.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 6d2cead461 [ena] Allow for out-of-order completions
The ENA data path design has separate submission and completion
queues.  Submission queues must be refilled in strict order (since
there is only a single linear tail pointer used to communicate the
existence of new entries to the hardware), and completion queue
entries include a request identifier copied verbatim from the
submission queue entry.  Once the submission queue doorbell has been
rung, software never again reads from the submission queue entry and
nothing ever needs to write back to the submission queue entry since
completions are reported via the separate completion queue.

This design allows the hardware to complete submission queue entries
out of order, provided that it internally caches at least as many
entries as it leaves gaps.

Record and identify I/O buffers by request identifier (using a
circular ring buffer of unique request identifiers), and remove the
assumption that submission queue entries will be completed in order.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:25 +01:00
Michael Brown 856ffe000e [ena] Limit submission queue fill level to completion queue size
The CREATE_CQ command is permitted to return a size smaller than
requested, which could leave us in a situation where the completion
queue could overflow.

Avoid overflow by limiting the submission queue fill level to the
actual size of the completion queue.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:37:54 +01:00
Michael Brown c5af41a6f5 [intelxl] Explicitly request a single queue pair for virtual functions
Current versions of the E810 PF driver fail to set the number of
in-use queue pairs in response to the CONFIG_VSI_QUEUES message.  When
the number of in-use queue pairs is less than the number of available
queue pairs, this results in some packets being directed to
nonexistent receive queues and hence silently dropped.

Work around this PF driver bug by explicitly configuring the number of
available queue pairs via the REQUEST_QUEUES message.  This message
triggers a VF reset that, in turn, requires us to reopen the admin
queue and issue an additional GET_RESOURCES message to restore the VF
to a functional state.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 19:31:06 +01:00
Michael Brown 04879352c4 [intelxl] Allow for admin commands that trigger a VF reset
The RESET_VF admin queue command does not complete via the usual
mechanism, but instead requires us to poll registers to wait for the
reset to take effect and then reopen the admin queue.

Allow for the existence of other admin queue commands that also
trigger a VF reset, by separating out the logic that waits for the
reset to complete.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 19:29:01 +01:00
Michael Brown 491c075f7f [intelxl] Negotiate virtual function API version 1.1
Negotiate API version 1.1 in order to allow access to virtual function
opcodes that are disallowed by default on the E810.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 17:58:52 +01:00
Michael Brown b52ea20841 [intelxl] Show virtual function packet statistics for debugging
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 17:58:46 +01:00
Michael Brown cad1cc6b44 [intelxl] Add driver for Intel 100 Gigabit Ethernet NICs
Add a driver for the E810 family of 100 Gigabit Ethernet NICs.  The
core datapath is identical to that of the 40 Gigabit XL710, and this
part of the code is shared between both drivers.  The admin queue
mechanism is sufficiently similar to make it worth reusing substantial
portions of the code, with separate implementations for several
commands to handle the (unnecessarily) breaking changes in data
structure layouts.  The major differences are in the mechanisms for
programming queue contexts (where the E810 abandons TX/RX symmetry)
and for configuring the transmit scheduler and receive filters: these
portions are sufficiently different to justify a separate driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 16:15:17 +01:00
Michael Brown 6871a7de70 [intelxl] Use admin queue to set port MAC address and maximum frame size
Remove knowledge of the PRTGL_SA[HL] registers, and instead use the
admin queue to set the MAC address and maximum frame size.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:24:06 +01:00
Michael Brown 727b034f11 [intelxl] Use admin queue to get port MAC address
Remove knowledge of the PRTPM_SA[HL] registers, and instead use the
admin queue to retrieve the MAC address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00
Michael Brown 06467ee70f [intelxl] Defer fetching MAC address until after opening admin queue
Allow for the MAC address to be fetched using an admin queue command,
instead of reading the PRTPM_SA[HL] registers directly.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00
Michael Brown d6e36a2d73 [intelxl] Set maximum frame size to 9728 bytes as per datasheet
The PRTGL_SAH register contains the current maximum frame size, and is
not guaranteed on reset to contain the actual maximum frame size
supported by the hardware, which the datasheet specifies as 9728 bytes
(including the 4-byte CRC).

Set the maximum packet size to a hardcoded 9728 bytes instead of
reading from the PRTGL_SAH register.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00
Michael Brown 99242bbe2e [intelxl] Always issue "clear PXE mode" admin queue command
Remove knowledge of the GLLAN_RCTL_0 register (which changes location
between the XL810 and E810 register maps), and instead unconditionally
issue the "clear PXE mode" command with the EEXIST error silenced.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 15:28:03 +01:00
Michael Brown faf26bf8b8 [intelxl] Allow expected admin queue command errors to be silenced
The "clear PXE mode" admin queue command will return an EEXIST error
if the device is already in non-PXE mode, but there is no other admin
queue command that can be used to determine whether the device has
already been switched into non-PXE mode.

Provide a mechanism to allow expected errors from a command to be
silenced, to allow the "clear PXE mode" command to be cleanly used
without needing to first check the GLLAN_RCTL_0 register value.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 15:28:03 +01:00
Michael Brown f0ea19b238 [intelxl] Increase data buffer size to 4kB
At least one E810 admin queue command (Query Default Scheduling Tree
Topology) insists upon being provided with a 4kB data buffer, even
when the data to be returned is much smaller.

Work around this requirement by increasing the admin queue data buffer
size to 4kB.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 15:24:29 +01:00
Michael Brown fb69d14002 [intelxl] Separate virtual function driver definitions
Move knowledge of the virtual function data structures and admin
command definitions from intelxl.h to intelxlvf.h.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 14:53:57 +01:00
Michael Brown c220b93f31 [intelxl] Reuse admin command descriptor and buffer for VF responses
Remove the large static admin data buffer structure embedded within
struct intelxl_nic, and instead copy the response received via the
"send to VF" admin queue event to the (already consumed and completed)
admin command descriptor and data buffer.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 14:53:57 +01:00
Michael Brown 67f8878e10 [intelxl] Handle admin events via a callback
The physical and virtual function drivers each care about precisely
one admin queue event type.  Simplify event handling by using a
per-driver callback instead of the existing weak function symbol.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 14:53:54 +01:00
Michael Brown 9e46ffa924 [intelxl] Rename 8086:1889 PCI ID to "iavf"
The PCI device ID 8086:1889 is for the Intel Ethernet Adaptive Virtual
Function, which is a generic virtual function that can be exposed by
different generations of Intel hardware.

Rename the PCI ID from "xl710-vf-ad" to "iavf" to reflect that the
driver is not XL710-specific.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown ef70667557 [intelxl] Increase receive descriptor ring size to 64 entries
The E810 requires that receive descriptor rings have at least 64
entries (and are a multiple of 32 entries).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown 9f5b9e3abb [intelxl] Negotiate API version for virtual function via admin queue
Do not attempt to use the admin commands to get the firmware version
and report the driver version for the virtual function driver, since
these will be rejected by the E810 firmware as invalid commands when
issued by a virtual function.  Instead, use the mailbox interface to
negotiate the API version with the physical function driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown b4216fa506 [intelxl] Use non-zero MSI-X vector for virtual function interrupts
The 100 Gigabit physical function driver requires a virtual function
driver to request that transmit and receive queues are mapped to MSI-X
vector 1 or higher.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown 1b61c2118c [intelxl] Fix invocation of intelxlvf_admin_queues()
The second parameter to intelxlvf_admin_queues() is a boolean used to
select the VF opcode, rather than the raw VF opcode itself.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:45 +01:00
Michael Brown a202de385d [intelxl] Use function-level reset instead of PFGEN_CTRL.PFSWR
Remove knowledge of the PFGEN_CTRL register (which changes location
between XL710 and E810 register maps), and instead use PCIe FLR to
reset the physical function.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 16:43:36 +01:00
Michael Brown 0965cec53c [pci] Generalise function-level reset mechanism
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 16:39:40 +01:00
Michael Brown 9dfcdc04c8 [intelxl] Update list of PCI IDs
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown d8014b1801 [intelxl] Include admin command response data buffer in debug output
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 319caeaa7b [intelxl] Identify rings consistently in debug messages
Use the tail register offset (which exists for all ring types) as the
ring identifier in all relevant debug messages.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 814aef68c5 [intelxl] Add missing padding bytes to receive queue context
For the sake of completeness, ensure that all 32 bytes of the receive
queue context are programmed (including the unused final 8 bytes).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 725f0370fa [intelxl] Fix bit width of function number in PFFUNC_RID register
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 5d3fad5c10 [intelxl] Fix retrieval of switch configuration via admin queue
Commit 8f3e648 ("[intelxl] Use one admin queue buffer per admin queue
descriptor") changed the API for intelxl_admin_command() such that the
caller now constructs the command directly within the next available
descriptor ring entry, rather than relying on intelxl_admin_command()
to copy the descriptor to and from the descriptor ring.

This introduced a regression in intelxl_admin_switch(), since the
second and subsequent iterations of the loop will not have constructed
a valid command in the new descriptor ring entry before calling
intelxl_admin_command().

Fix by constructing the command within the loop.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown d3c8944d5c [acpi] Expose system MAC address via ${sysmac} setting
Expose the system MAC address (if any) via the ${sysmac} setting.
This allows scripts to access the system MAC address even when iPXE
has decided not to apply it to a network device (e.g. because the
cached DHCPACK MAC address was selected in order to match the
behaviour of a previous boot stage).

The setting is named ${sysmac} rather than ${acpimac} in order to
allow for forward compatibility with non-ACPI mechanisms that may
exist in future for specifying a system MAC address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-06-10 13:44:40 +01:00
Michael Brown d72c8fdc90 [cachedhcp] Allow cached DHCPACK to override a temporary MAC address
When running on a system with an ACPI-provided system-specific MAC
address, iPXE will apply this address to an ECM or NCM USB NIC.  If
iPXE has been chainloaded from a previous stage that does not
understand the ACPI MAC mechanism then this can result in iPXE using a
different MAC address than the previous stage, which is surprising to
users.

Attempt to minimise surprise by allowing the MAC address found in a
cached DHCPACK packet to override a temporary MAC address, if the
DHCPACK MAC address matches the network device's permanent MAC
address.  When a previous stage has chosen to use the network device's
permanent MAC address (e.g. because it does not understand the ACPI
MAC mechanism), this will cause iPXE to make the same choice.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-05-23 13:05:24 +01:00
Michael Brown 87f1796f15 [ecm] Treat ACPI MAC address as being a non-permanent MAC address
When applying an ACPI-provided system-specific MAC address, apply it
to netdev->ll_addr rather than netdev->hw_addr.  This allows iPXE
scripts to access the permanent MAC address via the ${netX/hwaddr}
setting (and thereby provides scripts with a mechanism to ascertain
that the NIC is using a MAC address other than its own permanent
hardware address).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-05-23 12:23:53 +01:00
Michael Brown 70995397e5 [cloud] Allow aws-import script to run on Python 3.6
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-04-06 14:36:07 +01:00
Michael Brown f58b5109f4 [acpi] Support the "_RTXMAC_" format for ACPI-based MAC addresses
Some newer HP products expose the host-based MAC (HBMAC) address using
an ACPI method named "RTMA" returning a part-binary string of the form
"_RTXMAC_#<mac>#", where "<mac>" comprises the raw MAC address bytes.

Extend the existing support to handle this format alongside the older
"_AUXMAC_" format (which uses a base16-encoded MAC address).

Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-25 16:47:06 +00:00
Michael Brown 614c3f43a1 [acpi] Add MAC address extraction self-tests
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-24 12:58:52 +00:00
Michael Brown 1e1b9593e6 [linux] Add stub phys_to_user() implementation
For symmetry with the stub user_to_phys() implementation, provide
phys_to_user() with the same underlying assumption that virtual
addresses are physical (since there is no way to know the real
physical address when running as a Linux userspace executable).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-24 12:58:52 +00:00
Michael Brown 27825e5557 [acpi] Allow for the possibility of overriding ACPI tables at link time
Allow for linked-in code to override the mechanism used to locate an
ACPI table, thereby opening up the possibility of ACPI self-tests.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-24 12:58:52 +00:00
Michael Brown dd35475438 [efi] Support Unicode character output via framebuffer console
Extend the glyph cache to include a number of dynamic entries that are
populated on demand whenever a non-ASCII character needs to be drawn.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 17:30:52 +00:00
Michael Brown ba93c9134c [fbcon] Support Unicode character output
Accumulate UTF-8 characters in fbcon_putchar(), and require the frame
buffer console's .glyph() method to accept Unicode character values.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 17:27:18 +00:00
Michael Brown 2ff3385e00 [efi] Support Unicode character output via text console
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 17:09:58 +00:00
Michael Brown 7e9631b60f [utf8] Add UTF-8 accumulation self-tests
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 16:25:13 +00:00
Michael Brown 3cd3a73261 [utf8] Add ability to accumulate Unicode characters from UTF-8 bytes
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-01 15:57:33 +00:00
Michael Brown 2acdc92994 [dns] Always start DNS queries using the first configured DNS server
We currently define the active DNS server as a global variable.  All
queries will start by attempting to contact the active DNS server, and
the active DNS server will be changed only if we fail to get a
response.  This effectively treats the DNS server list as expressing a
weak preference ordering: we will try servers in order, but once we
have found a working server we will stick with that server for as long
as it continues to respond to queries.

Some sites are misconfigured to hand out DNS servers that do not have
a consistent worldview.  For example: the site may hand out two DNS
server addresses, the first being an internal DNS server (which is
able to resolve names in private DNS domains) and the second being a
public DNS server such as 8.8.8.8 (which will correctly return
NXDOMAIN for any private DNS domains).  This type of configuration is
fundamentally broken and should never be used, since any DNS resolver
performing a query for a name within a private DNS domain may obtain a
spurious NXDOMAIN response for a valid private DNS name.

Work around these broken configurations by treating the DNS server
list as expressing a strong preference ordering, and always starting
DNS queries from the first server in the list (rather than maintaining
a global concept of the active server).  This will have the debatable
benefit of converting permanent spurious NXDOMAIN errors into
transient spurious NXDOMAIN errors, which can at least be worked
around at a higher level (e.g. by retrying a download in a loop within
an iPXE script).

The cost of always starting DNS queries from the first server in the
list is a slight delay introduced when the first server is genuinely
unavailable.  This should be negligible in practice since DNS queries
are relatively infrequent and the failover expiry time is short.

Treating the DNS server list as a preference ordering is permitted by
the language of RFC 2132, which defines DHCP option 6 as a list in
which "[DNS] servers SHOULD be listed in order of preference".  No
specification defines a precise algorithm for how this preference
order should be applied in practice: this new approach seems as good
as any.

Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-23 23:17:05 +00:00
Michael Brown bc5c612f75 [console] Include mappings for AltGr-Shift-<key>
The BIOS console's interpretation of LShift+RShift as equivalent to
AltGr requires the shifted ASCII characters to be present in the AltGr
mapping table, to allow AltGr-Shift-<key> to be interpreted in the
same way as AltGr-<key>.

For keyboard layouts that have different ASCII characters for
AltGr-<key> and AltGr-Shift-<key>, this will potentially leave the
character for AltGr-<key> inaccessible via the BIOS console if the
BIOS requires the use of the LShift+RShift workaround.  This
theoretically affects the numeric keys in the Lithuanian ("lt")
keyboard layout (where the numerals are accessed via AltGr-<key> and
punctuation characters via AltGr-Shift-<key>), but the simple
workaround for that keyboard layout is to avoid using AltGr and Shift
entirely since the unmodified numeric keys are not remapped anyway.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 15:31:47 +00:00
Michael Brown 304333dace [console] Support changing keyboard map at runtime
Provide the special keyboard map named "dynamic" which allows the
active keyboard map to be selected at runtime via the ${keymap}
setting, e.g.:

  #define KEYBOARD_MAP dynamic

  iPXE> set keymap uk

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 14:06:33 +00:00
Michael Brown 674963e2a6 [settings] Always process all settings applicators
Settings applicators are entirely independent, and there is no reason
why a failure in one applicator should prevent other applicators from
being processed.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 13:50:41 +00:00