Commit Graph

6500 Commits (6714b20ea20af5046cdb36b58ab777b55a3c003c)

Author SHA1 Message Date
Michael Brown 475c0dfa8e [linux] Centralise the linker script for Linux binaries
Reduce duplication between i386 and x86_64 by providing a single
shared linker script that both architectures can include.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-22 12:38:03 +00:00
Michael Brown a99e435c8e [efi] Do not rely on ProcessorBind.h when building host binaries
We cannot rely on the EDK2 ProcessorBind.h headers when compiling a
binary for execution on the build host itself (e.g. elf2efi), since
the host's CPU architecture may not even be supported by EDK2.

Fix by skipping ProcessorBind.h when building a host binary, and
defining the bare minimum required to allow other EDK2 headers to
compile cleanly.

Reported-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-20 00:17:49 +00:00
Alexander Graf 6b977d1250 [ena] Allocate an unused Asynchronous Event Notification Queue (AENQ)
We currently don't allocate an Asynchronous Event Notification Queue
(AENQ) because we don't actually care about any of the events that may
come in.

The ENA firmware found on Graviton instances requires the AENQ to
exist, otherwise all admin queue commands will fail.

Fix by allocating an AENQ and disabling all events (so that we do not
need to include code to acknowledge any events that may arrive).

Signed-off-by: Alexander Graf <graf@amazon.com>
2023-01-18 22:47:58 +00:00
Michael Brown 08740220ba [netdevice] Ensure consistent interpretation of "netX" device name
Ensure that the "${netX/...}" settings mechanism always uses the same
interpretation of the network device corresponding to "netX" as any
other mechanism that performs a name-based lookup of a network device.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-17 12:42:46 +00:00
Michael Brown 2dcef4b7a1 [efi] Create VLAN autoboot device automatically
When chainloading iPXE from an EFI VLAN device, configure the
corresponding iPXE VLAN device to be created automatically.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-15 22:42:30 +00:00
Michael Brown f07630c74f [vlan] Support automatic VLAN device creation
Add the ability to automatically create a VLAN device for a specified
trunk device link-layer address and VLAN tag.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-15 22:35:44 +00:00
Michael Brown 5a2fa6040e [autoboot] Include VLAN tag in filter for identifying autoboot device
When chainloading iPXE from a VLAN device, the MAC address of the
loaded image's device handle will match the MAC address of the trunk
device created by iPXE, and the autoboot process will then erroneously
consider the trunk device to be an autoboot device.

Fix by recording the VLAN tag along with the MAC address, and treating
the VLAN tag as part of the filter used to match the MAC address
against candidate network devices.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-15 21:36:08 +00:00
Michael Brown c4c03e5be8 [netdevice] Allow duplicate MAC addresses
Many laptops now include the ability to specify a "system-specific MAC
address" (also known as "pass-through MAC"), which is supposed to be
used for both the onboard NIC and for any attached docking station or
other USB NIC.  This is intended to simplify interoperability with
software or hardware that relies on a MAC address to recognise an
individual machine: for example, a deployment server may associate the
MAC address with a particular operating system image to be deployed.
This therefore creates legitimate situations in which duplicate MAC
addresses may exist within the same system.

As described in commit 98d09a1 ("[netdevice] Avoid registering
duplicate network devices"), the Xen netfront driver relies on the
rejection of duplicate MAC addresses in order to inhibit registration
of the emulated PCI devices that a Xen PV-HVM guest will create to
shadow each of the paravirtual network devices.

Move the code that rejects duplicate MAC addresses from the network
device core to the Xen netfront driver, to allow for the existence of
duplicate MAC addresses in non-Xen setups.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-15 00:42:52 +00:00
Michael Brown 47af48012e [netdevice] Separate concept of scope ID from network device name index
The network device index currently serves two purposes: acting as a
sequential index for network device names ("net0", "net1", etc), and
acting as an opaque unique integer identifier used in socket address
scope IDs.

There is no particular need for these usages to be linked, and it can
lead to situations in which devices are named unexpectedly.  For
example: if a system has two network devices "net0" and "net1", a VLAN
is created as "net1-42", and then a USB NIC is connected, then the USB
NIC will be named "net3" rather than the expected "net2" since the
VLAN device "net1-42" will have consumed an index.

Separate the usages: rename the "index" field to "scope_id" (matching
its one and only use case), and assign the name without reference to
the scope ID by finding the first unused name.  For consistency,
assign the scope ID by similarly finding the first unused scope ID.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-14 00:09:20 +00:00
Michael Brown ab19546386 [efi] Disable receive filters to work around buggy UNDI drivers
Some UNDI drivers (such as the AMI UsbNetworkPkg currently in the
process of being upstreamed into EDK2) have a bug that will prevent
any packets from being received unless at least one attempt has been
made to disable some receive filters.

Work around these buggy drivers by attempting to disable receive
filters before enabling them.  Ignore any errors, since we genuinely
do not care whether or not the disabling succeeds.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2023-01-11 00:18:18 +00:00
Michael Brown 7147532c3f [cachedhcp] Retain cached DHCPACK after startup if not already consumed
We currently free an unclaimed cached DHCPACK immediately after
startup, in order to free up memory.  This prevents the cached DHCPACK
from being applied to a device that is created after startup, such as
a VLAN device created via the "vcreate" command.

Retain any unclaimed DHCPACK after startup to allow it to be matched
against (and applied to) any device that gets created at runtime.
Free the DHCPACK during shutdown if it still remains unclaimed, in
order to exit with memory cleanly freed.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-22 15:12:34 +00:00
Michael Brown 60b5532cfc [cachedhcp] Include VLAN tag in filter for applying cached DHCPACK
When chainloading iPXE from a VLAN device, the MAC address within the
cached DHCPACK will match the MAC address of the trunk device created
by iPXE, and the cached DHCPACK will then end up being erroneously
applied to the trunk device.  This tends to break outbound IPv4
routing, since both the trunk and VLAN devices will have the same
assigned IPv4 address.

Fix by recording the VLAN tag along with the cached DHCPACK, and
treating the VLAN tag as part of the filter used to match the cached
DHCPACK against candidate network devices.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-22 14:59:29 +00:00
Michael Brown b9571ca12e [efi] Add efi_path_vlan() utility function
EFI provides no API for determining the VLAN tag (if any) for a
specified device handle.  There is the EFI_VLAN_CONFIG_PROTOCOL, but
that exists only on the trunk device handle (not on the VLAN device
handle), and provides no way to match VLAN tags against the trunk
device's child device handles.

The EDK2 codebase seems to rely solely on the device path to determine
the VLAN tag for a specified device handle: both NetLibGetVlanId() and
BmGetNetworkDescription() will parse the device path to search for a
VLAN_DEVICE_PATH component.

Add efi_path_vlan() which uses the same device path parsing logic to
determine the VLAN tag.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-22 14:27:56 +00:00
Michael Brown 099e4d39b3 [efi] Expose efi_path_next() utility function
Provide a single central implementation of the logic for stepping
through elements of an EFI device path.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-22 13:34:28 +00:00
Michael Brown 0f3ace92c6 [efi] Allow passing a NULL device path to path utility functions
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-22 13:30:02 +00:00
Michael Brown d879c8e4d9 [efi] Provide VLAN configuration protocol
UEFI implements VLAN support within the Managed Network Protocol (MNP)
driver, which may create child VLAN devices automatically based on
stored UEFI variables.  These child devices do not themselves provide
a raw-packet interface via EFI_SIMPLE_NETWORK_PROTOCOL, and may be
consumed only via the EFI_MANAGED_NETWORK_PROTOCOL interface.

The device paths constructed for these child devices may conflict with
those for the EFI_SIMPLE_NETWORK_PROTOCOL instances that iPXE attempts
to install for its own VLAN devices.  The upshot is that creating an
iPXE VLAN device (e.g. via the "vcreate" command) will fail if the
UEFI Managed Network Protocol has already created a device for the
same VLAN tag.

Fix by providing our own EFI_VLAN_CONFIG_PROTOCOL instance on the same
device handle as EFI_SIMPLE_NETWORK_PROTOCOL.  This causes the MNP
driver to treat iPXE's device as supporting hardware VLAN offload, and
it will therefore not attempt to install its own instance of the
protocol.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-14 11:51:52 +00:00
Michael Brown 5e62b4bc6c [vlan] Allow external code to identify VLAN priority as well as tag
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-14 11:05:37 +00:00
Michael Brown b0ded89e91 [build] Disable dangling pointer checking for GCC
The dangling pointer warning introduced in GCC 12 reports false
positives that result in build failures.  In particular, storing the
address of a local code label used to record the current state of a
state machine (as done in crypto/deflate.c) is reported as an error.

There seems to be no way to mark the pointer type as being permitted
to hold such a value, so unconditionally disable the warning.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-14 01:29:49 +00:00
Michael Brown 54c4c1d403 [build] Disable array bounds checking for GCC
The array bounds checker on GCC 12 and newer reports a very large
number of false positives that result in build failures.  In
particular, accesses through pointers to zero-length arrays (such as
those used by the linker table mechanism in include/ipxe/tables.h) are
reported as errors, contrary to the GCC documentation.

Work around this GCC issue by unconditionally disabling the warning.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-12-14 00:54:13 +00:00
Christian I. Nilsson 563bff4722 [intel] Add PCI ID for I219-V and -LM 16,17
Signed-off-by: Christian I. Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-15 13:05:28 +00:00
Michael Brown 2ae5355321 [pci] Backup and restore standard config space across PCIe FLR
The behaviour of PCI devices across a function-level reset seems to be
inconsistent in practice: some devices will preserve PCI BARs, some
will not.

Fix the behaviour of FLR on devices that do not preserve PCI BARs by
backing up and restoring PCI configuration space across the reset.
Preserve only the standard portion of the configuration space, since
there may be registers with unexpected side effects in the remaining
non-standardised space.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-13 21:38:41 +00:00
Michael Brown ca2be7e094 [pci] Allow PCI config space backup to be limited by maximum offset
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-13 20:42:09 +00:00
Michael Brown 688646fe6d [tls] Add GCM cipher suites
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-10 09:58:44 +00:00
Michael Brown f5c829b6f8 [tests] Verify ability to perform in-place encryption and decryption
TLS relies upon the ability of ciphers to perform in-place decryption,
in order to avoid allocating additional I/O buffers for received data.

Add verification of in-place encryption and decryption to the cipher
self-tests.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-10 09:58:44 +00:00
Michael Brown 4acded7e57 [crypto] Support in-place decryption for GCM ciphers
The hash calculation is currently performed incorrectly when
decrypting in place, since the ciphertext will have been overwritten
with the plaintext before being used to update the hash value.

Restructure the code to allow for in-place encryption and decryption.
Choose to optimise for the decryption case, since we are likely to
decrypt much more data than we encrypt.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-10 09:58:37 +00:00
Michael Brown 63fdd9b581 [tests] Verify ability to reset cipher initialisation vector
TLS relies upon the ability to reuse a cipher by resetting only the
initialisation vector while reusing the existing key.

Add verification of resetting the initialisation vector to the cipher
self-tests.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-09 16:54:13 +00:00
Michael Brown 63577207ab [crypto] Ensure relevant GCM cipher state is cleared by cipher_setiv()
Reset the accumulated authentication state when cipher_setiv() is
called, to allow the cipher to be reused without resetting the key.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-09 16:48:50 +00:00
Michael Brown 7256a6eb24 [tls] Allow handshake digest algorithm to be specified by cipher suite
All existing cipher suites use SHA-256 as the TLSv1.2 and above
handshake digest algorithm (even when using SHA-1 as the MAC digest
algorithm).  Some GCM cipher suites use SHA-384 as the handshake
digest algorithm.

Allow the cipher suite to specify the handshake (and PRF) digest
algorithm to be used for TLSv1.2 and above.

This requires some restructuring to allow for the fact that the
ClientHello message must be included within the handshake digest, even
though the relevant digest algorithm is not yet known at the point
that the ClientHello is sent.  Fortunately, the ClientHello may be
reproduced verbatim at the point of receiving the ServerHello, so we
rely on reconstructing (rather than storing) this message.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-09 14:49:42 +00:00
Michael Brown 51ecc05490 [tls] Always send maximum supported version in ClientHello
Always send the maximum supported version in our ClientHello message,
even when performing renegotiation (in which case the current version
may already be lower than the maximum supported version).

This is permitted by the specification, and allows the ClientHello to
be reconstructed verbatim at the point of selecting the handshake
digest algorithm in tls_new_server_hello().

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-09 14:49:42 +00:00
Michael Brown 54d83e92f0 [tls] Add support for AEAD ciphers
Allow for AEAD cipher suites where the MAC length may be zero and the
authentication is instead provided by an authenticating cipher, with
the plaintext authentication tag appended to the ciphertext.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 15:14:19 +00:00
Michael Brown 186306d619 [tls] Treat invalid block padding as zero length padding
Harden against padding oracle attacks by treating invalid block
padding as zero length padding, thereby deferring the failure until
after computing the (incorrect) MAC.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 15:14:06 +00:00
Michael Brown 634a86093a [tls] Allow for arbitrary-length initialisation vectors
Restructure the encryption and decryption operations to allow for the
use of ciphers where the initialisation vector is constructed by
concatenating the fixed IV (derived as part of key expansion) with a
record IV (prepended to the ciphertext).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 15:14:04 +00:00
Michael Brown c453b4c284 [tls] Add MAC length as a cipher suite parameter
TLS stream and block ciphers use a MAC with a length equal to the
output length of the digest algorithm in use.  For AEAD ciphers there
is no MAC, with the equivalent functionality provided by the cipher
algorithm's authentication tag.

Allow for the existence of AEAD cipher suites by making the MAC length
a parameter of the cipher suite.

Assume that the MAC key length is equal to the MAC length, since this
is true for all currently supported cipher suites.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 14:09:18 +00:00
Michael Brown b6eef14858 [tls] Abstract out concept of a TLS authentication header
All TLS cipher types use a common structure for the per-record data
that is authenticated in addition to the plaintext itself.  This data
is used as a prefix in the HMAC calculation for stream and block
ciphers, or as additional authenticated data for AEAD ciphers.

Define a "TLS authentication header" structure to hold this data as a
contiguous block, in order to meet the alignment requirement for AEAD
ciphers such as GCM.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 13:48:45 +00:00
Michael Brown 6a360ebfde [tls] Ensure cipher alignment size is respected
Adjust the length of the first received ciphertext data buffer to
ensure that all decryption operations respect the cipher's alignment
size.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:49 +00:00
Michael Brown 30243ad739 [crypto] Add concept of cipher alignment size
The GCM cipher mode of operation (in common with other counter-based
modes of operation) has a notion of blocksize that does not neatly
fall into our current abstraction: it does operate in 16-byte blocks
but allows for an arbitrary overall data length (i.e. the final block
may be incomplete).

Model this by adding a concept of alignment size.  Each call to
encrypt() or decrypt() must begin at a multiple of the alignment size
from the start of the data stream.  This allows us to model GCM by
using a block size of 1 byte and an alignment size of 16 bytes.

As a side benefit, this same concept allows us to neatly model the
fact that raw AES can encrypt only a single 16-byte block, by
specifying an alignment size of zero on this cipher.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:48 +00:00
Michael Brown d1bc872a2e [tls] Formalise notions of fixed and record initialisation vectors
TLS block ciphers always use CBC (as per RFC 5246 section 6.2.3.2)
with a record initialisation vector length that is equal to the cipher
block size, and no fixed initialisation vector.

The initialisation vector for AEAD ciphers such as GCM is less
straightforward, and requires both a fixed and per-record component.

Extend the definition of a cipher suite to include fixed and record
initialisation vector lengths, and generate the fixed portion (if any)
as part of key expansion.

Do not add explicit calls to cipher_setiv() in tls_assemble_block()
and tls_split_block(), since the constraints imposed by RFC 5246 are
specifically chosen to allow implementations to avoid doing so.
(Instead, add a sanity check that the record initialisation vector
length is equal to the cipher block size.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:48 +00:00
Michael Brown f8565a655e [tls] Remove support for TLSv1.0
The TLSv1.0 protocol was deprecated by RFC 8996 (along with TLSv1.1),
and has been disabled by default in iPXE since commit dc785b0fb
("[tls] Default to supporting only TLSv1.1 or above") in June 2020.

While there is value in continuing to support older protocols for
interoperability with older server appliances, the additional
complexity of supporting the implicit initialisation vector for
TLSv1.0 is not worth the cost.

Remove support for the obsolete TLSv1.0 protocol, to reduce complexity
of the implementation and simplify ongoing maintenance.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:48 +00:00
Michael Brown 7b60a48752 [efi] Clear DMA-coherent buffers before mapping
The DMA mapping is performed implicitly as part of the call to
dma_alloc().  The current implementation creates the IOMMU mapping for
the allocated and potentially uninitialised data before returning to
the caller (which will immediately zero out or otherwise initialise
the buffer).  This leaves a small window within which a malicious PCI
device could potentially attempt to retrieve firmware-owned secrets
present in the uninitialised buffer.  (Note that the hypothetically
malicious PCI device has no viable way to know the address of the
buffer from which to attempt a DMA read, rendering the attack
extremely implausible.)

Guard against any such hypothetical attacks by zeroing out the
allocated buffer prior to creating the coherent DMA mapping.

Suggested-by: Mateusz Siwiec <Mateusz.Siwiec@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-04 20:28:09 +00:00
Michael Brown f48b01cb01 [bzimage] Fix parsing of "vga=..." when not at end of command line
bzimage_parse_cmdline() uses strcmp() to identify the named "vga=..."
kernel command line option values, which will give a false negative if
the option is not last on the command line.

Fix by temporarily changing the relevant command line separator (if
any) to a NUL terminator.

Debugged-by: Simon Rettberg <simon.rettberg@rz.uni-freiburg.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-27 13:05:35 +01:00
Michael Brown 8fce26730c [crypto] Add block cipher Galois/Counter mode of operation
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:30 +01:00
Michael Brown da81214cec [crypto] Add concept of authentication tag to cipher algorithms
Some ciphers (such as GCM) support the concept of a tag that can be
used to authenticate the encrypted data.  Add a cipher method for
generating an authentication tag.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:30 +01:00
Michael Brown 0c383bf00a [crypto] Add concept of additional data to cipher algorithms
Some ciphers (such as GCM) support the concept of additional
authenticated data, which does not appear in the ciphertext but may
affect the operation of the cipher.

Allow cipher_encrypt() and cipher_decrypt() to be called with a NULL
destination buffer in order to pass additional data.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:30 +01:00
Michael Brown 8e478e648f [crypto] Allow initialisation vector length to vary from cipher blocksize
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:28 +01:00
Michael Brown 52f72d298a [crypto] Expose null crypto algorithm methods for reuse
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:20:22 +01:00
Michael Brown 2c78242732 [tls] Add support for DHE variants of the existing cipher suites
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 15:42:13 +01:00
Michael Brown 6b2c94d3a7 [tls] Add support for Ephemeral Diffie-Hellman key exchange
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 15:42:11 +01:00
Michael Brown ea33ea33c0 [tls] Add key exchange mechanism to definition of cipher suite
Allow for the key exchange mechanism to vary depending upon the
selected cipher suite.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 80c45c5c71 [tls] Record ServerKeyExchange record, if provided
Accept and record the ServerKeyExchange record, which is required for
key exchange mechanisms such as Ephemeral Diffie-Hellman (DHE).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 028aac99a3 [tls] Generate pre-master secret at point of sending ClientKeyExchange
The pre-master secret is currently constructed at the time of
instantiating the TLS connection.  This precludes the use of key
exchange mechanisms such as Ephemeral Diffie-Hellman (DHE), which
require a ServerKeyExchange message to exchange additional key
material before the pre-master secret can be constructed.

Allow for the use of such cipher suites by deferring generation of the
master secret until the point of sending the ClientKeyExchange
message.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 1a7317e7d4 [tls] Generate master secret at point of sending ClientKeyExchange
The master secret is currently constructed upon receiving the
ServerHello message.  This precludes the use of key exchange
mechanisms such as Ephemeral Diffie-Hellman (DHE), which require a
ServerKeyExchange message to exchange additional key material before
the pre-master secret and master secret can be constructed.

Allow for the use of such cipher suites by deferring generation of the
master secret until the point of sending the ClientKeyExchange
message.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 18b861024a [crypto] Add Ephemeral Diffie-Hellman key exchange algorithm
Add an implementation of the Ephemeral Diffie-Hellman key exchange
algorithm as defined in RFC2631, with test vectors taken from the NIST
Cryptographic Toolkit.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:33:19 +01:00
Michael Brown 007d3cb800 [crypto] Simplify internal HMAC API
Simplify the internal HMAC API so that the key is provided only at the
point of calling hmac_init(), and the (potentially reduced) key is
stored as part of the context for later use by hmac_final().

This simplifies the calling code, and avoids the need for callers such
as TLS to allocate a potentially variable length block in order to
retain a copy of the unmodified key.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-10 12:21:54 +01:00
Michael Brown 88419b608d [test] Add HMAC self-tests
The HMAC code is already tested indirectly via several consuming
algorithms that themselves provide self-tests (e.g. HMAC-DRBG, NTLM
authentication, and PeerDist content identification), but lacks any
direct test vectors.

Add explicit HMAC tests and ensure that corner cases such as empty
keys, block-length keys, and over-length keys are all covered.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-10 12:17:39 +01:00
Michael Brown 081b3eefc4 [ena] Assign memory BAR if left empty by BIOS
Some BIOSes in AWS EC2 (observed with a c6i.metal instance in
eu-west-2) will fail to assign an MMIO address to the ENA device,
which causes ioremap() to fail.

Experiments show that the ENA device is the only device behind its
bridge, even when multiple ENA devices are present, and that the BIOS
does assign a memory window to the bridge.

We may therefore choose to assign the device an MMIO address at the
start of the bridge's memory window.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-19 17:49:25 +01:00
Michael Brown 3aa6b79c8d [pci] Add minimal PCI bridge driver
Add a minimal driver for PCI bridges that can be used to locate the
bridge to which a PCI device is attached.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-19 17:47:57 +01:00
Michael Brown 649176cd60 [pci] Select PCI I/O API at runtime for cloud images
Pretty much all physical machines and off-the-shelf virtual machines
will provide a functional PCI BIOS.  We therefore default to using
only the PCI BIOS, with no fallback to an alternative mechanism if the
PCI BIOS fails.

AWS EC2 provides the opportunity to experience some exceptions to this
rule.  For example, the t3a.nano instances in eu-west-1 have no
functional PCI BIOS at all.  As of commit 83516ba ("[cloud] Use
PCIAPI_DIRECT for cloud images") we therefore use direct Type 1
configuration space accesses in the images built and published for use
in the cloud.

Recent experience has discovered yet more variation in AWS EC2
instances.  For example, some of the metal instance types have
multiple PCI host bridges and the direct Type 1 accesses therefore
see only a subset of the PCI devices.

Attempt to accommodate future such variations by making the PCI I/O
API selectable at runtime and choosing ECAM (if available), falling
back to the PCI BIOS (if available), then finally falling back to
direct Type 1 accesses.

This is implemented as a dedicated PCIAPI_CLOUD API, rather than by
having the PCI core select a suitable API at runtime (as was done for
timers in commit 302f1ee ("[time] Allow timer to be selected at
runtime").  The common case will remain that only the PCI BIOS API is
required, and we would prefer to retain the optimisations that come
from inlining the configuration space accesses in this common case.
Cloud images are (at present) disk images rather than ROM images, and
so the increased code size required for this design approach in the
PCIAPI_CLOUD case is acceptable.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-18 13:41:21 +01:00
Michael Brown 9448ac5445 [bios] Allow pcibios_discover() to return an empty range
Allow pcibios_discover() to return an empty range if the INT 1A,B101
PCI BIOS installation check call fails.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-18 13:35:58 +01:00
Michael Brown be667ba948 [pci] Add support for the Enhanced Configuration Access Mechanism (ECAM)
The ACPI MCFG table describes a direct mapping of PCI configuration
space into MMIO space.  This mapping allows access to extended
configuration space (up to 4096 bytes) and also provides for the
existence of multiple host bridges.

Add support for the ECAM mechanism described by the ACPI MCFG table,
as a selectable PCI I/O API alongside the existing PCI BIOS and Type 1
mechanisms.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-16 01:05:47 +01:00
Michael Brown ff228f745c [pci] Generalise pci_num_bus() to pci_discover()
Allow pci_find_next() to discover devices beyond the first PCI
segment, by generalising pci_num_bus() (which implicitly assumes that
there is only a single PCI segment) with pci_discover() (which has the
ability to return an arbitrary contiguous chunk of PCI bus:dev.fn
address space).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 16:49:47 +01:00
Michael Brown 56b30364c5 [pci] Check for wraparound in callers of pci_find_next()
The semantics of the bus:dev.fn parameter passed to pci_find_next()
are "find the first existent PCI device at this address or higher",
with the caller expected to increment the address between finding
devices.  This does not allow the parameter to distinguish between the
two cases "start from address zero" and "wrapped after incrementing
maximal possible address", which could therefore lead to an infinite
loop in the degenerate case that a device with address ffff:ff:1f.7
really exists.

Fix by checking for wraparound in the caller (which is already
responsible for performing the increment).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 15:20:58 +01:00
Michael Brown 8fc3c26eae [pci] Allow pci_find_next() to return non-zero PCI segments
Separate the return status code from the returned PCI bus:dev.fn
address, in order to allow pci_find_next() to be used to find devices
with a non-zero PCI segment number.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 15:20:58 +01:00
Michael Brown 6459e3b7b1 [linux] Add missing PROVIDE_PCIAPI_INLINE() macros
Ensure type consistency of the PCI I/O API methods by adding the
missing PROVIDE_PCIAPI_INLINE() macros.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 15:20:58 +01:00
Michael Brown 8f5fc16143 [ipv6] Ignore SLAAC on prefixes with an incompatible prefix length
Experience suggests that routers are often misconfigured to advertise
SLAAC even on prefixes that do not have a SLAAC-compatible prefix
length.  iPXE will currently treat this as an error, resulting in the
prefix being ignored completely.

Handle this misconfiguration by ignoring the autonomous address flag
when the prefix length is unsuitable for SLAAC.

Reported-by: Malte Janduda <mail@janduda.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-13 13:25:19 +01:00
Michael Brown bc19aeca5f [ipv6] Fix mask calculation when prefix length is not a multiple of 8
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-06 13:04:19 +01:00
Michael Brown 131daf1aae [test] Validate constructed IPv6 routing table entries
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-06 12:31:32 +01:00
Michael Brown a80124456e [ena] Increase receive ring size to 128 entries
Some versions of the ENA hardware (observed on a c6i.large instance in
eu-west-2) seem to require a receive ring containing at least 128
entries: any smaller ring will never see receive completions or will
stall after the first few completions.

Increase the receive ring size to 128 entries (determined empirically)
for compatibility with these hardware versions.  Limit the receive
ring fill level to 16 (as at present) to avoid consuming more memory
than will typically be available in the internal heap.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 3b81a4e256 [ena] Provide a host information page
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) seem to require a host information page, without which the
CREATE_CQ command will fail with ENA_ADMIN_UNKNOWN_ERROR.

These firmware versions also seem to require us to claim that we are a
Linux kernel with a specific driver major version number.  This
appears to be a firmware bug, as revealed by Linux kernel commit
1a63443af ("net/amazon: Ensure that driver version is aligned to the
linux kernel"): this commit changed the value of the driver version
number field to be the Linux kernel version, and was hastily reverted
in commit 92040c6da ("net: ena: fix broken interface between ENA
driver and FW") which clarified that the version number field does
actually have some undocumented significance to some versions of the
firmware.

Fix by providing a host information page via the SET_FEATURE command,
incorporating the apparently necessary lies about our identity.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 9f81e97af5 [ena] Specify the unused completion queue MSI-X vector as 0xffffffff
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) will complain if the completion queue's MSI-X vector field
is left empty, even though the queue configuration specifies that
interrupts are not used.

Work around these firmware versions by passing in what appears to be
the magic "no MSI-X vector" value in this field.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 6d2cead461 [ena] Allow for out-of-order completions
The ENA data path design has separate submission and completion
queues.  Submission queues must be refilled in strict order (since
there is only a single linear tail pointer used to communicate the
existence of new entries to the hardware), and completion queue
entries include a request identifier copied verbatim from the
submission queue entry.  Once the submission queue doorbell has been
rung, software never again reads from the submission queue entry and
nothing ever needs to write back to the submission queue entry since
completions are reported via the separate completion queue.

This design allows the hardware to complete submission queue entries
out of order, provided that it internally caches at least as many
entries as it leaves gaps.

Record and identify I/O buffers by request identifier (using a
circular ring buffer of unique request identifiers), and remove the
assumption that submission queue entries will be completed in order.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:25 +01:00
Michael Brown 856ffe000e [ena] Limit submission queue fill level to completion queue size
The CREATE_CQ command is permitted to return a size smaller than
requested, which could leave us in a situation where the completion
queue could overflow.

Avoid overflow by limiting the submission queue fill level to the
actual size of the completion queue.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:37:54 +01:00
Michael Brown c5af41a6f5 [intelxl] Explicitly request a single queue pair for virtual functions
Current versions of the E810 PF driver fail to set the number of
in-use queue pairs in response to the CONFIG_VSI_QUEUES message.  When
the number of in-use queue pairs is less than the number of available
queue pairs, this results in some packets being directed to
nonexistent receive queues and hence silently dropped.

Work around this PF driver bug by explicitly configuring the number of
available queue pairs via the REQUEST_QUEUES message.  This message
triggers a VF reset that, in turn, requires us to reopen the admin
queue and issue an additional GET_RESOURCES message to restore the VF
to a functional state.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 19:31:06 +01:00
Michael Brown 04879352c4 [intelxl] Allow for admin commands that trigger a VF reset
The RESET_VF admin queue command does not complete via the usual
mechanism, but instead requires us to poll registers to wait for the
reset to take effect and then reopen the admin queue.

Allow for the existence of other admin queue commands that also
trigger a VF reset, by separating out the logic that waits for the
reset to complete.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 19:29:01 +01:00
Michael Brown 491c075f7f [intelxl] Negotiate virtual function API version 1.1
Negotiate API version 1.1 in order to allow access to virtual function
opcodes that are disallowed by default on the E810.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 17:58:52 +01:00
Michael Brown b52ea20841 [intelxl] Show virtual function packet statistics for debugging
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 17:58:46 +01:00
Michael Brown cad1cc6b44 [intelxl] Add driver for Intel 100 Gigabit Ethernet NICs
Add a driver for the E810 family of 100 Gigabit Ethernet NICs.  The
core datapath is identical to that of the 40 Gigabit XL710, and this
part of the code is shared between both drivers.  The admin queue
mechanism is sufficiently similar to make it worth reusing substantial
portions of the code, with separate implementations for several
commands to handle the (unnecessarily) breaking changes in data
structure layouts.  The major differences are in the mechanisms for
programming queue contexts (where the E810 abandons TX/RX symmetry)
and for configuring the transmit scheduler and receive filters: these
portions are sufficiently different to justify a separate driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 16:15:17 +01:00
Michael Brown 6871a7de70 [intelxl] Use admin queue to set port MAC address and maximum frame size
Remove knowledge of the PRTGL_SA[HL] registers, and instead use the
admin queue to set the MAC address and maximum frame size.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:24:06 +01:00
Michael Brown 727b034f11 [intelxl] Use admin queue to get port MAC address
Remove knowledge of the PRTPM_SA[HL] registers, and instead use the
admin queue to retrieve the MAC address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00
Michael Brown 06467ee70f [intelxl] Defer fetching MAC address until after opening admin queue
Allow for the MAC address to be fetched using an admin queue command,
instead of reading the PRTPM_SA[HL] registers directly.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00
Michael Brown d6e36a2d73 [intelxl] Set maximum frame size to 9728 bytes as per datasheet
The PRTGL_SAH register contains the current maximum frame size, and is
not guaranteed on reset to contain the actual maximum frame size
supported by the hardware, which the datasheet specifies as 9728 bytes
(including the 4-byte CRC).

Set the maximum packet size to a hardcoded 9728 bytes instead of
reading from the PRTGL_SAH register.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00
Michael Brown 99242bbe2e [intelxl] Always issue "clear PXE mode" admin queue command
Remove knowledge of the GLLAN_RCTL_0 register (which changes location
between the XL810 and E810 register maps), and instead unconditionally
issue the "clear PXE mode" command with the EEXIST error silenced.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 15:28:03 +01:00
Michael Brown faf26bf8b8 [intelxl] Allow expected admin queue command errors to be silenced
The "clear PXE mode" admin queue command will return an EEXIST error
if the device is already in non-PXE mode, but there is no other admin
queue command that can be used to determine whether the device has
already been switched into non-PXE mode.

Provide a mechanism to allow expected errors from a command to be
silenced, to allow the "clear PXE mode" command to be cleanly used
without needing to first check the GLLAN_RCTL_0 register value.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 15:28:03 +01:00
Michael Brown f0ea19b238 [intelxl] Increase data buffer size to 4kB
At least one E810 admin queue command (Query Default Scheduling Tree
Topology) insists upon being provided with a 4kB data buffer, even
when the data to be returned is much smaller.

Work around this requirement by increasing the admin queue data buffer
size to 4kB.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 15:24:29 +01:00
Michael Brown fb69d14002 [intelxl] Separate virtual function driver definitions
Move knowledge of the virtual function data structures and admin
command definitions from intelxl.h to intelxlvf.h.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 14:53:57 +01:00
Michael Brown c220b93f31 [intelxl] Reuse admin command descriptor and buffer for VF responses
Remove the large static admin data buffer structure embedded within
struct intelxl_nic, and instead copy the response received via the
"send to VF" admin queue event to the (already consumed and completed)
admin command descriptor and data buffer.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 14:53:57 +01:00
Michael Brown 67f8878e10 [intelxl] Handle admin events via a callback
The physical and virtual function drivers each care about precisely
one admin queue event type.  Simplify event handling by using a
per-driver callback instead of the existing weak function symbol.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-11 14:53:54 +01:00
Michael Brown 9e46ffa924 [intelxl] Rename 8086:1889 PCI ID to "iavf"
The PCI device ID 8086:1889 is for the Intel Ethernet Adaptive Virtual
Function, which is a generic virtual function that can be exposed by
different generations of Intel hardware.

Rename the PCI ID from "xl710-vf-ad" to "iavf" to reflect that the
driver is not XL710-specific.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown ef70667557 [intelxl] Increase receive descriptor ring size to 64 entries
The E810 requires that receive descriptor rings have at least 64
entries (and are a multiple of 32 entries).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown 9f5b9e3abb [intelxl] Negotiate API version for virtual function via admin queue
Do not attempt to use the admin commands to get the firmware version
and report the driver version for the virtual function driver, since
these will be rejected by the E810 firmware as invalid commands when
issued by a virtual function.  Instead, use the mailbox interface to
negotiate the API version with the physical function driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown b4216fa506 [intelxl] Use non-zero MSI-X vector for virtual function interrupts
The 100 Gigabit physical function driver requires a virtual function
driver to request that transmit and receive queues are mapped to MSI-X
vector 1 or higher.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:47 +01:00
Michael Brown 1b61c2118c [intelxl] Fix invocation of intelxlvf_admin_queues()
The second parameter to intelxlvf_admin_queues() is a boolean used to
select the VF opcode, rather than the raw VF opcode itself.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-10 12:29:45 +01:00
Michael Brown a202de385d [intelxl] Use function-level reset instead of PFGEN_CTRL.PFSWR
Remove knowledge of the PFGEN_CTRL register (which changes location
between XL710 and E810 register maps), and instead use PCIe FLR to
reset the physical function.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 16:43:36 +01:00
Michael Brown 0965cec53c [pci] Generalise function-level reset mechanism
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 16:39:40 +01:00
Michael Brown 9dfcdc04c8 [intelxl] Update list of PCI IDs
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown d8014b1801 [intelxl] Include admin command response data buffer in debug output
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 319caeaa7b [intelxl] Identify rings consistently in debug messages
Use the tail register offset (which exists for all ring types) as the
ring identifier in all relevant debug messages.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 814aef68c5 [intelxl] Add missing padding bytes to receive queue context
For the sake of completeness, ensure that all 32 bytes of the receive
queue context are programmed (including the unused final 8 bytes).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 725f0370fa [intelxl] Fix bit width of function number in PFFUNC_RID register
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown 5d3fad5c10 [intelxl] Fix retrieval of switch configuration via admin queue
Commit 8f3e648 ("[intelxl] Use one admin queue buffer per admin queue
descriptor") changed the API for intelxl_admin_command() such that the
caller now constructs the command directly within the next available
descriptor ring entry, rather than relying on intelxl_admin_command()
to copy the descriptor to and from the descriptor ring.

This introduced a regression in intelxl_admin_switch(), since the
second and subsequent iterations of the loop will not have constructed
a valid command in the new descriptor ring entry before calling
intelxl_admin_command().

Fix by constructing the command within the loop.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-08 15:59:55 +01:00
Michael Brown d3c8944d5c [acpi] Expose system MAC address via ${sysmac} setting
Expose the system MAC address (if any) via the ${sysmac} setting.
This allows scripts to access the system MAC address even when iPXE
has decided not to apply it to a network device (e.g. because the
cached DHCPACK MAC address was selected in order to match the
behaviour of a previous boot stage).

The setting is named ${sysmac} rather than ${acpimac} in order to
allow for forward compatibility with non-ACPI mechanisms that may
exist in future for specifying a system MAC address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-06-10 13:44:40 +01:00
Michael Brown d72c8fdc90 [cachedhcp] Allow cached DHCPACK to override a temporary MAC address
When running on a system with an ACPI-provided system-specific MAC
address, iPXE will apply this address to an ECM or NCM USB NIC.  If
iPXE has been chainloaded from a previous stage that does not
understand the ACPI MAC mechanism then this can result in iPXE using a
different MAC address than the previous stage, which is surprising to
users.

Attempt to minimise surprise by allowing the MAC address found in a
cached DHCPACK packet to override a temporary MAC address, if the
DHCPACK MAC address matches the network device's permanent MAC
address.  When a previous stage has chosen to use the network device's
permanent MAC address (e.g. because it does not understand the ACPI
MAC mechanism), this will cause iPXE to make the same choice.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-05-23 13:05:24 +01:00
Michael Brown 87f1796f15 [ecm] Treat ACPI MAC address as being a non-permanent MAC address
When applying an ACPI-provided system-specific MAC address, apply it
to netdev->ll_addr rather than netdev->hw_addr.  This allows iPXE
scripts to access the permanent MAC address via the ${netX/hwaddr}
setting (and thereby provides scripts with a mechanism to ascertain
that the NIC is using a MAC address other than its own permanent
hardware address).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-05-23 12:23:53 +01:00
Michael Brown f58b5109f4 [acpi] Support the "_RTXMAC_" format for ACPI-based MAC addresses
Some newer HP products expose the host-based MAC (HBMAC) address using
an ACPI method named "RTMA" returning a part-binary string of the form
"_RTXMAC_#<mac>#", where "<mac>" comprises the raw MAC address bytes.

Extend the existing support to handle this format alongside the older
"_AUXMAC_" format (which uses a base16-encoded MAC address).

Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-25 16:47:06 +00:00
Michael Brown 614c3f43a1 [acpi] Add MAC address extraction self-tests
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-24 12:58:52 +00:00
Michael Brown 1e1b9593e6 [linux] Add stub phys_to_user() implementation
For symmetry with the stub user_to_phys() implementation, provide
phys_to_user() with the same underlying assumption that virtual
addresses are physical (since there is no way to know the real
physical address when running as a Linux userspace executable).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-24 12:58:52 +00:00
Michael Brown 27825e5557 [acpi] Allow for the possibility of overriding ACPI tables at link time
Allow for linked-in code to override the mechanism used to locate an
ACPI table, thereby opening up the possibility of ACPI self-tests.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-24 12:58:52 +00:00
Michael Brown dd35475438 [efi] Support Unicode character output via framebuffer console
Extend the glyph cache to include a number of dynamic entries that are
populated on demand whenever a non-ASCII character needs to be drawn.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 17:30:52 +00:00
Michael Brown ba93c9134c [fbcon] Support Unicode character output
Accumulate UTF-8 characters in fbcon_putchar(), and require the frame
buffer console's .glyph() method to accept Unicode character values.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 17:27:18 +00:00
Michael Brown 2ff3385e00 [efi] Support Unicode character output via text console
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 17:09:58 +00:00
Michael Brown 7e9631b60f [utf8] Add UTF-8 accumulation self-tests
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-15 16:25:13 +00:00
Michael Brown 3cd3a73261 [utf8] Add ability to accumulate Unicode characters from UTF-8 bytes
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-03-01 15:57:33 +00:00
Michael Brown 2acdc92994 [dns] Always start DNS queries using the first configured DNS server
We currently define the active DNS server as a global variable.  All
queries will start by attempting to contact the active DNS server, and
the active DNS server will be changed only if we fail to get a
response.  This effectively treats the DNS server list as expressing a
weak preference ordering: we will try servers in order, but once we
have found a working server we will stick with that server for as long
as it continues to respond to queries.

Some sites are misconfigured to hand out DNS servers that do not have
a consistent worldview.  For example: the site may hand out two DNS
server addresses, the first being an internal DNS server (which is
able to resolve names in private DNS domains) and the second being a
public DNS server such as 8.8.8.8 (which will correctly return
NXDOMAIN for any private DNS domains).  This type of configuration is
fundamentally broken and should never be used, since any DNS resolver
performing a query for a name within a private DNS domain may obtain a
spurious NXDOMAIN response for a valid private DNS name.

Work around these broken configurations by treating the DNS server
list as expressing a strong preference ordering, and always starting
DNS queries from the first server in the list (rather than maintaining
a global concept of the active server).  This will have the debatable
benefit of converting permanent spurious NXDOMAIN errors into
transient spurious NXDOMAIN errors, which can at least be worked
around at a higher level (e.g. by retrying a download in a loop within
an iPXE script).

The cost of always starting DNS queries from the first server in the
list is a slight delay introduced when the first server is genuinely
unavailable.  This should be negligible in practice since DNS queries
are relatively infrequent and the failover expiry time is short.

Treating the DNS server list as a preference ordering is permitted by
the language of RFC 2132, which defines DHCP option 6 as a list in
which "[DNS] servers SHOULD be listed in order of preference".  No
specification defines a precise algorithm for how this preference
order should be applied in practice: this new approach seems as good
as any.

Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-23 23:17:05 +00:00
Michael Brown bc5c612f75 [console] Include mappings for AltGr-Shift-<key>
The BIOS console's interpretation of LShift+RShift as equivalent to
AltGr requires the shifted ASCII characters to be present in the AltGr
mapping table, to allow AltGr-Shift-<key> to be interpreted in the
same way as AltGr-<key>.

For keyboard layouts that have different ASCII characters for
AltGr-<key> and AltGr-Shift-<key>, this will potentially leave the
character for AltGr-<key> inaccessible via the BIOS console if the
BIOS requires the use of the LShift+RShift workaround.  This
theoretically affects the numeric keys in the Lithuanian ("lt")
keyboard layout (where the numerals are accessed via AltGr-<key> and
punctuation characters via AltGr-Shift-<key>), but the simple
workaround for that keyboard layout is to avoid using AltGr and Shift
entirely since the unmodified numeric keys are not remapped anyway.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 15:31:47 +00:00
Michael Brown 304333dace [console] Support changing keyboard map at runtime
Provide the special keyboard map named "dynamic" which allows the
active keyboard map to be selected at runtime via the ${keymap}
setting, e.g.:

  #define KEYBOARD_MAP dynamic

  iPXE> set keymap uk

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 14:06:33 +00:00
Michael Brown 674963e2a6 [settings] Always process all settings applicators
Settings applicators are entirely independent, and there is no reason
why a failure in one applicator should prevent other applicators from
being processed.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 13:50:41 +00:00
Michael Brown 11e17991d0 [console] Ensure that US keyboard map appears at start of linker table
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 13:50:41 +00:00
Michael Brown 252cff5e9a [xsigo] Avoid storing unused uninitialised fields in gateway address
As reported by Coverity, xsmp_rx_xve_modify() currently passes a
partially initialised struct ib_address_vector to xve_update_tca() and
thence to eoib_set_gateway(), which uses memcpy() to store the whole
structure including the (unused and unneeded) uninitialised fields.

Silence the Coverity warning by zeroing the whole structure.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 13:29:53 +00:00
Michael Brown 04288974f6 [pci] Ensure that pci_read_config() initialises all fields
As per the general pattern for initialisation functions in iPXE,
pci_init() saves code size by assuming that the caller has already
zeroed the underlying storage (e.g. as part of zeroing a larger
containing structure).  There are several places within the code where
pci_init() is deliberately used to initialise a transient struct
pci_device without zeroing the entire structure, because the calling
code knows that only the PCI bus:dev.fn address is required to be
initialised (e.g. when reading from PCI configuration space).

Ensure that using pci_init() followed by pci_read_config() will fully
initialise the struct pci_device even if the caller did not previously
zero the underlying storage, since Coverity reports that there are
several places in the code that rely upon this.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 12:55:28 +00:00
Michael Brown 5d22307c41 [image] Do not clear current working URI when executing embedded image
Embedded images do not have an associated URI.  This currently causes
the current working URI (cwuri) to be cleared when starting an
embedded image.

If the current working URI has been set via a ${next-server} setting
from a cached DHCP packet then this will result in unexpected
behaviour.  An attempt by the embedded script to use a relative URI to
download files from the TFTP server will fail with the error:

  Could not start download: Operation not supported (ipxe.org/3c092083)

Rerunning the "dhcp" command will not fix this error, since the TFTP
settings applicator will not see any change to the ${next-server}
setting and so will not reset the current working URI.

Fix by setting the current working URI to the image's URI only if the
image actually has an associated URI.

Debugged-by: Ignat Korchagin <ignat@cloudflare.com>
Originally-fixed-by: Ignat Korchagin <ignat@cloudflare.com>
Tested-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-16 00:21:19 +00:00
Michael Brown 419b2e71da [console] Fix definition of unreachability for remapped keys
The AltGr remapping table is constructed to include only keys that are
not reachable after applying the basic remapping table.  The logic
currently fails to include keys that are omitted entirely from the
basic remapping table since they would map to a non-ASCII character.

Fix this logic by allowing the remapping tables to include null
mappings, which are then elided only at the point of constructing the
C code fragment.

Reported-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 16:46:58 +00:00
Michael Brown 4a37b05008 [console] Add Swedish "se" keymap
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 14:12:18 +00:00
Michael Brown 5aee6b81d7 [build] Avoid invoking genkeymap.py via Perl
The build process currently invokes the Python genkeymap.py script via
the Perl executable.  Strangely, this appears to work.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 13:54:28 +00:00
Michael Brown 510f9de0a2 [console] Ensure that all ASCII characters are reachable in all keymaps
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 13:38:21 +00:00
Michael Brown 429d4beb89 [console] Remove "az" keymap
The "az" keymap has several unreachable ASCII characters, with no
obvious closest equivalent keys.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 13:38:04 +00:00
Michael Brown a7a79ab12b [console] Fix unreachable characters in "mt" keymap
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 13:37:54 +00:00
Michael Brown 164db2cc63 [console] Fix unreachable characters in "il" keymap
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 13:37:44 +00:00
Michael Brown c7d7819291 [console] Treat dead keys as producing their ASCII equivalents
Treat dead keys in target keymaps as producing the closest equivalent
ASCII character, since many of these characters are otherwise
unrepresented on the keyboard.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 13:37:41 +00:00
Michael Brown e1cedbc0d4 [console] Support AltGr to access ASCII characters via remapping
Several keyboard layouts define ASCII characters as accessible only
via the AltGr modifier.  Add support for this modifier to ensure that
all ASCII characters are accessible.

Experiments suggest that the BIOS console is likely to fail to
generate ASCII characters when the AltGr key is pressed.  Work around
this limitation by accepting LShift+RShift (which will definitely
produce an ASCII character) as a synonym for AltGr.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 12:50:26 +00:00
Michael Brown f2a59d5973 [console] Centralise handling of key modifiers
Handle Ctrl and CapsLock key modifiers within key_remap(), to provide
consistent behaviour across different console types.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 11:58:50 +00:00
Michael Brown 871dd236d4 [console] Allow for named keyboard mappings
Separate the concept of a keyboard mapping from a list of remapped
keys, to allow for the possibility of supporting multiple keyboard
mappings at runtime.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-15 11:58:47 +00:00
Michael Brown 1150321595 [tables] Add ability to declare static table start and end markers
The compound statement expression within __table_entries() prevents
the use of top-level declarations such as

  static struct thing *things = table_start ( THINGS );

Define TABLE_START() and TABLE_END() macros that can be used as:

  static TABLE_START ( things_start, THINGS );
  static struct thing *things = things_start;

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-14 13:21:09 +00:00
Michael Brown 0bbd896783 [console] Handle remapping of scancode 86
The key with scancode 86 appears in the position between left shift
and Z on a US keyboard, where it typically fails to exist entirely.
Most US keyboard maps define this nonexistent key as generating "\|",
with the notable exception of "loadkeys" which instead reports it as
generating "<>".  Both of these mapping choices duplicate keys that
exist elsewhere in the map, which causes problems for our ASCII-based
remapping mechanism.

Work around these quirks by treating the key as generating "\|" with
the high bit set, and making it subject to remapping.  Where the BIOS
generates "\|" as expected, this allows us to remap to the correct
ASCII value.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 13:59:32 +00:00
Michael Brown 3f05a82fec [console] Update genkeymap to work with current databases
Rewrite genkeymap.pl in Python with added sanity checks, and update
the list of keyboard mappings to remove those no longer supported by
the underlying "loadkeys" tool.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 13:59:32 +00:00
Michael Brown 0979b3a11d [efi] Support keyboard remapping via the EFI console
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 13:11:27 +00:00
Michael Brown eb92ba0a4f [usb] Handle upper/lower case and Ctrl-<key> after applying remapping
Some keyboard layouts (e.g. "fr") swap letter and punctuation keys.
Apply the logic for upper and lower case and for Ctrl-<key> only after
applying remapping, in order to handle these layouts correctly.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 13:11:27 +00:00
Michael Brown 468980db2b [usb] Support keyboard remapping via the native USB keyboard driver
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 13:11:27 +00:00
Michael Brown fa708015e5 [console] Avoid attempting to remap numeric keypad on BIOS console
To minimise code size, our keyboard mapping works on the basis of
allowing the BIOS to convert the keyboard scancode into an ASCII
character and then remapping the ASCII character.

This causes problems with keyboard layouts such as "fr" that swap the
shifted and unshifted digit keys, since the ASCII-based remapping will
spuriously remap the numeric keypad (which produces the same ASCII
values as the digit keys).

Fix by checking that the keyboard scancode is within the range of keys
that vary between keyboard mappings before attempting to remap the
ASCII character.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 13:11:27 +00:00
Michael Brown f51a62bc3f [console] Generalise bios_keymap() as key_remap()
Allow the keyboard remapping functionality to be exposed to consoles
other than the BIOS console.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 13:11:27 +00:00
Michael Brown 64113751c3 [efi] Enable IMAGE_GZIP by default for AArch64
AArch64 kernels tend to be distributed as gzip compressed images.
Enable IMAGE_GZIP by default for AArch64 to avoid the need for
uncompressed images to be provided.

Originally-implemented-by: Alessandro Di Stefano <aleskandro@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-10 12:47:25 +00:00
Michael Brown bc35b24e3e [prefix] Fix use of writable code segment on 486 and earlier CPUs
In real mode, code segments are always writable.  In protected mode,
code segments can never be writable.  The precise implementation of
this attribute differs between CPU generations, with subtly different
behaviour arising on the transitions from protected mode to real mode.

At the point of transition (when the PE bit is cleared in CR0) the
hidden portion of the %cs descriptor will retain whatever attributes
were in place for the protected-mode code segment, including the fact
that the segment is not writable.  The immediately following code will
perform a far control flow transfer (such as ljmp or lret) in order to
load a real-mode value into %cs.

On the Pentium and later CPUs, the retained protected-mode attributes
will be ignored for any accesses via %cs while the CPU is in real
mode.  A write via %cs will therefore be allowed even though the
hidden portion of the %cs descriptor still describes a non-writable
segment.

On the 486 and earlier CPUs, the retained protected-mode attributes
will not be ignored for accesses via %cs.  A write via %cs will
therefore cause a CPU fault.  To obtain normal real-mode behaviour
(i.e. a writable %cs descriptor), special logic is added to the ljmp
instruction that populates the hidden portion of the %cs descriptor
with real-mode attributes when a far jump is executed in real mode.
The result is that writes via %cs will cause a CPU fault until the
first ljmp instruction is executed, after which writes via %cs will be
allowed as expected in real mode.

The transition code in libprefix.S currently uses lret to load a
real-mode value into %cs after clearing the PE bit.  Experimentation
shows that only the ljmp instruction will work to load real-mode
attributes into the hidden portion of the %cs descriptor: other far
control flow transfers (such as lret, lcall, or int) do not do so.

When running on a 486 or earlier CPU, this results in code within
libprefix.S running with a non-writable code segment after a mode
transition, which in turn results in a CPU fault when real-mode code
in liba20.S attempts to write to %cs:enable_a20_method.

Fix by constructing and executing an ljmp instruction, to trigger the
relevant descriptor population logic on 486 and earlier CPUs.  This
ljmp instruction is constructed on the stack, since the .prefix
section may be executing directly from ROM (or from memory that the
BIOS has write-protected in order to emulate an ISA ROM region) and so
cannot be modified.

Reported-by: Nikolai Zhubr <n-a-zhubr@yandex.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-02-02 13:34:50 +00:00
Michael Brown 6ba671acd9 [efi] Attempt to fetch autoexec script via TFTP
Attempt to fetch the autoexec.ipxe script via TFTP using the PXE base
code protocol installed on the loaded image's device handle, if
present.

This provides a generic alternative to the use of an embedded script
for chainloaded binaries, which is particularly useful in a UEFI
Secure Boot environment since it allows the script to be modified
without the need to sign a new binary.

As a side effect, this also provides a third method for breaking the
PXE chainloading loop (as an alternative to requiring an embedded
script or custom DHCP server configuration).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-18 13:16:12 +00:00
Michael Brown ec746c0001 [efi] Allow for autoexec scripts that are not located in a filesystem
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-18 13:16:12 +00:00
Michael Brown e814d33900 [uri] Allow for relative URIs that include colons within the path
RFC3986 allows for colons to appear within the path component of a
relative URI, but iPXE will currently parse such URIs incorrectly by
interpreting the text before the colon as the URI scheme.

Fix by checking for valid characters when identifying the URI scheme.
Deliberately deviate from the RFC3986 definition of valid characters
by accepting "_" (which was incorrectly used in the iPXE-specific
"ib_srp" URI scheme and so must be accepted for compatibility with
existing deployments), and by omitting the code to check for
characters that are not used in any URI scheme supported by iPXE.

Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-13 15:03:22 +00:00
Michael Brown f4f9adf618 [efi] Include Secure Boot Advanced Targeting (SBAT) metadata
SBAT defines an encoding for security generation numbers stored as a
CSV file within a special ".sbat" section in the signed binary.  If a
Secure Boot exploit is discovered then the generation number will be
incremented alongside the corresponding fix.

Platforms may then record the minimum generation number required for
any given product.  This allows for an efficient revocation mechanism
that consumes minimal flash storage space (in contrast to the DBX
mechanism, which allows for only a single-digit number of revocation
events to ever take place across all possible signed binaries).

Add SBAT metadata to iPXE EFI binaries to support this mechanism.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-13 14:12:44 +00:00
Michael Brown fbbdc39260 [build] Ensure version.%.o is always rebuilt as expected
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-13 13:43:08 +00:00
Michael Brown 53a5de3641 [doc] Update user-visible ipxe.org URIs to use HTTPS
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-13 12:48:38 +00:00
Michael Brown 91c77e2592 [efi] Do not align VirtualSize for .reloc and .debug sections
As of commit f1e9e2b ("[efi] Align EFI image sections by page size"),
the VirtualSize fields for the .reloc and .debug sections have been
rounded up to the (4kB) image alignment.  This breaks the PE
relocation logic in the UEFI shim, which requires the VirtualSize
field to exactly match the size as recorded in the data directory.

Fix by setting the VirtualSize field to the unaligned size of the
section, as is already done for normal PE sections (i.e. those other
than .reloc and .debug).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-11 15:27:14 +00:00
Michael Brown f43c2fd697 [settings] Support formatting UUIDs as little-endian GUIDs
The RFC4122 specification defines UUIDs as being in network byte
order, but an unfortunately significant amount of (mostly Microsoft)
software treats them as having the first three fields in little-endian
byte order.

In an ideal world, any server-side software that compares UUIDs for
equality would perform an endian-insensitive comparison (analogous to
comparing strings for equality using a case-insensitive comparison),
and would therefore not care about byte order differences.

Define a setting type name ":guid" to allow a UUID setting to be
formatted in little-endian order, to simplify interoperability with
server-side software that expects such a formatting.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-01-04 14:03:12 +00:00
Michael Brown 9062544f6a [efi] Disable EFI watchdog timer when shutting down to boot an OS
The UEFI specification mandates that the EFI watchdog timer should be
disabled by the platform firmware as part of the ExitBootServices()
call, but some platforms (e.g. Hyper-V) are observed to occasionally
forget to do so, resulting in a reboot approximately five minutes
after starting the operating system.

Work around these firmware bugs by disabling the watchdog timer
ourselves.

Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-25 09:30:59 +00:00
Michael Brown 562c74e1ea [efi] Run ExitBootServices shutdown hook at TPL_NOTIFY
On some systems (observed with the Thunderbolt ports on a ThinkPad X1
Extreme Gen3 and a ThinkPad P53), if the IOMMU is enabled then the
system firmware will install an ExitBootServices notification event
that disables bus mastering on the Thunderbolt xHCI controller and all
PCI bridges, and destroys any extant IOMMU mappings.  This leaves the
xHCI controller unable to perform any DMA operations.

As described in commit 236299b ("[xhci] Avoid DMA during shutdown if
firmware has disabled bus mastering"), any subsequent DMA operation
attempted by the xHCI controller will end up completing after the
operating system kernel has reenabled bus mastering, resulting in a
DMA operation to an area of memory that the hardware is no longer
permitted to access and, on Windows with the Driver Verifier enabled,
a STOP 0xE6 (DRIVER_VERIFIER_DMA_VIOLATION).

That commit avoids triggering any DMA attempts during the shutdown of
the xHCI controller itself.  However, this is not a complete solution
since any attached and opened USB device (e.g. a USB NIC) may
asynchronously trigger DMA attempts that happen to occur after bus
mastering has been disabled but before we reset the xHCI controller.

Avoid this problem by installing our own ExitBootServices notification
event at TPL_NOTIFY, thereby causing it to be invoked before the
firmware's own ExitBootServices notification event that disables bus
mastering.

This unsurprisingly causes the shutdown hook itself to be invoked at
TPL_NOTIFY, which causes a fatal error when later code attempts to
raise the TPL to TPL_CALLBACK (which is a lower TPL).  Work around
this problem by redefining the "internal" iPXE TPL to be variable, and
set this internal TPL to TPL_NOTIFY when the shutdown hook is invoked.

Avoid calling into an underlying SNP protocol instance from within our
shutdown hook at TPL_NOTIFY, since the underlying SNP driver may
attempt to raise the TPL to TPL_CALLBACK (which would cause a fatal
error).  Failing to shut down the underlying SNP device is safe to do
since the underlying device must, in any case, have installed its own
ExitBootServices hook if any shutdown actions are required.

Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-23 15:55:01 +00:00
Michael Brown 0f4cc4b5a7 [build] Include EFI system partition table entry in isohybrid images
Add the "--uefi" option when invoking isohybrid on an EFI-bootable
image, to create a partition mapping to the EFI system partition
embedded within the ISO image.

This allows the resulting isohybrid image to be booted on UEFI systems
that will not recognise an El Torito boot catalog on a non-CDROM
device.

Originally-fixed-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-23 15:26:55 +00:00
Michael Brown a046da21a4 [efi] Raise TPL during driver unload entry point
The efi_unload() function is currently missing the calls to raise and
restore the TPL.  This has the side effect of causing iPXE to return
from the driver unload entry point at TPL_CALLBACK, which will cause
unexpected behaviour (typically a system lockup) shortly afterwards.

Fix by adding the missing calls to raise and restore the TPL.

Debugged-by: Petr Borsodi <petr.borsodi@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-22 12:50:38 +00:00
Benedikt Braunger 3ad27fbe78 [intel] Add PCI ID for Intel X553 0x15e4
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-22 12:42:18 +00:00
Michael Brown b6045a8cbb [efi] Modify global system table when wrapping a loaded image
The EFI loaded image protocol allows an image to be provided with a
custom system table, and we currently use this mechanism to wrap any
boot services calls made by the loaded image in order to provide
strace-like debugging via DEBUG=efi_wrap.

The ExitBootServices() call will modify the global system table,
leaving the loaded image using a system table that is no longer
current.  When DEBUG=efi_wrap is used, this generally results in the
machine locking up at the point that the loaded operating system calls
ExitBootServices().

Fix by modifying the global EFI system table to point to our wrapper
functions, instead of providing a custom system table via the loaded
image protocol.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-21 13:34:10 +00:00
Michael Brown 51612b6e69 [efi] Do not attempt to use console output after ExitBootServices()
A successful call to ExitBootServices() will result in the EFI console
becoming unusable.  Ensure that the EFI wrapper produces a complete
line of debug output before calling the wrapped ExitBootServices()
method, and attempt subsequent debug output only if the call fails.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-21 13:24:24 +00:00
Michael Brown 236299baa3 [xhci] Avoid DMA during shutdown if firmware has disabled bus mastering
On some systems (observed with the Thunderbolt ports on a ThinkPad X1
Extreme Gen3 and a ThinkPad P53), the system firmware will disable bus
mastering on the xHCI controller and all PCI bridges at the point that
ExitBootServices() is called if the IOMMU is enabled.  This leaves the
xHCI controller unable to shut down cleanly since all commands will
fail with a timeout.

Commit 85eb961 ("[xhci] Allow for permanent failure of the command
mechanism") allows us to detect that this has happened and respond
cleanly.  However, some unidentified hardware component (either the
xHCI controller or one of the PCI bridges) seems to manage to enqueue
the attempted DMA operation and eventually complete it after the
operating system kernel has reenabled bus mastering.  This results in
a DMA operation to an area of memory that the hardware is no longer
permitted to access.  On Windows with the Driver Verifier enabled,
this will result in a STOP 0xE6 (DRIVER_VERIFIER_DMA_VIOLATION).

Work around this problem by detecting when bus mastering has been
disabled, and immediately failing the device to avoid initiating any
further DMA attempts.

Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-12 22:27:25 +00:00
Michael Brown 1844aacc83 [uri] Retain original encodings for path, query, and fragment fields
iPXE decodes any percent-encoded characters during the URI parsing
stage, thereby allowing protocol implementations to consume the raw
field values directly without further decoding.

When reconstructing a URI string for use in an HTTP request line, the
percent-encoding is currently reapplied in a reversible way: we
guarantee that our reconstructed URI string could be decoded to give
the same raw field values.

This technically violates RFC3986, which states that "URIs that differ
in the replacement of a reserved character with its corresponding
percent-encoded octet are not equivalent".  Experiments show that
several HTTP server applications will attach meaning to the choice of
whether or not a particular character was percent-encoded, even when
the percent-encoding is unnecessary from the perspective of parsing
the URI into its component fields.

Fix by storing the originally encoded substrings for the path, query,
and fragment fields and using these original encoded versions when
reconstructing a URI string.  The path field is also stored as a
decoded string, for use by protocols such as TFTP that communicate
using raw strings rather than URI-encoded strings.  All other fields
(such as the username and password) continue to be stored only in
their decoded versions since nothing ever needs to know the originally
encoded versions of these fields.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-11-12 09:58:29 +00:00
Michael Brown 85eb961bf9 [xhci] Allow for permanent failure of the command mechanism
Some xHCI controllers (observed with the Thunderbolt ports on a
ThinkPad X1 Extreme Gen3 and a ThinkPad P53) seem to suffer a
catastrophic failure at the point that ExitBootServices() is called if
the IOMMU is enabled.  The symptoms appear to be consistent with
another UEFI driver (e.g. the IOMMU driver, or the Thunderbolt driver)
having torn down the DMA mappings, leaving the xHCI controller unable
to write to host memory.  The observable effect is that all commands
fail with a timeout, and attempts to abort command execution similarly
fail since the xHCI controller is unable to report the abort
completion.

Check for failure to abort a command, and respond by performing a full
device reset (as recommended by the xHCI specification) and by marking
the device as permanently failed.

Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-10-28 23:18:07 +01:00
Aaron Young f24a2794e1 [virtio] Update driver to use DMA API
Signed-off-by: Aaron Young <aaron.young@oracle.com>
2021-10-28 13:19:30 +01:00
Michael Brown 2265a65191 [readline] Extend maximum read line length to 1024 characters
Realistic Linux kernel command lines may exceed our current 256
character limit for interactively edited commands or settings.

Switch from stack allocation to heap allocation, and increase the
limit to 1024 characters.

Requested-by: Matteo Guglielmi <Matteo.Guglielmi@dalco.ch>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-09-10 15:51:14 +01:00
Michael Brown 05a76acc6d [ecm] Use ACPI-provided system-specific MAC address if present
Use the "system MAC address" provided within the DSDT/SSDT if such an
address is available and has not already been assigned to a network
device.

Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-09-09 12:56:02 +01:00
Michael Brown 91e147213c [ecm] Expose USB vendor/device information to ecm_fetch_mac()
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-09-09 12:52:12 +01:00
Michael Brown 0cc4c42f0a [acpi] Allow for extraction of a MAC address from the DSDT/SSDT
Some vendors provide a "system MAC address" within the DSDT/SSDT, to
be used to override the MAC address for a USB docking station.

A full implementation would require an ACPI bytecode interpreter,
since at least one OEM allows the MAC address to be constructed by
executable ACPI bytecode (rather than a fixed data structure).

We instead attempt to extract a plausible-looking "_AUXMAC_#.....#"
string that appears shortly after an "AMAC" or "MACA" signature.  This
should work for most implementations encountered in practice.

Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-09-09 12:18:00 +01:00
Michael Brown 02ec659b73 [acpi] Generalise DSDT/SSDT data extraction logic
Allow for the DSDT/SSDT signature-scanning and value extraction code
to be reused for extracting a pass-through MAC address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-09-08 14:46:30 +01:00
Michael Brown e09e1142a3 [efi] Record cached ProxyDHCPOFFER and PXEBSACK, if present
Commit cd3de55 ("[efi] Record cached DHCPACK from loaded image's
device handle, if present") added the ability for a chainloaded UEFI
iPXE to reuse an IPv4 address and DHCP options previously obtained by
a built-in PXE stack, without needing to perform a second DHCP
request.

Extend this to also record the cached ProxyDHCPOFFER and PXEBSACK
obtained from the EFI_PXE_BASE_CODE_PROTOCOL instance installed on the
loaded image's device handle, if present.

This allows a chainloaded UEFI iPXE to reuse a boot filename or other
options that were provided via a ProxyDHCP or PXE boot server
mechanism, rather than by standard DHCP.

Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-07-27 13:50:36 +01:00
Michael Brown db6310c3e5 [efi] Use zero for PCI vendor/device IDs when no applicable ID exists
When building an EFI ROM image for which no PCI vendor/device ID is
applicable (e.g. bin-x86_64-efi/ipxe.efirom), the build process will
currently construct a command such as

  ./util/efirom -v -d -c bin-x86_64-efi/ipxe.efidrv \
                         bin-x86_64-efi/ipxe.efirom

which gets interpreted as a vendor ID of "-0xd" (i.e. 0xfff3, after
truncation to 16 bits).

Fix by using an explicit zero ID when no applicable ID exists, as is
already done when constructing BIOS ROM images.

Reported-by: Konstantin Aladyshev <aladyshev22@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-07-26 15:47:47 +01:00
JuniorJPDJ b33cc1efe3 [build] Fix genfsimg to work with FATDIR with space
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-07-26 15:34:33 +01:00
Michael Brown 4d180be517 [cloud] Retry DHCP aggressively in AWS EC2
The DHCP service in EC2 has been observed to occasionally stop
responding for bursts of several seconds.  This can easily result in a
failed boot, since the current cloud boot script will attempt DHCP
only once.

Work around this problem by retrying DHCP in a fairly tight cycle
within the cloud boot script, and falling back to a reboot after
several failed DHCP attempts.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-07-20 13:19:15 +01:00
Michael Brown c64dfff0a9 [efi] Match signtool expectations for file alignment
As of commit f1e9e2b ("[efi] Align EFI image sections by page size"),
our SectionAlignment has been increased to 4kB in order to allow for
page-level memory protection to be applied by the UEFI firmware, with
FileAlignment left at 32 bytes.

The PE specification states that the value for FileAlignment "should
be a power of 2 between 512 and 64k, inclusive", and that "if the
SectionAlignment is less than the architecture's page size, then
FileAlignment must match SectionAlignment".

Testing shows that signtool.exe will reject binaries where
FileAlignment is less than 512, unless FileAlignment is equal to
SectionAlignment.  This indicates a somewhat zealous interpretation of
the word "should" in the PE specification.

Work around this interpretation by increasing FileAlignment from 32
bytes to 512 bytes, and add explanatory comments for both
FileAlignment and SectionAlignment.

Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-07-15 15:45:24 +01:00
Michael Brown 8d08300ad9 [libc] Allow for externally-defined LITTLE_ENDIAN and BIG_ENDIAN constants
When building the Linux userspace binaries, the external system
headers may have already defined values for the __LITTLE_ENDIAN and
__BIG_ENDIAN constants.

Fix by retaining the existing values if already defined, since the
actual values of these constants do not matter.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-07-15 14:16:17 +01:00
Michael Brown 2690f73096 [uri] Make URI schemes case-insensitive
RFC 3986 section 3.1 defines URI schemes as case-insensitive (though
the canonical form is always lowercase).

Use strcasecmp() rather than strcmp() to allow for case insensitivity
in URI schemes.

Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-07-01 16:32:46 +01:00
Michael Brown 4aa0375821 [rdc] Add driver for RDC R6040 embedded NIC
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-28 12:32:19 +01:00
Michael Brown 5622575c5e [realtek] Work around hardware bug on RTL8211B
The RTL8211B seems to have a bug that prevents the link from coming up
unless the MII_MMD_DATA register is cleared.

The Linux kernel driver applies this workaround (in rtl8211b_resume())
only to the specific RTL8211B PHY model, along with a matching
workaround to set bit 9 of MII_MMD_DATA when suspending the PHY.
Since we have no need to ever suspend the PHY, and since writing a
zero ought to be harmless, we just clear the register unconditionally.

Debugged-by: Nikolay Pertsev <nikolay.p@cos.flag.org>
Tested-by: Nikolay Pertsev <nikolay.p@cos.flag.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-24 12:36:46 +01:00
Michael Brown 0688114ea6 [cloud] Show ifstat output after a failed boot attempt
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-23 10:22:38 +01:00
Michael Brown 9b6ad2d888 [peerdist] Assume that most recently discovered peer can be reused
The peer discovery time has a significant impact on the overall
PeerDist download speed, since each block requires an individual
discovery attempt.  In most cases, a peer that responds for block N
will turn out to also respond for block N+1.

Assume that the most recently discovered peer (for any block) probably
has a copy of the next block to be discovered, thereby allowing the
peer download attempt to begin immediately.

In the case that this assumption is incorrect, the existing error
recovery path will allow for fallback to newly discovered peers (or to
the origin server).

Suggested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-22 09:45:21 +01:00
Michael Brown 51c88a4a62 [build] Fix building on broken versions of GNU binutils
Some versions of GNU objcopy (observed with binutils 2.23.52.0.1 on
CentOS 7.0.1406) document the -D/--enable-deterministic-archives
option but fail to recognise the short form of the option.

Work around this problem by using the long form of the option.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-17 14:37:39 +01:00
Bernhard M. Wiedemann bf4ccd4265 [build] Ensure build ID is deterministic
Commit 040cdd0 ("[linux] Add a prefix to all symbols to avoid future
name collisions") unintentionally reintroduced an element of
non-determinism into the build ID, by omitting the -D option when
manipulating the blib.a archive.

Fix by adding the -D option to restore determinism.

Reworded-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-14 14:36:51 +01:00
Michael Brown 3c040ad387 [efi] Veto the Itautec Ip4ConfigDxe driver
The Ip4ConfigDxe driver bug that was observed on Dell systems in
commit 64b4452 ("[efi] Blacklist the Dell Ip4ConfigDxe driver") has
also been observed on systems with a manufacturer name of "Itautec
S.A.".  The symptoms of the bug are identical: an attempt to call
DisconnectController() on the LOM device handle will lock up the
system.

Fix by extending the veto to cover the Ip4ConfigDxe driver for this
manufacturer.

Debugged-by: Celso Viana <celso.vianna@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-11 15:14:21 +01:00
Michael Brown 3dd1989ac0 [libc] Match standard prototype for putchar()
Reported-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-07 13:26:01 +01:00
Michael Brown 52300ccf98 [base64] Include terminating NUL within base64 character array
Reported-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-07 13:20:02 +01:00
Michael Brown 92807f5759 [rndis] Fix size of reserved fields
Most RNDIS data structures include a trailing 4-byte reserved field.
For the REMOTE_NDIS_PACKET_MSG and REMOTE_NDIS_INITIALIZE_CMPLT
structures, this is an 8-byte field instead.

iPXE currently uses incorrect structure definitions with a 4-byte
reserved field in all data structures, resulting in data payloads that
overlap the last 4 bytes of the 8-byte reserved field.

RNDIS uses explicit offsets to locate any data payloads beyond the
message header, and so liberal RNDIS parsers (such as those used in
Hyper-V and in the Linux USB Ethernet gadget driver) are still able to
parse the malformed structures.

A stricter RNDIS parser (such as that found in some older Android
builds that seem to use an out-of-tree USB Ethernet gadget driver) may
reject the malformed structures since the data payload offset is less
than the header length, causing iPXE to be unable to transmit packets.

Fix by correcting the length of the reserved fields.

Debugged-by: Martin Nield <pmn1492@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-07 12:01:10 +01:00
Michael Brown 065dce8d59 [ath5k] Avoid returning uninitialised data on EEPROM read errors
Originally-implemented-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-04 14:16:44 +01:00
Michael Brown f3f568e382 [crypto] Add memory output constraints for big-integer inline assembly
The ARM versions of the big-integer inline assembly functions include
constraints to indicate that the output value is modified by the
assembly code.  These constraints are not present in the equivalent
code for the x86 versions.

As of GCC 11, this results in the compiler reporting that the output
values may be uninitialized.

Fix by including the relevant memory output constraints.

Reported-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-06-03 13:34:14 +01:00
Michael Brown 74c54461cb [build] Use SOURCE_DATE_EPOCH for isohybrid MBR ID if it exists
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-24 15:38:54 +01:00
Michael Brown 0d68d71519 [build] Use SOURCE_DATE_EPOCH for .iso timestamps if it exists
Originally-implemented-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-24 15:30:08 +01:00
Michael Brown e5f0255173 [efi] Provide an "initrd.magic" file for use by UEFI kernels
Provide a file "initrd.magic" via the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
that contains the initrd file as constructed for BIOS bzImage kernels
(including injected files with CPIO headers constructed by iPXE).

This allows BIOS and UEFI kernels to obtain the exact same initramfs
image, by adding "initrd=initrd.magic" to the kernel command line.
For example:

  #!ipxe
  kernel boot/vmlinuz initrd=initrd.magic
  initrd boot/initrd.img
  initrd boot/modules/e1000.ko      /lib/modules/e1000.ko
  initrd boot/modules/af_packet.ko  /lib/modules/af_packet.ko
  boot

Do not include the "initrd.magic" file within the root directory
listing, since doing so would break software such as wimboot that
processes all files within the root directory.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-21 20:18:50 +01:00
Michael Brown ef9953b712 [efi] Allow for non-image-backed virtual files
Restructure the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL implementation to
allow for the existence of virtual files that are not simply backed by
a single underlying image.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-21 16:32:36 +01:00
Michael Brown bfca3db41e [cpio] Split out bzImage initrd CPIO header construction
iPXE will construct CPIO headers for images that have a non-empty
command line, thereby allowing raw images (without CPIO headers) to be
injected into a dynamically constructed initrd.  This feature is
currently implemented within the BIOS-only bzImage format support.

Split out the CPIO header construction logic to allow for reuse in
other contexts such as in a UEFI build.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-21 15:19:38 +01:00
Michael Brown fc8bd4ba1a [x509] Use case-insensitive comparison for certificate names
DNS names are case-insensitive, and RFC 5280 (unlike RFC 3280)
mandates support for case-insensitive name comparison in X.509
certificates.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-18 11:46:28 +01:00
Michael Brown 661093054b [libc] Add strncasecmp()
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-18 11:45:24 +01:00
Joseph 059c4dc688 [bnxt] Use hexadecimal values in PCI_ROM entries
Use hexadecimal values instead of macros in PCI_ROM entries so Perl
script can parse them correctly.  Move PCI_ROM entries from header
file to C file.  Integrate bnxt_vf_nics array into PCI_ROM entries by
introducing BNXT_FLAG_PCI_VF flag into driver_data field.  Add
whitespaces in PCI_ROM entries for style consistency.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-17 22:35:53 +01:00
Christian Nilsson adb2ed907e [intel] Add PCI ID for I219-V and -LM 10 to 15
Signed-off-by: Christian Nilsson <nikize@gmail.com>
2021-05-17 22:29:07 +01:00
Michael Brown d7bc9e9d67 [image] Support archive image formats independently of "imgextract" command
Support for the zlib and gzip archive image formats is currently
included only if the IMAGE_ARCHIVE_CMD is used to enable the
"imgextract" command.

The ability to transparently execute a single-member archive image
without using the "imgextract" command renders this unintuitive: a
user wanting to gain the ability to boot a gzip-compressed kernel
image would expect to have to enable IMAGE_GZIP rather than
IMAGE_ARCHIVE_CMD.

Reverse the inclusion logic, so that archive image formats must now be
enabled explicitly (via IMAGE_GZIP and/or IMAGE_ZLIB), with the
archive image management commands dragged in as needed if any archive
image formats are enabled.  The archive image management commands may
be explicitly disabled via IMAGE_ARCHIVE_CMD if necessary.

This matches the behaviour of IBMGMT_CMD and similar options, where
the relevant commands are included only when something else already
drags in the underlying feature.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-12 14:50:34 +01:00
Michael Brown 62f732207e [image] Propagate trust flag to extracted archive images
An extracted image is wholly derived from the original archive image.
If the original archive image has been verified and marked as trusted,
then this trust logically extends to any image extracted from it.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-12 14:14:52 +01:00
Michael Brown 191f8825cb [image] Allow single-member archive images to be executed transparently
Provide image_extract_exec() as a helper method to allow single-member
archive images (such as gzip compressed images) to be executed without
an explicit "imgextract" step.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-12 13:57:35 +01:00
Michael Brown a6a8bb1a9a [undi] Read TSC only when profiling
Avoid using the "rdtsc" instruction unless profiling is enabled.  This
allows the non-debug build of the UNDI driver to be used on a CPU such
as a 486 that does not support the TSC.

Reported-by: Nikolai Zhubr <n-a-zhubr@yandex.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-12 11:03:01 +01:00
Michael Brown 05fcf1a2f0 [rng] Check for TSC support before using RTC entropy source
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-12 10:24:00 +01:00
Michael Brown 13c1abe10a [prefix] Specify i486 architecture for LZMA decompressor
The decompressor uses the i486 "bswap" instruction, but does not
require any instructions that exist only on i586 or above.  Update the
".arch" directive to reflect the requirements of the code as
implemented.

Reported-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-12 10:09:33 +01:00
Michael Brown 866fa1ce76 [gzip] Add support for gzip archive images
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-08 15:34:19 +01:00
Michael Brown d093683d93 [zlib] Add support for zlib archive images
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-05-08 15:34:19 +01:00