Commit Graph

6521 Commits (9fb28080d97fac1061660befacfad8caaa2bc522)
 

Author SHA1 Message Date
Michael Brown 54d83e92f0 [tls] Add support for AEAD ciphers
Allow for AEAD cipher suites where the MAC length may be zero and the
authentication is instead provided by an authenticating cipher, with
the plaintext authentication tag appended to the ciphertext.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 15:14:19 +00:00
Michael Brown 186306d619 [tls] Treat invalid block padding as zero length padding
Harden against padding oracle attacks by treating invalid block
padding as zero length padding, thereby deferring the failure until
after computing the (incorrect) MAC.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 15:14:06 +00:00
Michael Brown 634a86093a [tls] Allow for arbitrary-length initialisation vectors
Restructure the encryption and decryption operations to allow for the
use of ciphers where the initialisation vector is constructed by
concatenating the fixed IV (derived as part of key expansion) with a
record IV (prepended to the ciphertext).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 15:14:04 +00:00
Michael Brown c453b4c284 [tls] Add MAC length as a cipher suite parameter
TLS stream and block ciphers use a MAC with a length equal to the
output length of the digest algorithm in use.  For AEAD ciphers there
is no MAC, with the equivalent functionality provided by the cipher
algorithm's authentication tag.

Allow for the existence of AEAD cipher suites by making the MAC length
a parameter of the cipher suite.

Assume that the MAC key length is equal to the MAC length, since this
is true for all currently supported cipher suites.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 14:09:18 +00:00
Michael Brown b6eef14858 [tls] Abstract out concept of a TLS authentication header
All TLS cipher types use a common structure for the per-record data
that is authenticated in addition to the plaintext itself.  This data
is used as a prefix in the HMAC calculation for stream and block
ciphers, or as additional authenticated data for AEAD ciphers.

Define a "TLS authentication header" structure to hold this data as a
contiguous block, in order to meet the alignment requirement for AEAD
ciphers such as GCM.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-08 13:48:45 +00:00
Michael Brown 6a360ebfde [tls] Ensure cipher alignment size is respected
Adjust the length of the first received ciphertext data buffer to
ensure that all decryption operations respect the cipher's alignment
size.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:49 +00:00
Michael Brown 30243ad739 [crypto] Add concept of cipher alignment size
The GCM cipher mode of operation (in common with other counter-based
modes of operation) has a notion of blocksize that does not neatly
fall into our current abstraction: it does operate in 16-byte blocks
but allows for an arbitrary overall data length (i.e. the final block
may be incomplete).

Model this by adding a concept of alignment size.  Each call to
encrypt() or decrypt() must begin at a multiple of the alignment size
from the start of the data stream.  This allows us to model GCM by
using a block size of 1 byte and an alignment size of 16 bytes.

As a side benefit, this same concept allows us to neatly model the
fact that raw AES can encrypt only a single 16-byte block, by
specifying an alignment size of zero on this cipher.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:48 +00:00
Michael Brown d1bc872a2e [tls] Formalise notions of fixed and record initialisation vectors
TLS block ciphers always use CBC (as per RFC 5246 section 6.2.3.2)
with a record initialisation vector length that is equal to the cipher
block size, and no fixed initialisation vector.

The initialisation vector for AEAD ciphers such as GCM is less
straightforward, and requires both a fixed and per-record component.

Extend the definition of a cipher suite to include fixed and record
initialisation vector lengths, and generate the fixed portion (if any)
as part of key expansion.

Do not add explicit calls to cipher_setiv() in tls_assemble_block()
and tls_split_block(), since the constraints imposed by RFC 5246 are
specifically chosen to allow implementations to avoid doing so.
(Instead, add a sanity check that the record initialisation vector
length is equal to the cipher block size.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:48 +00:00
Michael Brown f8565a655e [tls] Remove support for TLSv1.0
The TLSv1.0 protocol was deprecated by RFC 8996 (along with TLSv1.1),
and has been disabled by default in iPXE since commit dc785b0fb
("[tls] Default to supporting only TLSv1.1 or above") in June 2020.

While there is value in continuing to support older protocols for
interoperability with older server appliances, the additional
complexity of supporting the implicit initialisation vector for
TLSv1.0 is not worth the cost.

Remove support for the obsolete TLSv1.0 protocol, to reduce complexity
of the implementation and simplify ongoing maintenance.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-07 11:19:48 +00:00
Michael Brown 7b60a48752 [efi] Clear DMA-coherent buffers before mapping
The DMA mapping is performed implicitly as part of the call to
dma_alloc().  The current implementation creates the IOMMU mapping for
the allocated and potentially uninitialised data before returning to
the caller (which will immediately zero out or otherwise initialise
the buffer).  This leaves a small window within which a malicious PCI
device could potentially attempt to retrieve firmware-owned secrets
present in the uninitialised buffer.  (Note that the hypothetically
malicious PCI device has no viable way to know the address of the
buffer from which to attempt a DMA read, rendering the attack
extremely implausible.)

Guard against any such hypothetical attacks by zeroing out the
allocated buffer prior to creating the coherent DMA mapping.

Suggested-by: Mateusz Siwiec <Mateusz.Siwiec@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-11-04 20:28:09 +00:00
Michael Brown f48b01cb01 [bzimage] Fix parsing of "vga=..." when not at end of command line
bzimage_parse_cmdline() uses strcmp() to identify the named "vga=..."
kernel command line option values, which will give a false negative if
the option is not last on the command line.

Fix by temporarily changing the relevant command line separator (if
any) to a NUL terminator.

Debugged-by: Simon Rettberg <simon.rettberg@rz.uni-freiburg.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-27 13:05:35 +01:00
Michael Brown 8fce26730c [crypto] Add block cipher Galois/Counter mode of operation
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:30 +01:00
Michael Brown da81214cec [crypto] Add concept of authentication tag to cipher algorithms
Some ciphers (such as GCM) support the concept of a tag that can be
used to authenticate the encrypted data.  Add a cipher method for
generating an authentication tag.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:30 +01:00
Michael Brown 0c383bf00a [crypto] Add concept of additional data to cipher algorithms
Some ciphers (such as GCM) support the concept of additional
authenticated data, which does not appear in the ciphertext but may
affect the operation of the cipher.

Allow cipher_encrypt() and cipher_decrypt() to be called with a NULL
destination buffer in order to pass additional data.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:30 +01:00
Michael Brown 8e478e648f [crypto] Allow initialisation vector length to vary from cipher blocksize
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:21:28 +01:00
Michael Brown 52f72d298a [crypto] Expose null crypto algorithm methods for reuse
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-25 13:20:22 +01:00
Michael Brown 2c78242732 [tls] Add support for DHE variants of the existing cipher suites
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 15:42:13 +01:00
Michael Brown 6b2c94d3a7 [tls] Add support for Ephemeral Diffie-Hellman key exchange
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 15:42:11 +01:00
Michael Brown ea33ea33c0 [tls] Add key exchange mechanism to definition of cipher suite
Allow for the key exchange mechanism to vary depending upon the
selected cipher suite.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 80c45c5c71 [tls] Record ServerKeyExchange record, if provided
Accept and record the ServerKeyExchange record, which is required for
key exchange mechanisms such as Ephemeral Diffie-Hellman (DHE).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 028aac99a3 [tls] Generate pre-master secret at point of sending ClientKeyExchange
The pre-master secret is currently constructed at the time of
instantiating the TLS connection.  This precludes the use of key
exchange mechanisms such as Ephemeral Diffie-Hellman (DHE), which
require a ServerKeyExchange message to exchange additional key
material before the pre-master secret can be constructed.

Allow for the use of such cipher suites by deferring generation of the
master secret until the point of sending the ClientKeyExchange
message.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 1a7317e7d4 [tls] Generate master secret at point of sending ClientKeyExchange
The master secret is currently constructed upon receiving the
ServerHello message.  This precludes the use of key exchange
mechanisms such as Ephemeral Diffie-Hellman (DHE), which require a
ServerKeyExchange message to exchange additional key material before
the pre-master secret and master secret can be constructed.

Allow for the use of such cipher suites by deferring generation of the
master secret until the point of sending the ClientKeyExchange
message.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:37:12 +01:00
Michael Brown 18b861024a [crypto] Add Ephemeral Diffie-Hellman key exchange algorithm
Add an implementation of the Ephemeral Diffie-Hellman key exchange
algorithm as defined in RFC2631, with test vectors taken from the NIST
Cryptographic Toolkit.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-11 14:33:19 +01:00
Michael Brown 007d3cb800 [crypto] Simplify internal HMAC API
Simplify the internal HMAC API so that the key is provided only at the
point of calling hmac_init(), and the (potentially reduced) key is
stored as part of the context for later use by hmac_final().

This simplifies the calling code, and avoids the need for callers such
as TLS to allocate a potentially variable length block in order to
retain a copy of the unmodified key.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-10 12:21:54 +01:00
Michael Brown 88419b608d [test] Add HMAC self-tests
The HMAC code is already tested indirectly via several consuming
algorithms that themselves provide self-tests (e.g. HMAC-DRBG, NTLM
authentication, and PeerDist content identification), but lacks any
direct test vectors.

Add explicit HMAC tests and ensure that corner cases such as empty
keys, block-length keys, and over-length keys are all covered.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-10-10 12:17:39 +01:00
Michael Brown 081b3eefc4 [ena] Assign memory BAR if left empty by BIOS
Some BIOSes in AWS EC2 (observed with a c6i.metal instance in
eu-west-2) will fail to assign an MMIO address to the ENA device,
which causes ioremap() to fail.

Experiments show that the ENA device is the only device behind its
bridge, even when multiple ENA devices are present, and that the BIOS
does assign a memory window to the bridge.

We may therefore choose to assign the device an MMIO address at the
start of the bridge's memory window.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-19 17:49:25 +01:00
Michael Brown 3aa6b79c8d [pci] Add minimal PCI bridge driver
Add a minimal driver for PCI bridges that can be used to locate the
bridge to which a PCI device is attached.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-19 17:47:57 +01:00
Michael Brown 649176cd60 [pci] Select PCI I/O API at runtime for cloud images
Pretty much all physical machines and off-the-shelf virtual machines
will provide a functional PCI BIOS.  We therefore default to using
only the PCI BIOS, with no fallback to an alternative mechanism if the
PCI BIOS fails.

AWS EC2 provides the opportunity to experience some exceptions to this
rule.  For example, the t3a.nano instances in eu-west-1 have no
functional PCI BIOS at all.  As of commit 83516ba ("[cloud] Use
PCIAPI_DIRECT for cloud images") we therefore use direct Type 1
configuration space accesses in the images built and published for use
in the cloud.

Recent experience has discovered yet more variation in AWS EC2
instances.  For example, some of the metal instance types have
multiple PCI host bridges and the direct Type 1 accesses therefore
see only a subset of the PCI devices.

Attempt to accommodate future such variations by making the PCI I/O
API selectable at runtime and choosing ECAM (if available), falling
back to the PCI BIOS (if available), then finally falling back to
direct Type 1 accesses.

This is implemented as a dedicated PCIAPI_CLOUD API, rather than by
having the PCI core select a suitable API at runtime (as was done for
timers in commit 302f1ee ("[time] Allow timer to be selected at
runtime").  The common case will remain that only the PCI BIOS API is
required, and we would prefer to retain the optimisations that come
from inlining the configuration space accesses in this common case.
Cloud images are (at present) disk images rather than ROM images, and
so the increased code size required for this design approach in the
PCIAPI_CLOUD case is acceptable.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-18 13:41:21 +01:00
Michael Brown 9448ac5445 [bios] Allow pcibios_discover() to return an empty range
Allow pcibios_discover() to return an empty range if the INT 1A,B101
PCI BIOS installation check call fails.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-18 13:35:58 +01:00
Michael Brown be667ba948 [pci] Add support for the Enhanced Configuration Access Mechanism (ECAM)
The ACPI MCFG table describes a direct mapping of PCI configuration
space into MMIO space.  This mapping allows access to extended
configuration space (up to 4096 bytes) and also provides for the
existence of multiple host bridges.

Add support for the ECAM mechanism described by the ACPI MCFG table,
as a selectable PCI I/O API alongside the existing PCI BIOS and Type 1
mechanisms.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-16 01:05:47 +01:00
Michael Brown ff228f745c [pci] Generalise pci_num_bus() to pci_discover()
Allow pci_find_next() to discover devices beyond the first PCI
segment, by generalising pci_num_bus() (which implicitly assumes that
there is only a single PCI segment) with pci_discover() (which has the
ability to return an arbitrary contiguous chunk of PCI bus:dev.fn
address space).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 16:49:47 +01:00
Michael Brown 56b30364c5 [pci] Check for wraparound in callers of pci_find_next()
The semantics of the bus:dev.fn parameter passed to pci_find_next()
are "find the first existent PCI device at this address or higher",
with the caller expected to increment the address between finding
devices.  This does not allow the parameter to distinguish between the
two cases "start from address zero" and "wrapped after incrementing
maximal possible address", which could therefore lead to an infinite
loop in the degenerate case that a device with address ffff:ff:1f.7
really exists.

Fix by checking for wraparound in the caller (which is already
responsible for performing the increment).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 15:20:58 +01:00
Michael Brown 8fc3c26eae [pci] Allow pci_find_next() to return non-zero PCI segments
Separate the return status code from the returned PCI bus:dev.fn
address, in order to allow pci_find_next() to be used to find devices
with a non-zero PCI segment number.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 15:20:58 +01:00
Michael Brown 6459e3b7b1 [linux] Add missing PROVIDE_PCIAPI_INLINE() macros
Ensure type consistency of the PCI I/O API methods by adding the
missing PROVIDE_PCIAPI_INLINE() macros.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-15 15:20:58 +01:00
Michael Brown 8f5fc16143 [ipv6] Ignore SLAAC on prefixes with an incompatible prefix length
Experience suggests that routers are often misconfigured to advertise
SLAAC even on prefixes that do not have a SLAAC-compatible prefix
length.  iPXE will currently treat this as an error, resulting in the
prefix being ignored completely.

Handle this misconfiguration by ignoring the autonomous address flag
when the prefix length is unsuitable for SLAAC.

Reported-by: Malte Janduda <mail@janduda.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-13 13:25:19 +01:00
Michael Brown bc19aeca5f [ipv6] Fix mask calculation when prefix length is not a multiple of 8
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-06 13:04:19 +01:00
Michael Brown 131daf1aae [test] Validate constructed IPv6 routing table entries
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-09-06 12:31:32 +01:00
Michael Brown a80124456e [ena] Increase receive ring size to 128 entries
Some versions of the ENA hardware (observed on a c6i.large instance in
eu-west-2) seem to require a receive ring containing at least 128
entries: any smaller ring will never see receive completions or will
stall after the first few completions.

Increase the receive ring size to 128 entries (determined empirically)
for compatibility with these hardware versions.  Limit the receive
ring fill level to 16 (as at present) to avoid consuming more memory
than will typically be available in the internal heap.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 3b81a4e256 [ena] Provide a host information page
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) seem to require a host information page, without which the
CREATE_CQ command will fail with ENA_ADMIN_UNKNOWN_ERROR.

These firmware versions also seem to require us to claim that we are a
Linux kernel with a specific driver major version number.  This
appears to be a firmware bug, as revealed by Linux kernel commit
1a63443af ("net/amazon: Ensure that driver version is aligned to the
linux kernel"): this commit changed the value of the driver version
number field to be the Linux kernel version, and was hastily reverted
in commit 92040c6da ("net: ena: fix broken interface between ENA
driver and FW") which clarified that the version number field does
actually have some undocumented significance to some versions of the
firmware.

Fix by providing a host information page via the SET_FEATURE command,
incorporating the apparently necessary lies about our identity.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 9f81e97af5 [ena] Specify the unused completion queue MSI-X vector as 0xffffffff
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) will complain if the completion queue's MSI-X vector field
is left empty, even though the queue configuration specifies that
interrupts are not used.

Work around these firmware versions by passing in what appears to be
the magic "no MSI-X vector" value in this field.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:27 +01:00
Michael Brown 6d2cead461 [ena] Allow for out-of-order completions
The ENA data path design has separate submission and completion
queues.  Submission queues must be refilled in strict order (since
there is only a single linear tail pointer used to communicate the
existence of new entries to the hardware), and completion queue
entries include a request identifier copied verbatim from the
submission queue entry.  Once the submission queue doorbell has been
rung, software never again reads from the submission queue entry and
nothing ever needs to write back to the submission queue entry since
completions are reported via the separate completion queue.

This design allows the hardware to complete submission queue entries
out of order, provided that it internally caches at least as many
entries as it leaves gaps.

Record and identify I/O buffers by request identifier (using a
circular ring buffer of unique request identifiers), and remove the
assumption that submission queue entries will be completed in order.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:38:25 +01:00
Michael Brown 856ffe000e [ena] Limit submission queue fill level to completion queue size
The CREATE_CQ command is permitted to return a size smaller than
requested, which could leave us in a situation where the completion
queue could overflow.

Avoid overflow by limiting the submission queue fill level to the
actual size of the completion queue.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-26 19:37:54 +01:00
Michael Brown c5af41a6f5 [intelxl] Explicitly request a single queue pair for virtual functions
Current versions of the E810 PF driver fail to set the number of
in-use queue pairs in response to the CONFIG_VSI_QUEUES message.  When
the number of in-use queue pairs is less than the number of available
queue pairs, this results in some packets being directed to
nonexistent receive queues and hence silently dropped.

Work around this PF driver bug by explicitly configuring the number of
available queue pairs via the REQUEST_QUEUES message.  This message
triggers a VF reset that, in turn, requires us to reopen the admin
queue and issue an additional GET_RESOURCES message to restore the VF
to a functional state.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 19:31:06 +01:00
Michael Brown 04879352c4 [intelxl] Allow for admin commands that trigger a VF reset
The RESET_VF admin queue command does not complete via the usual
mechanism, but instead requires us to poll registers to wait for the
reset to take effect and then reopen the admin queue.

Allow for the existence of other admin queue commands that also
trigger a VF reset, by separating out the logic that waits for the
reset to complete.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 19:29:01 +01:00
Michael Brown 491c075f7f [intelxl] Negotiate virtual function API version 1.1
Negotiate API version 1.1 in order to allow access to virtual function
opcodes that are disallowed by default on the E810.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 17:58:52 +01:00
Michael Brown b52ea20841 [intelxl] Show virtual function packet statistics for debugging
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-16 17:58:46 +01:00
Michael Brown cad1cc6b44 [intelxl] Add driver for Intel 100 Gigabit Ethernet NICs
Add a driver for the E810 family of 100 Gigabit Ethernet NICs.  The
core datapath is identical to that of the 40 Gigabit XL710, and this
part of the code is shared between both drivers.  The admin queue
mechanism is sufficiently similar to make it worth reusing substantial
portions of the code, with separate implementations for several
commands to handle the (unnecessarily) breaking changes in data
structure layouts.  The major differences are in the mechanisms for
programming queue contexts (where the E810 abandons TX/RX symmetry)
and for configuring the transmit scheduler and receive filters: these
portions are sufficiently different to justify a separate driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 16:15:17 +01:00
Michael Brown 6871a7de70 [intelxl] Use admin queue to set port MAC address and maximum frame size
Remove knowledge of the PRTGL_SA[HL] registers, and instead use the
admin queue to set the MAC address and maximum frame size.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:24:06 +01:00
Michael Brown 727b034f11 [intelxl] Use admin queue to get port MAC address
Remove knowledge of the PRTPM_SA[HL] registers, and instead use the
admin queue to retrieve the MAC address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00
Michael Brown 06467ee70f [intelxl] Defer fetching MAC address until after opening admin queue
Allow for the MAC address to be fetched using an admin queue command,
instead of reading the PRTPM_SA[HL] registers directly.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2022-08-12 13:03:12 +01:00