When creating a device tree to pass to a booted operating system,
ensure that the "chosen" node exists, and populate the "bootargs"
property with the image command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The allocation of memory for the certificate chain link may cause the
certificate itself to be freed by the cache discarder, if the only
current reference to the certificate is held by the certificate store
and the system runs out of memory during the call to malloc().
Ensure that this cannot happen by taking out a temporary additional
reference to the certificate within x509_append(), rather than
requiring the caller to do so.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Large transmitted records may arise if we have long client certificate
chains or if a client sends a large block of data (such as a large
HTTP POST payload). Fragment records as needed to comply with the
value that we advertise via the max_fragment_length extension.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC5246 states that "a client MAY send no certificates if it does not
have an appropriate certificate to send in response to the server's
authentication request". This use case may arise when the server is
using optional client certificate verification and iPXE has not been
provided with a client certificate to use.
Treat the absence of a suitable client certificate as a non-fatal
condition and send a Certificate message containing no certificates as
permitted by RFC5246.
Reported-by: Alexandre Ravey <alexandre@voilab.ch>
Originally-implemented-by: Alexandre Ravey <alexandre@voilab.ch>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Without any explicit alignment requirement, we will currently allocate
I/O buffers on their own size rounded up to the nearest power of two.
This is done to simplify driver transmit code paths, which can assume
that a standard Ethernet frame lies within a single physical page and
therefore does not need to be split even for devices with DMA engines
that cannot cross page boundaries.
Limit this automatic alignment to a maximum of the page size, to avoid
requiring excessive alignment for unusually large buffers (such as a
buffer allocated for an HTTP POST with a large parameter list).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a custom xfer_alloc_iob() handler to ensure that transmit I/O
buffers contain sufficient headroom for the TLS record header and
record initialisation vector, and sufficient tailroom for the MAC,
block cipher padding, and authentication tag. This allows us to use
in-place encryption for the actual data within the I/O buffer, which
essentially halves the amount of memory that needs to be allocated for
a TLS data transmission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Datagram sockets such as UDP, ICMP, and fibre channel tend to provide
a custom xfer_alloc_iob() handler to ensure that transmit I/O buffers
contain sufficient headroom to accommodate any required protocol
headers.
Stream sockets such as TCP and TLS do not typically provide a custom
xfer_alloc_iob() handler at present. The default handler simply calls
alloc_iob(), and so stream socket consumers can therefore get away
with using alloc_iob() rather than xfer_alloc_iob().
Fix the HTTP and ONC RPC protocols to use xfer_alloc_iob() where
relevant, in order to operate correctly if the underlying stream
socket chooses to provide a custom xfer_alloc_iob() handler.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Legacy ISA device probing involves poking at various I/O addresses to
guess whether or not a particular device is present.
Actual legacy ISA cards are essentially nonexistent by now, but the
probed I/O addresses have a habit of being reused for various
OEM-specific functions. This can cause some very undesirable side
effects. For example, probing for the "ne2k_isa" driver on an HP
Elitebook 840 G10 will cause the system to lock up in a way that
requires two cold reboots to recover.
Enable ISA_PROBE_ONLY in config/isa.h by default. This limits ISA
probing to use only the addresses specified in ISA_PROBE_ADDRS, which
is empty by default, and so effectively disables ISA probing. The
vanishingly small number of users who require ISA probing can simply
adjust this configuration in config/local/isa.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The executed image may call DisconnectController() to remove our
network device. This will leave the net device unregistered but not
yet freed (since our installed PXE base code protocol retains a
reference to the net device).
Unregistration will cause the network upper-layer driver removal
functions to be called, which will free the SNP device structure.
When the image returns from StartImage(), the snpdev pointer may
therefore no longer be valid.
The SNP device structure is not reference counted, and so we cannot
simply take out a reference to ensure that it remains valid across the
call to StartImage(). However, the code path following the call to
StartImage() doesn't actually require the SNP device pointer, only the
EFI device handle.
Store the device handle in a local variable and ensure that snpdev is
invalidated before the call to StartImage() so that future code cannot
accidentally reintroduce this issue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI does not provide a direct method to disconnect the existing
driver of a specific protocol from a handle. We currently use
DisconnectController() to remove all drivers from a handle that we
want to drive ourselves, and then rely on recursion in the call to
ConnectController() to reconnect any drivers that did not need to be
disconnected in the first place.
Experience shows that OEMs tend not to ever test the disconnection
code paths in their UEFI drivers, and it is common to find drivers
that refuse to disconnect, fail to close opened handles, fail to
function correctly after reconnection, or lock up the entire system.
Implement a more selective form of disconnection, in which we use
OpenProtocolInformation() to identify the driver associated with a
specific protocol, and then disconnect only that driver.
Perform disconnections in reverse order of attachment priority, since
this is the order likely to minimise the number of cascaded implicit
disconnections.
This allows our MNP driver to avoid performing any disconnections at
all, since it does not require exclusive access to the MNP protocol.
It also avoids performing unnecessary disconnections and reconnections
of unrelated drivers such as the "UEFI WiFi Connection Manager" that
attaches to wireless network interfaces in order to manage wireless
network associations.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define an ordering for internal EFI drivers on the basis of how close
the driver is to the hardware, and attempt to start drivers in this
order.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI assumes in several places that an image installs only a single
driver binding protocol instance, and that this is installed on the
image handle itself. We therefore provide a single driver binding
protocol instance, which delegates to the various internal drivers
(for EFI_PCI_IO_PROTOCOL, EFI_USB_IO_PROTOCOL, etc) as appropriate.
The debug messages produced by our Supported() method can end up
slightly misleading, since they will report only the first internal
driver that claims support for a device. In the common case of the
all-drivers build, there may be multiple drivers that claim support
for the same handle: for example, the PCI, NII, SNP, and MNP drivers
are all likely to initially find the protocols that they need on the
same device handle.
Report all internal drivers that claim support for a device, to avoid
confusing debug messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Return success if asked to stop driving a device that we are not
currently driving. This avoids propagating spurious errors to an
external caller of DisconnectController().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If we have a device tree available (e.g. because the user has
explicitly downloaded a device tree using the "fdt" command), then
provide it to the booted operating system as an EFI configuration
table.
Since x86 does not typically use device trees, we create weak symbols
for efi_fdt_install() and efi_fdt_uninstall() to avoid dragging FDT
support into all x86 UEFI binaries.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide fdt_create() to create a device tree to be passed to a booted
operating system. The device tree will be created from the FDT image
(if present), falling back to the system device tree (if present).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI configuration tables may be freed at any time, and there is no way
to be notified when the table becomes invalidated. Create a copy of
the system flattened device tree (if present), so that we do not risk
being left with an invalid pointer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for parsing device trees where an external factor (such as a
downloaded image length) determines the maximum length, which must be
validated against the length within the device tree header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running on a platform that uses FDT as its hardware description
mechanism, we are likely to have multiple device tree structures. At
a minimum, there will be the device tree passed to us from the
previous boot stage (e.g. OpenSBI), and the device tree that we
construct to be passed to the booted operating system.
Update the internal FDT API to include an FDT pointer in all function
parameter lists.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow a Flattened Device Tree blob (DTB) to be provided to a booted
operating system using a script such as:
#!ipxe
kernel /images/vmlinuz console=ttyAMA0
initrd /images/initrd.img
fdt /images/rk3566-radxa-zero-3e.dtb
boot
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define the concept of an "FDT" image, representing a Flattened Device
Tree blob that has been downloaded in order to be provided to a kernel
or other executable image. FDT images are represented using an image
tag (as with other special-purpose images such as the UEFI shim), and
are similarly marked as hidden so that they will not be included in a
generated magic initrd or show up in a virtual filesystem directory
listing.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the UNDI interrupt remains constantly asserted (e.g. because the
BIOS has enabled interrupts for an unrelated device sharing the same
IRQ, or because of bugs in the OEM UNDI driver), then we may get stuck
in an interrupt storm.
We cannot safely chain to the previous interrupt handler (which could
plausibly handle an unrelated device interrupt) since there is no
well-defined behaviour for previous interrupt handlers. We have
observed BIOSes to provide default interrupt handlers that variously
do nothing, send EOI, disable the IRQ, or crash the system.
Fix by disabling the UNDI interrupt whenever our handler is triggered,
and rearm it as needed when polling the network device. This ensures
that forward progress continues to be made even if something causes
the interrupt to be constantly asserted.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When using the undionly.kkpxe binary (which is never recommended), the
UNDI interrupt may still be enabled when iPXE starts up. If the PXE
base code interrupt handler is not well-behaved, this can result in
undefined behaviour when interrupts are first enabled (e.g. for
entropy gathering, or for allowing the timer tick to occur).
Fix by detecting and disabling the UNDI interrupt during the prefix
code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The implementation of PXENV_UNDI_GET_NIC_TYPE in some PXE ROMs
(observed with an Intel X710 ROM in a Dell PowerEdge R6515) will fail
to write the NicType byte, leaving it uninitialised.
Prepopulate the NicType byte with a highly unlikely value as a
sentinel to allow us to detect this, and assume that any such devices
are overwhelmingly likely to be PCI devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide wrapper macros to allow efi_open() and related functions to
accept a pointer to any pointer type as the "interface" argument, in
order to allow a substantial amount of type adjustment boilerplate to
be removed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is now simpler to use efi_open() than to use HandleProtocol() to
obtain an ephemeral protocol instance. Remove all remaining uses of
HandleProtocol() to simplify the code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI model for opening and closing protocols is broken by design
and cannot be repaired.
Calling OpenProtocol() to obtain a protocol interface pointer does
not, in general, provide any guarantees about the lifetime of that
pointer. It is theoretically possible that the pointer has already
become invalid by the time that OpenProtocol() returns the pointer to
its caller. (This can happen when a USB device is physically removed,
for example.)
Various UEFI design flaws make it occasionally necessary to hold on to
a protocol interface pointer despite the total lack of guarantees that
the pointer will remain valid.
The UEFI driver model overloads the semantics of OpenProtocol() to
accommodate the use cases of recording a driver attachment (which is
modelled as opening a protocol with EFI_OPEN_PROTOCOL_BY_DRIVER
attributes) and recording the existence of a related child controller
(which is modelled as opening a protocol with
EFI_OPEN_PROTOCOL_BY_CHILD_CONTROLLER attributes).
The parameters defined for CloseProtocol() are not sufficient to allow
the implementation to precisely identify the matching call to
OpenProtocol(). While the UEFI model appears to allow for matched
open and close pairs, this is merely an illusion. Calling
CloseProtocol() will delete *all* matching records in the protocol
open information tables.
Since the parameters defined for CloseProtocol() do not include the
attributes passed to OpenProtocol(), this means that a matched
open/close pair using EFI_OPEN_PROTOCOL_GET_PROTOCOL can inadvertently
end up deleting the record that defines a driver attachment or the
existence of a child controller. This in turn can cause some very
unexpected side effects, such as allowing other UEFI drivers to start
controlling hardware to which iPXE believes it has exclusive access.
This rarely ends well.
To prevent this kind of inadvertent deletion, we establish a
convention for four different types of protocol opening:
- ephemeral opens: always opened with ControllerHandle = NULL
- unsafe opens: always opened with ControllerHandle = AgentHandle
- by-driver opens: always opened with ControllerHandle = Handle
- by-child opens: always opened with ControllerHandle != Handle
This convention ensures that the four types of open never overlap
within the set of parameters defined for CloseProtocol(), and so a
close of one type cannot inadvertently delete the record corresponding
to a different type.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In preparation for formalising the way that EFI protocols are opened
across the codebase, remove the efipci_open() wrapper.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently have both efipci_info() and efi_pci_info() serving
different but related purposes. Rename the latter to reduce
confusion.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit e727f57 ("[efi] Include a copy of the device path within struct
efi_device") neglected to delete the closure of the parent's device
path from the success code path in efi_snp_probe().
Reduce confusion by removing this (harmless) additional close.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some non-driver handles may have an installed component name protocol.
In particular, iPXE itself installs these protocols on its SNP device
handles, to simplify the process of delegating GetControllerName()
from our single-instance driver binding protocol to whatever child
controllers the relevant EFI driver may have installed.
For non-driver handles, the device path is more useful as debugging
information than the driver name. Limit the use of the component name
protocols to handles with a driver binding protocol installed, so that
we will end up using the device path for non-driver handles such as
the SNP device.
Continue to prefer the driver name to the device path for handles with
a driver binding protocol installed, since these will generally map to
things we are likely to conceptualise as drivers rather than as
devices.
Note that we deliberately do not use GetControllerName() to attempt to
get a human-readable name for a controller handle. In the normal
course of events, iPXE is likely to disconnect at least some existing
drivers from their controller handles. This would cause the name
obtained via GetControllerName() to change. By using the device path
instead, we ensure that the debug message name remains the same even
when the driver controlling the handle is changed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Attempt to get the veto candidate driver name from both the current
and obsolete versions of the component name protocol.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for drivers that do not install the driver binding protocol on
the image handle by opening the component name protocol on the driver
binding's ImageHandle rather than on the driver handle itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When hunting down a misbehaving OEM driver to add it to the veto list,
it can be very useful to know the address ranges used by each driver.
Add this information to the verbose debug messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The driver name is usually more informative for debug messages than
the device path from which a driver was loaded. Try using the various
mechanisms for obtaining a driver name before trying the device path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Not all drivers will install the driver binding protocol on the image
handle. Accommodate these drivers by attempting to retrieve the
driver name via the component name protocol(s) located on the driver
binding's ImageHandle, as well as on the driver handle itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When DEBUG=efi_wrap is enabled, we construct a patched copy of the
boot services table and patch the global system table to point to this
copy. This ensures that any subsequently loaded EFI binaries will
call our wrappers.
Previously loaded EFI binaries will typically have cached the boot
services table pointer (in the gBS variable used by EDK2 code), and
therefore will not pick up the updated pointer and so will not call
our wrappers. In most cases, this is what we want to happen: we are
interested in tracing the calls issued by the newly loaded binary and
we do not want to be distracted by the high volume of boot services
calls issued by existing UEFI drivers.
In some circumstances (such as when a badly behaved OEM driver is
causing the system to lock up during the ExitBootServices() call), it
can be very useful to be able to patch the global boot services table
in situ, so that we can trace calls issued by existing drivers.
Restructure the wrapping code to allow wrapping to be enabled or
disabled at any time, and to allow for patching the global boot
services table in situ.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The debug wrappers for CloseEvent() and CheckEvent() are currently
both calling SignalEvent() instead (presumably due to copy-paste
errors). Astonishingly, this has generally not prevented a successful
boot in the (very rare) case that DEBUG=efi_wrap is enabled.
Fix the wrappers to call the intended functions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The virtual filesystem that we provide to expose downloaded images
will erroneously interpret filenames with redundant path separators
such as ".\filename" as an attempt to open the directory, rather than
an attempt to open "filename".
This shows up most obviously when chainloading from one iPXE into
another iPXE, when the inner iPXE may end up attempting to open
".\autoexec.ipxe" from the outer iPXE's virtual filesystem. (The
erroneously opened file will have a zero length and will therefore be
ignored, but is still confusing.)
Fix by discarding any dot or backslash characters after a potential
initial backslash. This is very liberal and will accept some
syntactically invalid paths, but this is acceptable since our virtual
filesystem does not implement directories anyway.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some systems (observed with an HP Elitebook 840 G10), writing
console output that happens to cause the display to scroll will modify
the system memory map. This causes builds with DEBUG=efi_wrap to
typically fail to boot, since the debug output from the wrapped
ExitBootServices() call itself is sufficient to change the memory map
and therefore cause ExitBootServices() to fail due to an invalid
memory map key.
Work around these UEFI firmware bugs by prescrolling the display after
a failed ExitBootServices() attempt, in order to minimise the chance
that further scrolling will happen during the subsequent attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The python-asn1 documentation indicates that end of file may be
detected either by obtaining a True value from .eof() or by obtaining
a None value from .peek(), but does not mention any way to detect the
end of a constructed tag (rather than the end of the overall file).
We currently use .eof() to detect the end of a constructed tag, based
on the observed behaviour of the library.
The behaviour of .eof() changed between versions 2.8.0 and 3.0.0, such
that .eof() no longer returns True at the end of a constructed tag.
Switch to testing for a None value returned from .peek() to determine
when we have reached the end of a constructed tag, since this works on
both newer and older versions.
Continue to treat .eof() as a necessary but not sufficient condition
for reaching the overall end of file, to maintain compatibility with
older versions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Legacy IRQ 8 appears to be enabled by default on some platforms. If
iPXE selects the RTC entropy source, this will currently result in the
RTC IRQ 8 being unconditionally disabled. This can break assumptions
made by BIOSes or subsequent bootloaders: in particular, the FreeBSD
loader may lock up at the point of starting its default 10-second
countdown when it calls INT 15,86.
Fix by restoring the previous state of IRQ 8 instead of disabling it
unconditionally. Note that we do not need to disable IRQ 8 around the
point of hooking (or unhooking) the ISR, since this code will be
executing in iPXE's normal state of having interrupts disabled anyway.
Also restore the previous state of the RTC periodic interrupt enable,
rather than disabling it unconditionally.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Return the previous interrupt enabled state from enable_irq() and
disable_irq(), to allow callers to more easily restore this state.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI's built-in HTTPS boot mechanism requires the trusted CA
certificates to be provided via the TlsCaCertificates variable.
(There is no equivalent of the iPXE cross-signing mechanism, so it is
not possible for UEFI to automatically use public CA certificates.)
Users who have configured UEFI HTTPS boot to use a custom root of
trust (e.g. a private CA certificate) may find it useful to have iPXE
automatically pick up and use this same root of trust, so that iPXE
can seamlessly fetch files via HTTPS from the same servers that were
trusted by UEFI HTTPS boot, in addition to servers that iPXE can
validate through other means such as cross-signed certificates.
Parse the TlsCaCertificates variable at startup, add any certificates
to the certificate store, and mark these certificates as trusted.
There are no access restrictions on modifying the TlsCaCertificates
variable: anybody with access to write UEFI variables is permitted to
change the root of trust. The UEFI security model assumes that anyone
with access to run code prior to ExitBootServices() or with access to
modify UEFI variables from within a loaded operating system is
supposed to be able to change the system's root of trust for TLS.
Any certificates parsed from TlsCaCertificates will show up in the
output of "certstat", and may be discarded using "certfree" if
unwanted.
Support for parsing TlsCaCertificates is enabled by default in EFI
builds, but may be disabled in config/general.h if needed.
As with the ${trust} setting, the contents of the TlsCaCertificates
variable will be ignored if iPXE has been compiled with an explicit
root of trust by specifying TRUST=... on the build command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the TlsAuthentication.h header from EDK2's NetworkPkg, along with
a GUID definition for EFI_TLS_CA_CERTIFICATE_GUID.
It is unclear whether or not the TlsCaCertificate variable is intended
to be a UEFI standard. Its presence in NetworkPkg (rather than
MdePkg) suggests not, but the choice of EFI_TLS_CA_CERTIFICATE_GUID
(rather than e.g. EDKII_TLS_CA_CERTIFICATE_GUID) suggests that it is
intended to be included in future versions of the standard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the possibility of creating empty directories (without
having to include a dummy file inside the directory) using a
zero-length image and a CPIO filename with a trailing slash, such as:
initrd emptyfile /usr/share/oem/
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 12ea8c4 ("[cpio] Allow for construction of parent directories
as needed") introduced a regression in constructing CPIO archive
headers for relative paths (e.g. simple filenames with no leading
slash).
Fix by counting the number of path components rather than the number
of path separators, and add some test cases to cover CPIO header
construction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for the EFI signature list image format (as produced by
tools such as efisecdb).
The parsing code does not require any EFI boot services functions and
so may be enabled even in non-EFI builds. We default to enabling it
only for EFI builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently provide pem_asn1() to allow for parsing of PEM data that
is not necessarily contained in an image. Provide an equivalent
function der_asn1() to allow for similar parsing of DER data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The debug message transcription of well-known EFI GUIDs does not
require any EFI boot services calls. Move this code from efi_debug.c
to efi_guid.c, to allow it to be linked in to non-EFI builds.
We continue to rely on linker garbage collection to ensure that the
code is omitted completely from any non-debug builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI host utilities (such as elf2efi64, efirom, etc) include the
EDK2 headers, which include static assertions to ensure that they are
built with -fshort-wchar enabled. When building the host utilities,
we currently bypass these assertions by defining MDE_CPU_EBC. The EBC
compiler apparently does not support static assertions, and defining
MDE_CPU_EBC therefore causes EDK2's Base.h to define STATIC_ASSERT()
as a no-op.
Newer versions of the EDK2 headers omit the check for MDE_CPU_EBC (and
will presumably therefore fail to build with the EBC compiler). This
causes our host utility builds to fail since the static assertion now
detects that we are building with the host's default ABI (i.e. without
enabling -fshort-wchar).
Fix by enabling -fshort-wchar when building EFI host utilities. This
produces binaries that are technically incompatible with the host ABI.
However, since our host utilities never handle any wide-character
strings, this nominal ABI incompatiblity has no effect.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UsbHostController.h header has been removed from the EDK2 codebase
since it was never defined in a released UEFI specification. However,
we may still encounter it in the wild and so it is useful to retain
the GUID and the corresponding protocol name for debug messages.
Add an iPXE include guard to this file so that the EDK2 header import
script will no longer attempt to import it from the EDK2 tree.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The bzImage specification allows two bytes for the setup code jump
instruction at offset 0x200, which limits its relative offset to +0x7f
bytes. This currently imposes an upper limit on the length of the
version string, which currently precedes the setup code.
Fix by moving the version string to the .prefix.data section, so that
it no longer affects the placement of the setup code.
Originally-fixed-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE allows individual raw files to be automatically wrapped with
suitable CPIO headers and injected into the magic initrd image as
exposed to a booted Linux kernel. This feature is currently limited
to placing files within directories that already exist in the initrd
filesystem.
Remove this limitation by adding the ability for iPXE to construct
CPIO headers for parent directories as needed, under control of the
"mkdir=<n>" command-line argument. For example:
initrd config.ign /usr/share/oem/config.ign mkdir=1
will create CPIO headers for the "/usr/share/oem" directory as well as
for the "/usr/share/oem/config.ign" file itself.
This simplifies the process of booting operating systems such as
Flatcar Linux, which otherwise require the single "config.ign" file to
be manually wrapped up as a CPIO archive solely in order to create the
relevant parent directory entries.
The value <n> may be used to control the number of parent directory
entries that are created. For example, "mkdir=2" would cause up to
two parent directories to be created (i.e. "/usr/share" and
"/usr/share/oem" in the above example). A negative value such as
"mkdir=-1" may be used to create all parent directories up to the root
of the tree.
Do not create any parent directory entries by default, since doing so
would potentially cause the modes and ownership information for
existing directories to be overwritten.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the "--retimeout" option to be used to specify a timeout value
that will be (re)applied after each keypress activity. This allows
script authors to ensure that a single (potentially accidental)
keypress will not pause the boot process indefinitely.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ANS X9.82 specification implicitly assumes that the RBG_Startup
function will be called before it is needed, and includes checks to
make sure that Generate_function fails if this has not happened.
However, there is no well-defined point at which the RBG_Startup
function is to be called: it's just assumed that this happens as part
of system startup.
We currently call RBG_Startup to instantiate the DRBG as an iPXE
startup function, with the corresponding shutdown function
uninstantiating the DRBG. This works for most use cases, and avoids
an otherwise unexpected user-visible delay when a caller first
attempts to use the DRBG (e.g. by attempting an HTTPS download).
The download of autoexec.ipxe for UEFI is triggered by the EFI root
bus probe in efi_probe(). Both the root bus probe and the RBG startup
function run at STARTUP_NORMAL, so there is no defined ordering
between them. If the base URI for autoexec.ipxe uses HTTPS, then this
may cause random bits to be requested before the RBG has been started.
Extend the logic in rbg_generate() to automatically start up the RBG
if startup has not already been attempted. If startup fails
(e.g. because the entropy source is broken), then do not automatically
retry since this could result in extremely long delays waiting for
entropy that will never arrive.
Reported-by: Michael Niehaus <niehaus@live.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In almost all cases, the download timeout for autoexec.ipxe is
irrelevant: the operation will either succeed or fail relatively
quickly (e.g. due to a nonexistent file). The overall download
timeout exists only to ensure that an unattended or headless system
will not wait indefinitely in the case of a degenerate network
response (e.g. an HTTP server that returns an endless trickle of data
using chunked transfer encoding without ever reaching the end of the
file).
The current download timeout is too short if PeerDist content encoding
is enabled, since the overall download will abort before the first
peer discovery attempt has completed, and without allowing sufficient
time for an origin server range request.
The single timeout value is currently used for both the download
timeout and the sync timeout. The latter timeout exists only to allow
network communication to be gracefully quiesced before removing the
temporary MNP network device, and may safely be shortened without
affecting functionality.
Fix by increasing the download timeout from two seconds to 30 seconds,
and defining a separate one-second timeout for the sync operation.
Reported-by: Michael Niehaus <niehaus@live.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only remaining use case for direct reduction (outside of the unit
tests) is in calculating the constant R^2 mod N used during Montgomery
multiplication.
The current implementation of direct reduction requires a writable
copy of the modulus (to allow for shifting), and both the modulus and
the result buffer must be padded to be large enough to hold (R^2 - N),
which is twice the size of the actual values involved.
For the special case of reducing R^2 mod N (or any power of two mod
N), we can run the same algorithm without needing either a writable
copy of the modulus or a padded result buffer. The working state
required is only two bits larger than the result buffer, and these
additional bits may be held in local variables instead.
Rewrite bigint_reduce() to handle only this use case, and remove the
no longer necessary uses of double-sized big integers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When allocating memory with a non-zero alignment offset, the free
memory block structure following the allocation may end up improperly
aligned.
Ensure that free memory blocks always remain aligned to the size of
the free memory block structure.
Ensure that the initial heap is also correctly aligned, thereby
allowing the logic for leaking undersized free memory blocks to be
omitted.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The NIST elliptic curves are Weierstrass curves and have the form
y^2 = x^3 + ax + b
with each curve defined by its field prime, the constants "a" and "b",
and a generator base point.
Implement a constant-time algorithm for point addition, based upon
Algorithm 1 from "Complete addition formulas for prime order elliptic
curves" (Joost Renes, Craig Costello, and Lejla Batina), and use this
as a Montgomery ladder commutative operation to perform constant-time
point multiplication.
The code for point addition is implemented using a custom bytecode
interpreter with 16-bit instructions, since this results in
substantially smaller code than compiling the somewhat lengthy
sequence of arithmetic operations directly. Values are calculated
modulo small multiples of the field prime in order to allow for the
use of relaxed Montgomery reduction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Montgomery ladder may be used to perform any operation that is
isomorphic to exponentiation, i.e. to compute the result
r = g^e = g * g * g * g * .... * g
for an arbitrary commutative operation "*", base or generator "g", and
exponent "e".
Implement a generic Montgomery ladder for use by both modular
exponentiation and elliptic curve point multiplication (both of which
are isomorphic to exponentiation).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The elliptic curve point representation for the x25519 curve includes
only the X value, since the curve is designed such that the Montgomery
ladder does not need to ever know or calculate a Y value. There is no
curve point format byte: the public key data is simply the X value.
The pre-master secret is also simply the X value of the shared secret
curve point.
The point representation for the NIST curves includes both X and Y
values, and a single curve point format byte that must indicate that
the format is uncompressed. The pre-master secret for the NIST curves
does not include both X and Y values: only the X value is used.
Extend the definition of an elliptic curve to allow the point size to
be specified separately from the key size, and extend the definition
of a TLS named curve to include an optional curve point format byte
and a pre-master secret length.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split out the portion of tls_send_client_key_exchange_ecdhe() that
actually performs the elliptic curve key exchange into a separate
function ecdhe_key().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In debug messages, big integers are currently printed as hex dumps.
This is quite verbose and cumbersome to check against external
sources.
Add bigint_ntoa() to transcribe big integers into a static buffer
(following the model of inet_ntoa(), eth_ntoa(), uuid_ntoa(), etc).
Abbreviate big integers that will not fit within the static buffer,
showing both the most significant and least significant digits in the
transcription. This is generally the most useful form when visually
comparing against external sources (such as test vectors, or results
produced by high-level languages).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calculating the Montgomery constant (R^2 mod N) is done in our
implementation by zeroing the double-width representation of N,
subtracting N once to give (R^2 - N) in order to obtain a positive
value, then reducing this value modulo N.
Extract this logic from bigint_mod_exp() to a separate function
bigint_reduce_supremum(), to allow for reuse by other code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Classic Montgomery reduction involves a single conditional subtraction
to ensure that the result is strictly less than the modulus.
When performing chains of Montgomery multiplications (potentially
interspersed with additions and subtractions), it can be useful to
work with values that are stored modulo some small multiple of the
modulus, thereby allowing some reductions to be elided. Each addition
and subtraction stage will increase this running multiple, and the
following multiplication stages can be used to reduce the running
multiple since the reduction carried out for multiplication products
is generally strong enough to absorb some additional bits in the
inputs. This approach is already used in the x25519 code, where
multiplication takes two 258-bit inputs and produces a 257-bit output.
Split out the conditional subtraction from bigint_montgomery() and
provide a separate bigint_montgomery_relaxed() for callers who do not
require immediate reduction to within the range of the modulus.
Modular exponentiation could potentially make use of relaxed
Montgomery multiplication, but this would require R>4N, i.e. that the
two most significant bits of the modulus be zero. For both RSA and
DHE, this would necessitate extending the modulus size by one element,
which would negate any speed increase from omitting the conditional
subtractions. We therefore retain the use of classic Montgomery
reduction for modular exponentiation, apart from the final conversion
out of Montgomery form.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the number of parameters passed to bigint_montgomery() by
calculating the inverse of the modulus modulo the element size on
demand. Cache the result, since Montgomery reduction will be used
repeatedly with the same modulus value.
In all currently supported algorithms, the modulus is a public value
(or a fixed value defined by specification) and so this non-constant
timing does not leak any private information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The startup process is scheduled to run when the device is opened and
terminated (if still running) when the device is closed. It assumes
that the resource allocation performed in gve_open() has taken place,
and that the admin and transmit/receive data structure pointers are
therefore valid.
The process initialisation in gve_probe() erroneously calls
process_init() rather than process_init_stopped() and will therefore
schedule the startup process immediately, before the relevant
resources have been allocated.
This bug is masked in the typical use case of a Google Cloud instance
with a single NIC built with the config/cloud/gce.ipxe embedded
script, since the embedded script will immediately open the NIC (and
therefore allocate the required resources) before the scheduled
process is allowed to run for the first time. In a multi-NIC
instance, undefined behaviour will arise as soon as the startup
process for the second NIC is allowed to run.
Fix by using process_init_stopped() to avoid implicitly scheduling the
startup process during gve_probe().
Originally-fixed-by: Kal Cutter Conley <kalcutterc@nvidia.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no further need for a standalone modular multiplication
primitive, since the only consumer is modular exponentiation (which
now uses Montgomery multiplication instead).
Remove the now obsolete bigint_mod_multiply().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Speed up modular exponentiation by using Montgomery reduction rather
than direct modular reduction.
Montgomery reduction in base 2^n requires the modulus to be coprime to
2^n, which would limit us to requiring that the modulus is an odd
number. Extend the implementation to include support for
exponentiation with even moduli via Garner's algorithm as described in
"Montgomery reduction with even modulus" (Koç, 1994).
Since almost all use cases for modular exponentation require a large
prime (and hence odd) modulus, the support for even moduli could
potentially be removed in future.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Montgomery reduction is substantially faster than direct reduction,
and is better suited for modular exponentiation operations.
Add bigint_montgomery() to perform the Montgomery reduction operation
(often referred to as "REDC"), along with some test vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Montgomery reduction requires only the least significant element of an
inverse modulo 2^k, which in turn depends upon only the least
significant element of the invertend.
Use the inverse size (rather than the invertend size) as the effective
size for bigint_mod_invert(). This eliminates around 97% of the loop
iterations for a typical 2048-bit RSA modulus.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
With a slight modification to the algorithm to ignore bits of the
residue that can never contribute to the result, it is possible to
reuse the as-yet uncalculated portions of the inverse to hold the
residue. This removes the requirement for additional temporary
working space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Direct modular reduction is expected to be used in situations where
there is no requirement to retain the original (unreduced) value.
Modify the API for bigint_reduce() to reduce the value in place,
(removing the separate result buffer), impose a constraint that the
modulus and value have the same size, and require the modulus to be
passed in writable memory (to allow for scaling in place). This
removes the requirement for additional temporary working space.
Reverse the order of arguments so that the constant input is first,
to match the usage pattern for bigint_add() et al.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the effective carry (or borrow) out flag from big integer
addition and subtraction, and use this to elide an explicit bit test
when performing x25519 reduction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a dedicated bigint_msb_is_set() to reduce the amount of open
coding required in the common case of testing the sign of a two's
complement big integer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI systems may choose not to connect drivers for local disk drives
when the boot policy is set to attempt a network boot. This may cause
the "sanboot" command to be unable to boot from a local drive, since
the relevant block device and filesystem drivers may not have been
connected.
Fix by ensuring that all available drivers are connected before
attempting to boot from an EFI block device.
Reported-by: Andrew Cottrell <andrew.cottrell@xtxmarkets.com>
Tested-by: Andrew Cottrell <andrew.cottrell@xtxmarkets.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently require the variable CROSS (or CROSS_COMPILE) to be set
to specify the global cross-compilation prefix. This becomes
cumbersome when developing across multiple CPU architectures,
requiring frequent editing of build command lines and preventing
incompatible architectures from being built with a single command.
Allow a default cross-compilation prefix for each architecture to be
specified via the CROSS_COMPILE_<arch> variables. These may then be
provided as environment variables, e.g. using
export CROSS_COMPILE_arm32=arm-linux-gnu-
export CROSS_COMPILE_arm64=aarch64-linux-gnu-
export CROSS_COMPILE_loong64=loongarch64-linux-gnu-
export CROSS_COMPILE_riscv32=riscv64-linux-gnu-
export CROSS_COMPILE_riscv64=riscv64-linux-gnu-
This change requires some portions of the Makefile to be rearranged,
to allow for the fact that $(CROSS_COMPILE) may not have been set
until the build directory has been parsed to determine the CPU
architecture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The seed CSR defined by the Zkr extension is accessible only in M-mode
by default. Older versions of OpenSBI (prior to version 1.4) do not
set mseccfg.sseed, with the result that attempts to access the seed
CSR from S-mode will raise an illegal instruction exception.
Add a facility for testing the accessibility of arbitrary CSRs, and
use it to check that the seed CSR is accessible before reporting the
seed CSR entropy source as being functional.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for running directly on top of SBI, with no UEFI
firmware present. Build as e.g.:
make CROSS=riscv64-linux-gnu- bin-riscv64/ipxe.sbi
The resulting binary can be tested in QEMU using e.g.:
qemu-system-riscv64 -M virt -cpu max -serial stdio \
-kernel bin-riscv64/ipxe.sbi
No drivers or executable binary formats are supported yet, but the
unit test suite may be run successfully.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Restructure the parsing of the build directory name from
bin[[-<arch>]-<platform>]
to
bin[-<arch>[-<platform>]]
and allow for a per-architecture default build platform.
For the sake of backwards compatibility, handle "bin-efi" as a special
case equivalent to "bin-i386-efi".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The timer and entropy seed CSRs will, by design, return different
values each time they are read.
Add the missing volatile qualifiers on the inline assembly to prevent
gcc from assuming that repeated invocations may be elided.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Zkr entropy source extension defines a potentially unprivileged
seed CSR that can be read to obtain 16 bits of entropy input, with a
mandated requirement that 256 entropy input bits read from the seed
CSR will contain at least 128 bits of min-entropy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Zicntr extension defines an unprivileged wall-clock time CSR that
roughly matches the behaviour of an invariant TSC on x86. The nominal
frequency of this timer may be read from the "timebase-frequency"
property of the CPU node in the device tree.
Add a timer source using RDTIME to provide implementations of udelay()
and currticks(), modelled on the existing RDTSC-based timer for x86.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RISC-V seems to allow for direct discovery of CPU features only from
M-mode (e.g. by setting up a trap handler and then attempting to
access a CSR), with S-mode code expected to read the resulting
constructed ISA description from the device tree.
Add the ability to check for the presence of named extensions listed
in the "riscv,isa" property of the device tree node corresponding to
the boot hart.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the existence of platforms with no PCI bus by including the
PCI settings mechanism only if PCI bus support is included.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Running with flat physical addressing is a fairly common early boot
environment. Rename UACCESS_EFI to UACCESS_FLAT so that this code may
be reused in non-UEFI boot environments that also use flat physical
addressing.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the ability to issue Supervisor Binary Interface (SBI) calls via
the ECALL instruction, and use the SBI DBCN extension to implement a
debug console.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Montgomery multiplication requires calculating the inverse of the
modulus modulo a larger power of two.
Add bigint_mod_invert() to calculate the inverse of any (odd) big
integer modulo an arbitrary power of two, using a lightly modified
version of the algorithm presented in "A New Algorithm for Inversion
mod p^k (Koç, 2017)".
The power of two is taken to be 2^k, where k is the number of bits
available in the big integer representation of the invertend. The
inverse modulo any smaller power of two may be obtained simply by
masking off the relevant bits in the inverse.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow scripts to read basic information from USB device descriptors
via the settings mechanism. For example:
echo USB vendor ID: ${usb/${busloc}.8.2}
echo USB device ID: ${usb/${busloc}.10.2}
echo USB manufacturer name: ${usb/${busloc}.14.0}
The general syntax is
usb/<bus:dev>.<offset>.<length>
where bus:dev is the USB bus:device address (as obtained via the
"usbscan" command, or from e.g. ${net0/busloc} for a USB network
device), and <offset> and <length> select the required portion of the
USB device descriptor.
Following the usage of SMBIOS settings tags, a <length> of zero may be
used to indicate that the byte at <offset> contains a USB string
descriptor index, and an <offset> of zero may be used to indicate that
the <length> contains a literal USB string descriptor index.
Since the byte at offset zero can never contain a string index, and a
literal string index can never be zero, the combination of both
<length> and <offset> being zero may be used to indicate that the
entire device descriptor is to be read as a raw hex dump.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Implement a "usbscan" command as a direct analogy of the existing
"pciscan" command, allowing scripts to iterate over all detected USB
devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Faster modular multiplication algorithms such as Montgomery
multiplication will still require the ability to perform a single
direct modular reduction.
Neaten up the implementation of direct reduction and split it out into
a separate bigint_reduce() function, complete with its own unit tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Every architecture uses the same implementation for bigint_is_set(),
and there is no reason to suspect that a future CPU architecture will
provide a more efficient way to implement this operation.
Simplify the code by providing a single architecture-independent
implementation of bigint_is_set().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The big integer shift operations are misleadingly described as
rotations since the original x86 implementations are essentially
trivial loops around the relevant rotate-through-carry instruction.
The overall operation performed is a shift rather than a rotation.
Update the function names and descriptions to reflect this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An n-bit multiplication product may be added to up to two n-bit
integers without exceeding the range of a (2n)-bit integer:
(2^n - 1)*(2^n - 1) + (2^n - 1) + (2^n - 1) = 2^(2n) - 1
Exploit this to perform big integer multiplication in constant time
without requiring the caller to provide temporary carry space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building as a Linux userspace binary for AArch32.
This allows the self-test suite to be more easily run for the 32-bit
ARM code. For example:
make CROSS=arm-linux-gnu- bin-arm32-linux/tests.linux
qemu-arm -L /usr/arm-linux-gnu/sys-root/ \
./bin-arm32-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reading from PMCCNTR causes an undefined instruction exception when
running in PL0 (e.g. as a Linux userspace binary), unless the
PMUSERENR.EN bit is set.
Restructure profile_timestamp() for 32-bit ARM to perform an
availability check on the first invocation, with subsequent
invocations returning zero if PMCCNTR could not be enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All consumers of profile_timestamp() currently treat the value as an
unsigned long. Only the elapsed number of ticks is ever relevant: the
absolute value of the timestamp is not used. Profiling is used to
measure short durations that are generally fewer than a million CPU
cycles, for which an unsigned long is easily large enough.
Standardise the return type of profile_timestamp() as unsigned long
across all CPU architectures. This allows 32-bit architectures such
as i386 and riscv32 to omit all logic associated with retrieving the
upper 32 bits of the 64-bit hardware counter, which simplifies the
code and allows riscv32 and riscv64 to share the same implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Big integer multiplication currently performs immediate carry
propagation from each step of the long multiplication, relying on the
fact that the overall result has a known maximum value to minimise the
number of carries performed without ever needing to explicitly check
against the result buffer size.
This is not a constant-time algorithm, since the number of carries
performed will be a function of the input values. We could make it
constant-time by always continuing to propagate the carry until
reaching the end of the result buffer, but this would introduce a
large number of redundant zero carries.
Require callers of bigint_multiply() to provide a temporary carry
storage buffer, of the same size as the result buffer. This allows
the carry-out from the accumulation of each double-element product to
be accumulated in the temporary carry space, and then added in via a
single call to bigint_add() after the multiplication is complete.
Since the structure of big integer multiplication is identical across
all current CPU architectures, provide a single shared implementation
of bigint_multiply(). The architecture-specific operation then
becomes the multiplication of two big integer elements and the
accumulation of the double-element product.
Note that any intermediate carry arising from accumulating the lower
half of the double-element product may be added to the upper half of
the double-element product without risk of overflow, since the result
of multiplying two n-bit integers can never have all n bits set in its
upper half. This simplifies the carry calculations for architectures
such as RISC-V and LoongArch64 that do not have a carry flag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The admin queue API requires us to tell the device how many event
counters we have provided via the "configure device resources" admin
queue command. There is, of course, absolutely no documentation
indicating how many event counters actually need to be provided.
We require only two event counters: one for the transmit queue, one
for the receive queue. (The receive queue doesn't seem to actually
make any use of its event counter, but the "create receive queue"
admin queue command will fail if it doesn't have an available event
counter to choose.)
In the absence of any documentation, we currently make the assumption
that allocating and configuring 16 counters (i.e. one whole cacheline)
will be sufficient to allow for the use of two counters.
This assumption turns out to be incorrect. On larger instance types
(observed with a c3d-standard-16 instance in europe-west4-a), we find
that creating the transmit or receive queues will each fail with a
probability of around 50% with the "failed precondition" error code.
Experimentation suggests that even though the device has accepted our
"configure device resources" command indicating that we are providing
only 16 event counters, it will attempt to choose any of its potential
32 event counters (and will then fail since the event counter that it
unilaterally chose is outside of the agreed range).
Work around this firmware bug by always allocating the maximum number
of event counters supported by the device. (This requires deferring
the allocation of the event counters until after issuing the "describe
device" command.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit 79c0173 ("[build] Create util/genfsimg for building
filesystem-based images"), the EFI boot file name for each CPU
architecture is defined within the genfsimg script itself, rather than
being passed in as a Makefile parameter.
Remove the now-redundant Makefile definitions for EFI_BOOT_FILE.
Reported-by: Christian I. Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building iPXE as a 64-bit or 32-bit RISC-V binary, for
either UEFI or Linux userspace platforms. For example:
# RISC-V 64-bit UEFI
make CROSS=riscv64-linux-gnu- bin-riscv64-efi/ipxe.efi
# RISC-V 32-bit UEFI
make CROSS=riscv64-linux-gnu- bin-riscv32-efi/ipxe.efi
# RISC-V 64-bit Linux
make CROSS=riscv64-linux-gnu- bin-riscv64-linux/tests.linux
qemu-riscv64 -L /usr/riscv64-linux-gnu/sys-root \
./bin-riscv64-linux/tests.linux
# RISC-V 32-bit Linux
make CROSS=riscv64-linux-gnu- SYSROOT=/usr/riscv32-linux-gnu/sys-root \
bin-riscv32-linux/tests.linux
qemu-riscv32 -L /usr/riscv32-linux-gnu/sys-root \
./bin-riscv32-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The cross-compiler will typically use the appropriate sysroot
directory automatically. This may not work for toolchains where a
single cross-compiler is used to produce output for multiple CPU
variants (e.g. 32-bit and 64-bit RISC-V).
Add a SYSROOT=... parameter that may be used to specify the relevant
sysroot directory, e.g.
make CROSS=riscv64-linux-gnu- SYSROOT=/usr/riscv32-linux-gnu/sys-root \
bin-riscv32-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EDK2 header macros VA_START(), VA_ARG() etc produce build errors
on some CPU architectures (notably on 32-bit RISC-V, which is not yet
supported by EDK2).
Fix by using the standard variable argument list macros.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some 32-bit CPUs, we need to provide implementations of 64-bit
shifts as libgcc helper functions. Add test cases to cover these.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define a cpu_halt() function which is architecture-specific but
platform-independent, and merge the multiple architecture-specific
implementations of the EFI cpu_nap() function into a single central
efi_cpu_nap() that uses cpu_halt() if applicable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The definitions of the setjmp() and longjmp() functions are common to
all architectures, with only the definition of the jump buffer
structure being architecture-specific.
Move the architecture-specific portions to bits/setjmp.h and provide a
common setjmp.h for the function definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the "--retain <N>" option to limit the number of retained old AMI
images (within the same family, architecture, and public visibility).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for easier identification of images and snapshots created by the
aws-import script by adding tags for image family (e.g. "iPXE") and
architecture (e.g. "x86_64") to both.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As described in commit 3b81a4e ("[ena] Provide a host information
page"), we currently report an operating system type of "Linux" in
order to work around broken versions of the ENA firmware that will
fail to create a completion queue if we report the correct operating
system type.
As of September 2024, the ENA team at AWS assures us that the entire
AWS fleet has been upgraded to fix this bug, and that we are now safe
to report the correct operating system type value in the "type" field
of struct ena_host_info.
The ENA team has also clarified that at least some deployed versions
of the ENA firmware still have the defect that requires us to report
an operating system version number of 2 (regardless of operating
system type), and so we continue to report ENA_HOST_INFO_VERSION_WTF
in the "version" field of struct ena_host_info.
Add an explicit warning on the previous known failure path, in case
some deployed versions of the ENA firmware turn out to not have been
upgraded as expected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move the <gdbmach.h> file to <bits/gdbmach.h>, and provide a common
dummy implementation for all architectures that have not yet
implemented support for GDB.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the process of adding a new CPU architecture by providing
common implementations of typically empty architecture-specific header
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This patch adds support for the AQtion Ethernet controller, enabling
iPXE to recognize and utilize the specific models (AQC114, AQC113, and
AQC107).
Tested-by: Animesh Bhatt <animeshb@marvell.com>
Signed-off-by: Animesh Bhatt <animeshb@marvell.com>
The link status check in falcon_xaui_link_ok() reads from the
FCN_XX_CORE_STAT_REG_MAC register only on production hardware (where
the FPGA version reads as zero), but modifies the value and writes
back to this register unconditionally. This triggers an uninitialised
variable warning on newer versions of gcc.
Fix by assuming that the register exists only on production hardware,
and so moving the "modify-write" portion of the "read-modify-write"
operation to also be covered by the same conditional check.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the "imgdecrypt" command that can be used to decrypt a detached
encrypted data image using a cipher key obtained from a separate CMS
envelope image. For example:
# Create non-detached encrypted CMS messages
#
openssl cms -encrypt -binary -aes-256-gcm -recip client.crt \
-in vmlinuz -outform DER -out vmlinuz.cms
openssl cms -encrypt -binary -aes-256-gcm -recip client.crt \
-in initrd.img -outform DER -out initrd.img.cms
# Detach data from envelopes (using iPXE's contrib/crypto/cmsdetach)
#
cmsdetach vmlinuz.cms -d vmlinuz.dat -e vmlinuz.env
cmsdetach initrd.img.cms -d initrd.img.dat -e initrd.img.env
and then within iPXE:
#!ipxe
imgfetch http://192.168.0.1/vmlinuz.dat
imgfetch http://192.168.0.1/initrd.img.dat
imgdecrypt vmlinuz.dat http://192.168.0.1/vmlinuz.env
imgdecrypt initrd.img.dat http://192.168.0.1/initrd.img.env
boot vmlinuz
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for decrypting images containing detached encrypted data
using a cipher key obtained from a separate CMS envelope image (in DER
or PEM format).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The openssl toolchain does not currently seem to support creating CMS
envelopedData or authEnvelopedData messages with detached encrypted
data.
Add a standalone tool "cmsdetach" that can be used to detach the
encrypted data from a CMS message. For example:
openssl cms -encrypt -binary -aes-256-gcm -recip client.crt \
-in bootfile -outform DER -out bootfile.cms
cmsdetach bootfile.cms --data bootfile.dat --envelope bootfile.env
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise CMS self-test data structure and macro names to refer to
"messages" rather than "signatures", in preparation for adding image
decryption tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some ASN.1 OID-identified algorithms require additional parameters,
such as an initialisation vector for a block cipher. The structure of
the parameters is defined by the individual algorithm.
Extend asn1_algorithm() to allow these additional parameters to be
returned via a separate ASN.1 cursor.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the number of dynamic allocations required to parse a CMS
message by retaining the ASN.1 cursor returned from image_asn1() for
the lifetime of the CMS message. This allows embedded ASN.1 cursors
to be used for parsed objects within the message, such as embedded
signatures.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Instances of cipher and digest algorithms tend to get called
repeatedly to process substantial amounts of data. This is not true
for public-key algorithms, which tend to get called only once or twice
for a given key.
Simplify the public-key algorithm API so that there is no reusable
algorithm context. In particular, this allows callers to omit the
error handling currently required to handle memory allocation (or key
parsing) errors from pubkey_init(), and to omit the cleanup calls to
pubkey_final().
This change does remove the ability for a caller to distinguish
between a verification failure due to a memory allocation failure and
a verification failure due to a bad signature. This difference is not
material in practice: in both cases, for whatever reason, the caller
was unable to verify the signature and so cannot proceed further, and
the cause of the error will be visible to the user via the return
status code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TLS connection structure has grown to become unmanageably large as
new features and support for new TLS protocol versions have been added
over time.
Split out the portions of struct tls_connection that are specific to
client and server operations into separate structures, and simplify
some structure field names.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TLS connection structure has grown to become unmanageably large as
new features and support for new TLS protocol versions have been added
over time.
Split out the portions of struct tls_connection that are specific to
transmit and receive operations into separate structures, and simplify
some structure field names.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The rom-o-matic code does not form part of the iPXE codebase, has not
been maintained for over a decade, and does not appear to still be in
use anywhere in the world.
It does, however, result in a large number of false positive security
vulnerability reports from some low quality automated code analysis
tools such as Fortify SCA.
Remove this unused and obsolete code to reduce the burden of
responding to these false positives.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the existing support for performing RSA public-key
encryption, decryption, signature, and verification tests, and update
the code to use okx() for neater reporting of test results.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Asymmetric keys are invariably encountered within ASN.1 structures
such as X.509 certificates, and the various large integers within an
RSA key are themselves encoded using ASN.1.
Simplify all code handling asymmetric keys by passing keys as a single
ASN.1 cursor, rather than separate data and length pointers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the logic for identifying the matching PCI root bridge I/O
protocol to allow for identifying the closest matching PCI bus:dev.fn
address range, and use this to provide PCI address range discovery
(while continuing to inhibit automatic PCI bus probing).
This allows the "pciscan" command to work as expected under UEFI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI device model requires us to not probe the PCI bus directly,
but instead to wait to be offered the opportunity to drive devices via
our driver service binding handle.
We currently inhibit PCI bus probing by having pci_discover() return
an empty range when using the EFI PCI I/O API. This has the unwanted
side effect that scanning the bus manually using the "pciscan" command
will also fail to discover any devices.
Separate out the concept of being allowed to probe PCI buses from the
mechanism for discovering PCI bus:dev.fn address ranges, so that this
limitation may be removed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An attempt to use a validator for an empty certificate chain will
correctly fail the overall validation with the "empty certificate
chain" error propagated from x509_auto_append().
In a debug build, the call to validator_name() will attempt to call
x509_name() on a non-existent certificate, resulting in garbage in the
debug message.
Fix by checking for the special case of an empty certificate chain.
This issue does not affect non-debug builds, since validator_name() is
(as per its description) called only for debug messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is some exploitable similarity between the data structures used
for representing CMS signatures and CMS encryption keys. In both
cases, the CMS message fundamentally encodes a list of participants
(either message signers or message recipients), where each participant
has an associated certificate and an opaque octet string representing
the signature or encrypted cipher key. The ASN.1 structures are not
identical, but are sufficiently similar to be worth exploiting: for
example, the SignerIdentifier and RecipientIdentifier data structures
are defined identically.
Rename data structures and functions, and add the concept of a CMS
message type.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the definition of an ASN.1 OID-identified algorithm to include
a potential cipher suite, and add identifiers for AES-CBC and AES-GCM.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The cms_signature() and cms_verify() functions currently accept raw
data pointers. This will not be possible for cms_decrypt(), which
will need the ability to extract fragments of ASN.1 data from a
potentially large image.
Change cms_signature() and cms_verify() to accept an image as an input
parameter, and move the responsibility for setting the image trust
flag within cms_verify() since that now becomes a more natural fit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow passing a NULL value for the certificate list to all functions
used for identifying an X.509 certificate from an existing set of
certificates, and rename function parameters to indicate that this
certificate list represents an unordered certificate store (rather
than an ordered certificate chain).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Centralise all current mechanisms for identifying an X.509 certificate
(by raw content, by subject, by issuer and serial number, and by
matching public key), and remove the certstore-specific and
CMS-specific variants of these functions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Handling large ASN.1 objects such as encrypted CMS files will require
the ability to use the asn1_enter() and asn1_skip() family of
functions on partial object cursors, where a defined additional length
is known to exist after the end of the data buffer pointed to by the
ASN.1 object cursor.
We already have support for partial object cursors in the underlying
asn1_start() operation used by both asn1_enter() and asn1_skip(), and
this is used by the DER image probe routine to check that the
potential DER file comprises a single ASN.1 SEQUENCE object.
Add asn1_enter_partial() to formalise the process of entering an ASN.1
partial object, and refactor the DER image probe routine to use this
instead of open-coding calls to the underlying asn1_start() operation.
There is no need for an equivalent asn1_skip_partial() function, since
only objects that are wholly contained within the partial cursor may
be successfully skipped.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calling asn1_skip_if_exists() on a malformed ASN.1 object may
currently leave the cursor in a partially-updated state, where the tag
byte and one of the length bytes have been stripped. The cursor is
left with a valid data pointer and length and so no out-of-bounds
access can arise, but the cursor no longer points to the start of an
ASN.1 object.
Ensure that each ASN.1 cursor manipulation code path leads to the
cursor being either fully updated, left unmodified, or invalidated,
and update the function descriptions to reflect this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Successfully reaching the end of a well-formed ASN.1 object list is
arguably not an error, but the current code (dating back to the
original ASN.1 commit in 2007) will explicitly check for and report
this as an error condition.
Remove the explicit check for reaching the end of a well-formed ASN.1
object list, and instead return success along with a zero-length (and
hence implicitly invalidated) cursor.
Almost every existing caller of asn1_skip() or asn1_skip_if_exists()
currently ignores the return value anyway. Skipped objects are (by
definition) not of interest to the caller, and the invalidation
behaviour of asn1_skip() ensures that any errors will be safely caught
on a subsequent attempt to actually use the ASN.1 object content.
Since these existing callers ignore the return value, they cannot be
affected by this change.
There is one existing caller of asn1_skip_if_exists() that does check
the return value: in asn1_skip() itself, an error returned from
asn1_skip_if_exists() will cause the cursor to be invalidated. In the
case of an error indicating only that the cursor length is already
zero, invalidation is a no-op, and so this change affects only the
return value propagated from asn1_skip().
This leaves only a single call site within ocsp_request() where the
return value from asn1_skip() is currently checked. The return status
here is moot since there is no way for the code in question to fail
(absent a bug in the ASN.1 construction or parsing code).
There are therefore no callers of asn1_skip() or asn1_skip_if_exists()
that rely on an error being returned for successfully reaching the end
of a well-formed ASN.1 object list. Simplify the code by redefining
this as a successful outcome.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Redefine bit 30 of an SMBIOS numerical setting to be part of the
function number, in order to allow access to hypervisor CPUID leaves.
This technically breaks backwards compatibility with scripts
attempting to read more than 64 consecutive functions. Since there is
no meaningful block of 64 consecutive related functions, it is
vanishingly unlikely that this capability has ever been used.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hypervisors typically intercept CPUID leaves in the range 0x40000000
to 0x400000ff, with leaf 0x40000000 returning the maximum supported
function within this range in register %eax.
iPXE currently masks off bit 30 from the requested CPUID leaf when
checking to see if a function is supported, which causes this check to
read from leaf 0x00000000 instead of 0x40000000.
Fix by including bit 30 within the mask.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The general syntax for SMBIOS settings:
smbios/<instance>.<type>.<offset>.<length>
is currently extended such that a <length> of zero indicates that the
byte at <offset> contains a string index, and an <offset> of zero
indicates that the <length> contains a literal string index.
Since the byte at offset zero can never contain a string index, and a
literal string index can never have a zero value, the combination of
both <length> and <offset> being zero is currently invalid and will
always return "not found".
Extend the syntax such that the combination of both <length> and
<offset> being zero may be used to read the entire data structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Following the example of aws-int13con, add a utility that can be used
to read the INT13 console log from a used iPXE boot disk in Google
Compute Engine.
There seems to be no easy way to directly read the contents of either
a disk image or a snapshot in Google Cloud. Work around this
limitation by creating a snapshot and attaching this snapshot as a
data disk to a temporary Linux instance, which is then used to echo
the INT13 console log to the serial port.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Experiments suggest that using fewer than 64 receive buffers leads to
excessive packet drop rates on some instance types (observed with a
c3-standard-4 instance in europe-west4-a).
Fix by increasing the number of receive data buffers (and adjusting
the length of the registrable queue page address list to match).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Google Virtual Ethernet NIC (GVE or gVNIC) is found only in Google
Cloud instances. There is essentially zero documentation available
beyond the mostly uncommented source code in the Linux kernel.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Following the example of aws-import, add a utility that can be used to
upload an iPXE disk image to Google Compute Engine as a bootable
image. For example:
make CONFIG=cloud EMBED=config/cloud/gce.ipxe \
bin-x86_64-pcbios/ipxe.usb bin-x86_64-efi/ipxe.usb
make CONFIG=cloud EMBED=config/cloud/gce.ipxe \
CROSS=aarch64-linux-gnu- bin-arm64-efi/ipxe.usb
../contrib/cloud/gce-import -p \
bin-x86_64-pcbios/ipxe.usb \
bin-x86_64-efi/ipxe.usb \
bin-arm64-efi/ipxe.usb
The iPXE disk image is automatically wrapped into a tarball containing
a single file named "disk.raw", uploaded to a temporary bucket in
Google Cloud Storage, and used to create a bootable image. The
temporary bucket is deleted after use.
An appropriate image family name is identified automatically: "ipxe"
for BIOS images, "ipxe-uefi-x86-64" for x86_64 UEFI images, and
"ipxe-uefi-arm64" for AArch64 UEFI images. This allows the latest
image within each family to be launched within needing to know the
precise image name.
Google Compute Engine images are globally scoped and are available
(and cached upon first use) in all regions. The initial placement of
the image may be controlled indirectly by using the "--location"
option to specify the Google Cloud Storage location used for the
temporary upload bucket: the image will then be created in the closest
multi-region to the storage location.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DHCPv6 protocol does not itself provide a router address or a
prefix length. This information is instead obtained from the router
advertisements.
Our IPv6 minirouting table construction logic will first construct an
entry for each advertised prefix, and later update the entry to
include an address assigned within that prefix via stateful DHCPv6 (if
applicable).
This logic fails if the address assigned via stateful DHCPv6 does not
fall within any of the advertised prefixes (e.g. if the network is
configured to use DHCPv6-assigned /128 addresses with no advertised
on-link prefixes). We will currently treat this situation as
equivalent to having a manually assigned address with no corresponding
router address or prefix length: the routing table entry will use the
default /64 prefix length and will not include the router address.
DHCPv6 is triggered only in response to a router advertisement with
the "Managed Address Configuration (M)" or "Other Configuration (O)"
flags set, and a router address is therefore available at the point
that we initiate DHCPv6.
Record the router address when initiating DHCPv6, and expose this
router address as part of the DHCPv6 settings block. This allows the
routing table entry for any address assigned via stateful DHCPv6 to
correctly include the router address, even if the assigned address
does not fall within an advertised prefix.
Also provide a fixed /128 prefix length as part of the DHCPv6 settings
block. When an address assigned via stateful DHCPv6 does not fall
within an advertised prefix, this will cause the routing table entry
to have a /128 prefix length as expected. (When such an address does
fall within an advertised prefix, it will continue to use the
advertised prefix length.)
Originally-fixed-by: Guvenc Gulce <guevenc.guelce@sap.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a small subnet (with a /31 or /32 subnet mask), all addresses
within the subnet are valid host addresses: there is no separate
network address or directed broadcast address.
The logic used in iPXE to determine whether or not to use a link-layer
broadcast address will currently fail in these subnets. In a /31
subnet, the higher of the two host addresses (i.e. the address with
all host bits set) will be treated as a broadcast address. In a /32
subnet, the single valid host address will be treated as a broadcast
address.
Fix by adding the concept of a host mask, defined such that an address
in the local subnet with all of the mask bits set to zero represents
the network address, and an address in the local subnet with all of
the mask bits set to one represents the directed broadcast address.
For most subnets, this is simply the inverse of the subnet mask. For
small subnets (/31 or /32) we can obtain the desired behaviour by
setting the host mask to all ones, so that only the local broadcast
address 255.255.255.255 will be treated as a broadcast address.
Originally-fixed-by: Lukas Stockner <lstockner@genesiscloud.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the now-unused generalised text widget user interface, along
with the associated concept of a widget set and the implementation of
a read-only label widget.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite the code implementing the "login" user interface to use a
predefined interactive form. The command "login" then becomes roughly
equivalent to:
#!ipxe
form
item username Username
item --secret password Password
present
with the result that login form customisations (e.g. to add a Windows
domain name) may be implemented within the scripting language.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for presenting a dynamic user interface as an interactive
form, alongside the existing support for presenting a dynamic user
interface as a menu.
An interactive form may be used to allow a user to input (or edit)
values for multiple settings on a single screen, as a user-friendly
alternative to prompting for setting values via the "read" command.
In the present implementation, all input fields must fit on a single
screen (with no scrolling), and the only supported widget type is an
editable text box.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For interactive forms, the concept of a secret value becomes
meaningful (e.g. for password fields).
Add a flag to indicate that an item represents a secret value, and
allow this flag to be set via the "--secret" option of the "item"
command.
This flag has no meaning for menu items, but is silently accepted
anyway to keep the code size minimal.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the ability to look up a dynamic user interface item by
index or by shortcut key, to allow for reuse of this code for
interactive forms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently have an abstract model of a dynamic menu as a list of
items, each of which has a name, a description, and assorted metadata
such as a shortcut key. The "menu" and "item" commands construct
representations in this abstract model, and the "choose" command then
presents the items as a single-choice menu, with the selected item's
name used as the output value.
This same abstraction may be used to model a dynamic form as a list of
editable items, each of which has a corresponding setting name, an
optional description label, and assorted metadata such as a shortcut
key. By defining a "form" command as an alias for the "menu" command,
we could construct and present forms using commands such as:
#!ipxe
form Login to ${url}
item username Username or email address
item --secret password Password
present
or
#!ipxe
form Configure IPv4 networking for ${netX/ifname}
item netX/ip IPv4 address
item netX/netmask Subnet mask
item netX/gateway Gateway address
item netX/dns DNS server address
present
Reusing the same abstract model for both menus and forms allows us to
minimise the increase in code size, since the implementation of the
"form" and "item" commands is essentially zero-cost.
Rename everything within the abstract data model from "menu" to
"dynamic user interface" to reflect this generalisation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for wraparound scrolling and allow the tab key to be used
to move forward through a list of elements, wrapping back around to
the beginning of the list on overflow.
This is mildly useful for a menu, and likely to be a strong user
expectation for an interactive form.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Switch terminology for the "item" command from "item <label> <text>"
to "item <name> <text>", in preparation for repurposing the "item"
command to cover interactive forms as well as menus.
Since this renaming affects only a positional parameter, it does not
break compatibility with any existing scripts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The msg() and alert() functions currently defined in settings_ui.c
provide a general-purpose facility for printing messages centred on
the screen.
Split this out to a separate file to allow for reuse by the form
presentation code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The curses concept of a window has been supported but never actively
used in iPXE since the mucurses library was first implemented in 2006.
Simplify the code by removing the ability to place a widget set in a
specified window, and instead use the standard screen for all drawing
operations.
This simplification allows the widget set parameter to be omitted for
the draw_widget() and edit_widget() operations, since the only reason
for its inclusion was to provide access to the specified window.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Create a generic abstraction of a text widget, refactor the existing
editable text box widget to use this abstraction, add an
implementation of a non-editable text label widget, and generalise the
login user interface to use this generic widget abstraction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The comments for replace_string() state that a successful return
status guarantees that the dynamically allocated string pointer is no
longer NULL (even if it was initially NULL and the replacement string
is NULL or empty). This is relied upon by readline() to guarantee
that it will always return a non-NULL string if successful.
The code behaviour does not currently match this comment: an empty
replacement string may result in a successful return status even if
the (single-byte) allocation fails.
Fix up the code behaviour to match the comments, and to additionally
ensure that the edit history is filled in even in the event of an
allocation failure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The reference implementation of Dhcp6Dxe in EDK2 has a fatal flaw: the
code in EfiDhcp6Stop() will poll the network in a tight loop until
either a response is received or a timer tick (at TPL_CALLBACK)
occurs. When EfiDhcp6Stop() is called at TPL_CALLBACK or higher, this
will result in an endless loop and an apparently frozen system.
Since this is the reference implementation of Dhcp6Dxe, it is likely
that almost all platforms have the same problem.
Fix by vetoing the broken driver. If the upstream driver is ever
fixed and a new version number issued, then we could plausibly test
against the version number exposed via the driver binding protocol.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Editable strings currently require a fixed-size buffer, which is
inelegant and limits the potential for creating interactive forms with
a variable number of edit box widgets.
Remove this limitation by switching to using a dynamically allocated
buffer for editable strings and edit box widgets.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If we do not have a current working URI (after applying the EFI device
path settings and any cached DHCP settings), then an attempt to
download autoexec.ipxe will fail since there is no base URI from which
to resolve the full autoexec.ipxe URI.
Avoid this potentially confusing error message by attempting the
download only if we have successfully obtained a current working URI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a new setting to provide access to the link layer protocol type
from scripts. This can be useful in order to skip configuring
interfaces based on their link layer protocol or, conversely,
configure only selected interface types (Ethernet, IPoIB, etc.)
Example script:
set idx:int32 0
:loop
isset ${net${idx}/mac} || exit 0
iseq ${net${idx}/linktype} IPoIB && goto try_next ||
autoboot net${idx} ||
:try_next
inc idx && goto loop
Signed-off-by: Pavel Krotkiy <porsh@nebius.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently attempt to obtain the autoexec.ipxe script via early use
of the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL or EFI_PXE_BASE_CODE_PROTOCOL
interfaces to obtain an opaque block of memory, which is then
registered as an image at an appropriate point during our startup
sequence. The early use of these existent interfaces allows us to
obtain the script even if our subsequent actions (e.g. disconnecting
drivers in order to connect up our own) may cause the script to become
inaccessible.
This mirrors the approach used under BIOS, where the autoexec.ipxe
script is provided by the prefix (e.g. as an initrd image when using
the .lkrn build of iPXE) and so must be copied into a normally
allocated image from wherever it happens to previously exist in
memory.
We do not currently have support for downloading an autoexec.ipxe
script if we were ourselves downloaded via UEFI HTTP boot.
There is an EFI_HTTP_PROTOCOL defined within the UEFI specification,
but it is so poorly designed as to be unusable for the simple purpose
of downloading an additional file from the same directory. It
provides almost nothing more than a very slim wrapper around
EFI_TCP4_PROTOCOL (or EFI_TCP6_PROTOCOL). It will not handle
redirection, content encoding, retries, or even fundamentals such as
the Content-Length header, leaving all of this up to the caller.
The UEFI HTTP Boot driver will install an EFI_LOAD_FILE_PROTOCOL
instance on the loaded image's device handle. This looks promising at
first since it provides the LoadFile() API call which is specified to
accept an arbitrary filename parameter. However, experimentation (and
inspection of the code in EDK2) reveals a multitude of problems that
prevent this from being usable. Calling LoadFile() will idiotically
restart the entire DHCP process (and potentially pop up a UI requiring
input from the user for e.g. a wireless network password). The
filename provided to LoadFile() will be ignored. Any downloaded file
will be rejected unless it happens to match one of the limited set of
types expected by the UEFI HTTP Boot driver. The list of design
failures and conceptual mismatches is fairly impressive.
Choose to bypass every possible aspect of UEFI HTTP support, and
instead use our own HTTP client and network stack to download the
autoexec.ipxe script over a temporary MNP network device. Since this
approach works for TFTP as well as HTTP, drop the direct use of
EFI_PXE_BASE_CODE_PROTOCOL. For consistency and simplicity, also drop
the direct use of EFI_SIMPLE_FILE_SYSTEM_PROTOCOL and rely upon our
existing support to access local files via "file:" URIs.
This approach results in console output during the "iPXE initialising
devices...ok" message that appears while startup is in progress.
Remove the trailing "ok" so that this intermediate output appears at a
sensible location on the screen. The welcome banner that will be
printed immediately afterwards provides an indication that startup has
completed successfully even absent the explicit "ok".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Retain a reference to the cached DHCPACK until the late startup phase,
and allow it to be recycled for reuse. This allows the cached DHCPACK
to be used for a temporary MNP network device and then subsequently
reused for the corresponding real network device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An MNP network device may be temporarily and non-destructively
installed on top of an existing UEFI network stack without having to
disconnect existing drivers.
Add the ability to create such a temporary network device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split out the code that allocates our internal struct efi_device
representations, to allow for the creation of temporary MNP devices in
order to download the autoexec.ipxe script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an abbreviated "Not found" error message for an HTTP 404 status
code, so that any automatic attempt to download a non-existent
autoexec.ipxe script produces only a minimal error message.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an abbreviated "Not found" error message for a TFTP "file not
found" error code, so that any automatic attempt to download a
non-existent autoexec.ipxe script produces only a minimal error
message.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an abbreviated "Not found" error message for an EFI_NOT_FOUND
error encountered when attempting to open a file on a local
filesystem, so that any automatic attempt to download a non-existent
autoexec.ipxe script produces only a minimal error message.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE is designed around fully asynchronous I/O, including asynchronous
connection opening. Almost all errors are therefore necessarily
reported as occurring during an in-progress download, rather than
occurring at the time that the URI is opened.
Local file access is currently an exception to this: errors such as
nonexistent files will be encountered while opening the URI. This
results in mildly unexpected error messages of the form "Could not
start download", rather than the usual pattern of showing the URI, the
initial progress dots, and then the error message.
Fix this inconsistency by deferring the local filesystem access until
the local file download process is running.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some URI schemes allow for a path name to be specified via the opaque
component of the URI (e.g. "file:/script.ipxe" to specify a path on
the filesystem from which iPXE itself was loaded). Files loaded from
such paths will currently fail to be assigned an appropriate name,
since only the path component of the URI will be used to construct a
default image name.
Fix by falling back to attempt deriving an image name from the opaque
component of a URI, if no path component is specified.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For unknown reasons, miscellaneous versions of gcc seem to struggle
with the static assertions used to ensure the correct layout of the
GCM structures.
Adjust the assertions to use offsetof() rather than direct pointer
comparison, on the basis that offsetof() must be a compile-time
constant value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI HTTP boot mechanism is extraordinarily badly designed, even
by the standards of the UEFI specification in general. It has the
symptoms of a feature that has been designed entirely in terms of user
stories, without any consideration at all being given to the
underlying technical architecture. It does work, provided that you
are doing precisely and only what was envisioned by the product owner.
If you want to try anything outside the bounds of the product owner's
extremely limited imagination, then you are almost certainly about to
enter a world of pain.
As one very minor example of this: the cached DHCP packet is not
available when using HTTP boot. The UEFI HTTP boot code does perform
DHCP, but it pointlessly and unhelpfully throws away the DHCP packet
and trashes the network interface configuration before handing over to
the downloaded executable.
Work around this imbecility by parsing and applying the few network
configuration settings that are persisted into the loaded image's
device path. This is limited to very basic information such as the IP
address, gateway address, and DNS server address, but it does at least
provide enough for a functional routing table.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We want exclusive access to the network device, both for performance
reasons and because we perform operations such as EAPoL that affect
the entire link. We currently drive the network card via either a
native hardware driver or via the SNP or NII/UNDI interfaces, both of
which grant us this exclusive access.
Add an alternative driver that drives the network card non-exclusively
via the EFI_MANAGED_NETWORK_PROTOCOL interface. This can function as
a fallback for situations where neither SNP nor NII/UNDI interfaces
are functional, and also opens up the possibility of non-destructively
installing a temporary network device over which to download the
autoexec.ipxe script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When using a service binding protocol, CreateChild() will create a new
protocol instance (and optionally a new handle). The caller will then
typically open this new protocol instance with BY_DRIVER attributes,
since the service binding mechanism has no equivalent of the driver
binding protocol's Stop() method, and there is therefore no other way
for the caller to be informed if the protocol instance is about to
become invalid (e.g. because the service driver wants to remove the
child).
The caller cannot ask CreateChild() to install the new protocol
instance on the original handle (i.e. the service binding handle),
since the whole point of the service binding protocol is to allow for
the existence of multiple children, and UEFI does not permit multiple
instances of the same protocol to be installed on a handle.
Our current drivers all open the original handle (as passed to our
driver binding's Start() method) with BY_DRIVER attributes, and so the
same handle will be passed to our Stop() method. This changes when
our driver must use a separate handle, as described above.
Add an optional "child handle" field to struct efi_device (on the
assumption that we will not have any drivers that need to create
multiple children), and generalise efidev_find() to match on either
the original handle or the child handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI service binding abstraction is used to add and remove child
handles for multiple different protocols. Provide a common interface
for doing so.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 4c5b794 ("[efi] Use the SNP protocol instance to match the SNP
chainloading device") switched the chainloaded device matching logic
to use a target protocol instance rather than the loaded image's
device handle, on the basis that we want to bind to the parent SNP
device rather than to a duplicate SNP protocol instance installed onto
an IPv4 or IPv6 child device handle.
It is possible that our calls to DisconnectController() and
ConnectController() will cause the target protocol instance to be
uninstalled and reinstalled, which may change the value of the
protocol instance pointer. Allow for this by identifying and matching
against the uppermost handle that initially has this target protocol
instance installed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When booted via HTTP, our loaded image's device path will include the
URI from which we were downloaded. Set this as the current working
URI, so that an embedded script may perform subsequent downloads
relative to the iPXE binary, or construct explicit relative paths via
the ${cwduri} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE maintains a concept of a current working URI, which is used when
resolving relative URIs and allows scripts to download files using
URIs relative to the script itself.
There are situations in which it is valuable for a script to be able
to access the URI explicitly as a string, not just implicitly as a
base URI for subsequent downloads. For example, when booting a Fedora
installer, the "inst.repo" command-line parameter may be used to pass
the URI of the repository to the installer.
Expose the current working URI as ${cwuri}. Since relative URIs may
be constructed as strings only from a directory URI (not from a full
URI), also expose the current working directory URI as ${cwduri}.
This feature may be used as e.g.
#!ipxe
echo Booting from ${cwuri}
prompt -k 0x197e -t 2000 Press F12 to install Fedora... || exit
kernel images/pxeboot/vmlinux inst.repo=${cwduri}
initrd images/pxeboot/initrd.img
boot
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Mellanox/Nvidia UEFI driver is built from the same codebase as the
iPXE driver, and appears to contain the bug that was fixed in commit
c11734e ("[golan] Use ETH_HLEN for inline header size"). This results
in identical failures when using the SNP or NII interface (via
e.g. snponly.efi) to drive a Mellanox card while EAPoL is enabled.
Work around the underlying UEFI driver bug by padding transmit I/O
buffers to the minimum Ethernet frame length before passing them to
the underlying driver's transmit function.
This padding is not technically necessary, since almost all modern
hardware will insert transmit padding as necessary (and where the
hardware does not support doing so, the underlying UEFI driver is
responsible for adding any necessary padding). However, it is
guaranteed to be harmless (other than a miniscule performance impact):
the Ethernet specification requires zero padding up to the minimum
frame length for packets that are transmitted onto the wire, and so
the receiver will see the same packet whether or not we manually
insert this padding in software.
The additional padding causes the underlying Mellanox driver to avoid
its faulty code path, since it will never be asked to transmit a very
short packet.
Tested-by: Eric Hagberg <ehagberg@janestreet.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The driver does not correctly handle very short transmitted packets
such as EAPoL-Start where the entire DMA content lies within the
current send work queue entry inline header length of 18 bytes.
Fix by reducing the inline header length to the Ethernet frame header
length of 14 bytes.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Older versions of gcc (observed with gcc 4.8.5 on CentOS 7) complain
about having the label "err_ioremap" at the end of a compound
statement in bios_mp_start_all(). The label is correctly placed,
since it immediately follows the iounmap() that would be required to
undo a successful ioremap() in the non-error case.
Fix by adding an explicit "return" immediately after the label.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some SNP implementations (observed with a wifi adapter in a Dell
Latitude 3440 laptop) seem to require additional space in the
allocated receive buffers, otherwise full-length packets will be
silently dropped.
The EDK2 MnpDxe driver happens to allocate an additional 8 bytes of
padding (4 for a VLAN tag, 4 for the Ethernet frame checksum). Match
this behaviour since drivers are very likely to have been tested
against MnpDxe.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Intel and AMD distribute microcode updates, which are typically
applied by the BIOS and/or the booted operating system.
BIOS updates can be difficult to obtain and cumbersome to apply, and
are often neglected. Operating system updates may be subject to
strict change control processes, particularly for production
workloads. There is therefore value in being able to update the
microcode at boot time using a freshly downloaded microcode update
file, particularly in scenarios where the physical hardware and the
installed operating system are controlled by different parties (such
as in a public cloud infrastructure).
Add support for parsing Intel and AMD microcode update images, and for
applying the updates to all CPUs in the system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an implementation of the iPXE multiprocessor API for BIOS,
based on sending broadcast INIT and SIPI interprocessor interrupts to
start up all application processors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Application processors are started via INIT and SIPI interprocessor
interrupts: the INIT places the processor into a "wait for SIPI"
state, and the SIPI then starts the processor in real mode at a
page-aligned address derived from the SIPI vector number.
Add support for installing a real-mode SIPI handler that will switch
the CPU into protected mode with flat physical addressing, load
initial register contents, and then jump to the address of a
protected-mode SIPI handler. No stack pointer is set up, to avoid the
need to allocate stack space for each available processor.
We use 32-bit physical addressing in order to minimise the changes
required for a 64-bit build. The existing long mode transition code
relies on the existence of the stack, so we cannot easily switch the
application processor into long mode. We could use 32-bit virtual
addressing, but this runtime environment does not currently exist
outside of librm.S itself in a 64-bit build, and using it would
complicate the implementation of the protected-mode SIPI handler.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an implementation of the iPXE multiprocessor API for EFI,
based on using EFI_MP_SERVICES to start up a wrapper function on all
application processors.
Note that the processor numbers used by EFI_MP_SERVICES are opaque
integers that bear no relation to the underlying CPU identity
(e.g. the APIC ID), and so we must rely on our own (architecture-
specific) implementation to determine the relevant CPU identifiers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define an API for executing very limited functions on application
processors in a multiprocessor system, along with an x86-only
implementation.
The normal iPXE runtime environment is effectively non-existent on
application processors. There is no ability to make firmware calls
(e.g. to write to a console), and there may be no stack space
available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The return status from efi_block_local() indicates whether or not the
handle is eligible to be assigned a local virtual drive number. There
will always be several enumerated EFI_BLOCK_IO_PROTOCOL handles that
are not eligible for a local virtual drive number (e.g. the handles
corresponding to partitions, rather than to complete disks), and this
is not an interesting error to report.
Do not report errors from efi_block_local() as the overall error
status for a SAN boot, since doing so would be likely to mask a much
more relevant error from having previously attempted to scan for a
matching filesystem within an eligible block device handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a "--label" option that can be used to specify a filesystem label,
to be matched against the FAT volume label.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an "--extra" option that can be used to specify an extra
(non-boot) filename that must exist within the booted filesystem.
Note that only files within the FAT-formatted bootable partition will
be visible to this filter. Files within the operating system's root
disk (e.g. "/etc/redhat-release") are not generally accessible to the
firmware and so cannot be used as the existence check filter filename.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a "--uuid" option which may be used to specify a boot device UUID,
to be matched against the GPT partition GUID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no API for determining the partition GUID (if any) for a
specified device handle. The partition GUID appears to be exposed
only as part of the device path.
Add efi_path_guid() to extract the partition GUID (if any) from a
device path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The drive specification alone does not necessarily contain enough
information to perform a SAN boot (or local disk boot) under UEFI. If
the next-stage bootloader is installed in the EFI system partition
under a non-standard name (e.g. "\EFI\debian\grubx64.efi") then this
explicit boot filename must also be specified.
Generalise this concept to use a "SAN boot configuration parameters"
structure (currently containing only the optional explicit boot
filename), to allow for easy expansion to provide other parameters
such as the partition UUID or volume label.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the EFI SAN boot code to allow for booting from a local disk,
as is already possible with the BIOS SAN boot code.
There is unfortunately no direct UEFI equivalent of the BIOS drive
number. The UEFI shell does provide numbered mappings fs0:, blk0:,
etc, but these numberings exist only while the UEFI shell is running
and are not necessarily stable between shell invocations or across
reboots.
A substantial amount of existing third-party documentation for iPXE
will suggest using "sanboot --drive 0x80" to boot from a local disk
(when no SAN drives are present), since this suggestion has been
present in the official documentation for the "sanboot" command for
almost thirteen years. We therefore aim to ensure that this
instruction will also work for UEFI, i.e. that in a situation where
there are local disks but no SAN disks, then the first local disk will
be treated as being drive 0x80.
We therefore assign local disks the virtual drive numbers 0x80, 0x81,
etc, matching the numbering typically used in a BIOS environment.
Where a SAN disk is already occupying one of these drive numbers, the
local disks' virtual drive numbers will be incremented as necessary.
This provides a rough approximation of the equivalent functionality
under BIOS, where existing local disks' drive numbers are remapped to
make way for SAN disks.
We do not make any attempt to sort the list of local disks: the order
used for allocating virtual drive numbers will be whatever order is
returned by LocateHandle(). This will typically match the creation
order of the EFI handles, which will typically match the hardware
enumeration order of the devices, which will typically match user
expectations as to which local disk is first, second, etc.
We explicitly do not attempt to match the numbering used by the UEFI
shell (which initially sorts in increasing order of device path, but
does not renumber when new devices are added or removed). We can
never guarantee matching this partly transient UEFI shell numbering,
so it is best not to set any expectation that it will be matched.
(Using local drive numbers starting at 0x80 helps to avoid setting up
this impossible expectation, since the UEFI shell uses local drive
numbers starting at zero.)
Since floppy disks are essentially non-existent in any plausible UEFI
system, overload "--drive 0" to mean "boot from any drive containing
the specified (or default) boot filename".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Maintain the SAN device list in order of drive number, and provide
sandev_next() to locate the first SAN device at or above a given drive
number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SAN devices created by iPXE are visible to the firmware, and may be
accessed using the firmware's standard block I/O device interface
(e.g. INT 13 for BIOS, or EFI_BLOCK_IO_PROTOCOL for UEFI). The iPXE
code to perform a SAN boot acts as a client of this standard block I/O
device interface, even when the underlying block I/O is being
performed by iPXE itself.
We rely on this separation to allow the "sanboot" command to be used
to boot from a local disk: since the code to perform a SAN boot does
not need direct access to an underlying iPXE SAN device, it may be
used to boot from any device providing the firmware's standard block
I/O device interface.
Clean up the EFI SAN boot code to require only a drive number and an
EFI_BLOCK_IO_PROTOCOL handle, in preparation for adding support for
booting from a local disk under UEFI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "sanboot" command allows a custom boot filename to be specified
via the "--filename" option. We currently rely on LoadImage() to
perform both the existence check and to load the image ready for
execution. This may give a false negative result if Secure Boot is
enabled and the boot file is not correctly signed.
Carry out the existence check using EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
separately from loading the image via LoadImage().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the SAN device pointer as the debug message stream
identifier. This pointer is not always available: for example, when
booting from a local disk there is no underlying SAN device.
Switch to using the drive number as the debug message colour stream
identifier, so that all block device debug messages may be colourised
consistently.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently call ConvertDevicePathToText() with DisplayOnly=TRUE when
constructing a device path to appear within a debug message. For
ATAPI device paths, this will unfortunately omit some key information:
the textual representation will not indicate which ATA bus or drive is
represented. This can lead to misleading debug messages that appear
to refer to identical devices.
Fix by setting DisplayOnly=FALSE to select the long form of device
path textual representations.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ":uuid" and ":guid" settings types are currently format-only: it
is possible to format a setting as a UUID (via e.g. "show foo:uuid")
but it is not currently possible to parse a string into a UUID setting
(via e.g. "set foo:uuid 406343fe-998b-44be-8a28-44ca38cb202b").
Use uuid_aton() to implement parsing of these settings types, and add
appropriate test cases for both.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add uuid_aton() to parse a UUID value from a string (analogous to
inet_aton(), inet6_aton(), sock_aton(), etc), treating it as a
32-digit hex string with optional hyphen separators. The placement of
the separators is not checked: each byte within the hex string may be
separated by a hyphen, or not separated at all.
Add dedicated self-tests for UUID parsing and formatting (already
partially covered by the ":uuid" and ":guid" settings self-tests).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI shim installs wrappers around several boot services functions
before invoking its next stage bootloader, in an attempt to enforce
its desired behaviour upon the aforementioned bootloader. For
example, shim checks that the bootloader has either invoked
StartImage() or has called into the "shim lock protocol" before
allowing an ExitBootServices() call to proceed.
When invoking a shim, iPXE will also install boot services function
wrappers in order to work around assorted bugs in the UEFI shim code
that would otherwise prevent it from being used to boot a kernel. For
details on these workarounds, see commits 28184b7 ("[efi] Add support
for executing images via a shim") and 5b43181 ("[efi] Support versions
of shim that perform SBAT verification").
Using boot services function wrappers in this way is not intrinsically
problematic, provided that wrappers are installed before starting the
wrapped program, and uninstalled only after the wrapped program exits.
This strict ordering requirement ensures that all layers of wrappers
are called in the expected order, and that no calls are issued through
a no-longer-valid function pointer.
Unfortunately, the UEFI shim does not respect this strict ordering
requirement, and will instead uninstall (and reinstall) its wrappers
midway through the execution of the wrapped program. This leaves the
wrapped program with an inconsistent view of the boot services table,
leading to incorrect behaviour.
This results in a boot failure when a first shim is used to boot iPXE,
which then uses a second shim to boot a Linux kernel:
- First shim installs StartImage() and ExitBootServices() wrappers
- First shim invokes iPXE via its own PE loader
- iPXE installs ExitBootServices() wrapper
- iPXE invokes second shim via StartImage()
At this point, the first shim's StartImage() wrapper will illegally
uninstall its ExitBootServices() wrapper, without first checking that
nothing else has modified the ExitBootServices function pointer. This
effectively bypasses iPXE's own ExitBootServices() wrapper, which
causes a boot failure since the code within that wrapper does not get
called.
A proper fix would be for shim to install its wrappers before starting
the image and uninstall its wrappers only after the started image has
exited. Instead of repeatedly uninstalling and reinstalling its
wrappers while the wrapped program is running, shim should simply use
a flag to keep track of whether or not it needs to modify the
behaviour of the wrapped calls.
Experience shows that there is unfortunately no point in trying to get
a fix for this upstreamed into shim. We therefore work around the
shim bug by removing our ExitBootServices() wrapper and moving the
relevant code into our GetMemoryMap() wrapper.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for EAP-MSCHAPv2 (note that this is not the same as
PEAP-MSCHAPv2), controllable via the build configuration option
EAP_METHOD_MSCHAPV2 in config/general.h.
Our model for EAP does not encompass mutual authentication: we will
starting sending plaintext packets (e.g. DHCP requests) over the link
even before EAP completes, and our only use for an EAP success is to
mark the link as unblocked.
We therefore ignore the content of the EAP-MSCHAPv2 success request
(containing the MS-CHAPv2 authenticator response) and just send back
an EAP-MSCHAPv2 success response, so that the EAP authenticator will
complete the process and send through the real EAP success packet
(which will, in turn, cause us to unblock the link).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC 3748 states that implementations must support the MD5-Challenge
method. However, some network environments may wish to disable it as
a matter of policy.
Allow support for MD5-Challenge to be controllable via the build
configuration option EAP_METHOD_MD5 in config/general.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add debug messages for each EAP Request and Response, and to show the
list of methods offered when sending a Nak.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several new relocations types have been added in LoongArch ABI version
2.10. In particular:
- R_LARCH_B16 (18-bit PC-relative jump)
- R_LARCH_B21 (23-bit PC-relative jump)
- R_LARCH_PCREL20_S2 (22-bit PC-relative offset)
Also relocation relaxations have been introduced. Recent GCC (13.2)
and binutils 2.41+ use these types of relocations, which confuses
elf2efi tool. As a result, iPXE EFI images for LoongArch fail to
build with the following error:
Unrecognised relocation type 103
Fix by ignoring R_LARCH_B{16,21} and R_LARCH_PCREL20_S2 (as with other
PC-relative relocations), and by ignoring relaxations (R_LARCH_RELAX).
Relocation relaxations are basically optimizations: ignoring them
results in a correct binary (although it might be suboptimal).
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Done with the help of this Perl script:
$MARKER = 'PCI_ROM'; # a regex
$AB = 1; # At Begin
@HEAD = ();
@ITEMS = ();
@TAIL = ();
foreach $fn (@ARGV) {
open(IN, $fn) or die "Can't open file '$fn': $!\n";
while (<IN>) {
if (/$MARKER/) {
push @ITEMS, $_;
$AB = 0; # not anymore at begin
}
else {
if ($AB) {
push @HEAD, $_;
}
else {
push @TAIL, $_;
}
}
}
} continue {
close IN;
open(OUT, ">$fn") or die "Can't open file '$fn' for output: $!\n";
print OUT @HEAD;
print OUT sort @ITEMS;
print OUT @TAIL;
close OUT;
# For a next file
$AB = 1;
@HEAD = ();
@ITEMS = ();
@TAIL = ();
}
Executed that script while src/drivers/ as current working directory,
provided '$(grep -rl PCI_ROM)' as argument.
Signed-off-by: Geert Stappers <stappers@stappers.it>
Inspection of the generated assembly shows that gcc will often emit
standalone implementations of frequently invoked functions such as
digest_update(), which contain no logic and exist only as syntactic
sugar.
Force inlining of these functions to reduce the overall binary size.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an implementation of the authentication portions of the MS-CHAPv2
algorithm as defined in RFC 2759, along with the single test vector
provided therein.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Certificates issued by Let's Encrypt have two options for their chain
of trust: the chain can either terminate in the self-signed ISRG Root
X1 root certificate, or in an intermediate ISRG Root X1 certificate
that is signed in turn by the self-signed DST Root CA X3 root
certificate. This is a historical artifact: when Let's Encrypt first
launched as a project, the chain ending in DST Root CA X3 was used
since existing clients would not have recognised the ISRG Root X1
certificate as a trusted root certificate.
The DST Root CA X3 certificate expired in September 2021, and so is no
longer trusted by clients (such as iPXE) that validate the expiry
times of all certificates in the certificate chain.
In order to maintain usability of certificates on older Android
devices, the default certificate chain provided by Let's Encrypt still
terminates in DST Root CA X3, even though that certificate has now
expired. On newer devices which include ISRG Root X1 as a trusted
root certificate, the intermediate version of ISRG Root X1 in the
certificate chain is ignored and validation is performed as though the
chain had terminated in the self-signed ISRG Root X1 root certificate.
On older Android devices which do not include ISRG Root X1 as a
trusted root certificate, the validation succeeds since Android
chooses to ignore expiry times for root certificates and so continues
to trust the DST Root CA X3 root certificate.
This backwards compatibility hack unfortunately breaks the cross-
signing mechanism used by iPXE, which assumes that the certificate
chain will always terminate in a non-expired root certificate.
Generalise the validator's cross-signed certificate download mechanism
to walk up the certificate chain in the event of a failure, attempting
to find a replacement cross-signed certificate chain starting from the
next level up. This allows the validator to step over the expired
(and hence invalidatable) DST Root CA X3 certificate, and instead
download the cross-signed version of the ISRG Root X1 certificate.
This generalisation also gives us the ability to handle servers that
provide a full certificate chain including their root certificate:
iPXE will step over the untrusted public root certificate and attempt
to find a cross-signed version of it instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Downloading a cross-signed certificate chain to partially replace
(rather than simply extend) an existing chain will require the ability
to discard all certificates after a specified link in the chain.
Extract the relevant logic from x509_free_chain() and expose it
separately as x509_truncate().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc (observed with gcc 4.8.5 in CentOS 7) will report
spurious build_assert() failures for some assertions about structure
layouts. There is no clear pattern as to what causes these spurious
failures, and the build assertion does succeed in that no unresolvable
symbol reference is generated in the compiled code.
Adjust the assertions to work around these apparent compiler issues.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We build with -Werror by default so that any warning is treated as an
error and aborts the build. The build system allows NO_WERROR=1 to be
used to override this behaviour, in order to allow builds to succeed
when spurious warnings occur (e.g. when using a newer compiler that
includes checks for which the codebase is not yet prepared).
Some versions of gcc (observed with gcc 4.8.5 in CentOS 7) will report
spurious build_assert() failures: the compilation will fail due to an
allegedly unelided call to the build assertion's external function
declared with __attribute__((error)) even though the compiler does
manage to successfully elide the call (as verified by the fact that
there are no unresolvable symbol references in the compiler output).
Change build_assert() to declare __attribute__((warning)) instead of
__attribute__((error)) on its extern function. This will still abort
a normal build if the assertion fails, but may be overridden using
NO_WERROR=1 if necessary to work around a spurious assertion failure.
Note that if the build assertion has genuinely failed (i.e. if the
compiler has genuinely not been able to elide the call) then the
object will still contain an unresolvable symbol reference that will
cause the link to fail (which matches the behaviour of the old
linker_assert() mechanism).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DES block cipher dates back to the 1970s. It is no longer
relevant for use in TLS cipher suites, but it is still used by the
MS-CHAPv2 authentication protocol which remains unfortunately common
for 802.1x port authentication.
Add an implementation of the DES block cipher, complete with the
extremely comprehensive test vectors published by NBS (the precursor
to NIST) in the form of an utterly adorable typewritten and hand-drawn
paper document.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A block cipher in ECB mode has no concept of an initialisation vector,
and any data provided to cipher_setiv() for an ECB cipher will be
ignored. There is no requirement within our cipher algorithm
abstraction for a dummy initialisation vector to be provided.
Remove the entirely spurious dummy 16-byte initialisation vector from
the ECB test cases.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The CBC_CIPHER() macro contains some accidentally hardcoded references
to an underlying AES cipher, instead of using the cipher specified in
the macro parameters.
Fix by using the macro parameter as required.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Coverity reported that tls_send_plaintext() failed to check the return
status from tls_generate_random(), which could potentially result in
uninitialised random data being used as the block initialisation
vector (instead of intentionally random data).
Add the missing return status check, and separate out the error
handling code paths (since on the successful exit code path there will
be no need to free either the plaintext or the ciphertext anyway).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When ExitBootServices() invokes efi_shutdown_hook(), there may be
nothing to generate an interrupt since the timer is disabled in the
first step of ExitBootServices(). Additionally, for VMs OVMF masks
everything from the PIC (except the timer) by default. This means
that calling cpu_nap() may hang indefinitely. This was seen in
practice in netfront_reset() when running in a VM on XenServer.
Fix this by skipping the halt if an EFI shutdown is in progress.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the choice of key exchange algorithms to be controlled via build
configuration options in config/crypto.h, as is already done for the
choices of public-key algorithms, cipher algorithms, and digest
algorithms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
DHE and ECDHE use essentially the same mechanism for verifying the
signature over the Diffie-Hellman parameters, though the format of the
parameters is different between the two methods.
Split out the verification of the parameter signature so that it may
be shared between the DHE and ECDHE key exchange algorithms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The construction of the key material for the pending cipher suites
from the TLS master secret must happen regardless of which key
exchange algorithm is in use, and the key material is not required to
send the ClientKeyExchange handshake (which is sent before changing
cipher suites).
Centralise the call to tls_generate_keys() after performing key
exchange via the selected algorithm.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define an individual local structure for each extension and a single
structure for the list of extensions. This makes it viable to add
extensions such as the Supported Elliptic Curves extension, which must
not be present if the list of curves is empty.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define an abstraction of an elliptic curve with a fixed generator and
one supported operation (scalar multiplication of a curve point).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC7748 states that it is entirely optional for X25519 Diffie-Hellman
implementations to check whether or not the result is the all-zero
value (indicating that an attacker sent a malicious public key with a
small order). RFC8422 states that implementations in TLS must abort
the handshake if the all-zero value is obtained.
Return an error if the all-zero value is obtained, so that the TLS
code will not require knowledge specific to the X25519 curve.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an implementation of the X25519 key exchange algorithm as defined
in RFC7748.
This implementation is inspired by and partially based upon the paper
"Implementing Curve25519/X25519: A Tutorial on Elliptic Curve
Cryptography" by Martin Kleppmann, available for download from
https://www.cl.cam.ac.uk/teaching/2122/Crypto/curve25519.pdf
The underlying modular addition, subtraction, and multiplication
operations are completely redesigned for substantially improved
efficiency compared to the TweetNaCl implementation studied in that
paper (approximately 5x-10x faster and with 70% less memory usage).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The slightly incomprehensible LoongArch64 implementation for
bigint_subtract() is observed to produce incorrect results for some
input values.
Replace the suspicious LoongArch64 implementations of bigint_add(),
bigint_subtract(), bigint_rol() and bigint_ror(), and add a test case
for a subtraction that was producing an incorrect result with the
previous implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a helper function bigint_swap() that can be used to conditionally
swap a pair of big integers in constant time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Big integers may be efficiently copied using bigint_shrink() (which
will always copy only the size of the destination integer), but this
is potentially confusing to a reader of the code.
Provide bigint_copy() as an alias for bigint_shrink() so that the
intention of the calling code may be more obvious.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Big integer multiplication is currently used only as part of modular
exponentiation, where both multiplicand and multiplier will be the
same size.
Relax this requirement to allow for the use of big integer
multiplication in other contexts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently implement build-time assertions via a mechanism that
generates a call to an undefined external function that will cause the
link to fail unless the compiler can prove that the asserted condition
is true (and thereby eliminate the undefined function call).
This assertion mechanism can be used for conditions that are not
amenable to the use of static_assert(), since static_assert() will not
allow for proofs via dead code elimination.
Add __attribute__((error(...))) to the undefined external function, so
that the error is raised at compile time rather than at link time.
This allows us to provide a more meaningful error message (which will
include the file name and line number, as with any other compile-time
error), and avoids the need for the caller to specify a unique symbol
name for the external function.
Change the name from linker_assert() to build_assert(), since the
assertion now takes place at compile time rather than at link time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose static_assert() via assert.h and migrate link-time assertions
to build-time assertions where possible.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Newer versions of the GNU assembler (observed with binutils 2.41) will
complain about the ".arch i386" in files assembled with "as --64",
with the message "Error: 64bit mode not supported on 'i386'".
In files such as stack.S that contain no instructions to be assembled,
the ".arch i386" is redundant and may be removed entirely.
In the remaining files, fix by moving ".arch i386" below the relevant
".code16" or ".code32" directive, so that the assembler is no longer
expecting 64-bit instructions to be used by the time that the ".arch
i386" directive is encountered.
Reported-by: Ali Mustakim <alim@forwardcomputers.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The .text directive is entirely redundant when followed by a .section
directive giving an explicit section name and attributes.
Remove these unnecessary directives to simplify the code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC 3748 states that support for MD5-Challenge is mandatory for EAP
implementations. The MD5 and CHAP code is already included in the
default build since it is required by iSCSI, and so this does not
substantially increase the binary size.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the ${netX/username} setting to be used to specify an EAP
identity to be returned in response to a Request-Identity, and provide
a mechanism for responding with a NAK to indicate which authentication
types we support.
If no identity is specified then fall back to the current behaviour of
not sending any Request-Identity response, so that switches will time
out and switch to MAC Authentication Bypass (MAB) if applicable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EAP responses (including our own) may be broadcast by switches but are
not of interest to us and can be safely ignored if received.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that .gitignore rules do not cover any files that do exist
within the repository.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support scanning for the 64-bit SMBIOS3 entry point in addition to the
32-bit SMBIOS2 entry point.
Prefer use of the 32-bit entry point if present, since this is
guaranteed to be within accessible memory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add definitions for relocation types that may be missing on older
versions of the host system's elf.h.
This mirrors wimboot commit 47f6298 ("[efi] Add potentially missing
relocation types").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The result of multiplying a uint16_t by another uint16_t will be a
signed int. Comparing this against a size_t will perform an unwanted
sign extension.
Fix by explicitly casting e_phnum to an unsigned int, thereby matching
the data type used for the loop index variable (and avoiding the
unwanted sign extension).
This mirrors wimboot commit 15f6162 ("[efi] Fix Coverity warning about
unintended sign extension").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add additional PC-relative relocation types that may be encountered
when converting binaries compiled with clang.
This mirrors the relevant elf2efi portions of wimboot commit 7910830
("[build] Support building with the clang compiler").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The clang compiler does not (and apparently will not ever) allow for
variable-length arrays within structs.
Work around this limitation by using a fixed-length array to hold the
PDB filename in the debug section.
This mirrors wimboot commit f52c3ff ("[efi] Allow compiling elf2efi
with clang").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The function efi_pecoff_debug_name() (called by efi_handle_name()) is
used to extract a filename from the debug data directory entry located
within a PE/COFF image. The name is copied into a temporary static
buffer to allow for modifications, but the code currently erroneously
modifies the original name within the loaded PE/COFF image.
Fix by performing the modification on the copy in the temporary
buffer, as originally intended.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) may place sections
at explicit offsets within the PE file, as described in commit b30a098
("[efi] Use load memory address as file offset for hybrid binaries").
This can leave a gap after the PE headers that is not covered by any
section. It is not entirely clear whether or not such gaps are
permitted in binaries submitted for Secure Boot signing.
To minimise potential problems, extend the PE header size to cover any
space before the first explicitly placed section.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE images are linked with a starting virtual address of zero. Other
images (such as wimboot) may use a non-zero starting virtual address.
There is no direct equivalent of the PE ImageBase address field within
ELF object files. Choose to use the highest possible address that
accommodates all sections and the PE header itself, since this will
minimise the memory allocated to hold the loaded image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The BaseOfCode (and, in PE32, BaseOfData) fields imply an assumption
that binaries are laid out as code followed by initialised data
followed by uninitialised data. This assumption may not be valid for
complex binaries such as wimboot.
Remove this implicit assumption, and use arguably justifiable values
for the assorted summary start and size fields within the PE headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
sections such as .bss16 that do not need to consume an entry in the PE
section list. Treat any such sections as hidden.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PE debug information generated by elf2efi is used only to hold the
image filename, and the debug information is located via the relevant
data directory entry rather than via the section table.
Make the .debug section a hidden section in order to save one entry in
the PE section list. Choose to place the debug information in the
unused space at the end of the PE headers, since it no longer needs to
satisfy the general section alignment constraints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 1e4c378 ("[efi] Shrink size of data directory in PE header")
reduced the number of entries used in the data directory and reduced
the recorded size of the NT "optional" header, but did not also adjust
the recorded overall size of the PE headers, resulting in unused space
between the PE headers and the first section.
Fix by reducing the initial recorded size of the PE headers by the
size of the omitted data directory entries.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) include a bzImage
header within a section starting at offset zero, with the PE header
effectively occupying unused space within this section.
Allow for this by treating a section placed at offset zero as hidden,
and by deferring the writing of the PE header until after the output
sections have been written.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) may be loaded as a
single contiguous blob without reference to the PE headers, and the
placement of sections within the PE file must therefore be known at
link time.
Use the load memory address (extracted from the ELF program headers)
to determine the physical placement of the section within the PE file
when generating a hybrid binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The images generated by elf2efi can be loaded anywhere in the address
space, and are not limited to the low 2GB.
Indicate this by setting the "large address aware" flag within the PE
header, for compatibility with EFI images generated by the EDK2 build
process. (The EDK2 PE loader does not ever check this flag, and it is
unlikely that any other EFI PE loader ever does so, but we may as well
report it accurately.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Indicate that the binary is compatible with W^X protections by setting
the NXCOMPAT bit in the DllCharacteristics field of the PE header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
executable code that is opaque data from the perspective of a UEFI PE
binary, as described in wimboot commit fe456ca ("[efi] Use separate
.text and .data PE sections").
The ELF section will be marked as containing both executable code and
writable data. Choose to treat such a section as a data section
rather than a code section, since that matches the expected semantics
for ELF files that we expect to process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some AWS instance types still do not support serial console output or
screenshots. For these instance types, the only viable way to extract
debugging information is to use the INT13 console (which is already
enabled via CONFIG=cloud for all AWS images).
Obtaining the INT13 console output can be very cumbersome, since there
is no direct way to read from an AWS volume. The simplest current
approach is to stop the instance under test, detach its root volume,
and reattach the volume to a Linux instance in the same region.
Add a utility script aws-int13con to retrieve the INT13 console output
by creating a temporary snapshot, reading the first block from the
snapshot, and extracting the INT13 console partition content.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AMI names must be unique within a region. Add a --overwrite option
that allows an existing AMI of the same name to be deregistered (and
its underlying snapshot deleted).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EAP exchanges may take a long time to reach a final status, especially
when relying upon MAC Authentication Bypass (MAB). Our current
behaviour of sending EAPoL-Start every few seconds until a final
status is obtained can prevent these exchanges from ever completing.
Fix by redefining the EAP supplicant state to allow EAPoL-Start to be
suppressed: either temporarily (while waiting for a full EAP exchange
to complete, in which case we need to eventually resend EAPoL-Start if
the final Success or Failure packet is lost), or permanently (while
waiting for the potentially very long MAC Authentication Bypass
timeout period).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PCI cloud API (PCIAPI_CLOUD) currently selects the first PCI API
that successfully discovers a PCI device address range. The ECAM API
may discover an address range but subsequently be unable to map the
configuration space region, which would result in the selected PCI API
being unusable.
Fix by instead selecting the first PCI API that can be successfully
used to discover a PCI device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some machines (observed with an AWS EC2 m7a.large instance) will place
the ECAM configuration space window above 4GB, thereby making it
unreachable from non-paged 32-bit code. This problem is currently
ignored by iPXE, since the address is silently truncated in the call
to ioremap(). (Note that other uses of ioremap() are not affected
since the PCI core will already have checked for unreachable 64-bit
BARs when retrieving the physical address to be mapped.)
Fix by adding an explicit check that the region to be mapped starts
within the reachable memory address space. (Assume that no machines
will be sufficiently peverse to provide a region that straddles the
4GB boundary.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an error occurs during ECAM configuration space mapping, preserve
the error within the existing cached mapping (instead of invalidating
the cached mapping) in order to avoid flooding the debug log with
repeated identical mapping errors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The base address provided in the PCI ECAM allocation within the ACPI
MCFG table is the base address for the segment as a whole, not for the
starting bus within that allocation. On machines that provide ECAM
allocations with a non-zero starting bus number (observed with an AWS
EC2 m7a.large instance), this will result in iPXE accessing the wrong
memory addresses within the ECAM region.
Fix by adding the appropriate starting bus offset to the base address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PCIe specification requires that "processor and host bridge
implementations must ensure that a method exists for the software to
determine when the write using the ECAM is completed by the completer"
but does not specify any particular method to be used. Some platforms
might treat writes to the ECAM region as non-posted, others might
require reading back from a dedicated (and implementation-specific)
completion register to determine when the configuration space write
has completed.
Since PCI configuration space writes will never be used for any
performance-critical datapath operations (on any sane hardware), a
simple and platform-independent solution is to always read back from
the written register in order to guarantee that the write must have
completed. This is safe to do, since the PCIe specification defines a
limited set of configuration register types, none of which have read
side effects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ipair_tx() function uses a va_list twice (first to calculate the
formatted string length before allocation, then to construct the
string in the allocated buffer) but is missing the va_start() and
va_end() around the second usage. This is undefined behaviour that
happens to work on some build platforms.
Fix by adding the missing va_start() and va_end() around the second
usage of the variadic argument list.
Reported-by: Andreas Hammarskjöld <andreas@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the number of timer ticks since power-on as a seed
for the non-cryptographic RNG implemented by random(). Since iPXE is
often executed directly after power-on, and since the timer tick
resolution is generally low, this can often result in identical seed
values being used on each cold boot attempt.
As of commit 41f786c ("[settings] Add "unixtime" builtin setting to
expose the current time"), the current wall-clock time is always
available within the default build of iPXE. Use this time instead, to
introduce variability between cold boot attempts on the same host.
(Note that variability between different hosts is obtained by using
the MAC address as an additional seed value.)
This has no effect on the separate DRBG used by cryptographic code.
Suggested-by: Heiko <heik0@xs4all.nl>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We have no way to force a link-layer restart in iPXE, and therefore no
way to explicitly trigger a restart of EAP authentication. If an iPXE
script has performed some action that requires such a restart
(e.g. registering a device such that the port VLAN assignment will be
changed), then the only means currently available to effect the
restart is to reboot the whole system. If iPXE is taking over a
physical link already used by a preceding bootloader, then even a
reboot may not work.
In the EAP model, the supplicant is a pure responder and never
initiates transmissions. EAPoL extends this to include an EAPoL-Start
packet type that may be sent by the supplicant to (re)trigger EAP.
Add support for sending EAPoL-Start packets at two-second intervals on
links that are open and have reached physical link-up, but for which
EAP has not yet completed. This allows "ifclose ; ifopen" to be used
to restart the EAP process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the EAP model to include a record of whether or not EAP
authentication has completed (successfully or otherwise), and to
provide a method for transmitting EAP responses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the FCoE code by using driver-private data to hold the FCoE
port for each network device, instead of using a separate allocation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the per-netdevice GuestInfo settings code by using
driver-private data to hold the settings block, instead of using a
separate allocation.
The settings block (if existent) will be automatically unregistered
when the parent network device settings block is unregistered, and no
longer needs to be separately freed. The guestinfo_net_remove()
function may therefore be omitted completely.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the IPv6 link-local settings code by using driver-private
data to hold the settings block, instead of using a separate
allocation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the LLDP code by using driver-private data to hold the LLDP
settings block, instead of using a separate allocation. This avoids
the need to maintain a list of LLDP settings blocks (since the LLDP
settings block pointer can always be obtained using netdev_priv()) and
obviates several failure paths.
Any recorded LLDP data is now freed when the network device is
unregistered, since there is no longer a dedicated reference counter
for the LLDP settings block. To minimise surprise, we also now
explicitly unregister the settings block. This is not strictly
necessary (since the block will be automatically unregistered when the
parent network device settings block is unregistered), but it
maintains symmetry between lldp_probe() and lldp_remove().
The overall reduction in the size of the LLDP code is around 15%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow network upper-layer drivers (such as LLDP, which attaches to
each network device in order to provide a corresponding LLDP settings
block) to specify a size for private data, which will be allocated as
part of the network device structure (as with the existing private
data allocated for the underlying device driver).
This will allow network upper-layer drivers to be simplified by
omitting memory allocation and freeing code. If the upper-layer
driver requires a reference counter (e.g. for interface
initialisation), then it may use the network device's existing
reference counter, since this is now the reference counter for the
containing block of memory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some network device drivers use the trivial netdev_priv() helper
function while others use the netdev->priv pointer directly.
Standardise on direct use of netdev->priv, in order to free up the
function name netdev_priv() for reuse.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use "push $1f" within inline assembly to push the address
of the real-mode code fragment, relying on the assembler to treat this
as "pushl" for 32-bit code or "pushq" for 64-bit code.
As of binutils commit 5cc0077 ("x86: further adjust extend-to-32bit-
address conditions"), first included in binutils-2.41, this implicit
operand size is no longer calculated as expected and 64-bit builds
will fail with
Error: operand size mismatch for `push'
Fix by adding an explicit operand size to the "push" instruction.
Originally-fixed-by: Justin Cano <jstncno@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current implementation of vpm_ioread32() erroneously reads only 16
bits of data, which fails when used with the (stricter) virtio device
emulation in VirtualBox.
Fix by using the correct readl()/inl() I/O wrappers.
Reworded-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define the IPv4 NTP server setting to simplify the use of a
DHCP-provided NTP server in scripts, using e.g.
#!ipxe
dhcp
ntp ${ntp}
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 3ef4f7e ("[console] Avoid overlap between special keys and
Unicode characters") renumbered the special key encoding to avoid
collisions with Unicode key values outside the ASCII range. This
change broke backwards compatibility with existing scripts that
specify key values using e.g. "prompt --key" or "menu --key".
Restore compatibility with existing scripts by tweaking the special
key encoding so that the relative key value (i.e. the delta from
KEY_MIN) is numerically equal to the old pre-Unicode key value, and by
modifying parse_key() to accept a relative key value.
Reported-by: Sven Dreyer <sven@dreyer-net.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid the need to always specify a local MAC address on the command
line by setting a default hardware MAC address (using the same default
address as for slirp devices).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A running link block timer holds a reference to the network device and
will prevent it from being freed until the timer expires. It is
impossible for free_netdev() to be called while the timer is still
running: the call to stop_timer() therein is therefore a no-op.
Stop the link block timer when the device is closed, to allow a
link-blocked device to be freed immediately upon unregistration of the
device. (Since link block state is updated in response to received
packets, the state is effectively undefined for a closed device: there
is therefore no reason to leave the timer running.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The interface debug message values constructed by INTF_DBG() et al
rely on the interface being embedded within a containing object. This
assumption is not valid for the temporary outbound-only interfaces
constructed on the stack by intf_shutdown() and xfer_vredirect().
Formalise the notion of a temporary outbound-only interface as having
a NULL interface descriptor, and overload the "original interface
descriptor" field to contain a pointer to the original interface that
the temporary interface is shadowing.
Originally-fixed-by: Vincent Fazio <vfazio@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The special key range (from KEY_MIN upwards) currently overlaps with
the valid range for Unicode characters, and therefore prohibits the
use of Unicode key values outside the ASCII range.
Create space for Unicode key values by moving the special keys to the
range immediately above the maximum valid Unicode character. This
allows the existing encoding of special keys as an efficiently packed
representation of the equivalent ANSI escape sequence to be maintained
almost as-is.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The keyboard remapping flags currently occupy bits 8 and upwards of
the to-be-mapped character value. This overlaps the range used for
special keys (KEY_MIN and upwards) and also overlaps the valid Unicode
character range.
No conflict is created by this overlap, since by design only ASCII
character values (as generated by an ASCII-only keyboard driver) are
subject to remapping, and so the to-be-remapped character values exist
in a conceptually separate namespace from either special keys or
non-ASCII Unicode characters. However, the overlap is potentially
confusing for readers of the code.
Minimise cognitive load by using bits 24 and upwards for the keyboard
remapping flags.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of ld will complain that the automatically created (and
unused by our build process) ELF program headers include a "LOAD
segment with RWX permissions".
Silence this warning by adding "-z separate-code" to the linker
options, where supported.
For BIOS builds, where the prefix will generally require writable
access to its own (tiny) code segment, simply inhibit the warning
completely via "--no-warn-rwx-segments".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Multiple target patterns in pattern rules are treated as grouped
targets regardless of the separator character. Newer verions of make
will generate "warning: pattern recipe did not update peer target" to
warn that the rule was expected to update all of the (implicitly)
grouped targets.
Fix by splitting all multiple target pattern rules into single target
pattern rules.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no common standard for I/O-space access for non-x86 CPU
families, and non-MMIO peripherals are vanishingly rare.
Generalise the existing ARM definitions for dummy PIO to allow for
reuse by other CPU architectures.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PCI I/O API (supporting accesses to PCI configuration space) is
not related to the general I/O API (supporting accesses to
memory-mapped I/O peripherals).
Remove the spurious inclusion of ipxe/io.h from the PCI I/O header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
While not guaranteed by the UEFI specification, the enumeration of
handles, protocols, and openers will generally return results in order
of creation. Processing these objects in reverse order (as is already
done when calling DisconnectController() on the list of all handles)
will generally therefore perform the forcible uninstallation
operations in reverse order of object creation, which minimises the
number of implicit operations performed (e.g. when disconnecting a
controller that itself still has existent child controllers).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification states that the AgentHandle may be either the
driving binding protocol handle or the image handle.
Check for both handles when searching for stale handles to be forcibly
closed on behalf of a vetoed driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In most cases, the driver handle will be the image handle itself.
However, this is not required by the UEFI specification, and some
images will install multiple driver binding handles.
Use the image handle (extracted from the driver binding protocol
instance) when attempting to unload the driver's image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Pass the driver binding handle, the driver binding protocol instance,
the image handle, and the loaded image protocol instance to all veto
methods.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the process of adding new entries to the veto list by
including the manufacturer name within the standard debug output.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Polling for TX completions is arguably redundant when there are no
transmissions currently in progress. Commit c6c7e78 ("[efi] Poll for
TX completions only when there is an outstanding TX buffer") switched
to setting the PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS flag only when
there is an in-progress transmission awaiting completion, in order to
reduce reported TX errors and debug message noise from buggy NII
implementations that report spurious TX completions whenever the
transmit queue is empty.
Some other NII implementations (observed with the Realtek driver in a
Dell Latitude 3440) seem to have a bug in the transmit datapath
handling which results in the transmit ring freezing after sending a
few hundred packets under heavy load. The symptoms are that the
TPPoll register's NPQ bit remains set and the 256-entry transmit ring
contains a large number of uncompleted descriptors (with the OWN bit
set), the first two of which have identical data buffer addresses.
Though iPXE will submit at most one in-progress transmission via NII,
the Dell/Realtek driver seems to make a page-aligned copy of each
transmit data buffer and to report TX completions immediately without
waiting for the packet to actually be transmitted. These synthetic TX
completions continue even after the hardware transmit ring freezes.
Setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll reduces the
probability of this Dell/Realtek driver bug being triggered by a
factor of around 500, which brings the failure rate down to the point
that it can sensibly be managed by external logic such as the
"--timeout" option for image downloads. Closing and reopening the
interface (via "ifclose"/"ifopen") will clear the error condition and
allow transmissions to resume.
Revert to setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll,
and silently ignore situations in which the hardware reports a
completion when no transmission is in progress. This approximately
matches the behaviour of the SnpDxe driver, which will also generally
set PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI variables do not map neatly to the iPXE settings mechanism, since
the EFI variable identifier includes a namespace GUID that cannot
cleanly be supplied as part of a setting name. Creating a new EFI
variable requires the variable's attributes to be specified, which
does not fit within iPXE's settings concept.
However, EFI variable names are generally unique even without the
namespace GUID, and EFI does provide a mechanism to iterate over all
existent variables. We can therefore provide read-only access to EFI
variables by comparing only the names and ignoring the namespace
GUIDs.
Provide an "efi" settings block that implements this mechanism using a
syntax such as:
echo Platform language is ${efi/PlatformLang:string}
show efi/SecureBoot:int8
Settings are returned as raw binary values by default since an EFI
variable may contain boolean flags, integer values, ASCII strings,
UCS-2 strings, EFI device paths, X.509 certificates, or any other
arbitrary blob of data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EDK2 UefiPxeBcDxe driver includes some remarkably convoluted and
unsafe logic in its driver binding protocol Start() and Stop() methods
in order to support a pair of nominally independent driver binding
protocols (one for IPv4, one for IPv6) sharing a single dynamically
allocated data structure. This PXEBC_PRIVATE_DATA structure is
installed as a dummy protocol on the NIC handle in order to allow both
IPv4 and IPv6 driver binding protocols to locate it as needed.
The error handling code path in the UefiPxeBcDxe driver's Start()
method may attempt to uninstall the dummy protocol but fail to do so.
This failure is ignored and the containing memory is subsequently
freed anyway. On the next invocation of the driver binding protocol,
it will find and use this already freed block of memory. At some
point another memory allocation will occur, the PXEBC_PRIVATE_DATA
structure will be corrupted, and some undefined behaviour will occur.
The UEFI firmware used in VMware ESX 8 includes some proprietary
changes which attempt to install copies of the EFI_LOAD_FILE_PROTOCOL
and EFI_PXE_BASE_CODE_PROTOCOL instances from the IPv4 child handle
onto the NIC handle (along with a VMware-specific protocol with GUID
5190120d-453b-4d48-958d-f0bab3bc2161 and a NULL instance pointer).
This will inevitably fail with iPXE, since the NIC handle already
includes an EFI_LOAD_FILE_PROTOCOL instance.
These VMware proprietary changes end up triggering the unsafe error
handling code path described above. The typical symptom is that an
attempt to exit from iPXE back to the UEFI firmware will crash the VM
with a General Protection fault from within the UefiPxeBcDxe driver:
this happens when the UefiPxeBcDxe driver's Stop() method attempts to
call through a function pointer in the (freed) PXEBC_PRIVATE_DATA
structure, but the function pointer has by then been overwritten by
UCS-2 character data from an unrelated memory allocation.
Work around this failure by adding the VMware UefiPxeBcDxe driver to
the driver veto list.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The old IPv4-only IScsiDxe driver in MdeModulePkg/Universal/Network
was replaced by a dual-stack IScsiDxe driver in NetworkPkg.
Add the module GUID for this driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EDK2 headers may be included even in builds for non-EFI platforms.
Commits such as 9de6c45 ("[arm] Use -fno-short-enums for all 32-bit
ARM builds") have so far ensured that the compile-time checks within
the EDK2 headers will pass even when building for a non-EFI platform.
As a more general solution, temporarily disable static assertions
while including UefiBaseType.h if building on a non-EFI platform.
This avoids the need to modify the ABI on other platforms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "shim" command will skip downloading the shim binary (and is
therefore a conditional no-op) if there is already a selected EFI
image that can be executed directly via LoadImage()/StartImage().
This allows the same iPXE script to be used with Secure Boot either
enabled or disabled.
Generalise this further to provide a dummy "shim" command that is an
unconditional no-op on non-EFI platforms. This then allows the same
iPXE script to be used for BIOS, EFI with Secure Boot disabled, or EFI
with Secure Boot enabled.
The same effect could be achieved by using "iseq ${platform} efi"
within the script, but this would complicate end-user documentation.
To minimise the code size impact, the dummy "shim" command is a pure
no-op that does not call parse_options() and so will ignore even
standardised arguments such as "--help".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI shim implements a fairly nicely designed revocation mechanism
designed around the concept of security generations. Unfortunately
nobody in the shim community has thus far added the relevant metadata
to the Linux kernel, with the result that current versions of shim are
incapable of booting current versions of the Linux kernel.
Experience shows that there is unfortunately no point in trying to get
a fix for this upstreamed into shim. We therefore default to working
around this undesirable behaviour by patching data read from the
"SbatLevel" variable used to hold SBAT configuration.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow a shim to be used to facilitate booting a kernel using a script
such as:
kernel /images/vmlinuz console=ttyS0,115200n8
initrd /images/initrd.img
shim /images/shimx64.efi
boot
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for using a shim as a helper to execute an EFI image.
When a shim has been specified via shim(), the shim image will be
passed to LoadImage() instead of the selected EFI image and the
command line will be prepended with the name of the selected EFI
image. The selected EFI image will be accessible to the shim via the
virtual filesystem as a hidden file.
Reduce the Secure Boot attack surface by removing, where possible, the
spurious requirement for a third party second stage loader binary such
as GRUB to be used solely in order to call the "shim lock protocol"
entry point.
Do not install the EFI PXE APIs when using a shim, since if shim finds
EFI_PXE_BASE_CODE_PROTOCOL on the loaded image's device handle then it
will attempt to download files afresh instead of using the files
already downloaded by iPXE and exposed via the EFI_SIMPLE_FILE_SYSTEM
protocol. (Experience shows that there is no point in trying to get a
fix for this upstreamed into shim.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI shim includes a "shim lock protocol" that can be used by a
third party second stage loader such as GRUB to verify a kernel image.
Add definitions for the relevant portions of this protocol interface.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most image flags are independent values: any combination of flags may
be set for any image, and the flags for one image are independent of
the flags for any other image. The "selected" flag does not follow
this pattern: at most one image may be marked as selected at any time.
When invoking a kernel via the UEFI shim, there will be multiple
"special" images: the selected kernel itself, the shim image, and
potentially a shim-signed GRUB binary to be used as a crutch to assist
shim in loading the kernel (since current versions of the UEFI shim
are not capable of directly loading a Linux kernel).
Remove the "selected" image flag and replace it with a general concept
of an image tag with the same semantics: a given tag may be assigned
to at most one image, an image may be found by its tag only while the
image is currently registered, and a tag will survive unregistration
and reregistration of an image (if it has not already been assigned to
a new image). For visual consistency, also replace the current image
pointer with a current image tag.
The image pointer stored within the image tag holds only a weak
reference to the image, since the selection of an image should not
prevent that image from being freed. (The strong reference to the
currently executing image is held locally within the execution scope
of image_exec(), and is logically separate from the current image
pointer.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An EFI image that is rejected by LoadImage() due to failing Secure
Boot verification is still an EFI image. Unfortunately, the extremely
broken UEFI Secure Boot model provides no way for us to unambiguously
determine that a valid EFI executable image was rejected only because
it failed signature verification. We must therefore use heuristics to
guess whether not an image that was rejected by LoadImage() could
still be loaded via a separate PE loader such as the UEFI shim.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The libc6-dbg:i386 package has spontaneously started failing to
install from the Azure package repositories used by the GitHub Actions
runners, with the somewhat recalcitrant error message:
libc6:i386: Depends: libgcc-s1:i386 but it is not going to be installed
Work around this unexplained issue by explicitly requesting
installation of the libgcc-s1:i386 package.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Versions 15.4 and earlier of the UEFI shim are incapable of correctly
parsing the command line in order to extract the second stage loader
filename, and will always attempt to load "grubx64.efi" or equivalent.
Versions 15.3 and later of the UEFI shim are currently incapable of
loading a Linux kernel directly anyway, since the kernel does not
include SBAT metadata. These versions will require a genuine
shim-signed GRUB binary to be used as a crutch to assist shim in
loading a Linux kernel.
This leaves versions 15.2 and earlier of the UEFI shim (as currently
used in e.g. RHEL7) as being capable of directly loading a Linux
kernel, but incorrectly attempting to load it using the filename
"grubx64.efi" or equivalent. To support the bugs in these older
versions of the UEFI shim, allow the currently selected image to be
opened via any filename of the form "grub*.efi".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When invoking a kernel via the UEFI shim, the kernel image must be
accessible via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL but must not be present
in the magic initrd constructed from all registered images.
Re-register a currently executing EFI image and mark it as hidden,
thereby allowing it to be accessed via the virtual filesystem exposed
via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL without appearing in the magic
initrd contents.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When invoking a kernel via the UEFI shim, the kernel (and potentially
also a helper binary such as GRUB) must be accessible via the virtual
filesystem exposed via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL but must not be
present in the magic initrd constructed from all registered images.
Allow for images to be flagged as hidden, which will cause them to be
excluded from API-level lists of all images such as the virtual
filesystem directory contents, the magic initrd, or the Multiboot
module list. Hidden images remain visible to iPXE commands including
"imgstat", which will show a "[HIDDEN]" flag for such images.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Show the original filename as used by the consumer when calling our
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL's Open() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Try searching for a matching registered image before checking for
fixed filenames (such as "initrd.magic" for the dynamically generated
magic initrd file). This minimises surprise by ensuring that an
explicitly downloaded image will always be used verbatim.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) include a bzImage
header within a section starting at offset zero, with the PE header
effectively occupying unused space within this section. This section
should not appear as a named section in the resulting PE file.
Allow for the existence of hidden sections that do not result in a
section header being written to the PE file.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid 32-bit BIOS and 64-bit UEFI binaries (such as wimboot) may
include R_X86_64_32 relocation records for the 32-bit BIOS portions.
These should be ignored when generating PE relocation records, since
they apply only to code that cannot be executed within the context of
the 64-bit UEFI binary, and creating a 4-byte relocation record is
invalid in a binary that may be relocated anywhere within the 64-bit
address space (see commit 907cffb "[efi] Disallow R_X86_64_32
relocations").
Add a "--hybrid" option to elf2efi, which will cause R_X86_64_32
relocation records to be silently discarded.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) require the PE
header to be kept as small as possible, since the bzImage header
starts at a fixed offset 0x1f1.
The EFI_IMAGE_OPTIONAL_HEADER structures in PeImage.h define an
optional header containing 16 data directory entries, of which the
last eight are unused in binaries that we create. Shrink the data
directory to contain only the first eight entries, to minimise the
overall size of the PE header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) require the PE
header to be kept as small as possible, since the bzImage header
starts at a fixed offset 0x1f1.
The PE header currently includes 128 bytes of zero padding between the
DOS and NT header portions. This padding has been present since
commit 81d92c6 ("[efi] Add EFI image format and basic runtime
environment") first added support for EFI images in iPXE, and was
included on the basis of matching the observed behaviour of the
Microsoft toolchain. There appears to be no requirement for this
padding to exist: EDK2 binaries built with gcc include only 64 bytes
of zero padding, Linux kernel binaries include 66 bytes of non-zero
padding, and wimboot binaries include no padding at all.
Remove the unnecessary padding between the DOS and NT header portions
to minimise the overall size of the PE header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Prepare for the possibility that a record handler may choose not to
consume the entire record by passing the I/O buffer and requiring the
handler to mark consumed data using iob_pull().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As documented in commits 6a004be ("[efi] Support the initrd
autodetection mechanism in newer Linux kernels") and 04e60a2 ("[efi]
Omit EFI_LOAD_FILE2_PROTOCOL for a zero-length initrd"), the choice in
Linux of using a fixed device path requires bootloaders to allow for
the fact that a previous bootloader may have already installed a
handle with the fixed device path.
We currently deal with this situation by reusing the existing handle,
replacing the EFI_LOAD_FILE2_PROTOCOL instance with our own. Simplify
the code by instead uninstalling the EFI_DEVICE_PATH_PROTOCOL instance
from the existing handle (if present), thereby allowing the creation
of a new handle to succeed.
Create the new handle only if we have a non-empty initrd to provide.
This works around bugs in bootloaders such as the systemd EFI stub
that fail to allow for the existence of multiple-bootloader chains.
(The workaround is not comprehensive: if the user has downloaded other
images in iPXE before invoking the systemd Unified Kernel Image (UKI),
then the systemd EFI stub will still crash and burn since it fails to
allow for the fact that a previous bootloader has already installed a
handle with the fixed device path.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Intel I210's packet buffer size registers reset only on power up,
not when a reset signal is asserted. This can lead to the inability
to pass traffic in the event that the DMA TX Maximum Packet Size
(which does reset to its default value on reset) is bigger than the TX
Packet Buffer Size.
For example, an operating system may be using the time sensitive
networking features of the I210 and the registers may be programmed
correctly, but then a reset signal is asserted and iPXE on the next
boot will be unable to use the I210.
Mimic what Linux does and forcibly set the registers to their default
values.
Signed-off-by: Matt Parrella <parrella.matthew@gmail.com>
When a DHCP transaction does not result in the registration of a new
"proxydhcp" or "pxebs" settings block, any existing settings blocks
are currently left unaltered.
This can cause surprising behaviour. For example: when chainloading
iPXE, the "proxydhcp" and "pxebs" settings blocks may be prepopulated
using cached values from the previous PXE bootloader. If iPXE
performs a subsequent DHCP request, then the DHCP or ProxyDHCP servers
may choose to respond differently to iPXE. The response may choose to
omit the ProxyDHCP or PXEBS stages, in which case no new "proxydhcp"
or "pxebs" settings blocks may be registered. This will result in
iPXE using a combination of both old and new DHCP responses.
Fix by assuming that a successful DHCPACK effectively acquires
ownership of the "proxydhcp" and "pxebs" settings blocks, and that any
existing settings blocks should therefore be unregistered.
Reported-by: Henry Tung <htung@palantir.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We unregister script images during their execution, to prevent a
"boot" command from re-executing the containing script. This also has
the side effect of preventing executing scripts from showing up within
the Linux magic initrd image (or the Multiboot module list).
Additional logic in bzimage.c and efi_file.c prevents a currently
executing kernel from showing up within the magic initrd image.
Similar logic in multiboot.c prevents the Multiboot kernel from
showing up as a Multiboot module.
This still leaves some corner cases that are not covered correctly.
For example: when using a gzip-compressed kernel image, nothing will
currently hide the original compressed image from the magic initrd.
Fix by moving the logic that temporarily unregisters the current image
from script_exec() to image_exec(), so that it applies to all image
types, and simplify the magic initrd and Multiboot module list
construction logic on the basis that no further filtering of the
registered image list is necessary.
This change has the side effect of hiding currently executing EFI
images from the virtual filesystem exposed by iPXE. For example, when
using iPXE to boot wimboot, the wimboot binary itself will no longer
be visible within the virtual filesystem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the request parameter mechanism to allow for arbitrary HTTP
headers to be specified via e.g.:
params
param --header Referer http://www.example.com
imgfetch http://192.168.0.1/script.ipxe##params
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Prepare for the parameter mechanism to be generalised to specifying
request parameters that are passed via mechanisms other than an
application/x-www-form-urlencoded form.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An attempt to use an existent but empty form parameter list will
currently result in an invalid POST request since the Content-Length
header will be missing.
Fix by using GET instead of POST if the form parameter list is empty.
This is a non-breaking change (since the current behaviour produces an
invalid request), and simplifies the imminent generalisation of the
parameter list concept to handle both header and form parameters.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the Linux kernel is being used with no initrd, iPXE will still
provide a zero-length initrd.magic file within the virtual filesystem.
As of commit 6a004be ("[efi] Support the initrd autodetection
mechanism in newer Linux kernels"), this zero-length file will also be
exposed via an EFI_LOAD_FILE2_PROTOCOL instance on a handle with a
fixed device path.
The correct handling of zero-length files via EFI_LOAD_FILE2_PROTOCOL
is unfortunately not well defined.
Linux expects the first call to LoadFile() to always fail with
EFI_BUFFER_TOO_SMALL. When the initrd is genuinely zero-length, iPXE
will return success since the buffer is not too small to hold the
(zero-length) file. This causes Linux to immediately report a
spurious EFI_LOAD_ERROR boot failure.
We could change the logic in iPXE's efi_file_load() to always return
EFI_BUFFER_TOO_SMALL if Buffer is NULL on entry. Since the correct
behaviour of LoadFile() in the corner case of a zero-length file is
left undefined by the UEFI specification, this would be permissible.
Unfortunately this approach would not fix the problem. If we return
EFI_BUFFER_TOO_SMALL and set the file length to zero, then Linux will
call the boot services AllocatePages() method with a zero length. In
at least the EDK2 implementation, this combination of parameters will
cause AllocatePages() to return EFI_OUT_OF_RESOURCES, and Linux will
again report a boot failure.
Another approach would be to install the initrd device path handle
only if we have a non-empty initrd to offer. Unfortunately this would
lead to a failure in yet another corner case: if a previous bootloader
has installed an initrd device path handle (e.g. to pass a boot script
to iPXE) then we must not leave that initrd in place, since then our
loaded kernel would end up seeing the wrong initrd content.
The cleanest fix seems to be to ensure that the initrd device path
handle is installed with the EFI_DEVICE_PATH_PROTOCOL instance present
but with the EFI_LOAD_FILE2_PROTOCOL instance absent (and forcibly
uninstalled if necessary), matching the state in which we leave the
handle after uninstalling our virtual filesystem. Linux will then not
find any handle that supports EFI_LOAD_FILE2_PROTOCOL within the fixed
device path, and so will fall through to trying other mechanisms to
locate the initrd.
Reported-by: Chris Bradshaw <cwbshaw@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 7ca801d ("[efi] Use the EFI_RNG_PROTOCOL as an entropy source
if available") added EFI_RNG_PROTOCOL as an alternative entropy source
via an ad-hoc mechanism specific to efi_entropy.c.
Split out EFI_RNG_PROTOCOL to a separate entropy source, and allow the
entropy core to handle the selection of RDRAND, EFI_RNG_PROTOCOL, or
timer ticks as the active source.
The fault detection logic added in commit a87537d ("[efi] Detect and
disable seriously broken EFI_RNG_PROTOCOL implementations") may be
removed completely, since the failure will already be detected by the
generic ANS X9.82-mandated repetition count test and will now be
handled gracefully by the entropy core.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide per-source state variables for the repetition count test and
adaptive proportion test, to allow for the situation in which an
entropy source can be enabled but then fails during the startup tests,
thereby requiring an alternative entropy source to be used.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As noted in commit 3c83843 ("[rng] Check for several functioning RTC
interrupts"), experimentation shows that Hyper-V cannot be trusted to
reliably generate RTC interrupts. (As noted in commit f3ba0fb
("[hyperv] Provide timer based on the 10MHz time reference count
MSR"), Hyper-V appears to suffer from a general problem in reliably
generating any legacy interrupts.) An alternative entropy source is
therefore required for an image that may be used in a Hyper-V Gen1
virtual machine.
The x86 RDRAND instruction provides a suitable alternative entropy
source, but may not be supported by all CPUs. We must therefore allow
for multiple entropy sources to be compiled in, with the single active
entropy source selected only at runtime.
Restructure the internal entropy API to allow a working entropy source
to be detected and chosen at runtime.
Enable the RDRAND entropy source for all x86 builds, since it is
likely to be substantially faster than any other source.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently specify only the iSCSI default value for MaxBurstLength
and ignore any negotiated value, since our internal block device API
allows only for receiving directly into caller-allocated buffers and
so we have no intrinsic limit on burst length.
A conscientious target may however refuse to attempt a transfer that
we request for a number of blocks that would exceed the negotiated
maximum burst length.
Fix by recording the negotiated maximum burst length and using it to
limit the maximum number of blocks per transfer as reported by the
SCSI layer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Linux 5.7 added the ability to autodetect an initrd by searching for a
handle via a fixed vendor-specific "Linux initrd device path" and then
locating and using the EFI_LOAD_FILE2_PROTOCOL instance on that
handle.
This maps quite naturally onto our existing concept of a "magic
initrd" as introduced for EFI in commit e5f0255 ("[efi] Provide an
"initrd.magic" file for use by UEFI kernels").
Add an EFI_LOAD_FILE2_PROTOCOL instance to our EFI virtual files
(backed by simply calling the existing EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
method to read from the file), and install the protocol instance for
the "initrd.magic" virtual file onto a new device handle that also
provides the Linux initrd device path.
The design choice in Linux of using a single fixed device path makes
this unfortunately messy to support, since device paths must be unique
within a system. When multiple bootloaders are used (e.g. GRUB
loading iPXE loading Linux) then only one bootloader can ever install
the device path onto a handle. Subsequent bootloaders must locate the
existing handle and replace the load file protocol instance with their
own.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Show the requested range when a caller reads from a virtual file via
the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL interface.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Linux kernel bzImage image format and the CPIO archive constructor
will parse the image command line for certain arguments of the form
"key=value". This parsing is currently implemented using strstr() in
a way that can cause a false positive suffix match. For example, a
command line containing "highmem=<n>" would erroneously be treated as
containing a value for "mem=<n>".
Fix by centralising the logic used for parsing such arguments, and
including a check that the argument immediately follows a whitespace
delimiter (or is at the start of the string).
Reported-by: Filippo Giunchedi <filippo@esaurito.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 74222cd ("[rng] Check for functioning RTC interrupt") added a
check that the RTC is capable of generating interrupts via the legacy
PIC, since this mechanism appears to be broken in some Hyper-V virtual
machines.
Experimentation shows that the RTC is sometimes capable of generating
a single interrupt, but will then generate no subsequent interrupts.
This currently causes rtc_entropy_check() to falsely detect that the
entropy gathering mechanism is functional.
Fix by checking for several RTC interrupts before declaring that it is
a functional entropy source.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EISA expansion slot I/O port addresses overlap space that may be
assigned to PCI devices, which can lead to register reads and writes
with unwanted side effects during EISA probing.
Reduce the chances of performing EISA probing on PCI devices by
probing EISA slot vendor and product ID registers only if the EISA
system board vendor ID register indicates that the motherboard
supports EISA.
Debugged-by: Václav Ovsík <vaclav.ovsik@gmail.com>
Tested-by: Václav Ovsík <vaclav.ovsik@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building a LoongArch64 Linux userspace binary.
Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The test suites for the various architectures are often run back to
back, and there is currently nothing to visually distinguish one test
run from another.
Include the architecture name within the self-test startup banner, to
aid in visual identification of test results.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Speed up the "Install packages" step for each CI run by caching the
downloaded packages in /var/cache/apt.
Do not include libc6-dbg:i386 within the cache, since apt seems to
complain if asked to download both gcc-aarch64-linux-gnu and
libc6-dbg:i386 at the same time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PAGE_SHIFT definition is an architectural property, rather than an
aspect of a particular I/O API implementation (of which, in theory,
there may be more than one per architecture).
Reflect this by moving the definition to the top-level bits/io.h for
each architecture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Over the years, the undocumented operand modifier used to produce the
unprefixed constant values in __einfo_error() has varied from "%c0" to
"%a0" in commit 1a77466 ("[build] Fix use of inline assembly on GCC
4.8 ARM64 builds") and back to "%c0" in commit 3fb3ffc ("[build] Fix
use of inline assembly on GCC 8 ARM64 builds"), according to the
evolving demands of the toolchain.
LoongArch64 suffers from a similar issue: GCC 13 will allow either,
but the currently released GCC 12 allows only the "%a0" form.
Introduce a macro ASM_NO_PREFIX, defined in bits/compiler.h, to
abstract away this difference and allow different architectures to use
different operand modifiers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Xen headers support only x86 and ARM. Allow for platforms such as
LoongArch64 to build despite the absence of Xen support by providing
an architecture-specific <bits/xen.h> that simply does:
#ifndef _BITS_XEN_H
#define _BITS_XEN_H
#include <ipxe/nonxen.h>
#endif /* _BITS_XEN_H */
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for recording LLDP packets and exposing TLV values via the
settings mechanism. LLDP settings are encoded as
${netX.lldp/<prefix>.<type>.<index>.<offset>.<length>}
where
<type> is the TLV type
<offset> is the starting offset within the TLV value
<length> is the length (or zero to read the from <offset> to the end)
<prefix>, if it has a non-zero value, is the subtype byte string of
length <offset> to match at the start of the TLV value, up to a
maximum matched length of 4 bytes
<index> is the index of the entry matching <type> and <prefix> to be
accessed, with zero indicating the first matching entry
The <prefix> is designed to accommodate both matching of the OUI
within an organization-specific TLV (e.g. 0x0080c2 for IEEE 802.1
TLVs) and of a subtype byte as found within many TLVs.
This encoding allows most LLDP values to be extracted easily. For
example
System name: ${netX.lldp/5.0.0.0:string}
System description: ${netX.lldp/6.0.0.0:string}
Port description: ${netX.lldp/4.0.0.0:string}
Port interface name: ${netX.lldp/5.2.0.1.0:string}
Chassis MAC address: ${netX.lldp/4.1.0.1.0:hex}
Management IPv4 address: ${netX.lldp/5.1.8.0.2.4:ipv4}
Port VLAN ID: ${netX.lldp/0x0080c2.1.127.0.4.2:int16}
Port VLAN name: ${netX.lldp/0x0080c2.3.127.0.7.0:string}
Maximum frame size: ${netX.lldp/0x00120f.4.127.0.4.2:uint16}
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC 2131 leaves undefined the behaviour of the client in response to a
DHCPNAK that comes from a server other than the selected DHCP server.
A substantial amount of online documentation suggests using multiple
independent DHCP servers with non-overlapping ranges in the same
subnet in order to provide some minimal redundancy. Experimentation
shows that in this setup, at least ISC dhcpd will send a DHCPNAK in
response to the client's DHCPREQUEST for an address that is not within
the range defined on that server. (Since the requested address does
lie within the subnet defined on that server, this will happen
regardless of the "authoritative" parameter.) The client will
therefore receive a DHCPACK from the selected DHCP server along with
one or more DHCPNAKs from each of the non-selected DHCP servers.
Filter out responses from non-selected DHCP servers before checking
for a DHCPNAK, so that these arguably spurious DHCPNAKs will not cause
iPXE to return to the discovery state.
Continue to check for DHCPNAK before filtering out responses for
non-selected lease addresses, since experimentation shows that the
DHCPNAK will usually have an empty yiaddr field.
Reported-by: Anders Blomdell <anders.blomdell@control.lth.se>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "bridge" driver introduced in 3aa6b79 ("[pci] Add minimal PCI
bridge driver") is required only for BIOS builds using the ENA driver,
where experimentation shows that we cannot rely on the BIOS to fully
assign MMIO addresses.
Since the driver is a valid PCI driver, it will end up binding to all
PCI bridge devices even on a UEFI platform, where the firmware is
likely to have completed MMIO address assignment correctly. This has
no impact on most systems since there is generally no UEFI driver for
PCI bridges: the enumeration of the whole PCI bus is handled by the
PciBusDxe driver bound to the root bridge.
Experimentation shows that at least one laptop will freeze at the
point that iPXE attempts to bind to the bridge device. No deeper
investigation has been carried out to find the root cause.
Fix by causing efipci_supported() to return an error unless the
configuration space header type indicates a non-bridge device.
Reported-by: Marcel Petersen <mp@sbe.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Try loading the autoexec.ipxe script first from the directory
containing the iPXE binary (based on the relative file path provided
to us via EFI_LOADED_IMAGE_PROTOCOL), then fall back to trying the
root directory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some cards seem to have the receive VLAN tag stripping feature enabled
by default, which causes received VLAN packets to be misinterpreted as
being received by the trunk device.
Fix by disabling VLAN tag stripping in the C+ Command Register.
Debugged-by: Xinming Lai <yiyihu@gmail.com>
Tested-by: Xinming Lai <yiyihu@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update to pick up the upstream commit bda715b ("MdePkg: Fix UINT64 and
INT64 word length for LoongArch64").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The self-test suite does not currently ever attempt to sleep the CPU.
This is an operation that may fail (e.g. by attempting to execute a
privileged instruction while running as a Linux userspace binary, or
by halting the CPU with all interrupts disabled).
Add a trivial self-test to exercise the ability to sleep the CPU
without crashing or halting forever.
Inspired-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Treat a command line passed to iPXE via UEFI LoadOptions as an image
to be registered at startup, as is already done for the .lkrn, .pxe,
and .exe BIOS images.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The obsolete ConsoleControl.h header is no longer present in the
current EDK2 codebase, but is still required for interoperability with
old iMacs.
Add an iPXE include guard to this file so that the EDK2 header import
script will no longer attempt to import it from the EDK2 tree.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The IntelFrameworkPkg and EdkCompatibilityPkg directories have been
removed from the EDK2 codebase. Remove these directories from the
EDK2 header import script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with util/elf2efi32 and util/elf2efi64 in commit a99e435 ("[efi] Do
not rely on ProcessorBind.h when building host binaries"), build
util/efirom without using any architecture-specific EDK2 headers since
the build host's CPU architecture may not be supported by EDK2.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current maximum window size of 256kB was calculated based on rough
link bandwidth and RTT measurements taken in 2012, and is too small to
avoid filling the TCP window on some modern links.
Update the list of typical link bandwidth and RTT figures to reflect
the modern world, and increase the maximum window size accordingly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PXE UDP receive queue may grow without limit if the PXE NBP does
not call PXENV_UDP_READ sufficiently frequently.
Fix by implementing a cache discarder for received PXE UDP packets
(similar to the TCP cache discarder).
Reported-by: Tal Shorer <shorer@amazon.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many consoles will scroll immediately upon drawing a character in the
rightmost column of the bottom row of the display, in order to be able
to advance the cursor to the next character (even if the cursor is
disabled).
This causes PXE menus to display incorrectly. Specifically, pressing
the down arrow key while already on the last menu item may cause the
whole screen to scroll and the line to be duplicated.
Fix by moving the PXE menu one row up from the bottom of the screen.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI has the mildly annoying habit of installing copies of the
EFI_SIMPLE_NETWORK_PROTOCOL instance on the IPv4 and IPv6 child device
handles. This can cause iPXE's SNP driver to attempt to bind to a
copy of the EFI_SIMPLE_NETWORK_PROTOCOL that iPXE itself provided on a
different handle.
Fix by refusing to bind to an SNP (or NII) handle if there exists
another instance of the same protocol further up the device path (on
the basis that we always want to bind to the highest possible device).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the functionality of efi_locate_device() to allow callers to
find instances of the protocol that may exist further up the device
path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of the 32-bit ARM linker seem to treat the absence of a
.note.GNU-stack section as implying an executable stack, and will
print a warning that this is deprecated behaviour.
Silence the warning by adding a .note.GNU-stack section to each
assembly file and retaining the sections in the Linux linker script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI ABI requires the use of -mfloat-abi=soft, but other platforms
may require -mfloat-abi=hard.
Allow for this by using -mfloat-abi=soft only for EFI builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI ABI requires the use of -fno-short-enums, and the EDK2 headers
will perform a compile-time check that enums are 32 bits.
The EDK2 headers may be included even in builds for non-EFI platforms,
and so the -fno-short-enums flag must be used in all 32-bit ARM
builds. Fortunately, nothing else currently cares about enum sizes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building as a Linux userspace binary for AArch64.
This allows the self-test suite to be more easily run for the 64-bit
ARM code. For example:
# On a native AArch64 system:
#
make bin-arm64-efi/tests.linux && ./bin-arm64-efi/tests.linux
# On a non-AArch64 system (e.g. x86_64) via cross-compilation,
# assuming that kernel and glibc headers are present within
# /usr/aarch64-linux-gnu/sys-root/:
#
make bin-arm64-linux/tests.linux CROSS=aarch64-linux-gnu- && \
qemu-aarch64 -L /usr/aarch64-linux-gnu/sys-root/ \
./bin-arm64-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move the platform-specific DHCP client architecture definitions to
header files of the form <ipxe/$(PLATFORM)/dhcparch.h>. This
simplifies the directory structure and allows the otherwise unused
arch/$(ARCH)/include/$(PLATFORM) to be removed from the include
directory search path, which avoids the confusing situation in which a
header file may potentially be accessed through more than one path.
For Linux userspace binaries on any architecture, use the EFI values
for that architecture by delegating to the EFI header file. This
avoids the need to explicitly select values for Linux userspace
binaries for each architecture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The requirement to undo the implicit "-Dlinux" is not specific to the
x86 architecture. Move this out of the x86-specific Makefile.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce duplication between i386 and x86_64 by providing a single
shared linker script that both architectures can include.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We cannot rely on the EDK2 ProcessorBind.h headers when compiling a
binary for execution on the build host itself (e.g. elf2efi), since
the host's CPU architecture may not even be supported by EDK2.
Fix by skipping ProcessorBind.h when building a host binary, and
defining the bare minimum required to allow other EDK2 headers to
compile cleanly.
Reported-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently don't allocate an Asynchronous Event Notification Queue
(AENQ) because we don't actually care about any of the events that may
come in.
The ENA firmware found on Graviton instances requires the AENQ to
exist, otherwise all admin queue commands will fail.
Fix by allocating an AENQ and disabling all events (so that we do not
need to include code to acknowledge any events that may arrive).
Signed-off-by: Alexander Graf <graf@amazon.com>
Ensure that the "${netX/...}" settings mechanism always uses the same
interpretation of the network device corresponding to "netX" as any
other mechanism that performs a name-based lookup of a network device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading iPXE from an EFI VLAN device, configure the
corresponding iPXE VLAN device to be created automatically.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the ability to automatically create a VLAN device for a specified
trunk device link-layer address and VLAN tag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading iPXE from a VLAN device, the MAC address of the
loaded image's device handle will match the MAC address of the trunk
device created by iPXE, and the autoboot process will then erroneously
consider the trunk device to be an autoboot device.
Fix by recording the VLAN tag along with the MAC address, and treating
the VLAN tag as part of the filter used to match the MAC address
against candidate network devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many laptops now include the ability to specify a "system-specific MAC
address" (also known as "pass-through MAC"), which is supposed to be
used for both the onboard NIC and for any attached docking station or
other USB NIC. This is intended to simplify interoperability with
software or hardware that relies on a MAC address to recognise an
individual machine: for example, a deployment server may associate the
MAC address with a particular operating system image to be deployed.
This therefore creates legitimate situations in which duplicate MAC
addresses may exist within the same system.
As described in commit 98d09a1 ("[netdevice] Avoid registering
duplicate network devices"), the Xen netfront driver relies on the
rejection of duplicate MAC addresses in order to inhibit registration
of the emulated PCI devices that a Xen PV-HVM guest will create to
shadow each of the paravirtual network devices.
Move the code that rejects duplicate MAC addresses from the network
device core to the Xen netfront driver, to allow for the existence of
duplicate MAC addresses in non-Xen setups.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The network device index currently serves two purposes: acting as a
sequential index for network device names ("net0", "net1", etc), and
acting as an opaque unique integer identifier used in socket address
scope IDs.
There is no particular need for these usages to be linked, and it can
lead to situations in which devices are named unexpectedly. For
example: if a system has two network devices "net0" and "net1", a VLAN
is created as "net1-42", and then a USB NIC is connected, then the USB
NIC will be named "net3" rather than the expected "net2" since the
VLAN device "net1-42" will have consumed an index.
Separate the usages: rename the "index" field to "scope_id" (matching
its one and only use case), and assign the name without reference to
the scope ID by finding the first unused name. For consistency,
assign the scope ID by similarly finding the first unused scope ID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UNDI drivers (such as the AMI UsbNetworkPkg currently in the
process of being upstreamed into EDK2) have a bug that will prevent
any packets from being received unless at least one attempt has been
made to disable some receive filters.
Work around these buggy drivers by attempting to disable receive
filters before enabling them. Ignore any errors, since we genuinely
do not care whether or not the disabling succeeds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently free an unclaimed cached DHCPACK immediately after
startup, in order to free up memory. This prevents the cached DHCPACK
from being applied to a device that is created after startup, such as
a VLAN device created via the "vcreate" command.
Retain any unclaimed DHCPACK after startup to allow it to be matched
against (and applied to) any device that gets created at runtime.
Free the DHCPACK during shutdown if it still remains unclaimed, in
order to exit with memory cleanly freed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading iPXE from a VLAN device, the MAC address within the
cached DHCPACK will match the MAC address of the trunk device created
by iPXE, and the cached DHCPACK will then end up being erroneously
applied to the trunk device. This tends to break outbound IPv4
routing, since both the trunk and VLAN devices will have the same
assigned IPv4 address.
Fix by recording the VLAN tag along with the cached DHCPACK, and
treating the VLAN tag as part of the filter used to match the cached
DHCPACK against candidate network devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no API for determining the VLAN tag (if any) for a
specified device handle. There is the EFI_VLAN_CONFIG_PROTOCOL, but
that exists only on the trunk device handle (not on the VLAN device
handle), and provides no way to match VLAN tags against the trunk
device's child device handles.
The EDK2 codebase seems to rely solely on the device path to determine
the VLAN tag for a specified device handle: both NetLibGetVlanId() and
BmGetNetworkDescription() will parse the device path to search for a
VLAN_DEVICE_PATH component.
Add efi_path_vlan() which uses the same device path parsing logic to
determine the VLAN tag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a single central implementation of the logic for stepping
through elements of an EFI device path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI implements VLAN support within the Managed Network Protocol (MNP)
driver, which may create child VLAN devices automatically based on
stored UEFI variables. These child devices do not themselves provide
a raw-packet interface via EFI_SIMPLE_NETWORK_PROTOCOL, and may be
consumed only via the EFI_MANAGED_NETWORK_PROTOCOL interface.
The device paths constructed for these child devices may conflict with
those for the EFI_SIMPLE_NETWORK_PROTOCOL instances that iPXE attempts
to install for its own VLAN devices. The upshot is that creating an
iPXE VLAN device (e.g. via the "vcreate" command) will fail if the
UEFI Managed Network Protocol has already created a device for the
same VLAN tag.
Fix by providing our own EFI_VLAN_CONFIG_PROTOCOL instance on the same
device handle as EFI_SIMPLE_NETWORK_PROTOCOL. This causes the MNP
driver to treat iPXE's device as supporting hardware VLAN offload, and
it will therefore not attempt to install its own instance of the
protocol.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The dangling pointer warning introduced in GCC 12 reports false
positives that result in build failures. In particular, storing the
address of a local code label used to record the current state of a
state machine (as done in crypto/deflate.c) is reported as an error.
There seems to be no way to mark the pointer type as being permitted
to hold such a value, so unconditionally disable the warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The array bounds checker on GCC 12 and newer reports a very large
number of false positives that result in build failures. In
particular, accesses through pointers to zero-length arrays (such as
those used by the linker table mechanism in include/ipxe/tables.h) are
reported as errors, contrary to the GCC documentation.
Work around this GCC issue by unconditionally disabling the warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The behaviour of PCI devices across a function-level reset seems to be
inconsistent in practice: some devices will preserve PCI BARs, some
will not.
Fix the behaviour of FLR on devices that do not preserve PCI BARs by
backing up and restoring PCI configuration space across the reset.
Preserve only the standard portion of the configuration space, since
there may be registers with unexpected side effects in the remaining
non-standardised space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS relies upon the ability of ciphers to perform in-place decryption,
in order to avoid allocating additional I/O buffers for received data.
Add verification of in-place encryption and decryption to the cipher
self-tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The hash calculation is currently performed incorrectly when
decrypting in place, since the ciphertext will have been overwritten
with the plaintext before being used to update the hash value.
Restructure the code to allow for in-place encryption and decryption.
Choose to optimise for the decryption case, since we are likely to
decrypt much more data than we encrypt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS relies upon the ability to reuse a cipher by resetting only the
initialisation vector while reusing the existing key.
Add verification of resetting the initialisation vector to the cipher
self-tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reset the accumulated authentication state when cipher_setiv() is
called, to allow the cipher to be reused without resetting the key.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All existing cipher suites use SHA-256 as the TLSv1.2 and above
handshake digest algorithm (even when using SHA-1 as the MAC digest
algorithm). Some GCM cipher suites use SHA-384 as the handshake
digest algorithm.
Allow the cipher suite to specify the handshake (and PRF) digest
algorithm to be used for TLSv1.2 and above.
This requires some restructuring to allow for the fact that the
ClientHello message must be included within the handshake digest, even
though the relevant digest algorithm is not yet known at the point
that the ClientHello is sent. Fortunately, the ClientHello may be
reproduced verbatim at the point of receiving the ServerHello, so we
rely on reconstructing (rather than storing) this message.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Always send the maximum supported version in our ClientHello message,
even when performing renegotiation (in which case the current version
may already be lower than the maximum supported version).
This is permitted by the specification, and allows the ClientHello to
be reconstructed verbatim at the point of selecting the handshake
digest algorithm in tls_new_server_hello().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for AEAD cipher suites where the MAC length may be zero and the
authentication is instead provided by an authenticating cipher, with
the plaintext authentication tag appended to the ciphertext.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Harden against padding oracle attacks by treating invalid block
padding as zero length padding, thereby deferring the failure until
after computing the (incorrect) MAC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Restructure the encryption and decryption operations to allow for the
use of ciphers where the initialisation vector is constructed by
concatenating the fixed IV (derived as part of key expansion) with a
record IV (prepended to the ciphertext).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS stream and block ciphers use a MAC with a length equal to the
output length of the digest algorithm in use. For AEAD ciphers there
is no MAC, with the equivalent functionality provided by the cipher
algorithm's authentication tag.
Allow for the existence of AEAD cipher suites by making the MAC length
a parameter of the cipher suite.
Assume that the MAC key length is equal to the MAC length, since this
is true for all currently supported cipher suites.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All TLS cipher types use a common structure for the per-record data
that is authenticated in addition to the plaintext itself. This data
is used as a prefix in the HMAC calculation for stream and block
ciphers, or as additional authenticated data for AEAD ciphers.
Define a "TLS authentication header" structure to hold this data as a
contiguous block, in order to meet the alignment requirement for AEAD
ciphers such as GCM.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Adjust the length of the first received ciphertext data buffer to
ensure that all decryption operations respect the cipher's alignment
size.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The GCM cipher mode of operation (in common with other counter-based
modes of operation) has a notion of blocksize that does not neatly
fall into our current abstraction: it does operate in 16-byte blocks
but allows for an arbitrary overall data length (i.e. the final block
may be incomplete).
Model this by adding a concept of alignment size. Each call to
encrypt() or decrypt() must begin at a multiple of the alignment size
from the start of the data stream. This allows us to model GCM by
using a block size of 1 byte and an alignment size of 16 bytes.
As a side benefit, this same concept allows us to neatly model the
fact that raw AES can encrypt only a single 16-byte block, by
specifying an alignment size of zero on this cipher.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS block ciphers always use CBC (as per RFC 5246 section 6.2.3.2)
with a record initialisation vector length that is equal to the cipher
block size, and no fixed initialisation vector.
The initialisation vector for AEAD ciphers such as GCM is less
straightforward, and requires both a fixed and per-record component.
Extend the definition of a cipher suite to include fixed and record
initialisation vector lengths, and generate the fixed portion (if any)
as part of key expansion.
Do not add explicit calls to cipher_setiv() in tls_assemble_block()
and tls_split_block(), since the constraints imposed by RFC 5246 are
specifically chosen to allow implementations to avoid doing so.
(Instead, add a sanity check that the record initialisation vector
length is equal to the cipher block size.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TLSv1.0 protocol was deprecated by RFC 8996 (along with TLSv1.1),
and has been disabled by default in iPXE since commit dc785b0fb
("[tls] Default to supporting only TLSv1.1 or above") in June 2020.
While there is value in continuing to support older protocols for
interoperability with older server appliances, the additional
complexity of supporting the implicit initialisation vector for
TLSv1.0 is not worth the cost.
Remove support for the obsolete TLSv1.0 protocol, to reduce complexity
of the implementation and simplify ongoing maintenance.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DMA mapping is performed implicitly as part of the call to
dma_alloc(). The current implementation creates the IOMMU mapping for
the allocated and potentially uninitialised data before returning to
the caller (which will immediately zero out or otherwise initialise
the buffer). This leaves a small window within which a malicious PCI
device could potentially attempt to retrieve firmware-owned secrets
present in the uninitialised buffer. (Note that the hypothetically
malicious PCI device has no viable way to know the address of the
buffer from which to attempt a DMA read, rendering the attack
extremely implausible.)
Guard against any such hypothetical attacks by zeroing out the
allocated buffer prior to creating the coherent DMA mapping.
Suggested-by: Mateusz Siwiec <Mateusz.Siwiec@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
bzimage_parse_cmdline() uses strcmp() to identify the named "vga=..."
kernel command line option values, which will give a false negative if
the option is not last on the command line.
Fix by temporarily changing the relevant command line separator (if
any) to a NUL terminator.
Debugged-by: Simon Rettberg <simon.rettberg@rz.uni-freiburg.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some ciphers (such as GCM) support the concept of a tag that can be
used to authenticate the encrypted data. Add a cipher method for
generating an authentication tag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some ciphers (such as GCM) support the concept of additional
authenticated data, which does not appear in the ciphertext but may
affect the operation of the cipher.
Allow cipher_encrypt() and cipher_decrypt() to be called with a NULL
destination buffer in order to pass additional data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Accept and record the ServerKeyExchange record, which is required for
key exchange mechanisms such as Ephemeral Diffie-Hellman (DHE).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The pre-master secret is currently constructed at the time of
instantiating the TLS connection. This precludes the use of key
exchange mechanisms such as Ephemeral Diffie-Hellman (DHE), which
require a ServerKeyExchange message to exchange additional key
material before the pre-master secret can be constructed.
Allow for the use of such cipher suites by deferring generation of the
master secret until the point of sending the ClientKeyExchange
message.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The master secret is currently constructed upon receiving the
ServerHello message. This precludes the use of key exchange
mechanisms such as Ephemeral Diffie-Hellman (DHE), which require a
ServerKeyExchange message to exchange additional key material before
the pre-master secret and master secret can be constructed.
Allow for the use of such cipher suites by deferring generation of the
master secret until the point of sending the ClientKeyExchange
message.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an implementation of the Ephemeral Diffie-Hellman key exchange
algorithm as defined in RFC2631, with test vectors taken from the NIST
Cryptographic Toolkit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the internal HMAC API so that the key is provided only at the
point of calling hmac_init(), and the (potentially reduced) key is
stored as part of the context for later use by hmac_final().
This simplifies the calling code, and avoids the need for callers such
as TLS to allocate a potentially variable length block in order to
retain a copy of the unmodified key.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The HMAC code is already tested indirectly via several consuming
algorithms that themselves provide self-tests (e.g. HMAC-DRBG, NTLM
authentication, and PeerDist content identification), but lacks any
direct test vectors.
Add explicit HMAC tests and ensure that corner cases such as empty
keys, block-length keys, and over-length keys are all covered.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes in AWS EC2 (observed with a c6i.metal instance in
eu-west-2) will fail to assign an MMIO address to the ENA device,
which causes ioremap() to fail.
Experiments show that the ENA device is the only device behind its
bridge, even when multiple ENA devices are present, and that the BIOS
does assign a memory window to the bridge.
We may therefore choose to assign the device an MMIO address at the
start of the bridge's memory window.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a minimal driver for PCI bridges that can be used to locate the
bridge to which a PCI device is attached.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Pretty much all physical machines and off-the-shelf virtual machines
will provide a functional PCI BIOS. We therefore default to using
only the PCI BIOS, with no fallback to an alternative mechanism if the
PCI BIOS fails.
AWS EC2 provides the opportunity to experience some exceptions to this
rule. For example, the t3a.nano instances in eu-west-1 have no
functional PCI BIOS at all. As of commit 83516ba ("[cloud] Use
PCIAPI_DIRECT for cloud images") we therefore use direct Type 1
configuration space accesses in the images built and published for use
in the cloud.
Recent experience has discovered yet more variation in AWS EC2
instances. For example, some of the metal instance types have
multiple PCI host bridges and the direct Type 1 accesses therefore
see only a subset of the PCI devices.
Attempt to accommodate future such variations by making the PCI I/O
API selectable at runtime and choosing ECAM (if available), falling
back to the PCI BIOS (if available), then finally falling back to
direct Type 1 accesses.
This is implemented as a dedicated PCIAPI_CLOUD API, rather than by
having the PCI core select a suitable API at runtime (as was done for
timers in commit 302f1ee ("[time] Allow timer to be selected at
runtime"). The common case will remain that only the PCI BIOS API is
required, and we would prefer to retain the optimisations that come
from inlining the configuration space accesses in this common case.
Cloud images are (at present) disk images rather than ROM images, and
so the increased code size required for this design approach in the
PCIAPI_CLOUD case is acceptable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow pcibios_discover() to return an empty range if the INT 1A,B101
PCI BIOS installation check call fails.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ACPI MCFG table describes a direct mapping of PCI configuration
space into MMIO space. This mapping allows access to extended
configuration space (up to 4096 bytes) and also provides for the
existence of multiple host bridges.
Add support for the ECAM mechanism described by the ACPI MCFG table,
as a selectable PCI I/O API alongside the existing PCI BIOS and Type 1
mechanisms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow pci_find_next() to discover devices beyond the first PCI
segment, by generalising pci_num_bus() (which implicitly assumes that
there is only a single PCI segment) with pci_discover() (which has the
ability to return an arbitrary contiguous chunk of PCI bus:dev.fn
address space).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The semantics of the bus:dev.fn parameter passed to pci_find_next()
are "find the first existent PCI device at this address or higher",
with the caller expected to increment the address between finding
devices. This does not allow the parameter to distinguish between the
two cases "start from address zero" and "wrapped after incrementing
maximal possible address", which could therefore lead to an infinite
loop in the degenerate case that a device with address ffff:ff:1f.7
really exists.
Fix by checking for wraparound in the caller (which is already
responsible for performing the increment).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Separate the return status code from the returned PCI bus:dev.fn
address, in order to allow pci_find_next() to be used to find devices
with a non-zero PCI segment number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Experience suggests that routers are often misconfigured to advertise
SLAAC even on prefixes that do not have a SLAAC-compatible prefix
length. iPXE will currently treat this as an error, resulting in the
prefix being ignored completely.
Handle this misconfiguration by ignoring the autonomous address flag
when the prefix length is unsuitable for SLAAC.
Reported-by: Malte Janduda <mail@janduda.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of the ENA hardware (observed on a c6i.large instance in
eu-west-2) seem to require a receive ring containing at least 128
entries: any smaller ring will never see receive completions or will
stall after the first few completions.
Increase the receive ring size to 128 entries (determined empirically)
for compatibility with these hardware versions. Limit the receive
ring fill level to 16 (as at present) to avoid consuming more memory
than will typically be available in the internal heap.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) seem to require a host information page, without which the
CREATE_CQ command will fail with ENA_ADMIN_UNKNOWN_ERROR.
These firmware versions also seem to require us to claim that we are a
Linux kernel with a specific driver major version number. This
appears to be a firmware bug, as revealed by Linux kernel commit
1a63443af ("net/amazon: Ensure that driver version is aligned to the
linux kernel"): this commit changed the value of the driver version
number field to be the Linux kernel version, and was hastily reverted
in commit 92040c6da ("net: ena: fix broken interface between ENA
driver and FW") which clarified that the version number field does
actually have some undocumented significance to some versions of the
firmware.
Fix by providing a host information page via the SET_FEATURE command,
incorporating the apparently necessary lies about our identity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of the ENA firmware (observed on a c6i.large instance in
eu-west-2) will complain if the completion queue's MSI-X vector field
is left empty, even though the queue configuration specifies that
interrupts are not used.
Work around these firmware versions by passing in what appears to be
the magic "no MSI-X vector" value in this field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ENA data path design has separate submission and completion
queues. Submission queues must be refilled in strict order (since
there is only a single linear tail pointer used to communicate the
existence of new entries to the hardware), and completion queue
entries include a request identifier copied verbatim from the
submission queue entry. Once the submission queue doorbell has been
rung, software never again reads from the submission queue entry and
nothing ever needs to write back to the submission queue entry since
completions are reported via the separate completion queue.
This design allows the hardware to complete submission queue entries
out of order, provided that it internally caches at least as many
entries as it leaves gaps.
Record and identify I/O buffers by request identifier (using a
circular ring buffer of unique request identifiers), and remove the
assumption that submission queue entries will be completed in order.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The CREATE_CQ command is permitted to return a size smaller than
requested, which could leave us in a situation where the completion
queue could overflow.
Avoid overflow by limiting the submission queue fill level to the
actual size of the completion queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Current versions of the E810 PF driver fail to set the number of
in-use queue pairs in response to the CONFIG_VSI_QUEUES message. When
the number of in-use queue pairs is less than the number of available
queue pairs, this results in some packets being directed to
nonexistent receive queues and hence silently dropped.
Work around this PF driver bug by explicitly configuring the number of
available queue pairs via the REQUEST_QUEUES message. This message
triggers a VF reset that, in turn, requires us to reopen the admin
queue and issue an additional GET_RESOURCES message to restore the VF
to a functional state.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RESET_VF admin queue command does not complete via the usual
mechanism, but instead requires us to poll registers to wait for the
reset to take effect and then reopen the admin queue.
Allow for the existence of other admin queue commands that also
trigger a VF reset, by separating out the logic that waits for the
reset to complete.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Negotiate API version 1.1 in order to allow access to virtual function
opcodes that are disallowed by default on the E810.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a driver for the E810 family of 100 Gigabit Ethernet NICs. The
core datapath is identical to that of the 40 Gigabit XL710, and this
part of the code is shared between both drivers. The admin queue
mechanism is sufficiently similar to make it worth reusing substantial
portions of the code, with separate implementations for several
commands to handle the (unnecessarily) breaking changes in data
structure layouts. The major differences are in the mechanisms for
programming queue contexts (where the E810 abandons TX/RX symmetry)
and for configuring the transmit scheduler and receive filters: these
portions are sufficiently different to justify a separate driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove knowledge of the PRTGL_SA[HL] registers, and instead use the
admin queue to set the MAC address and maximum frame size.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove knowledge of the PRTPM_SA[HL] registers, and instead use the
admin queue to retrieve the MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the MAC address to be fetched using an admin queue command,
instead of reading the PRTPM_SA[HL] registers directly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PRTGL_SAH register contains the current maximum frame size, and is
not guaranteed on reset to contain the actual maximum frame size
supported by the hardware, which the datasheet specifies as 9728 bytes
(including the 4-byte CRC).
Set the maximum packet size to a hardcoded 9728 bytes instead of
reading from the PRTGL_SAH register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove knowledge of the GLLAN_RCTL_0 register (which changes location
between the XL810 and E810 register maps), and instead unconditionally
issue the "clear PXE mode" command with the EEXIST error silenced.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "clear PXE mode" admin queue command will return an EEXIST error
if the device is already in non-PXE mode, but there is no other admin
queue command that can be used to determine whether the device has
already been switched into non-PXE mode.
Provide a mechanism to allow expected errors from a command to be
silenced, to allow the "clear PXE mode" command to be cleanly used
without needing to first check the GLLAN_RCTL_0 register value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At least one E810 admin queue command (Query Default Scheduling Tree
Topology) insists upon being provided with a 4kB data buffer, even
when the data to be returned is much smaller.
Work around this requirement by increasing the admin queue data buffer
size to 4kB.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move knowledge of the virtual function data structures and admin
command definitions from intelxl.h to intelxlvf.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the large static admin data buffer structure embedded within
struct intelxl_nic, and instead copy the response received via the
"send to VF" admin queue event to the (already consumed and completed)
admin command descriptor and data buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical and virtual function drivers each care about precisely
one admin queue event type. Simplify event handling by using a
per-driver callback instead of the existing weak function symbol.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PCI device ID 8086:1889 is for the Intel Ethernet Adaptive Virtual
Function, which is a generic virtual function that can be exposed by
different generations of Intel hardware.
Rename the PCI ID from "xl710-vf-ad" to "iavf" to reflect that the
driver is not XL710-specific.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The E810 requires that receive descriptor rings have at least 64
entries (and are a multiple of 32 entries).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Do not attempt to use the admin commands to get the firmware version
and report the driver version for the virtual function driver, since
these will be rejected by the E810 firmware as invalid commands when
issued by a virtual function. Instead, use the mailbox interface to
negotiate the API version with the physical function driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The 100 Gigabit physical function driver requires a virtual function
driver to request that transmit and receive queues are mapped to MSI-X
vector 1 or higher.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The second parameter to intelxlvf_admin_queues() is a boolean used to
select the VF opcode, rather than the raw VF opcode itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove knowledge of the PFGEN_CTRL register (which changes location
between XL710 and E810 register maps), and instead use PCIe FLR to
reset the physical function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the tail register offset (which exists for all ring types) as the
ring identifier in all relevant debug messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For the sake of completeness, ensure that all 32 bytes of the receive
queue context are programmed (including the unused final 8 bytes).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 8f3e648 ("[intelxl] Use one admin queue buffer per admin queue
descriptor") changed the API for intelxl_admin_command() such that the
caller now constructs the command directly within the next available
descriptor ring entry, rather than relying on intelxl_admin_command()
to copy the descriptor to and from the descriptor ring.
This introduced a regression in intelxl_admin_switch(), since the
second and subsequent iterations of the loop will not have constructed
a valid command in the new descriptor ring entry before calling
intelxl_admin_command().
Fix by constructing the command within the loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the system MAC address (if any) via the ${sysmac} setting.
This allows scripts to access the system MAC address even when iPXE
has decided not to apply it to a network device (e.g. because the
cached DHCPACK MAC address was selected in order to match the
behaviour of a previous boot stage).
The setting is named ${sysmac} rather than ${acpimac} in order to
allow for forward compatibility with non-ACPI mechanisms that may
exist in future for specifying a system MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running on a system with an ACPI-provided system-specific MAC
address, iPXE will apply this address to an ECM or NCM USB NIC. If
iPXE has been chainloaded from a previous stage that does not
understand the ACPI MAC mechanism then this can result in iPXE using a
different MAC address than the previous stage, which is surprising to
users.
Attempt to minimise surprise by allowing the MAC address found in a
cached DHCPACK packet to override a temporary MAC address, if the
DHCPACK MAC address matches the network device's permanent MAC
address. When a previous stage has chosen to use the network device's
permanent MAC address (e.g. because it does not understand the ACPI
MAC mechanism), this will cause iPXE to make the same choice.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When applying an ACPI-provided system-specific MAC address, apply it
to netdev->ll_addr rather than netdev->hw_addr. This allows iPXE
scripts to access the permanent MAC address via the ${netX/hwaddr}
setting (and thereby provides scripts with a mechanism to ascertain
that the NIC is using a MAC address other than its own permanent
hardware address).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some newer HP products expose the host-based MAC (HBMAC) address using
an ACPI method named "RTMA" returning a part-binary string of the form
"_RTXMAC_#<mac>#", where "<mac>" comprises the raw MAC address bytes.
Extend the existing support to handle this format alongside the older
"_AUXMAC_" format (which uses a base16-encoded MAC address).
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For symmetry with the stub user_to_phys() implementation, provide
phys_to_user() with the same underlying assumption that virtual
addresses are physical (since there is no way to know the real
physical address when running as a Linux userspace executable).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for linked-in code to override the mechanism used to locate an
ACPI table, thereby opening up the possibility of ACPI self-tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the glyph cache to include a number of dynamic entries that are
populated on demand whenever a non-ASCII character needs to be drawn.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Accumulate UTF-8 characters in fbcon_putchar(), and require the frame
buffer console's .glyph() method to accept Unicode character values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently define the active DNS server as a global variable. All
queries will start by attempting to contact the active DNS server, and
the active DNS server will be changed only if we fail to get a
response. This effectively treats the DNS server list as expressing a
weak preference ordering: we will try servers in order, but once we
have found a working server we will stick with that server for as long
as it continues to respond to queries.
Some sites are misconfigured to hand out DNS servers that do not have
a consistent worldview. For example: the site may hand out two DNS
server addresses, the first being an internal DNS server (which is
able to resolve names in private DNS domains) and the second being a
public DNS server such as 8.8.8.8 (which will correctly return
NXDOMAIN for any private DNS domains). This type of configuration is
fundamentally broken and should never be used, since any DNS resolver
performing a query for a name within a private DNS domain may obtain a
spurious NXDOMAIN response for a valid private DNS name.
Work around these broken configurations by treating the DNS server
list as expressing a strong preference ordering, and always starting
DNS queries from the first server in the list (rather than maintaining
a global concept of the active server). This will have the debatable
benefit of converting permanent spurious NXDOMAIN errors into
transient spurious NXDOMAIN errors, which can at least be worked
around at a higher level (e.g. by retrying a download in a loop within
an iPXE script).
The cost of always starting DNS queries from the first server in the
list is a slight delay introduced when the first server is genuinely
unavailable. This should be negligible in practice since DNS queries
are relatively infrequent and the failover expiry time is short.
Treating the DNS server list as a preference ordering is permitted by
the language of RFC 2132, which defines DHCP option 6 as a list in
which "[DNS] servers SHOULD be listed in order of preference". No
specification defines a precise algorithm for how this preference
order should be applied in practice: this new approach seems as good
as any.
Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The BIOS console's interpretation of LShift+RShift as equivalent to
AltGr requires the shifted ASCII characters to be present in the AltGr
mapping table, to allow AltGr-Shift-<key> to be interpreted in the
same way as AltGr-<key>.
For keyboard layouts that have different ASCII characters for
AltGr-<key> and AltGr-Shift-<key>, this will potentially leave the
character for AltGr-<key> inaccessible via the BIOS console if the
BIOS requires the use of the LShift+RShift workaround. This
theoretically affects the numeric keys in the Lithuanian ("lt")
keyboard layout (where the numerals are accessed via AltGr-<key> and
punctuation characters via AltGr-Shift-<key>), but the simple
workaround for that keyboard layout is to avoid using AltGr and Shift
entirely since the unmodified numeric keys are not remapped anyway.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide the special keyboard map named "dynamic" which allows the
active keyboard map to be selected at runtime via the ${keymap}
setting, e.g.:
#define KEYBOARD_MAP dynamic
iPXE> set keymap uk
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Settings applicators are entirely independent, and there is no reason
why a failure in one applicator should prevent other applicators from
being processed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As reported by Coverity, xsmp_rx_xve_modify() currently passes a
partially initialised struct ib_address_vector to xve_update_tca() and
thence to eoib_set_gateway(), which uses memcpy() to store the whole
structure including the (unused and unneeded) uninitialised fields.
Silence the Coverity warning by zeroing the whole structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As per the general pattern for initialisation functions in iPXE,
pci_init() saves code size by assuming that the caller has already
zeroed the underlying storage (e.g. as part of zeroing a larger
containing structure). There are several places within the code where
pci_init() is deliberately used to initialise a transient struct
pci_device without zeroing the entire structure, because the calling
code knows that only the PCI bus:dev.fn address is required to be
initialised (e.g. when reading from PCI configuration space).
Ensure that using pci_init() followed by pci_read_config() will fully
initialise the struct pci_device even if the caller did not previously
zero the underlying storage, since Coverity reports that there are
several places in the code that rely upon this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Embedded images do not have an associated URI. This currently causes
the current working URI (cwuri) to be cleared when starting an
embedded image.
If the current working URI has been set via a ${next-server} setting
from a cached DHCP packet then this will result in unexpected
behaviour. An attempt by the embedded script to use a relative URI to
download files from the TFTP server will fail with the error:
Could not start download: Operation not supported (ipxe.org/3c092083)
Rerunning the "dhcp" command will not fix this error, since the TFTP
settings applicator will not see any change to the ${next-server}
setting and so will not reset the current working URI.
Fix by setting the current working URI to the image's URI only if the
image actually has an associated URI.
Debugged-by: Ignat Korchagin <ignat@cloudflare.com>
Originally-fixed-by: Ignat Korchagin <ignat@cloudflare.com>
Tested-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The AltGr remapping table is constructed to include only keys that are
not reachable after applying the basic remapping table. The logic
currently fails to include keys that are omitted entirely from the
basic remapping table since they would map to a non-ASCII character.
Fix this logic by allowing the remapping tables to include null
mappings, which are then elided only at the point of constructing the
C code fragment.
Reported-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The build process currently invokes the Python genkeymap.py script via
the Perl executable. Strangely, this appears to work.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Treat dead keys in target keymaps as producing the closest equivalent
ASCII character, since many of these characters are otherwise
unrepresented on the keyboard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several keyboard layouts define ASCII characters as accessible only
via the AltGr modifier. Add support for this modifier to ensure that
all ASCII characters are accessible.
Experiments suggest that the BIOS console is likely to fail to
generate ASCII characters when the AltGr key is pressed. Work around
this limitation by accepting LShift+RShift (which will definitely
produce an ASCII character) as a synonym for AltGr.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Handle Ctrl and CapsLock key modifiers within key_remap(), to provide
consistent behaviour across different console types.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Separate the concept of a keyboard mapping from a list of remapped
keys, to allow for the possibility of supporting multiple keyboard
mappings at runtime.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The compound statement expression within __table_entries() prevents
the use of top-level declarations such as
static struct thing *things = table_start ( THINGS );
Define TABLE_START() and TABLE_END() macros that can be used as:
static TABLE_START ( things_start, THINGS );
static struct thing *things = things_start;
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The key with scancode 86 appears in the position between left shift
and Z on a US keyboard, where it typically fails to exist entirely.
Most US keyboard maps define this nonexistent key as generating "\|",
with the notable exception of "loadkeys" which instead reports it as
generating "<>". Both of these mapping choices duplicate keys that
exist elsewhere in the map, which causes problems for our ASCII-based
remapping mechanism.
Work around these quirks by treating the key as generating "\|" with
the high bit set, and making it subject to remapping. Where the BIOS
generates "\|" as expected, this allows us to remap to the correct
ASCII value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite genkeymap.pl in Python with added sanity checks, and update
the list of keyboard mappings to remove those no longer supported by
the underlying "loadkeys" tool.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some keyboard layouts (e.g. "fr") swap letter and punctuation keys.
Apply the logic for upper and lower case and for Ctrl-<key> only after
applying remapping, in order to handle these layouts correctly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
To minimise code size, our keyboard mapping works on the basis of
allowing the BIOS to convert the keyboard scancode into an ASCII
character and then remapping the ASCII character.
This causes problems with keyboard layouts such as "fr" that swap the
shifted and unshifted digit keys, since the ASCII-based remapping will
spuriously remap the numeric keypad (which produces the same ASCII
values as the digit keys).
Fix by checking that the keyboard scancode is within the range of keys
that vary between keyboard mappings before attempting to remap the
ASCII character.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AArch64 kernels tend to be distributed as gzip compressed images.
Enable IMAGE_GZIP by default for AArch64 to avoid the need for
uncompressed images to be provided.
Originally-implemented-by: Alessandro Di Stefano <aleskandro@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In real mode, code segments are always writable. In protected mode,
code segments can never be writable. The precise implementation of
this attribute differs between CPU generations, with subtly different
behaviour arising on the transitions from protected mode to real mode.
At the point of transition (when the PE bit is cleared in CR0) the
hidden portion of the %cs descriptor will retain whatever attributes
were in place for the protected-mode code segment, including the fact
that the segment is not writable. The immediately following code will
perform a far control flow transfer (such as ljmp or lret) in order to
load a real-mode value into %cs.
On the Pentium and later CPUs, the retained protected-mode attributes
will be ignored for any accesses via %cs while the CPU is in real
mode. A write via %cs will therefore be allowed even though the
hidden portion of the %cs descriptor still describes a non-writable
segment.
On the 486 and earlier CPUs, the retained protected-mode attributes
will not be ignored for accesses via %cs. A write via %cs will
therefore cause a CPU fault. To obtain normal real-mode behaviour
(i.e. a writable %cs descriptor), special logic is added to the ljmp
instruction that populates the hidden portion of the %cs descriptor
with real-mode attributes when a far jump is executed in real mode.
The result is that writes via %cs will cause a CPU fault until the
first ljmp instruction is executed, after which writes via %cs will be
allowed as expected in real mode.
The transition code in libprefix.S currently uses lret to load a
real-mode value into %cs after clearing the PE bit. Experimentation
shows that only the ljmp instruction will work to load real-mode
attributes into the hidden portion of the %cs descriptor: other far
control flow transfers (such as lret, lcall, or int) do not do so.
When running on a 486 or earlier CPU, this results in code within
libprefix.S running with a non-writable code segment after a mode
transition, which in turn results in a CPU fault when real-mode code
in liba20.S attempts to write to %cs:enable_a20_method.
Fix by constructing and executing an ljmp instruction, to trigger the
relevant descriptor population logic on 486 and earlier CPUs. This
ljmp instruction is constructed on the stack, since the .prefix
section may be executing directly from ROM (or from memory that the
BIOS has write-protected in order to emulate an ISA ROM region) and so
cannot be modified.
Reported-by: Nikolai Zhubr <n-a-zhubr@yandex.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Attempt to fetch the autoexec.ipxe script via TFTP using the PXE base
code protocol installed on the loaded image's device handle, if
present.
This provides a generic alternative to the use of an embedded script
for chainloaded binaries, which is particularly useful in a UEFI
Secure Boot environment since it allows the script to be modified
without the need to sign a new binary.
As a side effect, this also provides a third method for breaking the
PXE chainloading loop (as an alternative to requiring an embedded
script or custom DHCP server configuration).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC3986 allows for colons to appear within the path component of a
relative URI, but iPXE will currently parse such URIs incorrectly by
interpreting the text before the colon as the URI scheme.
Fix by checking for valid characters when identifying the URI scheme.
Deliberately deviate from the RFC3986 definition of valid characters
by accepting "_" (which was incorrectly used in the iPXE-specific
"ib_srp" URI scheme and so must be accepted for compatibility with
existing deployments), and by omitting the code to check for
characters that are not used in any URI scheme supported by iPXE.
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SBAT defines an encoding for security generation numbers stored as a
CSV file within a special ".sbat" section in the signed binary. If a
Secure Boot exploit is discovered then the generation number will be
incremented alongside the corresponding fix.
Platforms may then record the minimum generation number required for
any given product. This allows for an efficient revocation mechanism
that consumes minimal flash storage space (in contrast to the DBX
mechanism, which allows for only a single-digit number of revocation
events to ever take place across all possible signed binaries).
Add SBAT metadata to iPXE EFI binaries to support this mechanism.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit f1e9e2b ("[efi] Align EFI image sections by page size"),
the VirtualSize fields for the .reloc and .debug sections have been
rounded up to the (4kB) image alignment. This breaks the PE
relocation logic in the UEFI shim, which requires the VirtualSize
field to exactly match the size as recorded in the data directory.
Fix by setting the VirtualSize field to the unaligned size of the
section, as is already done for normal PE sections (i.e. those other
than .reloc and .debug).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RFC4122 specification defines UUIDs as being in network byte
order, but an unfortunately significant amount of (mostly Microsoft)
software treats them as having the first three fields in little-endian
byte order.
In an ideal world, any server-side software that compares UUIDs for
equality would perform an endian-insensitive comparison (analogous to
comparing strings for equality using a case-insensitive comparison),
and would therefore not care about byte order differences.
Define a setting type name ":guid" to allow a UUID setting to be
formatted in little-endian order, to simplify interoperability with
server-side software that expects such a formatting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification mandates that the EFI watchdog timer should be
disabled by the platform firmware as part of the ExitBootServices()
call, but some platforms (e.g. Hyper-V) are observed to occasionally
forget to do so, resulting in a reboot approximately five minutes
after starting the operating system.
Work around these firmware bugs by disabling the watchdog timer
ourselves.
Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some systems (observed with the Thunderbolt ports on a ThinkPad X1
Extreme Gen3 and a ThinkPad P53), if the IOMMU is enabled then the
system firmware will install an ExitBootServices notification event
that disables bus mastering on the Thunderbolt xHCI controller and all
PCI bridges, and destroys any extant IOMMU mappings. This leaves the
xHCI controller unable to perform any DMA operations.
As described in commit 236299b ("[xhci] Avoid DMA during shutdown if
firmware has disabled bus mastering"), any subsequent DMA operation
attempted by the xHCI controller will end up completing after the
operating system kernel has reenabled bus mastering, resulting in a
DMA operation to an area of memory that the hardware is no longer
permitted to access and, on Windows with the Driver Verifier enabled,
a STOP 0xE6 (DRIVER_VERIFIER_DMA_VIOLATION).
That commit avoids triggering any DMA attempts during the shutdown of
the xHCI controller itself. However, this is not a complete solution
since any attached and opened USB device (e.g. a USB NIC) may
asynchronously trigger DMA attempts that happen to occur after bus
mastering has been disabled but before we reset the xHCI controller.
Avoid this problem by installing our own ExitBootServices notification
event at TPL_NOTIFY, thereby causing it to be invoked before the
firmware's own ExitBootServices notification event that disables bus
mastering.
This unsurprisingly causes the shutdown hook itself to be invoked at
TPL_NOTIFY, which causes a fatal error when later code attempts to
raise the TPL to TPL_CALLBACK (which is a lower TPL). Work around
this problem by redefining the "internal" iPXE TPL to be variable, and
set this internal TPL to TPL_NOTIFY when the shutdown hook is invoked.
Avoid calling into an underlying SNP protocol instance from within our
shutdown hook at TPL_NOTIFY, since the underlying SNP driver may
attempt to raise the TPL to TPL_CALLBACK (which would cause a fatal
error). Failing to shut down the underlying SNP device is safe to do
since the underlying device must, in any case, have installed its own
ExitBootServices hook if any shutdown actions are required.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the "--uefi" option when invoking isohybrid on an EFI-bootable
image, to create a partition mapping to the EFI system partition
embedded within the ISO image.
This allows the resulting isohybrid image to be booted on UEFI systems
that will not recognise an El Torito boot catalog on a non-CDROM
device.
Originally-fixed-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The efi_unload() function is currently missing the calls to raise and
restore the TPL. This has the side effect of causing iPXE to return
from the driver unload entry point at TPL_CALLBACK, which will cause
unexpected behaviour (typically a system lockup) shortly afterwards.
Fix by adding the missing calls to raise and restore the TPL.
Debugged-by: Petr Borsodi <petr.borsodi@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI loaded image protocol allows an image to be provided with a
custom system table, and we currently use this mechanism to wrap any
boot services calls made by the loaded image in order to provide
strace-like debugging via DEBUG=efi_wrap.
The ExitBootServices() call will modify the global system table,
leaving the loaded image using a system table that is no longer
current. When DEBUG=efi_wrap is used, this generally results in the
machine locking up at the point that the loaded operating system calls
ExitBootServices().
Fix by modifying the global EFI system table to point to our wrapper
functions, instead of providing a custom system table via the loaded
image protocol.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A successful call to ExitBootServices() will result in the EFI console
becoming unusable. Ensure that the EFI wrapper produces a complete
line of debug output before calling the wrapped ExitBootServices()
method, and attempt subsequent debug output only if the call fails.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some systems (observed with the Thunderbolt ports on a ThinkPad X1
Extreme Gen3 and a ThinkPad P53), the system firmware will disable bus
mastering on the xHCI controller and all PCI bridges at the point that
ExitBootServices() is called if the IOMMU is enabled. This leaves the
xHCI controller unable to shut down cleanly since all commands will
fail with a timeout.
Commit 85eb961 ("[xhci] Allow for permanent failure of the command
mechanism") allows us to detect that this has happened and respond
cleanly. However, some unidentified hardware component (either the
xHCI controller or one of the PCI bridges) seems to manage to enqueue
the attempted DMA operation and eventually complete it after the
operating system kernel has reenabled bus mastering. This results in
a DMA operation to an area of memory that the hardware is no longer
permitted to access. On Windows with the Driver Verifier enabled,
this will result in a STOP 0xE6 (DRIVER_VERIFIER_DMA_VIOLATION).
Work around this problem by detecting when bus mastering has been
disabled, and immediately failing the device to avoid initiating any
further DMA attempts.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE decodes any percent-encoded characters during the URI parsing
stage, thereby allowing protocol implementations to consume the raw
field values directly without further decoding.
When reconstructing a URI string for use in an HTTP request line, the
percent-encoding is currently reapplied in a reversible way: we
guarantee that our reconstructed URI string could be decoded to give
the same raw field values.
This technically violates RFC3986, which states that "URIs that differ
in the replacement of a reserved character with its corresponding
percent-encoded octet are not equivalent". Experiments show that
several HTTP server applications will attach meaning to the choice of
whether or not a particular character was percent-encoded, even when
the percent-encoding is unnecessary from the perspective of parsing
the URI into its component fields.
Fix by storing the originally encoded substrings for the path, query,
and fragment fields and using these original encoded versions when
reconstructing a URI string. The path field is also stored as a
decoded string, for use by protocols such as TFTP that communicate
using raw strings rather than URI-encoded strings. All other fields
(such as the username and password) continue to be stored only in
their decoded versions since nothing ever needs to know the originally
encoded versions of these fields.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some xHCI controllers (observed with the Thunderbolt ports on a
ThinkPad X1 Extreme Gen3 and a ThinkPad P53) seem to suffer a
catastrophic failure at the point that ExitBootServices() is called if
the IOMMU is enabled. The symptoms appear to be consistent with
another UEFI driver (e.g. the IOMMU driver, or the Thunderbolt driver)
having torn down the DMA mappings, leaving the xHCI controller unable
to write to host memory. The observable effect is that all commands
fail with a timeout, and attempts to abort command execution similarly
fail since the xHCI controller is unable to report the abort
completion.
Check for failure to abort a command, and respond by performing a full
device reset (as recommended by the xHCI specification) and by marking
the device as permanently failed.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Realistic Linux kernel command lines may exceed our current 256
character limit for interactively edited commands or settings.
Switch from stack allocation to heap allocation, and increase the
limit to 1024 characters.
Requested-by: Matteo Guglielmi <Matteo.Guglielmi@dalco.ch>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the "system MAC address" provided within the DSDT/SSDT if such an
address is available and has not already been assigned to a network
device.
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some vendors provide a "system MAC address" within the DSDT/SSDT, to
be used to override the MAC address for a USB docking station.
A full implementation would require an ACPI bytecode interpreter,
since at least one OEM allows the MAC address to be constructed by
executable ACPI bytecode (rather than a fixed data structure).
We instead attempt to extract a plausible-looking "_AUXMAC_#.....#"
string that appears shortly after an "AMAC" or "MACA" signature. This
should work for most implementations encountered in practice.
Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the DSDT/SSDT signature-scanning and value extraction code
to be reused for extracting a pass-through MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit cd3de55 ("[efi] Record cached DHCPACK from loaded image's
device handle, if present") added the ability for a chainloaded UEFI
iPXE to reuse an IPv4 address and DHCP options previously obtained by
a built-in PXE stack, without needing to perform a second DHCP
request.
Extend this to also record the cached ProxyDHCPOFFER and PXEBSACK
obtained from the EFI_PXE_BASE_CODE_PROTOCOL instance installed on the
loaded image's device handle, if present.
This allows a chainloaded UEFI iPXE to reuse a boot filename or other
options that were provided via a ProxyDHCP or PXE boot server
mechanism, rather than by standard DHCP.
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building an EFI ROM image for which no PCI vendor/device ID is
applicable (e.g. bin-x86_64-efi/ipxe.efirom), the build process will
currently construct a command such as
./util/efirom -v -d -c bin-x86_64-efi/ipxe.efidrv \
bin-x86_64-efi/ipxe.efirom
which gets interpreted as a vendor ID of "-0xd" (i.e. 0xfff3, after
truncation to 16 bits).
Fix by using an explicit zero ID when no applicable ID exists, as is
already done when constructing BIOS ROM images.
Reported-by: Konstantin Aladyshev <aladyshev22@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DHCP service in EC2 has been observed to occasionally stop
responding for bursts of several seconds. This can easily result in a
failed boot, since the current cloud boot script will attempt DHCP
only once.
Work around this problem by retrying DHCP in a fairly tight cycle
within the cloud boot script, and falling back to a reboot after
several failed DHCP attempts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit f1e9e2b ("[efi] Align EFI image sections by page size"),
our SectionAlignment has been increased to 4kB in order to allow for
page-level memory protection to be applied by the UEFI firmware, with
FileAlignment left at 32 bytes.
The PE specification states that the value for FileAlignment "should
be a power of 2 between 512 and 64k, inclusive", and that "if the
SectionAlignment is less than the architecture's page size, then
FileAlignment must match SectionAlignment".
Testing shows that signtool.exe will reject binaries where
FileAlignment is less than 512, unless FileAlignment is equal to
SectionAlignment. This indicates a somewhat zealous interpretation of
the word "should" in the PE specification.
Work around this interpretation by increasing FileAlignment from 32
bytes to 512 bytes, and add explanatory comments for both
FileAlignment and SectionAlignment.
Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building the Linux userspace binaries, the external system
headers may have already defined values for the __LITTLE_ENDIAN and
__BIG_ENDIAN constants.
Fix by retaining the existing values if already defined, since the
actual values of these constants do not matter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC 3986 section 3.1 defines URI schemes as case-insensitive (though
the canonical form is always lowercase).
Use strcasecmp() rather than strcmp() to allow for case insensitivity
in URI schemes.
Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RTL8211B seems to have a bug that prevents the link from coming up
unless the MII_MMD_DATA register is cleared.
The Linux kernel driver applies this workaround (in rtl8211b_resume())
only to the specific RTL8211B PHY model, along with a matching
workaround to set bit 9 of MII_MMD_DATA when suspending the PHY.
Since we have no need to ever suspend the PHY, and since writing a
zero ought to be harmless, we just clear the register unconditionally.
Debugged-by: Nikolay Pertsev <nikolay.p@cos.flag.org>
Tested-by: Nikolay Pertsev <nikolay.p@cos.flag.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The peer discovery time has a significant impact on the overall
PeerDist download speed, since each block requires an individual
discovery attempt. In most cases, a peer that responds for block N
will turn out to also respond for block N+1.
Assume that the most recently discovered peer (for any block) probably
has a copy of the next block to be discovered, thereby allowing the
peer download attempt to begin immediately.
In the case that this assumption is incorrect, the existing error
recovery path will allow for fallback to newly discovered peers (or to
the origin server).
Suggested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of GNU objcopy (observed with binutils 2.23.52.0.1 on
CentOS 7.0.1406) document the -D/--enable-deterministic-archives
option but fail to recognise the short form of the option.
Work around this problem by using the long form of the option.
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 040cdd0 ("[linux] Add a prefix to all symbols to avoid future
name collisions") unintentionally reintroduced an element of
non-determinism into the build ID, by omitting the -D option when
manipulating the blib.a archive.
Fix by adding the -D option to restore determinism.
Reworded-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Ip4ConfigDxe driver bug that was observed on Dell systems in
commit 64b4452 ("[efi] Blacklist the Dell Ip4ConfigDxe driver") has
also been observed on systems with a manufacturer name of "Itautec
S.A.". The symptoms of the bug are identical: an attempt to call
DisconnectController() on the LOM device handle will lock up the
system.
Fix by extending the veto to cover the Ip4ConfigDxe driver for this
manufacturer.
Debugged-by: Celso Viana <celso.vianna@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most RNDIS data structures include a trailing 4-byte reserved field.
For the REMOTE_NDIS_PACKET_MSG and REMOTE_NDIS_INITIALIZE_CMPLT
structures, this is an 8-byte field instead.
iPXE currently uses incorrect structure definitions with a 4-byte
reserved field in all data structures, resulting in data payloads that
overlap the last 4 bytes of the 8-byte reserved field.
RNDIS uses explicit offsets to locate any data payloads beyond the
message header, and so liberal RNDIS parsers (such as those used in
Hyper-V and in the Linux USB Ethernet gadget driver) are still able to
parse the malformed structures.
A stricter RNDIS parser (such as that found in some older Android
builds that seem to use an out-of-tree USB Ethernet gadget driver) may
reject the malformed structures since the data payload offset is less
than the header length, causing iPXE to be unable to transmit packets.
Fix by correcting the length of the reserved fields.
Debugged-by: Martin Nield <pmn1492@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ARM versions of the big-integer inline assembly functions include
constraints to indicate that the output value is modified by the
assembly code. These constraints are not present in the equivalent
code for the x86 versions.
As of GCC 11, this results in the compiler reporting that the output
values may be uninitialized.
Fix by including the relevant memory output constraints.
Reported-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a file "initrd.magic" via the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
that contains the initrd file as constructed for BIOS bzImage kernels
(including injected files with CPIO headers constructed by iPXE).
This allows BIOS and UEFI kernels to obtain the exact same initramfs
image, by adding "initrd=initrd.magic" to the kernel command line.
For example:
#!ipxe
kernel boot/vmlinuz initrd=initrd.magic
initrd boot/initrd.img
initrd boot/modules/e1000.ko /lib/modules/e1000.ko
initrd boot/modules/af_packet.ko /lib/modules/af_packet.ko
boot
Do not include the "initrd.magic" file within the root directory
listing, since doing so would break software such as wimboot that
processes all files within the root directory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Restructure the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL implementation to
allow for the existence of virtual files that are not simply backed by
a single underlying image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE will construct CPIO headers for images that have a non-empty
command line, thereby allowing raw images (without CPIO headers) to be
injected into a dynamically constructed initrd. This feature is
currently implemented within the BIOS-only bzImage format support.
Split out the CPIO header construction logic to allow for reuse in
other contexts such as in a UEFI build.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
DNS names are case-insensitive, and RFC 5280 (unlike RFC 3280)
mandates support for case-insensitive name comparison in X.509
certificates.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use hexadecimal values instead of macros in PCI_ROM entries so Perl
script can parse them correctly. Move PCI_ROM entries from header
file to C file. Integrate bnxt_vf_nics array into PCI_ROM entries by
introducing BNXT_FLAG_PCI_VF flag into driver_data field. Add
whitespaces in PCI_ROM entries for style consistency.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support for the zlib and gzip archive image formats is currently
included only if the IMAGE_ARCHIVE_CMD is used to enable the
"imgextract" command.
The ability to transparently execute a single-member archive image
without using the "imgextract" command renders this unintuitive: a
user wanting to gain the ability to boot a gzip-compressed kernel
image would expect to have to enable IMAGE_GZIP rather than
IMAGE_ARCHIVE_CMD.
Reverse the inclusion logic, so that archive image formats must now be
enabled explicitly (via IMAGE_GZIP and/or IMAGE_ZLIB), with the
archive image management commands dragged in as needed if any archive
image formats are enabled. The archive image management commands may
be explicitly disabled via IMAGE_ARCHIVE_CMD if necessary.
This matches the behaviour of IBMGMT_CMD and similar options, where
the relevant commands are included only when something else already
drags in the underlying feature.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An extracted image is wholly derived from the original archive image.
If the original archive image has been verified and marked as trusted,
then this trust logically extends to any image extracted from it.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide image_extract_exec() as a helper method to allow single-member
archive images (such as gzip compressed images) to be executed without
an explicit "imgextract" step.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid using the "rdtsc" instruction unless profiling is enabled. This
allows the non-debug build of the UNDI driver to be used on a CPU such
as a 486 that does not support the TSC.
Reported-by: Nikolai Zhubr <n-a-zhubr@yandex.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The decompressor uses the i486 "bswap" instruction, but does not
require any instructions that exist only on i586 or above. Update the
".arch" directive to reflect the requirements of the code as
implemented.
Reported-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the concept of extracting an image from an archive (which could be
a single-file archive such as a gzip-compressed file), along with an
"imgextract" command to expose this functionality to scripts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow both x86_64 and arm64 images to be imported in a single import
command, thereby allowing for e.g.
make CONFIG=cloud EMBED=config/cloud/aws.ipxe bin/ipxe.usb
make CONFIG=cloud EMBED=config/cloud/aws.ipxe \
CROSS=aarch64-linux-gnu- bin-arm64-efi/ipxe.usb
../contrib/cloud/aws-import -w amilist.txt -p \
bin/ipxe.usb bin-arm64-efi/ipxe.usb
This simplifies the process of generating a single amilist.txt file
for inclusion in the documentation at https://ipxe.org/howto/ec2
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The AWS console user interface provides no convenient way to sort AMIs
by creation date.
Provide a default AMI name constructed from the current date and CPU
architecture, to simplify the task of finding the most recent iPXE AMI
in a given AWS region.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an option to generate the amilist.txt list of current AMI images
as included in the EC2 documentation at https://ipxe.org/howto/ec2
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI PCI API takes a page count as the input to AllocateBuffer()
but a byte count as the input to Map(). There is nothing in the UEFI
specification that requires us to map exactly the allocated length,
and no systems have yet been observed that will fail if the map length
does not exactly match the allocated length. However, it is plausible
that some implementations may fail if asked to map a length that does
not match the length of the corresponding allocation.
Avoid potential future problems by always mapping the full allocated
length.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 79c0173 ("[build] Create util/genfsimg for building
filesystem-based images") introduced the new genfsimg, which lacks the
-l option when building ISO files. This option is required to build
level 2 (long plain) ISO9660 filenames, which are required when using
the .lkrn extensions on older versions of ISOLINUX.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The use of jumbo frames for the Xen netfront virtual NIC requires the
use of scatter-gather ("feature-sg"), with the receive descriptor ring
becoming a list of page-sized buffers and the backend using as many
page buffers as required for each packet.
Since iPXE's abstraction of an I/O buffer does not include any sort of
scatter-gather list, this requires an extra allocation and copy on the
receive datapath for any packet that spans more than a single page.
This support is required in order to successfully boot an AWS EC2
virtual machine (with non-enhanced networking) via iSCSI if jumbo
frames are enabled, since the netback driver used in EC2 seems not to
allow "feature-sg" to be renegotiated once the Linux kernel driver
takes over.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The INT 13 extensions provide a mechanism for accessing disks using
linear (LBA) rather than C/H/S addressing. SAN protocols such as
iSCSI invariably support only linear addresses and so iPXE currently
provides LBA access to all SAN disks (with autodetection and emulation
of an appropriate geometry for C/H/S accesses).
Most BIOSes will not report support for INT 13 extensions for floppy
disk drives, and some operating systems may be confused by a floppy
drive that claims such support.
Minimise surprise by reporting the existence of support for INT 13
extensions only for non-floppy drive numbers. Continue to provide
support for all drive numbers, to avoid breaking operating systems
that may unconditionally use the INT 13 extensions without first
checking for support.
Reported-by: Valdo Toost <vtoost@hot.ee>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When CONSOLE_SYSLOG is used, a DBG() from within a network device
driver may cause its transmit() or poll() methods to be unexpectedly
re-entered. Since these methods are not intended to be re-entrant,
this can lead to undefined behaviour.
Add an explicit re-entrancy guard to both methods. Note that this
must operate at a per-netdevice level, since there are legitimate
circumstances under which the netdev_tx() or netdev_poll() functions
may be re-entered (e.g. when using VLAN devices).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no method for obtaining the number of PCI buses when using
PCIAPI_DIRECT, and we therefore currently scan all possible bus
numbers. This can cause a several-second startup delay in some
virtualised environments, since PCI configuration space access will
necessarily require the involvement of the hypervisor.
Ameliorate this situation by defaulting to scanning only a single bus,
and expanding the number of PCI buses to accommodate any subordinate
buses that are detected during enumeration.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Adding this missing identifier allows the X557-AT2 chipset seen on (at
least) Super Micro A2SDI-H-TF motherboards to function with iPXE.
Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
After a PE image is fully loaded and relocated, the loader code may
opt to zero discardable sections for security reasons. This includes
relocation and debug information, as both contain hints about specific
locations within the binary. Mark both generated sections as
discardable, which follows the PE specification.
Signed-off-by: Marvin Häuser <mhaeuser@posteo.de>
For optimal memory permission management, PE sections need to be
aligned by the platform's minimum page size. Currently, the PE
section alignment is fixed to 32 bytes, which is below the typical 4kB
page size. Align all sections to 4kB and adjust ELF to PE image
conversion accordingly.
Signed-off-by: Marvin Häuser <mhaeuser@posteo.de>
As per https://github.com/ipxe/ipxe/pull/313#issuecomment-816018398,
these sections are not required for EFI execution. Discard them to
avoid implementation-defined alignment malforming binaries.
Signed-off-by: Marvin Häuser <mhaeuser@posteo.de>
Handle a DHCPNAK by returning to the discovery state to allow iPXE to
attempt to obtain a replacement IPv4 address.
Reuse the existing logic for deferring discovery when the link is
blocked: this avoids hammering a misconfigured DHCP server with a
non-stop stream of requests and allows the DHCP process to eventually
time out and fail.
Originally-implemented-by: Blake Rouse <blake.rouse@canonical.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iPXE build system is constructed for a standalone codebase with no
external dependencies, and does not have any equivalent of the
standard userspace ./configure script. We currently check for the
ability to include slirp/libslirp.h and conditionalise portions of
linux_api.c on its presence. The actual slirp driver code is built
unconditionally, as with all iPXE drivers.
This currently leads to a silent runtime failure if attempting to use
slirp.linux built on a system that was missing slirp/libslirp.h.
Convert this to a link-time failure by deliberately omitting the
relevant symbols from linux_api.c when slirp/libslirp.h is not
present. This allows other builds (e.g. tap.linux or tests.linux) to
succeed: the link-time failure will occur only if the slirp driver is
included within the build target.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Linux kernel 3.12 and earlier report a zero size via stat() for all
ACPI table files in sysfs. There is no way to determine the file size
other than by reading the file until EOF.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Consumers of acpi_find() will assume that returned structures include
a valid table header and that the length in the table header is
correct. These assumptions are necessary when dealing with raw ACPI
tables, since there exists no independent source of length
information.
Ensure that these assumptions are also valid for ACPI tables read from
sysfs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The statx() system call has a clean header file and a consistent
layout, but was unfortunately added only in kernel 4.11.
Using stat() or fstat() directly is extremely messy since glibc does
not necessarily use the kernel native data structures. However, as
the only current use case is to obtain the length of an open file, we
can merely provide a wrapper that does precisely this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DNS server list is currently printed as a debug message whenever
settings are applied. This can result in some very noisy debug logs
when a script makes extensive use of settings.
Move the DNS server list debug messages to DBGLVL_EXTRA.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow arbitrary settings to be specified on the Linux command line.
For example:
./bin-x86_64-linux/slirp.linux \
--net slirp,testserver=qa-test.ipxe.org
This can be useful when using the Linux userspace build to test
embedded scripts, since it allows arbitrary parameters to be passed
directly on the command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Versions of gcc prior to 9.1 do not support the single-argument form
of static_assert(). Fix by unconditionally defining a compatibility
macro for the single file that uses this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a driver using libslirp to provide a virtual network interface
without requiring root permissions on the host. This simplifies the
process of running iPXE as a Linux userspace application with network
access. For example:
make bin-x86_64-linux/slirp.linux
./bin-x86_64-linux/slirp.linux --net slirp
libslirp will provide a built-in emulated DHCP server and NAT router.
Settings such as the boot filename may be controlled via command-line
options. For example:
./bin-x86_64-linux/slirp.linux \
--net slirp,filename=http://192.168.0.1/boot.ipxe
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "used" attribute can be applied only to functions or variables,
which prevents the use of __asmcall as a type attribute.
Fix by removing "used" from the definition of __asmcall for i386 and
x86_64 architectures, and adding explicit __used annotations where
necessary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ACPI API currently expects platforms to provide access to a single
contiguous ACPI table. Some platforms (e.g. Linux userspace) do not
provide a convenient way to obtain the entire ACPI table, but do
provide access to individual tables.
All iPXE consumers of the ACPI API require access only to individual
tables.
Redefine the internal API to make acpi_find() an API method, with all
existing implementations delegating to the current RSDT-based
implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The result from acpi_find_rsdt() is used only for the debug message.
Simplify the debug message and remove the otherwise redundant call to
acpi_find_rsdt().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building as a Linux userspace application, iPXE currently
implements its own system calls to the host kernel rather than relying
on the host's C library. The output binary is statically linked and
has no external dependencies.
This matches the general philosophy of other platforms on which iPXE
runs, since there are no external libraries available on either BIOS
or UEFI bare metal. However, it would be useful for the Linux
userspace application to be able to link against host libraries such
as libslirp.
Modify the build process to perform a two-stage link: first picking
out the requested objects in the usual way from blib.a but with
relocations left present, then linking again with a helper object to
create a standard hosted application. The helper object provides the
standard main() entry point and wrappers for the Linux system calls
required by the iPXE Linux drivers and interface code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the possibility of linking to platform libraries for the
Linux userspace build by adding an iPXE-specific symbol prefix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Recent versions of the GNU assembler (observed with GNU as 2.35 on
Fedora 33) will produce a warning message
Warning: no instruction mnemonic suffix given and no register
operands; using default for `bts'
The operand size affects only the potential range for the bit number.
Since we pass the bit number as an unsigned int, it is already
constrained to 32 bits for both i386 and x86_64.
Silence the assembler warning by specifying an explicit 32-bit operand
size (and thereby matching the choice that the assembler would
otherwise make automatically).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the reference implementation of the EFI compression algorithm
(taken from the EDK2 codebase, with minor bugfixes to allow
compilation with -Werror) to compress EFI ROM images.
Inspired-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Assume that preservation of the %xmm registers is unnecessary during
installation of iPXE into memory, since this is an operation that by
its nature substantially disrupts large portions of the system anyway
(such as the E820 memory map). This assumption allows us to utilise
the existing CPUID code to check that FXSAVE/FXRSTOR are supported.
Test for support during the call to init_librm and store the flag for
use during subsequent calls to virt_call.
Reduce the scope of TIVOLI_VMM_WORKAROUND to affecting only the call
to check_fxsr(), to reduce #ifdef pollution in the remaining code.
Debugged-by: Johannes Heimansberg <git@jhe.dedyn.io>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The __asmcall declaration has no effect on a void function with no
parameters, but should be included for completeness since the function
is called directly from assembly code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic raw image prefix, which assumes that the iPXE image
has been loaded in its entirety on a paragraph boundary.
The resulting .raw image can be loaded via RPL using an rpld.conf file
such as:
HOST {
ethernet = 00:00:00:00:00:00/6;
FILE {
path="ipxe.raw";
load=0x2000;
};
execute=0x2000;
};
Debugged-by: Johannes Heimansberg <git@jhe.dedyn.io>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A zero-length initrd file will currently cause an endless loop during
reshuffling as the empty image is repeatedly swapped with itself.
Fix by terminating the inner loop before considering an image as a
candidate to be swapped with itself.
Reported-by: Pico Mitchell <pico@randomapplications.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most EFI firmware builds (including those found on ARM64 instances in
AWS EC2) will already send console output to the serial port.
Do not enable direct serial console output in EFI builds using
CONFIG=cloud.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the cached DHCPACK obtained from the EFI_PXE_BASE_CODE_PROTOCOL
instance installed on the loaded image's device handle, if present.
This allows a chainloaded UEFI iPXE to reuse the IPv4 address and DHCP
options previously obtained by the built-in PXE stack, as is already
done for a chainloaded BIOS iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The code to detect the autoboot link-layer address and to load the
autoexec script currently runs before the call to initialise() and so
has to function without a working heap.
This requirement can be relaxed by deferring this code to run via an
initialisation function. This gives the code a normal runtime
environment, but still invokes it early enough to guarantee that the
original loaded image device handle has not yet been invalidated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "autoboot device" and "autoexec script" functionalities in
efi_autoboot.c are unrelated except in that they both need to be
invoked by efiprefix.c before device drivers are loaded.
Split out the autoexec script portions to a separate file to avoid
potential confusion.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split out the portions of cachedhcp.c that can be shared between BIOS
and UEFI (both of which can provide a buffer containing a previously
obtained DHCP packet, and neither of which provide a means to
determine the length of this DHCP packet).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The GCC11 compiler pointed out something that apparently no previous
compiler noticed: in ath5k_eeprom_pread_turbo_modes, local variable
val is used uninitialized. From what I can see, the code is just
missing an initial AR5K_EEPROM_READ. Add it right before the switch
statement.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a utility that can be used to upload an iPXE disk image to AWS EC2
as an Amazon Machine Image (AMI). For example:
make CONFIG=cloud EMBED=config/cloud/aws.ipxe bin/ipxe.usb
../contrib/cloud/aws-import -p -n "iPXE 1.21.1" bin/ipxe.usb
Uploads are performed in parallel across all regions, and use the EBS
direct APIs to avoid the need to store temporary files in S3 or to run
VM import tasks.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of GNU ld (observed with binutils 2.36 on Arch Linux)
introduce a .note.gnu.property section marked as loadable at a high
address and with non-empty contents. This adds approximately 128MB of
garbage to the BIOS .usb disk images.
Fix by using a custom linker script for the prefix-only binaries such
as the USB disk partition table and MBR, in order to allow unwanted
sections to be explicitly discarded.
Reported-by: Christian Hesse <mail@eworm.de>
Tested-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The version of SeaBIOS found on some AWS EC2 instances (observed with
t3a.nano in eu-west-1) has no support for the INT 1A PCI BIOS calls.
Bring config/ioapi.h into the named-configuration set of headers, and
specify the use of PCIAPI_DIRECT for CONFIG=cloud, to work around the
missing PCI BIOS support.
Switching to a different named configuration will now unfortunately
cause an almost complete rebuild of iPXE. As described in commit
c801cb2 ("[build] Allow for named configurations at build time"), this
is the reason why config/ioapi.h was not originally in the
named-configuration set of header files.
This rebuild cost is acceptable given that build times are
substantially faster now than seven years ago, and that very few
people are likely to be switching named configurations on a regular
basis.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Linux and FreeBSD drivers for the (totally undocumented) ENA
adapters use a two-phase reset mechanism: first set ENA_CTRL.RESET and
wait for this to be reflected in ENA_STAT.RESET, then clear
ENA_CTRL.RESET and again wait for it to be reflected in
ENA_STAT.RESET.
The iPXE driver currently assumes a self-clearing reset mechanism,
which appeared to work at the time that the driver was created but
seems no longer to function, at least on the t3.nano and t3a.nano
instance types found in eu-west-1.
Switch to a simplified version of the two-phase reset mechanism as
used by Linux and FreeBSD.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The semantics of the assembler's .align directive vary by CPU
architecture. For the ARM builds, it specifies a power of two rather
than a number of bytes. This currently leads to the .einfo entries
(which do not appear in the final binary) having an alignment of 256
bytes for the ARM builds.
Fix by switching to the GNU-specific directive .balign, which is
consistent across architectures
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support for building with the Intel C compiler (icc) was added in 2009
in the expectation that UEFI support would eventually involve
compiling iPXE to EFI Byte Code.
EFI Byte Code has never found any widespread use: no widely available
compilers can emit it, Microsoft refuses to sign EFI Byte Code
binaries for UEFI Secure Boot, and I have personally never encountered
any examples of EFI Byte Code in the wild.
The support for using the Intel C compiler has not been tested in over
a decade, and would almost certainly require modification to work with
current releases of the compiler.
Simplify the build process by removing this old legacy code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit 7c3d186 ("[build] Check that mkisofs equivalent supports
the required options"), we may refuse to use a mkisofs equivalent if
it does not support the options required to produce the requested
output file.
This can result in confusing error messages since the user is unaware
of the reason for which the installed mkisofs or genisoimage has been
rejected.
Fix by explicitly reporting the reason why each possible mkisofs
equivalent could not be used.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The scheduled Coverity Scan run is triggered by an external mechanism
that synchronises the coverity_scan branch with the master branch.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some patched versions of gcc (observed with gcc 9.3.0 on Ubuntu 20.04)
enable -fcf-protection=full by default. This breaks code that is not
explicitly written to expect the use of this flag. The breakage
occurs only at runtime if the affected code (such as setjmp()) happens
to execute, and is therefore a particularly pernicious class of bug to
be introduced into working code by a broken compiler.
Work around these broken patched versions of gcc by detecting support
for -fcf-protection and explicitly setting -fcf-protection=none if
found.
If any Ubuntu maintainers are listening: PLEASE STOP DOING THIS. It's
extremely unhelpful to have to keep working around breakages that you
introduce by modifying the compiler's default behaviour. Do what Red
Hat does instead: set your preferred CFLAGS within the package build
system rather than by patching the compiler to behave in violation of
its own documentation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several distributions include versions of gcc that are patched to
create position-independent executables by default. These have caused
multiple problems over the years: see e.g. commits fe61f6d ("[build]
Fix compilation when gcc is patched to default to -fPIE -Wl,-pie"),
5de1346 ("[build] Apply the "-fno-PIE -nopie" workaround only to i386
builds"), 7c395b0 ("[build] Use -no-pie on newer versions of gcc"),
and decee20 ("[build] Disable position-independent code for ARM64 EFI
builds").
The build system currently attempts to work around these mildly broken
patched versions of gcc for the i386 and arm64 architectures. This
misses the relatively obscure bin-x86_64-pcbios build platform, which
turns out to also require the same workaround.
Attempt to preempt the next such required workaround by moving the
existing i386 version to apply to all platforms and all architectures,
unless -fpie has been requested explicitly by another Makefile (as is
done by arch/x86_64/Makefile.efi).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The function trace recorder build logic defaults to making "clean" a
dependency of the first build in a clean checkout. This is redundant
and causes problems if the build process spins up multiple make
invocations to handle multiple build architectures.
Fix by replacing with logic based on the known-working patterns used
for the ASSERT and PROFILE build parameters.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
After a ton of tedious work, I am pleased to finally introduce full
support for ConnectX-3 cards in iPXE!
The work has been done by finding all publicly available versions of
the Mellanox Flexboot sources, cleaning them up, synthesizing a git
history from them, cleaning out non-significant changes, and
correlating with the iPXE upstream git history.
After this, a proof-of-concept diff was produced, that allowed iPXE to
be compiled with rudimentary ConnectX-3 support. This diff was over
10k lines, and contained many changes that were not part of the core
driver.
Special thanks to Michael Brown <mcb30@ipxe.org> for answering my
barrage of questions, and helping brainstorm the development along the
way.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Some network devices can take a substantial time to close and reopen.
Avoid closing the device from which we are about to attempt booting,
in case it happens to be already open.
Suggested-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The CQE length field will not be valid for a completion in error.
Avoid parsing the length field and just call the completion handler
directly.
In debug builds, also dump the queue pair context to allow for
inspection of the error.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Check for reset completion by waiting for the device to respond to PCI
configuration cycles, as documented in the Programmer's Reference
Manual. On the original ConnectX HCA, this reduces the time spent on
reset from 1000ms down to 1ms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When auto-detecting the initial port type, the Hermon driver will spam
the debug output without hesitation. Add a short delay in each
iteration to fix this.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Inspired by Flexboot, the function hermon_event_port_mgmnt_change() is
added to handle the HERMON_EV_PORT_MGMNT_CHANGE event type, which
updates the Infiniband subsystem.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hermon Ethernet work queues have more RX than TX entries, unlike most
other drivers. This is possibly the source of some stochastic
deadlocks previously experienced with this driver.
Update the sizes to be in line with other drivers, and make them
slightly larger for better performance. These new queue sizes have
been found to work well with ConnectX-3 hardware.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The programming documentation states that the reset magic value is
"0x00000001 (Big Endian)", and the current code matches this by using
the value 0x01000000 for the implicitly little-endian writel().
Inspection of the FlexBoot source code reveals an exciting variety of
reset values, some suggestive of confusion around endianness.
Experimentation suggests that the value 0x01000001 works reliably
across a wide range of hardware.
Debugged-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of the hardware (and/or firmware) do not report an
event when an Infiniband link reaches the INIT state. The driver
works around this missing event by calling ib_smc_update() on each
event queue poll while the link is in the DOWN state.
Commit 6cb12ee ("[hermon] Increase polling rate for command
completions") addressed this by speeding up the time taken to issue
each command invoked by ib_smc_update(). Experimentation shows that
the impact is still significant: for example, in a situation where an
unplugged port is opened, the throughput on the other port can be
reduced by over 99%.
Fix by throttling the rate at which link polling is attempted.
Debugged-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The version of awk used in FreeBSD seems to be incapable of formatting
unsigned 32-bit integers above 0x80000000 and will silently render any
such value as 0x80000000. For example:
echo 3735928559 | awk '{printf "0x%08x", $1}'
will produce 0x80000000 instead of the correct 0xdeadbeef.
This results in an approximately 50% chance of a build ID collision
when building on FreeBSD.
Work around this problem by passing the decimal value directly in the
ld --defsym argument value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a few more ABSOLUTE() expressions to convince the FreeBSD linker
that already-absolute symbols are, in fact, absolute.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The elftoolchain version of objcopy (as used in FreeBSD) seems to be
unusable for generating a raw binary file, since it will apparently
ignore the load memory addresses specified for each section in the
input file.
The binutils version of objcopy may be used on FreeBSD by specifying
OBJCOPY=/usr/local/bin/objcopy
Detect an attempt to use the unusable elftoolchain version of objcopy
and report it as an error.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of objcopy will spuriously complain when asked to
extract the .zinfo section since doing so will nominally alter the
load addresses of the (non-loadable) .bss.* sections.
Avoid these warnings by placing the .zinfo section at the very end of
the load memory address space.
Allocate non-overlapping load memory addresses for the (non-loadable)
.bss.* sections, in the hope of avoiding spurious warnings about
overlapping load addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calculate the build ID as a checksum over the input files. Since the
input files include $(BIN)/version.%.o which itself includes the build
target name (from which TGT_LD_FLAGS is calculated), this should be
sufficient to meet the requirement that the build ID be unique for
each $(BIN)/%.tmp even within the same build run.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When using $(shell), make will first invoke BUILD_ID_CMD and then have
the value defined when calling $(LD). This means we get to see the
_build_id when building with make V=1. Previously the build_id was
figured out as a subshell command run during the recipe execution
without being able to see the build_id itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Directories may be left behind by failed filesystem image builds, and
will not currently be successfully removed by a "make clean".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The elf.h on FreeBSD defines ELF_R_TYPE and ELF_R_SYM (based on the
host platform) and omits some but not all of the AArch64 relocation
types.
Fix by undefining ELF_R_TYPE and ELF_R_SYM in favour of our own
definitions, and by placing each potentially missing relocation type
within an individual #ifdef guard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The -boot-info-table option to mkisofs will cause it to overwrite a
portion of the local copy of isolinux.bin. Ensure that this file is
writable.
Originally-implemented-by: Nikolai Lifanov <lifanov@mail.lifanov.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make the contents of $(BLIB) deterministic to allow it to be
subsequently used for calculating a build ID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This change is ported from Flexboot sources. When stopping a Hermon
device, perform hermon_unmap_mpt() which runs HERMON_HCR_HW2SW_MPT to
bring the Memory Protection Table (MPT) back to software control.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The eIPoIB local Ethernet MAC is currently constructed from the port
GUID. Given a base GUID/MAC value of N, Mellanox seems to populate:
Node GUID: N + 0
Port 1 GUID: N + 1
Port 2 GUID: N + 2
and
Port 1 MAC: N + 0
Port 2 MAC: N + 1
This causes a duplicate local MAC address when port 1 is configured as
Infiniband and port 2 as Ethernet, since both will derive their MAC
address as (N + 1).
Fix by using the port's Ethernet MAC as the eIPoIB local EMAC. This
is a behavioural change that could potentially break configurations
that rely on the local EMAC value, such as a DHCP server relying on
the chaddr field for DHCP reservations.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of the hardware (and/or firmware) do not report an
event when an Infiniband link reaches the INIT state. The driver
works around this missing event by calling ib_smc_update() on each
event queue poll while the link is in the DOWN state. This results in
a very large number of commands being issued while any open Infiniband
link is in the DOWN state (e.g. unplugged), to the point that the 1ms
delay from waiting for each command to complete will noticeably affect
responsiveness.
Fix by decreasing the command completion polling delay from 1ms to
10us.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add hermon_dump_eqctx() for dumping the event queue context and
hermon_dump_eqes() for dumping any unconsumed event queue entries.
Originally-implemented-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some commands (particularly in relation to device initialization) can
occasionally take longer than 2 seconds, and the Mellanox documentation
recommends a 10 second timeout.
Signed-off-by: Christian Iversen <ci@iversenit.dk>
The original EFI_SIMPLE_TEXT_INPUT_PROTOCOL is not technically
required to handle the use of the Ctrl key, and the long-obsolete EFI
1.10 specification lists only backspace, tab, linefeed, and carriage
return as required. Some particularly brain-dead vendor UEFI firmware
implementations dutifully put in the extra effort of ensuring that all
other control characters (such as Ctrl-C) are impossible to type via
EFI_SIMPLE_TEXT_INPUT_PROTOCOL.
Current versions of the UEFI specification mandate that the console
input handle must support both EFI_SIMPLE_TEXT_INPUT_PROTOCOL and
EFI_SIMPLE_TEXT_INPUT_EX_PROTOCOL, the latter of which at least
provides access to modifier key state.
Unlike EFI_SIMPLE_TEXT_INPUT_PROTOCOL, the pointer to the
EFI_SIMPLE_TEXT_INPUT_EX_PROTOCOL instance does not appear within the
EFI system table and must therefore be opened explicitly. The UEFI
specification provides no safe way to do so, since we cannot open the
handle BY_DRIVER or BY_CHILD_CONTROLLER and so nothing guarantees that
this pointer will remain valid for the lifetime of iPXE. We must
simply hope that no UEFI firmware implementation ever discovers a
motivation for reinstalling the EFI_SIMPLE_TEXT_INPUT_EX_PROTOCOL
instance.
Use EFI_SIMPLE_TEXT_INPUT_EX_PROTOCOL if available, falling back to
the existing EFI_SIMPLE_TEXT_PROTOCOL otherwise.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE when used as a NIC option ROM can sometimes be reloaded by the
UEFI/BIOS and any pre-initialised memory will remain loaded. When the
imgtrust command is run it sets `require_trusted_images'. Upon
reloading, iPXE tries to load the first embedded image but fails as it
is not marked trusted.
Setting this flag ensures that imgtrust with the first embedded script
is reentrant.
Signed-off-by: Joe Groocock <jgroocock@cloudflare.com>
Require drivers to report the total number of Infiniband ports. This
is necessary to report the correct number of ports on devices with
dynamic port types.
For example, dual-port Mellanox cards configured for (eth, ib) would
be rejected by the subnet manager, because they report using "port 2,
out of 1".
Signed-off-by: Christian Iversen <ci@iversenit.dk>
This is useful on devices that perform auto-detection for ports.
Example output:
iPXE> ifstat
net0: 00:11:22:33:44:55 using mt4099 on 0000:00:03.0 (Ethernet) [open]
[Link:down, TX:0 TXE:0 RX:0 RXE:0]
[Link status: Unknown (http://ipxe.org/1a086101)]
net1: 00:11:22:33:44:56 using mt4099 on 0000:00:03.0 (IPoIB) [open]
[Link:down, TX:0 TXE:0 RX:0 RXE:0]
[Link status: Initialising (http://ipxe.org/1a136101)]
Signed-off-by: Christian Iversen <ci@iversenit.dk>
When booting iPXE from a filesystem (e.g. a FAT-formatted USB key) it
can be useful to have an iPXE script loaded automatically from the
same filesystem. Compared to using an embedded script, this has the
advantage that the script can be edited without recompiling the iPXE
binary.
For the BIOS version of iPXE, loading from a filesystem is handled
using syslinux (or isolinux) which allows the script to be passed to
the iPXE .lkrn image as an initrd.
For the UEFI version of iPXE, the platform firmware loads the iPXE
.efi image directly and there is currently no equivalent of the BIOS
initrd mechanism.
Add support for automatically loading a file "autoexec.ipxe" (if
present) from the root of the filesystem containing the UEFI iPXE
binary.
A combined BIOS and UEFI image for a USB key can be created using e.g.
./util/genfsimg -o usbkey.img -s myscript.ipxe \
bin-x86_64-efi/ipxe.efi bin/ipxe.lkrn
The file "myscript.ipxe" would appear as "autoexec.ipxe" on the USB
key, and would be loaded automatically on both BIOS and UEFI systems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Consolidate the remaining logic common to initrd_init() and imgmem()
into a shared image_memory() function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "-e" option required for creating EFI boot images is supported
only by widely used patched versions of genisoimage.
Check that the required options are supported when selecting a mkisofs
equivalent, thereby allowing a fallback to the use of xorrisofs when
building a UEFI ISO image on a system with an unpatched version of
genisoimage.
Continue to prefer the use of genisoimage over xorrisofs, since there
is apparently no way to inhibit the irritatingly useless startup
banner message printed by xorrisofs even when the "-quiet" option is
specified.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide some visibility into the turnaround times on both client and
server sides as perceived by iPXE, to assist in debugging inexplicably
slow TFTP transfers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide the "imgmem" command to create an image from an existing block
of memory, for debugging purposes only.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For FAT filesystem images larger than a 1.44MB floppy disk, round up
the image size to a whole number of 504kB cylinders before formatting.
This avoids losing up to a cylinder's worth of expected space in the
filesystem image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow generated filesystem images to be accessed using the file:// URI
syntax by setting a defined volume name. This allows a script placed
on the same filesystem image to be accessed using e.g.
chain file://iPXE/script.ipxe
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extract the PE header offset from the MZ header rather than assuming a
fixed offset as used in the binaries created by the iPXE build system.
This allows genfsimg to be used to create bootable filesystem images
from third party UEFI binaries such as the UEFI shell.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
With the default timeouts for Cisco MAC Authentication Bypass, the
link will remain blocked for around 90 seconds (plus a likely
subsequent delay for STP).
Extend the maximum number of DHCP discovery deferrals to allow for up
to three minutes of waiting for a link to become unblocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A switch port using 802.1x authentication will send EAP
Request-Identity packets once the physical link is up, and will not be
forwarding packets until the port identity has been established.
We do not currently support 802.1x authentication. However, a
reasonably common configuration involves using a preset list of
permitted MAC addresses, with the "authentication" taking place
between the switch and a RADIUS server. In this configuration, the
end device does not need to perform any authentication step, but does
need to be prepared for the switch port to fail to forward packets for
a substantial time after physical link-up. This exactly matches the
"blocked link" semantics already used when detecting a non-forwarding
switch port via LACP or STP.
Treat a received EAP Request-Identity as indicating a blocked link.
Unlike LACP or STP, there is no way to determine the expected time
until the next EAP packet and so we must choose a fixed timeout.
Erroneously assuming that the link is blocked is relatively harmless
since we will still attempt to transmit and receive data even over a
link that is marked as blocked, and so the net effect is merely to
prolong DHCP attempts. In contrast, erroneously assuming that the
link is unblocked will potentially cause DHCP to time out and give up,
resulting in a failed boot.
The default EAP Request-Identity interval in Cisco switches (where
this is most likely to be encountered in practice) is 30 seconds, so
choose 45 seconds as a timeout that is likely to avoid gaps during
which we falsely assume that the link is unblocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the GPL2+-only EAPoL code (currently used only for WPA) with
new code licensed under GPL2+-or-UBDL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Continue to transmit DHCPDISCOVER while waiting for a blocked link, in
order to support mechanisms such as Cisco MAC Authentication Bypass
that require repeated transmission attempts in order to trigger the
action that will result in the link becoming unblocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for xorrisofs, a GNU mkisofs equivalent that is available
in most distro repositories.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc (observed with gcc 9.3.0 on NixOS Linux) produce
a spurious warning about an out-of-bounds array access for the
isa_extra_probe_addrs[] array.
Work around this compiler bug by redefining the array index as a
signed long, which seems to somehow avoid this spurious warning.
Debugged-by: Manuel Mendez <mmendez534@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise util/geniso, util/gensdsk, and util/genefidsk to create a
single script util/genfsimg that can be used to build either FAT
filesystem images or ISO images.
Extend the functionality to allow for building multi-architecture UEFI
bootable ISO images and combined BIOS+UEFI images.
For example:
./util/genfsimg -o combined.iso \
bin-x86_64-efi/ipxe.efi \
bin-arm64-efi/ipxe.efi \
bin/ipxe.lkrn
would generate a hybrid image that could be used as a CDROM (or hard
disk or USB key) on legacy BIOS, x86_64 UEFI, or ARM64 UEFI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI device drivers will react to an asynchronous USB transfer
failure by dubiously terminating the scheduled transfer from within
the completion handler.
We already have code from commit fbb776f ("[efi] Leave USB endpoint
descriptors in existence until device is removed") that avoids freeing
memory in this situation, in order to avoid use-after-free bugs. This
is not sufficient to avoid potential problems, since with an xHCI
controller the act of closing the endpoint requires issuing a command
and awaiting completion via the event ring, which may in turn dispatch
further USB transfer completion events.
Avoid these problems by leaving the USB endpoint open (but with the
refill timer stopped) until the device is finally removed, as is
already done for control and bulk transfers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that any command failure messages are followed up with an error
message indicating what the failed command was attempting to perform.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The xHCI driver can handle only a single command TRB in progress at
any one time. Immediately fail any attempts to issue concurrent
commands (which should not occur in normal operation).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There may be multiple instances of EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL for
a single PCI segment. Use the bus number range descriptor from the
ACPI resource list to identify the correct protocol instance.
There is some discrepancy between the ACPI and UEFI specifications
regarding the interpretation of values within the ACPI resource list.
The ACPI specification defines the min/max field values to be within
the secondary (device-side) address space, and defines the offset
field value as "the offset that must be added to the address on the
secondary side to obtain the address on the primary side".
The UEFI specification states instead that the offset field value is
the "offset to apply to the starting address to convert it to a PCI
address", helpfully omitting to clarify whether "to apply" in this
context means "to add" or "to subtract". The implication of the
wording is also that the "starting address" is not already a "PCI
address" and must therefore be a host-side address rather than the
ACPI-defined device-side address.
Code comments in the EDK2 codebase seem to support the latter
(non-ACPI) interpretation of these ACPI structures. For example, in
the PciHostBridgeDxe driver there can be found the comment
Macros to translate device address to host address and vice versa.
According to UEFI 2.7, device address = host address + translation
offset.
along with a pair of macros TO_HOST_ADDRESS() and TO_DEVICE_ADDRESS()
which similarly negate the sense of the "translation offset" from the
definition found in the ACPI specification.
The existing logic in efipci_ioremap() (based on a presumed-working
externally contributed patch) applies the non-ACPI interpretation: it
assumes that min/max field values are host-side addresses and that the
offset field value is negated.
Match this existing logic by assuming that min/max field values are
host-side bus numbers. (The bus number offset value is therefore not
required and so can be ignored.)
As noted in commit 9b25f6e ("[efi] Fall back to assuming identity
mapping of MMIO address space"), some systems seem to fail to provide
MMIO address space descriptors. Assume that some systems may
similarly fail to provide bus number range descriptors, and fall back
in this situation to assuming that matching on segment number alone is
sufficient.
Testing any of this is unfortunately impossible without access to
esoteric hardware that actually uses non-zero translation offsets.
Originally-implemented-by: Thomas Walker <twalker@twosigma.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Email from solarflare.com will stop working, so update those. Remove
email for Shradha Shah, as she is not involved with this any more.
Update copyright notices for files touched.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We surface this debugging information in cases where a cert actually
lacks an issuer, but also in cases where it *has* an issuer, but we
cannot trust it (e.g. due to issues in establishing a trust chain).
Signed-off-by: Josh McSavaney <me@mcsau.cc>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE seems to be almost alone in the UEFI world in attempting to shut
down cleanly, free resources, and leave hardware in a well-defined
reset state before handing over to the booted operating system.
The UEFI driver model does allow for graceful shutdown via
uninstallation of protocol interfaces. However, virtually no other
UEFI drivers do this, and the external code paths that react to
uninstallation are consequently poorly tested. This leads to a
proliferation of bugs found in UEFI implementations in the wild, as
described in commits such as 1295b4a ("[efi] Allow initialisation via
SNP interface even while claimed") or b6e2ea0 ("[efi] Veto the HP
XhciDxe Driver").
Try to avoid triggering such bugs by unconditionally skipping the
protocol interface uninstallation during UEFI boot services shutdown,
leaving the interfaces present but nullified and deliberately leaking
the containing memory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
USB tethering via an iPhone is unreasonably complicated due to the
requirement to perform a pairing operation that involves establishing
a TLS session over a completely unrelated USB function that speaks a
protocol that is almost, but not quite, entirely unlike TCP.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the root of trust used at the point that a certificate is
validated, redefine validation as checking a certificate against a
specific root of trust, and pass an explicit root of trust when
creating a TLS connection.
This allows a custom TLS connection to be used with a custom root of
trust, without causing any validated certificates to be treated as
valid for normal purposes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
OCSP currently calls x509_validate() with an empty root certificate
list, on the basis that the OCSP signer certificate (if existent) must
be signed directly by the issuer certificate.
Using an empty root certificate list is not required to achieve this
goal, since x509_validate() already accepts an explicit issuer
certificate parameter. The explicit empty root certificate list
merely prevents the signer certificate from being evaluated as a
potential trusted root certificate.
Remove the dummy OCSP root certificate list and use the default root
certificate list when calling x509_validate().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is nothing OID-specific about the ASN1_OID_CURSOR macro. Rename
to allow it to be used for constructing ASN.1 cursors with arbitrary
contents.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the existing certificate store to automatically append any
available issuing certificates to the selected client certificate.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Restructure the use of add_tls() to insert a TLS filter onto an
existing interface. This allows for the possibility of using
add_tls() to start TLS on an existing connection (as used in several
protocols which will negotiate the choice to use TLS before the
ClientHello is sent).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the filter interface insertion logic from block_translate()
and expose as intf_insert(), allowing a filter interface to be
inserted on any existing interface.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The HP XhciDxe driver (observed on an HP EliteBook 840 G6) does not
respond correctly to driver disconnection, and will leave the PciIo
protocol instance opened with BY_DRIVER attributes even after
returning successfully from its Stop() method. This prevents iPXE
from subsequently connecting to the PCI device handle.
Veto this driver if the iPXE build includes a native xHCI driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI drivers (observed with the "Usb Xhci Driver" on an HP
EliteBook) are particularly badly behaved: they cannot be unloaded and
will leave handles opened with BY_DRIVER attributes even after
disconnecting the driver, thereby preventing a replacement iPXE driver
from opening the handle.
Allow such drivers to be vetoed by falling back to a brute-force
mechanism that will disconnect the driver from all handles, uninstall
the driver binding protocol (to prevent it from attaching to any new
handles), and finally close any stray handles that the vetoed driver
has left open.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most veto checks are likely to use the manufacturer name and driver
name, so pass these as parameters to minimise code duplication.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow external code to dump the information for an opened protocol
information entry via DBG_EFI_OPENER() et al.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some devices (e.g. xHCI USB host controllers) may require the use of
large areas of host memory for private use by the device. These
allocations cannot be satisfied from iPXE's limited heap space, and so
are currently allocated using umalloc() which will allocate external
system memory (and alter the system memory map as needed).
Provide dma_umalloc() to provide such allocations as part of the DMA
API, since there is otherwise no way to guarantee that the allocated
regions are usable for coherent DMA.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification does not prohibit zero-length DMA mappings.
However, there is a reasonable chance that at least one implementation
will treat it as an invalid parameter. As a precaution, avoid calling
EFI_PCI_IO_PROTOCOL.Map() with a length of zero.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Unlike netdev_rx_err(), there is no valid circumstance under which
netdev_rx() may be called with a null I/O buffer, since a call to
netdev_rx() represents the successful reception of a packet. Fix the
code comment to reflect this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
netdev_tx_err() may be called with a null I/O buffer (e.g. to record a
transmit error with no associated buffer). Avoid a potential null
pointer dereference in the DMA unmapping code path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Include a potential DMA mapping within the definition of an I/O
buffer, and move all I/O buffer DMA mapping functions from dma.h to
iobuf.h. This avoids the need for drivers to maintain a separate list
of DMA mappings for each I/O buffer that they may handle.
Network device drivers typically do not keep track of transmit I/O
buffers, since the network device core already maintains a transmit
queue. Drivers will typically call netdev_tx_complete_next() to
complete a transmission without first obtaining the relevant I/O
buffer pointer (and will rely on the network device core automatically
cancelling any pending transmissions when the device is closed).
To allow this driver design approach to be retained, update the
netdev_tx_complete() family of functions to automatically perform the
DMA unmapping operation if required. For symmetry, also update the
netdev_rx() family of functions to behave the same way.
As a further convenience for drivers, allow the network device core to
automatically perform DMA mapping on the transmit datapath before
calling the driver's transmit() method. This avoids the need to
introduce a mapping error handling code path into the typically
error-free transmit methods.
With these changes, the modifications required to update a typical
network device driver to use the new DMA API are fairly minimal:
- Allocate and free descriptor rings and similar coherent structures
using dma_alloc()/dma_free() rather than malloc_phys()/free_phys()
- Allocate and free receive buffers using alloc_rx_iob()/free_rx_iob()
rather than alloc_iob()/free_iob()
- Calculate DMA addresses using dma() or iob_dma() rather than
virt_to_bus()
- Set a 64-bit DMA mask if needed using dma_set_mask_64bit() and
thereafter eliminate checks on DMA address ranges
- Either record the DMA device in netdev->dma, or call iob_map_tx() as
part of the transmit() method
- Ensure that debug messages use virt_to_phys() when displaying
"hardware" addresses
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Redefine the value stored within a DMA mapping to be the offset
between physical addresses and DMA addresses within the mapped region.
Provide a dma() wrapper function to calculate the DMA address for any
pointer within a mapped region, thereby simplifying the use cases when
a device needs to be given addresses other than the region start
address.
On a platform using the "flat" DMA implementation the DMA offset for
any mapped region is always zero, with the result that dma_map() can
be optimised away completely and dma() reduces to a straightforward
call to virt_to_phys().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE will currently fail all SNP interface methods with EFI_NOT_READY
while the network devices are claimed for use by iPXE's own network
stack.
As of commit c70b3e0 ("[efi] Always enable recursion when calling
ConnectController()"), this exposes latent UEFI firmware bugs on some
systems at the point of calling ExitBootServices().
With recursion enabled, the MnpDxe driver will immediately attempt to
consume the SNP protocol instance provided by iPXE. Since the network
devices are claimed by iPXE at this point, the calls by MnpDxe to
Start() and Initialize() will both fail with EFI_NOT_READY.
This unfortunately triggers a broken error-handling code path in the
Ip6Dxe driver. Specifically: Ip6DriverBindingStart() will call
Ip6CreateService(), which will call Ip6ServiceConfigMnp(), which will
return an error. The subsequent error handling code path in
Ip6CreateService() simply calls Ip6CleanService(). The code in
Ip6CleanService() will attempt to leave the all-nodes multicast group,
which will fail since the group was never joined. This will result in
Ip6CleanService() returning an error and omitting most of the required
clean-up operations. In particular, the MNP protocol instance will
remain opened with BY_DRIVER attributes even though the Ip6Dxe driver
start method has failed.
When ExitBootServices() is eventually called, iPXE will attempt to
uninstall the SNP protocol instance. This results in the UEFI core
calling Ip6DriverBindingStop(), which will fail since there is no
EFI_IP6_SERVICE_BINDING_PROTOCOL instance installed on the handle.
A failure during a call to UninstallMultipleProtocolInterfaces() will
result in the UEFI core attempting to reinstall any successfully
uninstalled protocols. This is an intrinsically unsafe operation, and
represents a fundamental design flaw in UEFI. Failure code paths
cannot be required to themselves handle failures, since there is no
well-defined correct outcome of such a situation.
With a current build of OVMF, this results in some unexpected debug
messages occurring at the time that the loaded operating system calls
ExitBootServices(). With the UEFI firmware in Hyper-V, the result is
an immediate reboot.
Work around these UEFI design and implementation flaws by allowing the
calls to our EFI_SIMPLE_NETWORK_PROTOCOL instance's Start() and
Initialize() methods to return success even when the network devices
are claimed for exclusive use by iPXE. This is sufficient to allow
MnpDxe to believe that it has successfully initialised the device, and
thereby avoids the problematic failure code paths in Ip6Dxe.
Debugged-by: Aaron Heusser <aaron_heusser@hotmail.com>
Debugged-by: Pico Mitchell <pico@randomapplications.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For the physical function driver, the transmit queue needs to be
configured to be associated with the relevant physical function
number. This is currently obtained from the bus:dev.fn address of the
underlying PCI device.
In the case of a virtual machine using the physical function via PCI
passthrough, the PCI bus:dev.fn address within the virtual machine is
unrelated to the real physical function number. Such a function will
typically be presented to the virtual machine as a single-function
device. The function number extracted from the PCI bus:dev.fn address
will therefore always be zero.
Fix by reading from the Function Requester ID Information Register,
which always returns the real PCI bus:dev.fn address as used by the
physical host.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The datasheet is fairly incomprehensible in terms of identifying the
appropriate MAC address for use by the physical function driver.
Choose to read the MAC address from PRTPM_SAH and PRTPM_SAL, which at
least matches the MAC address as selected by the Linux i40e driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE will currently drop to TPL_APPLICATION whenever the current
system time is obtained via currticks(), since the system time
mechanism relies on a timer that can fire only when the TPL is below
TPL_CALLBACK.
This can cause unexpected behaviour if the system time is obtained in
the middle of an API call into iPXE by external code. For example,
MnpDxe sets up a 10ms periodic timer running at TPL_CALLBACK to poll
the underling EFI_SIMPLE_NETWORK_PROTOCOL device for received packets.
If the resulting poll within iPXE happens to hit a code path that
requires obtaining the current system time (e.g. due to reception of
an STP packet, which affects iPXE's blocked link timer), then iPXE
will end up temporarily dropping to TPL_APPLICATION. This can
potentially result in retriggering the MnpDxe periodic timer, causing
code to be unexpectedly re-entered.
Fix by recording the external TPL at any entry point into iPXE and
dropping only as far as this external TPL, rather than dropping
unconditionally to TPL_APPLICATION.
The side effect of this change is that iPXE's view of the current
system time will be frozen for the duration of any API calls made into
iPXE by external code at TPL_CALLBACK or above. Since any such
external code is already responsible for allowing execution at
TPL_APPLICATION to occur, then this should not cause a problem in
practice.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Physical addresses in debug messages are more meaningful from an
end-user perspective than potentially IOMMU-mapped I/O virtual
addresses, and have the advantage of being calculable without access
to the original DMA mapping entry (e.g. when displaying an address for
a single failed completion within a descriptor ring).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For a software UNDI, the addresses in PXE_CPB_TRANSMIT.FrameAddr and
PXE_CPB_RECEIVE.BufferAddr are host addresses, not bus addresses.
Remove the spurious (and no-op) use of virt_to_bus() and replace with
a cast via intptr_t.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification defines PXE_CPB_TRANSMIT.DataLen as excluding
the length of the media header. iPXE currently fills in DataLen as
the whole frame length (including the media header), along with
placing the media header length separately in MediaheaderLen. On some
UNDI implementations (observed using a VMware ESXi 7.0b virtual
machine), this causes transmitted packets to include 14 bytes of
trailing garbage.
Match the behaviour of the EDK2 SnpDxe driver, which fills in DataLen
as the whole frame length (including the media header) and leaves
MediaheaderLen as zero. This behaviour also violates the UEFI
specification, but is likely to work in practice since EDK2 is the
reference implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently assumes that DMA-capable devices can directly address
physical memory using host addresses. This assumption fails when
using an IOMMU.
Define an internal DMA API with two implementations: a "flat"
implementation for use in legacy BIOS or other environments in which
flat physical addressing is guaranteed to be used and all allocated
physical addresses are guaranteed to be within a 32-bit address space,
and an "operations-based" implementation for use in UEFI or other
environments in which DMA mapping may require bus-specific handling.
The purpose of the fully inlined "flat" implementation is to allow the
trivial identity DMA mappings to be optimised out at build time,
thereby avoiding an increase in code size for legacy BIOS builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The malloc_dma() function allocates memory with specified physical
alignment, and is typically (though not exclusively) used to allocate
memory for DMA.
Rename to malloc_phys() to more closely match the functionality, and
to create name space for functions that specifically allocate and map
DMA-capable buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide opened EFI PCI devices with access to the underlying
EFI_PCI_IO_PROTOCOL instance, in order to facilitate the future use of
the DMA mapping methods within the fast data path.
Do not require the use of this stored EFI_PCI_IO_PROTOCOL instance for
memory-mapped I/O (since the entire point of memory-mapped I/O as a
concept is to avoid this kind of unnecessary complexity) or for
slow-path PCI configuration space accesses (since these may be
required for access to PCI bus:dev.fn addresses that do not correspond
to a device bound via our driver binding protocol instance).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The legacy transmit descriptor index is not reset by anything short of
a full device reset. This can cause the legacy transmit ring to stall
after closing and reopening the device, since the hardware and
software indices will be out of sync.
Fix by performing a reset after closing the interface. Do this only
if operating in legacy mode, since in C+ mode the reset is not
required and would undesirably clear additional state (such as the C+
command register itself).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI systems (observed with a Supermicro X11SPG-TF motherboard)
seem to fail to provide a valid ACPI address space descriptor for the
MMIO address space associated with a PCI root bridge.
If no valid descriptor can be found, fall back to assuming that the
MMIO address space is identity mapped, thereby matching the behaviour
prior to commit 27e886c ("[efi] Use address offset as reported by
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL").
Debugged-by: Tore Anderson <tore@fud.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 87e39a9c9 ("[efi] Split efi_usb_path() out to a separate
function") unintentionally introduced an undefined symbol reference
from efi_path.o to usb_depth(), causing the USB subsystem to become a
dependency of all EFI builds.
Fix by converting usb_depth() to a static inline function.
Reported-by: Pico Mitchell <pico@randomapplications.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification allows uninstallation of a protocol interface
to fail. There is no sensible way for code to react to this, since
uninstallation is likely to be taking place on a code path that cannot
itself fail (e.g. a code path that is itself a failure path).
Where the protocol structure exists within a dynamically allocated
block of memory, this leads to possible use-after-free bugs. Work
around this unfortunate design choice by nullifying the protocol
(i.e. overwriting the method pointers with no-ops) and leaking the
memory containing the protocol structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the device path constructed via efi_describe() for the installed
EFI_BLOCK_IO_PROTOCOL device handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification provides a partial definition of an Infiniband
device path structure. Use this structure to construct what may be a
plausible path containing at least some of the information required to
identify an SRP target device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ACPI table contents are typically large and are likely to cause
any preceding error messages to scroll off-screen.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no standard defined for AoE device paths in the UEFI
specification, and it seems unlikely that any standard will be adopted
in future.
Choose to construct an AoE device path using a concatenation of the
network device path and a SATA device path, treating the AoE major and
minor numbers as the HBA port number and port multiplier port number
respectively.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide efi_netdev_path() as a standalone function, to allow for reuse
when constructing child device paths.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow an interface operation to be declared as unused. This will
perform full type-checking and compilation of the implementing method,
without including any code in the resulting object (other than a NULL
entry in the interface operations table).
The intention is to provide a relatively clean way for interface
operation methods to be omitted in builds for which the operation is
not required (such as an operation to describe an object using an EFI
device path, which would not be required in a non-EFI build).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Now that IPv6 is enabled by default for UEFI builds, it is important
that iPXE does not delay unnecessarily in the (still relatively
common) case of a network that lacks IPv6 routers.
Apply the timeout values used for neighbour discovery to the router
discovery process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
IPv6 PXE was included in the UEFI specification over eight years ago,
specifically in version 2.3 (Errata D).
http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_3_D.pdf
When iPXE is being chainloaded from a UEFI firmware performing a PXE
boot in an IPv6 network, it is essential that iPXE supports IPv6 as
well.
I understand that the reason for NET_PROTO_IPV6 being disabled by
default (in src/config/general.h) is that it would cause certain
space-constrained build targets to become too large. However, this
should not be an issue for EFI builds.
It is also worth noting that RFC 6540 makes a clear recommendation
that IPv6 support should not be considered optional.
https://tools.ietf.org/html/rfc6540
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Tore Anderson <tore@fud.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LACP responder reuses the received I/O buffer to construct the
response LACP (or marker) packet. Any received padding will therefore
be unintentionally included within the response.
Truncate the received I/O buffer to the expected length (which is
already defined in a way to allow for future protocol expansion)
before reusing it to construct the response.
Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some external drivers (observed with the UEFI NII driver provided by
an HPE-branded Mellanox ConnectX-3 Pro) seem to cause LACP packets
transmitted by iPXE to be looped back as received packets. Since
iPXE's trivial LACP responder will send one response per received
packet, this results in an immediate LACP packet storm.
Detect looped back LACP packets (based on the received LACP actor MAC
address), and refuse to respond to such packets.
Reported-by: Tore Anderson <tore@fud.no>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When iPXE is downloading a file from an EFI_FILE_PROTOCOL instance
backed by an EFI_BLOCK_IO_PROTOCOL instance provided by the same iPXE
binary (e.g. via a hooked SAN device), then it is possible for step()
to be invoked as a result of the calls into the EFI_BLOCK_IO_PROTOCOL
methods. This can potentially result in efi_local_step() being run
prematurely, before the file has been opened and before the parent
interface has been attached.
Fix by deferring starting the download process until immediately prior
to returning from efi_local_open().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI BIOSes (observed with at least the Insyde UEFI BIOS on a
Microsoft Surface Go) provide a very broken version of the
UsbMassStorageDxe driver that is incapable of binding to the standard
EFI_USB_IO_PROTOCOL instances and instead relies on an undocumented
proprietary protocol (with GUID c965c76a-d71e-4e66-ab06-c6230d528425)
installed by the platform's custom version of UsbCoreDxe.
The upshot is that USB mass storage devices become inaccessible once
iPXE's native USB host controller drivers are loaded.
One possible workaround is to load a known working version of
UsbMassStorageDxe (e.g. from the EDK2 tree): this driver will
correctly bind to the standard EFI_USB_IO_PROTOCOL instances exposed
by iPXE. This workaround is ugly in practice, since it involves
embedding UsbMassStorageDxe.efi into the iPXE binary and including an
embedded script to perform the required "chain UsbMassStorageDxe.efi".
Provide a native USB mass storage driver for iPXE, allowing USB mass
storage devices to be exposed as iPXE SAN devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE will often have multiple drivers available for a USB device. For
example: some USB network devices will support both RNDIS and CDC-ECM,
and any device may be consumed by the fallback "usbio" driver under
UEFI in order to expose an EFI_USB_IO_PROTOCOL instance.
The driver scoring mechanism is used to select a device configuration
based on the availability of drivers for the interfaces exposed in
each configuration.
For the case of RNDIS versus CDC-ECM, this mechanism will always
produce the correct result since RNDIS and CDC-ECM will not exist
within the same configuration and so each configuration will receive a
score based on the relevant driver.
This guarantee does not hold for the "usbio" driver, which will match
against any device. It is a surprising coincidence that the "usbio"
driver seems to usually end up at the tail end of the USB drivers
list, thereby resulting in the expected behaviour.
Guarantee the expected behaviour by explicitly placing the "usbio"
driver at the end of the USB drivers list.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For USB mass storage devices, we do not want to submit more bulk IN
packets than are required for the inbound data, since this will waste
memory.
Allow an upper limit to be specified on each refill attempt. The
endpoint will be refilled to the lower of this limit or the limit
specified by usb_refill_init().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Closing and reopening a USB endpoint will clear any halt status
recorded by the host controller, but may leave the endpoint halted at
the device. This will cause the first packet submitted to the
reopened endpoint to be lost, before the automatic stall recovery
mechanism detects the halt and resets the endpoint.
This is relatively harmless for USB network or HID devices, since the
wire protocols will recover gracefully from dropped packets. Some
protocols (e.g. for USB mass storage devices) assume zero packet loss
and so would be adversely affected.
Fix by allowing any device endpoint halt status to be cleared on a
freshly opened endpoint.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There appears to be no reason for avoiding recursion when calling
ConnectController(), and recursion provides the least surprising
behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE is already capable of loading EFI drivers on demand (via
e.g. "chain UsbMassStorageDxe.efi") but there is currently no way to
trigger connection of the driver to any preexisting handles.
Add an explicit call to (re)connect all drivers after successfully
loading an image with a code type that indicates a boot services
driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A zero divisor will currently lead to a 16-bit integer overflow when
calculating the transmit padding, and a potential division by zero if
assertions are enabled.
Avoid these problems by treating a divisor value of zero as equivalent
to a divisor value of one (i.e. no alignment requirements).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The length as returned by UsbGetSupportedLanguages() should not
include the length of the descriptor header itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow temporary debugging code to call efi_wrap_systab() to obtain a
pointer to the wrapper EFI system table. This can then be used to
e.g. forcibly overwrite the boot services table pointer used by an
already loaded and running UEFI driver, in order to trace calls made
by that driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The call to UninstallMultipleProtocolInterfaces() will implicitly
disconnect any relevant controllers, and there is no specified
requirement to explicitly call DisconnectController() prior to
callling UninstallMultipleProtocolInterfaces().
However, some UEFI implementations (observed with the USB keyboard
driver on a Microsoft Surface Go) will fail to implicitly disconnect
the controller and will consequently fail to uninstall the protocols.
The net effect is that unplugging and replugging a USB keyboard may
leave the keyboard in a non-functional state.
Work around these broken UEFI implementations by including an
unnecessary call to DisconnectController() before the call to
UninstallMultipleProtocolInterfaces().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI USB drivers (e.g. the UsbKbDxe driver in EDK2) will react to
a reported EFI_USB_ERR_STALL by attempting to clear the endpoint halt.
This is redundant with iPXE's EFI_USB_IO_PROTOCOL implementation,
since endpoint stalls are cleared automatically by the USB core as
needed.
The UEFI USB driver's attempt to clear the endpoint halt can introduce
an unwanted 5 second delay per endpoint if the USB error was the
result of a device being physically removed, since the control
transfer will always time out.
Fix by reporting all USB errors as EFI_USB_ERR_SYSTEM instead of
EFI_USB_ERR_STALL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI USB drivers (observed with the keyboard driver on a
Microsoft Surface Go) will react to an asynchronous USB transfer
failure by terminating the transfer from within the completion
handler. This closes the USB endpoint and, in the current
implementation, frees the containing structure.
This can lead to use-after-free bugs after the UEFI USB driver's
completion handler returns, since the calling code in iPXE expects
that a completion handler will not perform a control-flow action such
as terminating the transfer.
Fix by leaving the USB endpoint structure allocated until the device
is finally removed, as is already done (as an optimisation) for
control and bulk transfers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current error handling mechanism defers the endpoint reset until
the next use of the endpoint, on the basis that errors are detected
during completions and completion handling should not recursively call
usb_poll().
In the case of usb_control(), we are already at the level that calls
usb_poll() and can therefore safely perform the endpoint reset
immediately. This has no impact on functionality, but does make
debugging traces easier to read since the reset will appear
immediately after the causative error.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Retrieve the address windows and translation offsets for the
appropriate PCI root bridge and use them to adjust the PCI BAR address
prior to calling ioremap().
Originally-implemented-by: Pankaj Bansal <pankaj.bansal@nxp.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define pci_ioremap() as a wrapper around ioremap() that could allow
for a non-zero address translation offset.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Older versions of gcc (observed with gcc 4.5.3) require attributes to
be specified on the first declaration of a symbol, and will silently
ignore attributes specified after the initial declaration. This
causes the ASN.1 OID-identified algorithms to end up misaligned.
Fix by adding __asn1_algorithm to the initial declarations in asn1.h.
Debugged-by: Dentcho Bankov <dbankov@vmware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use a heuristic to determine whether or not to request
cable detection in PXE_OPCODE_INITIALIZE, based on the need to work
around a known Emulex driver bug (see commit c0b61ba "[efi] Work
around bugs in Emulex NII driver") and the need to accommodate links
that are legitimately slow to come up (see commit 6324227 "[efi] Skip
cable detection at initialisation where possible").
This heuristic appears to fail with newer Emulex drivers. Attempt to
support all known drivers (past and present) by first attempting
initialisation with cable detection, then falling back to attempting
initialisation without cable detection.
Reported-by: Kwang Woo Lee <kwleeyh@gmail.com>
Tested-by: Kwang Woo Lee <kwleeyh@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The file:/ URI syntax may be used to refer to local files on the
filesystem from which the iPXE binary was loaded. This is currently
implemented by directly using the DeviceHandle recorded in our
EFI_LOADED_IMAGE_PROTOCOL.
This mechanism will fail when a USB-enabled build of iPXE is loaded
from USB storage and subsequently installs its own USB host controller
drivers, since doing so will disconnect and reconnect the existing USB
storage drivers and thereby invalidate the original storage device
handle.
Fix by recording the device path for the loaded image's DeviceHandle
at initialisation time and later using the recorded device path to
locate the appropriate device handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The various USB specifications all use one-based numbering for ports.
This scheme is applied consistently across the various relevant
specifications, covering both port numbers that appear on the wire
(i.e. downstream hub port numbers) and port numbers that exist only
logically (i.e. root hub port numbers).
The UEFI specification is ambiguous about the port numbers as used for
the ParentPortNumber field within a USB_DEVICE_PATH structure. As of
UEFI specification version 2.8 errata B:
- section 10.3.4.5 just states "USB Parent Port Number" with no
indication of being zero-based or one-based
- section 17.1.1 notes that for the EFI_USB2_HC_PROTOCOL, references
to PortNumber parameters are zero-based for root hub ports
- section 17.1.1 also mentions a TranslatorPortNumber used by
EFI_USB2_HC_PROTOCOL, with no indication of being zero-based or
one-based
- there are no other mentions of USB port numbering schemes.
Experimentation and inspection of the EDK2 codebase reveals that at
least the EDK2 reference implementation will use zero-based numbering
for both root and non-root hub ports when populating a USB_DEVICE_PATH
structure (though will inconsistently use one-based numbering for the
TranslatorPortNumber parameter).
Use zero-based numbering for both root and non-root hub ports when
constructing a USB_DEVICE_PATH in order to match the behaviour of the
EDK2 implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This change fixes the offset used when retrieving the iPXE stack
pointer after a COM32 binary returns. The iPXE stack pointer is saved
at the top of the available memory then the the top of the stack for
the COM32 binary is set just below it. However seven more items are
pushed on the COM32 stack before the entry point is invoked so when
the COM32 binary returns the location of the iPXE stack pointer is 28
(and not 24) bytes above the current stack pointer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to the latest UEFI specification (Version 2.8 Errata B)
p. 7.2:
"Buffer: A pointer to a pointer to the allocated buffer if the call
succeeds; undefined otherwise."
So implementations are obliged neither to return NULL, if the
allocation fails, nor to preserve the contents of the pointer.
Make the logic more reliable by checking the status code from
AllocatePool() instead of checking the returned pointer for NULL
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At the moment '\s*' is silently interpreted as just 's*', but in the
future it will be an error:
sed: 1: "s/\.o\s*:/_DEPS +=/": RE error: trailing backslash (\)
cf. https://bugs.freebsd.org/229925
Signed-off-by: Tobias Kortkamp <t@tobik.me>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split debug message since eth_ntoa() uses a static result buffer.
Originally-fixed-by: Michael Bazzinotti <bazz@bazz1.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix memcmp() to return proper standard positive/negative values for
unequal comparisons. Current implementation is backwards (i.e. the
functions are returning negative when should be positive and
vice-versa).
Currently most consumers of these functions only check the return value
for ==0 or !=0 and so we can safely change the implementation without
breaking things.
However, there is one call that checks the polarity of this function,
and that is prf_sha1() for wireless WPA 4-way handshake. Due to the
incorrect memcmp() polarity, the WPA handshake creates an incorrect
PTK, and the handshake would fail after step 2. Undoubtedly, the AP
noticed the supplicant failed the mic check. This commit fixes that
issue.
Similar to commit 3946aa9 ("[libc] Fix strcmp()/strncmp() to return
proper values").
Signed-off-by: Michael Bazzinotti <bazz@bazz1.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This caused iPXE to reject images even when enough memory was
available.
Signed-off-by: David Decotigny <ddecotig@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gensdsk currently creates a syslinux.cfg file that is invalid if the
filename ends in lkrn. Fix by setting the default target to label($b)
instead of filename($g).
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When no response is obtained from the first configured DNS server,
fall back to attempting the other configured servers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All implemented socket openers provide definitions for both IPv4 and
IPv6 using exactly the same opener method. Simplify the logic by
omitting the address family from the definition.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Claiming the SNP devices has the side effect of raising the TPL to
iPXE's normal operating level of TPL_CALLBACK (see the commit message
for c89a446 ("[efi] Run at TPL_CALLBACK to protect against UEFI
timers") for details). This must happen before executing any code
that relies upon the TPL having been raised to TPL_CALLBACK.
The call to efi_snp_claim() in efi_download_start() currently happens
only after the call to xfer_open(). Calling xfer_open() will
typically result in a retry timer being started, which will result in
a call to currticks() in order to initialise the timer. The call to
currticks() will drop to TPL_APPLICATION and restore to TPL_CALLBACK
in order to allow a timer tick to occur. Since this call happened
before the call to efi_snp_claim(), the restored TPL is incorrect.
This in turn results in efi_snp_claim() recording the incorrect
original TPL, causing efi_snp_release() to eventually restore the
incorrect TPL, causing the system to lock up when ExitBootServices()
is called at TPL_CALLBACK.
Fix by moving the call to efi_snp_claim() to the start of
efi_download_start().
Debugged-by: Jarrod Johnson <jjohnson2@lenovo.com>
Debugged-by: He He4 Huang <huanghe4@lenovo.com>
Debugged-by: James Wang <jameswang@ami.com.tw>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The NUL byte included within the stack cookie to act as a string
terminator should be placed at the lowest byte address within the
stack cookie, in order to avoid potentially including the stack cookie
value within an accidentally unterminated string.
Suggested-by: Pete Beck <pete.beck@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several of the values used to compute a stack cookie (in the absence
of a viable entropy source) will tend to have either all-zeroes or
all-ones in the higher order bits. Rotate the values in order to
distribute the (minimal) available entropy more evenly.
Suggested-by: Pete Beck <pete.beck@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the bit rotation implementations to use a common macro, and
add roll() and rorl() to handle unsigned long values.
Each function will still compile down to a single instruction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only remaining use case in iPXE for the CPU direction flag is in
__memcpy_reverse() where it is set to allow the use of "rep movsb" to
perform the memory copy. This matches the equivalent functionality in
the EDK2 codebase, which has functions such as InternalMemCopyMem that
also temporarily set the direction flag in order to use "rep movsb".
As noted in commit d2fb317 ("[crypto] Avoid temporarily setting
direction flag in bigint_is_geq()"), some UEFI implementations are
known to have buggy interrupt handlers that may reboot the machine if
a timer interrupt happens to occur while the direction flag is set.
Work around these buggy UEFI implementations by using the
(unoptimised) generic_memcpy_reverse() on i386 or x86_64 UEFI
platforms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification states that the calling convention for IA-32
and x64 includes "Direction flag in EFLAGS is clear". This
specification covers only the calling convention used at the point of
calling functions annotated with EFIAPI. The specification explicitly
states that other functions (such as private functions or static
library calls) are not required to follow the UEFI calling
conventions.
The reference EDK2 implementation follows this specification. In
particular, the EDK2 interrupt handlers will clear the direction flag
before calling any EFIAPI functions, and will restore the direction
flag when returning from the interrupt handler. Some EDK2 private
library functions (most notably InternalMemCopyMem) may set the
direction flag temporarily in order to make efficient use of CPU
string operations.
The current implementation of iPXE's bigint_is_geq() for i386 and
x86_64 will similarly set the direction flag temporarily in order to
make efficient use of CPU string operations.
On some UEFI implementations (observed with a Getac RX10 tablet), a
timer interrupt that happens to occur while the direction flag is set
will reboot the machine. This very strongly indicates that the UEFI
timer interrupt handler is failing to clear the direction flag before
performing an affected operation (such as copying a block of memory).
Work around such buggy UEFI implementations by rewriting
bigint_is_geq() to avoid the use of string operations and so obviate
the requirement to temporarily set the direction flag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A failure in device registration (e.g. due to a device with malformed
descriptors) will currently result in the port being disabled as part
of the error path. This in turn causes the hardware to detect the
device as newly connected, leading to an endless loop of failed device
registrations.
Fix by leaving the port enabled in the case of a registration failure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When connected to a USB3 port, the AX88179 seems to have an
approximately 50% chance of producing a USB transaction error on each
of its three endpoints after being closed and reopened. The root
cause is unclear, but rewriting the USB device configuration value
seems to clear whatever internal error state has accumulated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Experimentation shows that the existing 20ms delay is insufficient,
and often results in device detection being deferred until after iPXE
has completed startup.
Fix by increasing the delay to 100ms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The driver-private data for root hubs is already set immediately after
allocating the USB bus. There seems to be no reason to set it again
when opening the root hub.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "disabled" port states for USB2 and USB3 are not directly
equivalent. In particular, a disabled USB3 port will not detect new
device connections. The result is that a USB3 device disconnected
from and reconnected to an xHCI root hub port will end up reconnecting
as a USB2 device.
Fix by setting the link state to RxDetect after disabling the port, as
is already done during initialisation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB3 specification removes PORT_ENABLE from the list of features
that may be cleared via a CLEAR_FEATURE request. Experimentation
shows that omitting the attempt to clear PORT_ENABLE seems to result
in the correct hotplug behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Resetting the host endpoint may immediately restart any pending
transfers for that endpoint. If the device endpoint halt has not yet
been cleared, then this will probably result in a second failed
transfer.
This second failure may be detected within usb_endpoint_reset() while
waiting for usb_clear_feature() to complete. The endpoint will
subsequently be removed from the list of halted endpoints, causing the
second failure to be effectively ignored and leaving the host endpoint
in a permanently halted state.
Fix by deferring the host endpoint reset until after the device
endpoint is ready to accept new transfers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ASIX USB NICs are capable of autodetecting the Ethernet link speed
and reporting it via PLSR but will not automatically update the
relevant GM and PS bits in MSR. The result is that a non-gigabit link
will fail to send or receive any packets.
The interrupt endpoint used to report link state includes the values
of the PHY BMSR and LPA registers. These are not sufficient to
differentiate between 100Mbps and 1000Mbps, since the LPA_NPAGE bit
does not necessarily indicate that the link partner is advertising
1000Mbps.
Extend axge_check_link() to write the MSR value based on the link
speed read from PLSR, and simplify the interrupt endpoint handler to
merely trigger a call to axge_check_link().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As per commit c89a446 ("[efi] Run at TPL_CALLBACK to protect against
UEFI timers") we expect to run at TPL_CALLBACK almost all of the time.
Various code paths rely on this assumption. Code paths that need to
temporarily lower the TPL (e.g. for entropy gathering) will restore it
to TPL_CALLBACK.
The entropy gathering code will be run during DRBG initialisation,
which happens during the call to startup(). In the case of iPXE
compiled as an EFI application this code will run within the scope of
efi_snp_claim() and so will execute at TPL_CALLBACK as expected.
In the case of iPXE compiled as an EFI driver the code will
incorrectly run at TPL_APPLICATION since there is nothing within the
EFI driver entry point that raises (and restores) the TPL. The net
effect is that a build that includes the entropy-gathering code
(e.g. a build with HTTPS enabled) will return from the driver entry
point at TPL_CALLBACK, which causes a system lockup.
Fix by raising and restoring the TPL within the EFI driver entry
point.
Debugged-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_RNG_PROTOCOL on the Microsoft Surface Go does not generate
random numbers. Successive calls to GetRNG() without any intervening
I/O operations (such as writing to the console) will produce identical
results. Successive reboots will produce identical results.
It is unclear what the Microsoft Surface Go is attempting to use as an
entropy source, but it is demonstrably producing zero bits of entropy.
The failure is already detected by the ANS-mandated Repetition Count
Test performed as part of our GetEntropy implementation. This
currently results in the entropy source being marked as broken, with
the result that iPXE refuses to perform any operations that require a
working entropy source.
We cannot use the existing EFI driver blacklisting mechanism to unload
the broken driver, since the RngDxe driver is integrated into the
DxeCore image.
Work around the broken driver by checking for consecutive identical
results returned by EFI_RNG_PROTOCOL and falling back to the original
timer-based entropy source.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc (observed with the cross-compiling gcc 9.3.0 in
Ubuntu 20.04) default to enabling -fPIE. Experimentation shows that
this results in the emission of R_AARCH64_ADR_GOT_PAGE relocation
records for __stack_chk_guard. These relocation types are not
supported by elf2efi.c.
Fix by explicitly disabling position-independent code for ARM64 EFI
builds.
Debugged-by: Antony Messerli <antony@mes.ser.li>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
GCC 10 emits warnings for implicit conversions of enumerated types.
The flexboot_nodnic code defines nodnic_queue_pair_type with values
identical to those of ib_queue_pair_type, and implicitly casts between
them. Add an explicit cast to fix the warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
GCC 10 produces a spurious warning about an out-of-bounds array access
for the unsized raw dword array in union intelvf_msg.
Avoid the warning by embedding the zero-length array within a struct.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gcc10 switched default behavior from -fcommon to -fno-common. Since
"__shared" relies on the legacy behavior, explicitly specify it.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Various implementation quirks in OCSP servers make it impractical to
use anything other than SHA1 to construct the issuerNameHash and
issuerKeyHash identifiers in the request certID. For example: both
the OpenCA OCSP responder used by ipxe.org and the Boulder OCSP
responder used by LetsEncrypt will fail if SHA256 is used in the
request certID.
As of commit 6ffe28a ("[ocsp] Accept response certID with missing
hashAlgorithm parameters") we rely on asn1_digest_algorithm() to parse
the algorithm identifier in the response certID. This will fail if
SHA1 is disabled via config/crypto.h.
Fix by using a direct ASN.1 object comparison on the OID within the
algorithm identifier.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable -fstack-protector for EFI builds, where binary size is less
critical than for BIOS builds.
The stack cookie must be constructed immediately on entry, which
prohibits the use of any viable entropy source. Construct a cookie by
XORing together various mildly random quantities to produce a value
that will at least not be identical on each run.
On detecting a stack corruption, attempt to call Exit() with an
appropriate error. If that fails, then lock up the machine since
there is no other safe action that can be taken.
The old conditional check for support of -fno-stack-protector is
omitted since this flag dates back to GCC 4.1.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some devices (observed with a Getac RX10 tablet and docking station
containing an embedded AX88179 USB NIC) seem to be capable of
detecting link state only during the call to Initialize(), and will
occasionally erroneously report that the link is down for the first
few such calls.
Work around these devices by retrying the Initialize() call multiple
times, terminating early if a link is detected. The eventual absence
of a link is treated as a non-fatal error, since it is entirely
possible that the link really is down.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Disable the use of MD5 as an OID-identifiable algorithm. Note that
the MD5 algorithm implementation will still be present in the build,
since it is used implicitly by various cryptographic components such
as HTTP digest authentication; this commit removes it only from the
list of OID-identifiable algorithms.
It would be appropriate to similarly disable the use of SHA-1 by
default, but doing so would break the use of OCSP since several OCSP
responders (including the current version of openca-ocspd) are not
capable of interpreting the hashAlgorithm field and so will fail if
the client uses any algorithm other than the configured default.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There are many ways in which the object for a cryptographic algorithm
may be included, even if not explicitly enabled in config/crypto.h.
For example: the MD5 algorithm is required by TLSv1.1 or earlier, by
iSCSI CHAP authentication, by HTTP digest authentication, and by NTLM
authentication.
In the current implementation, inclusion of an algorithm for any
reason will result in the algorithm's ASN.1 object identifier being
included in the "asn1_algorithms" table, which consequently allows the
algorithm to be used for any ASN1-identified purpose. For example: if
the MD5 algorithm is included in order to support HTTP digest
authentication, then iPXE would accept a (validly signed) TLS
certificate using an MD5 digest.
Split the ASN.1 object identifiers into separate files that are
required only if explicitly enabled in config/crypto.h. This allows
an algorithm to be omitted from the "asn1_algorithms" table even if
the algorithm implementation is dragged in for some other purpose.
The end result is that only the algorithms that are explicitly enabled
in config/crypto.h can be used for ASN1-identified purposes such as
signature verification.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The supported ciphers and digest algorithms may already be specified
via config/crypto.h. Extend this to allow a minimum TLS protocol
version to be specified.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some platforms (observed with an AMI BIOS on an Apollo Lake system)
will spuriously fail the call to ConnectController() when the UEFI
network stack is disabled. This appears to be a BIOS bug that also
affects attempts to connect any non-iPXE driver to the NIC controller
handle via the UEFI shell "connect" utility.
Work around this BIOS bug by falling back to calling our
efi_driver_start() directly if the call to ConnectController() fails.
This bypasses any BIOS policy in terms of deciding which driver to
connect but still cooperates with the UEFI driver model in terms of
handle ownership, since the use of EFI_OPEN_PROTOCOL_BY_DRIVER ensures
that the BIOS is aware of our ownership claim.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The URI parsing code for "host[:port]" checks that the final character
is not ']' in order to allow for IPv6 literals. If the entire
"host[:port]" portion of the URL is an empty string, then this will
access the preceding character. This does not result in accessing
invalid memory (since the string is guaranteed by construction to
always have a preceding character) and does not result in incorrect
behaviour (since if the string is empty then strrchr() is guaranteed
to return NULL), but it does make the code confusing to read.
Fix by inverting the order of the two tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As described in the previous commit, work around a UEFI specification
bug that necessitates calling UnloadImage if the return value from
LoadImage is EFI_SECURITY_VIOLATION.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently assumes that any error returned from LoadImage()
indicates that the image was not loaded. This assumption was correct
at the time the code was written and remained correct for UEFI
specifications up to and including version 2.1.
In version 2.3, the UEFI specification broke API and ABI compatibility
by defining that a return value of EFI_SECURITY_VIOLATION would now
indicate that the image had been loaded and a valid image handle had
been created, but that the image should not be started.
The wording in version 2.2 is ambiguous, and does not define whether
or not a return value of EFI_SECURITY_VIOLATION indicates that a valid
image handle has been created.
Attempt to work around all of these incompatible and partially
undefined APIs by calling UnloadImage if we get a return value of
EFI_SECURITY_VIOLATION. Minimise the risk of passing an uninitialised
pointer to UnloadImage by setting ImageHandle to NULL prior to calling
LoadImage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the size of the USB disk image in the common case that
CONSOLE_INT13 is not enabled.
Originally-implemented-by: Romain Guyard <romain.guyard@mujin.co.jp>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Eliminate an unnecessary variable-length stack allocation and memory
copy by allowing TFTP option processors to modify the option string
in-place.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove clone depth limit, to ensure that the most recent tag (from
which the version should be constructed) is always present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-01-03 00:14:03 +01:00
1153 changed files with 116959 additions and 32956 deletions