Hybrid bzImage and UEFI binaries (such as wimboot) may place sections
at explicit offsets within the PE file, as described in commit b30a098
("[efi] Use load memory address as file offset for hybrid binaries").
This can leave a gap after the PE headers that is not covered by any
section. It is not entirely clear whether or not such gaps are
permitted in binaries submitted for Secure Boot signing.
To minimise potential problems, extend the PE header size to cover any
space before the first explicitly placed section.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE images are linked with a starting virtual address of zero. Other
images (such as wimboot) may use a non-zero starting virtual address.
There is no direct equivalent of the PE ImageBase address field within
ELF object files. Choose to use the highest possible address that
accommodates all sections and the PE header itself, since this will
minimise the memory allocated to hold the loaded image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The BaseOfCode (and, in PE32, BaseOfData) fields imply an assumption
that binaries are laid out as code followed by initialised data
followed by uninitialised data. This assumption may not be valid for
complex binaries such as wimboot.
Remove this implicit assumption, and use arguably justifiable values
for the assorted summary start and size fields within the PE headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
sections such as .bss16 that do not need to consume an entry in the PE
section list. Treat any such sections as hidden.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PE debug information generated by elf2efi is used only to hold the
image filename, and the debug information is located via the relevant
data directory entry rather than via the section table.
Make the .debug section a hidden section in order to save one entry in
the PE section list. Choose to place the debug information in the
unused space at the end of the PE headers, since it no longer needs to
satisfy the general section alignment constraints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 1e4c378 ("[efi] Shrink size of data directory in PE header")
reduced the number of entries used in the data directory and reduced
the recorded size of the NT "optional" header, but did not also adjust
the recorded overall size of the PE headers, resulting in unused space
between the PE headers and the first section.
Fix by reducing the initial recorded size of the PE headers by the
size of the omitted data directory entries.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) include a bzImage
header within a section starting at offset zero, with the PE header
effectively occupying unused space within this section.
Allow for this by treating a section placed at offset zero as hidden,
and by deferring the writing of the PE header until after the output
sections have been written.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) may be loaded as a
single contiguous blob without reference to the PE headers, and the
placement of sections within the PE file must therefore be known at
link time.
Use the load memory address (extracted from the ELF program headers)
to determine the physical placement of the section within the PE file
when generating a hybrid binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The images generated by elf2efi can be loaded anywhere in the address
space, and are not limited to the low 2GB.
Indicate this by setting the "large address aware" flag within the PE
header, for compatibility with EFI images generated by the EDK2 build
process. (The EDK2 PE loader does not ever check this flag, and it is
unlikely that any other EFI PE loader ever does so, but we may as well
report it accurately.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Indicate that the binary is compatible with W^X protections by setting
the NXCOMPAT bit in the DllCharacteristics field of the PE header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
executable code that is opaque data from the perspective of a UEFI PE
binary, as described in wimboot commit fe456ca ("[efi] Use separate
.text and .data PE sections").
The ELF section will be marked as containing both executable code and
writable data. Choose to treat such a section as a data section
rather than a code section, since that matches the expected semantics
for ELF files that we expect to process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EAP exchanges may take a long time to reach a final status, especially
when relying upon MAC Authentication Bypass (MAB). Our current
behaviour of sending EAPoL-Start every few seconds until a final
status is obtained can prevent these exchanges from ever completing.
Fix by redefining the EAP supplicant state to allow EAPoL-Start to be
suppressed: either temporarily (while waiting for a full EAP exchange
to complete, in which case we need to eventually resend EAPoL-Start if
the final Success or Failure packet is lost), or permanently (while
waiting for the potentially very long MAC Authentication Bypass
timeout period).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PCI cloud API (PCIAPI_CLOUD) currently selects the first PCI API
that successfully discovers a PCI device address range. The ECAM API
may discover an address range but subsequently be unable to map the
configuration space region, which would result in the selected PCI API
being unusable.
Fix by instead selecting the first PCI API that can be successfully
used to discover a PCI device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some machines (observed with an AWS EC2 m7a.large instance) will place
the ECAM configuration space window above 4GB, thereby making it
unreachable from non-paged 32-bit code. This problem is currently
ignored by iPXE, since the address is silently truncated in the call
to ioremap(). (Note that other uses of ioremap() are not affected
since the PCI core will already have checked for unreachable 64-bit
BARs when retrieving the physical address to be mapped.)
Fix by adding an explicit check that the region to be mapped starts
within the reachable memory address space. (Assume that no machines
will be sufficiently peverse to provide a region that straddles the
4GB boundary.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an error occurs during ECAM configuration space mapping, preserve
the error within the existing cached mapping (instead of invalidating
the cached mapping) in order to avoid flooding the debug log with
repeated identical mapping errors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The base address provided in the PCI ECAM allocation within the ACPI
MCFG table is the base address for the segment as a whole, not for the
starting bus within that allocation. On machines that provide ECAM
allocations with a non-zero starting bus number (observed with an AWS
EC2 m7a.large instance), this will result in iPXE accessing the wrong
memory addresses within the ECAM region.
Fix by adding the appropriate starting bus offset to the base address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PCIe specification requires that "processor and host bridge
implementations must ensure that a method exists for the software to
determine when the write using the ECAM is completed by the completer"
but does not specify any particular method to be used. Some platforms
might treat writes to the ECAM region as non-posted, others might
require reading back from a dedicated (and implementation-specific)
completion register to determine when the configuration space write
has completed.
Since PCI configuration space writes will never be used for any
performance-critical datapath operations (on any sane hardware), a
simple and platform-independent solution is to always read back from
the written register in order to guarantee that the write must have
completed. This is safe to do, since the PCIe specification defines a
limited set of configuration register types, none of which have read
side effects.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ipair_tx() function uses a va_list twice (first to calculate the
formatted string length before allocation, then to construct the
string in the allocated buffer) but is missing the va_start() and
va_end() around the second usage. This is undefined behaviour that
happens to work on some build platforms.
Fix by adding the missing va_start() and va_end() around the second
usage of the variadic argument list.
Reported-by: Andreas Hammarskjöld <andreas@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the number of timer ticks since power-on as a seed
for the non-cryptographic RNG implemented by random(). Since iPXE is
often executed directly after power-on, and since the timer tick
resolution is generally low, this can often result in identical seed
values being used on each cold boot attempt.
As of commit 41f786c ("[settings] Add "unixtime" builtin setting to
expose the current time"), the current wall-clock time is always
available within the default build of iPXE. Use this time instead, to
introduce variability between cold boot attempts on the same host.
(Note that variability between different hosts is obtained by using
the MAC address as an additional seed value.)
This has no effect on the separate DRBG used by cryptographic code.
Suggested-by: Heiko <heik0@xs4all.nl>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We have no way to force a link-layer restart in iPXE, and therefore no
way to explicitly trigger a restart of EAP authentication. If an iPXE
script has performed some action that requires such a restart
(e.g. registering a device such that the port VLAN assignment will be
changed), then the only means currently available to effect the
restart is to reboot the whole system. If iPXE is taking over a
physical link already used by a preceding bootloader, then even a
reboot may not work.
In the EAP model, the supplicant is a pure responder and never
initiates transmissions. EAPoL extends this to include an EAPoL-Start
packet type that may be sent by the supplicant to (re)trigger EAP.
Add support for sending EAPoL-Start packets at two-second intervals on
links that are open and have reached physical link-up, but for which
EAP has not yet completed. This allows "ifclose ; ifopen" to be used
to restart the EAP process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the EAP model to include a record of whether or not EAP
authentication has completed (successfully or otherwise), and to
provide a method for transmitting EAP responses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the FCoE code by using driver-private data to hold the FCoE
port for each network device, instead of using a separate allocation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the per-netdevice GuestInfo settings code by using
driver-private data to hold the settings block, instead of using a
separate allocation.
The settings block (if existent) will be automatically unregistered
when the parent network device settings block is unregistered, and no
longer needs to be separately freed. The guestinfo_net_remove()
function may therefore be omitted completely.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the IPv6 link-local settings code by using driver-private
data to hold the settings block, instead of using a separate
allocation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the LLDP code by using driver-private data to hold the LLDP
settings block, instead of using a separate allocation. This avoids
the need to maintain a list of LLDP settings blocks (since the LLDP
settings block pointer can always be obtained using netdev_priv()) and
obviates several failure paths.
Any recorded LLDP data is now freed when the network device is
unregistered, since there is no longer a dedicated reference counter
for the LLDP settings block. To minimise surprise, we also now
explicitly unregister the settings block. This is not strictly
necessary (since the block will be automatically unregistered when the
parent network device settings block is unregistered), but it
maintains symmetry between lldp_probe() and lldp_remove().
The overall reduction in the size of the LLDP code is around 15%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow network upper-layer drivers (such as LLDP, which attaches to
each network device in order to provide a corresponding LLDP settings
block) to specify a size for private data, which will be allocated as
part of the network device structure (as with the existing private
data allocated for the underlying device driver).
This will allow network upper-layer drivers to be simplified by
omitting memory allocation and freeing code. If the upper-layer
driver requires a reference counter (e.g. for interface
initialisation), then it may use the network device's existing
reference counter, since this is now the reference counter for the
containing block of memory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some network device drivers use the trivial netdev_priv() helper
function while others use the netdev->priv pointer directly.
Standardise on direct use of netdev->priv, in order to free up the
function name netdev_priv() for reuse.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use "push $1f" within inline assembly to push the address
of the real-mode code fragment, relying on the assembler to treat this
as "pushl" for 32-bit code or "pushq" for 64-bit code.
As of binutils commit 5cc0077 ("x86: further adjust extend-to-32bit-
address conditions"), first included in binutils-2.41, this implicit
operand size is no longer calculated as expected and 64-bit builds
will fail with
Error: operand size mismatch for `push'
Fix by adding an explicit operand size to the "push" instruction.
Originally-fixed-by: Justin Cano <jstncno@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current implementation of vpm_ioread32() erroneously reads only 16
bits of data, which fails when used with the (stricter) virtio device
emulation in VirtualBox.
Fix by using the correct readl()/inl() I/O wrappers.
Reworded-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define the IPv4 NTP server setting to simplify the use of a
DHCP-provided NTP server in scripts, using e.g.
#!ipxe
dhcp
ntp ${ntp}
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 3ef4f7e ("[console] Avoid overlap between special keys and
Unicode characters") renumbered the special key encoding to avoid
collisions with Unicode key values outside the ASCII range. This
change broke backwards compatibility with existing scripts that
specify key values using e.g. "prompt --key" or "menu --key".
Restore compatibility with existing scripts by tweaking the special
key encoding so that the relative key value (i.e. the delta from
KEY_MIN) is numerically equal to the old pre-Unicode key value, and by
modifying parse_key() to accept a relative key value.
Reported-by: Sven Dreyer <sven@dreyer-net.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid the need to always specify a local MAC address on the command
line by setting a default hardware MAC address (using the same default
address as for slirp devices).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A running link block timer holds a reference to the network device and
will prevent it from being freed until the timer expires. It is
impossible for free_netdev() to be called while the timer is still
running: the call to stop_timer() therein is therefore a no-op.
Stop the link block timer when the device is closed, to allow a
link-blocked device to be freed immediately upon unregistration of the
device. (Since link block state is updated in response to received
packets, the state is effectively undefined for a closed device: there
is therefore no reason to leave the timer running.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The interface debug message values constructed by INTF_DBG() et al
rely on the interface being embedded within a containing object. This
assumption is not valid for the temporary outbound-only interfaces
constructed on the stack by intf_shutdown() and xfer_vredirect().
Formalise the notion of a temporary outbound-only interface as having
a NULL interface descriptor, and overload the "original interface
descriptor" field to contain a pointer to the original interface that
the temporary interface is shadowing.
Originally-fixed-by: Vincent Fazio <vfazio@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The special key range (from KEY_MIN upwards) currently overlaps with
the valid range for Unicode characters, and therefore prohibits the
use of Unicode key values outside the ASCII range.
Create space for Unicode key values by moving the special keys to the
range immediately above the maximum valid Unicode character. This
allows the existing encoding of special keys as an efficiently packed
representation of the equivalent ANSI escape sequence to be maintained
almost as-is.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The keyboard remapping flags currently occupy bits 8 and upwards of
the to-be-mapped character value. This overlaps the range used for
special keys (KEY_MIN and upwards) and also overlaps the valid Unicode
character range.
No conflict is created by this overlap, since by design only ASCII
character values (as generated by an ASCII-only keyboard driver) are
subject to remapping, and so the to-be-remapped character values exist
in a conceptually separate namespace from either special keys or
non-ASCII Unicode characters. However, the overlap is potentially
confusing for readers of the code.
Minimise cognitive load by using bits 24 and upwards for the keyboard
remapping flags.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of ld will complain that the automatically created (and
unused by our build process) ELF program headers include a "LOAD
segment with RWX permissions".
Silence this warning by adding "-z separate-code" to the linker
options, where supported.
For BIOS builds, where the prefix will generally require writable
access to its own (tiny) code segment, simply inhibit the warning
completely via "--no-warn-rwx-segments".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Multiple target patterns in pattern rules are treated as grouped
targets regardless of the separator character. Newer verions of make
will generate "warning: pattern recipe did not update peer target" to
warn that the rule was expected to update all of the (implicitly)
grouped targets.
Fix by splitting all multiple target pattern rules into single target
pattern rules.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no common standard for I/O-space access for non-x86 CPU
families, and non-MMIO peripherals are vanishingly rare.
Generalise the existing ARM definitions for dummy PIO to allow for
reuse by other CPU architectures.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PCI I/O API (supporting accesses to PCI configuration space) is
not related to the general I/O API (supporting accesses to
memory-mapped I/O peripherals).
Remove the spurious inclusion of ipxe/io.h from the PCI I/O header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
While not guaranteed by the UEFI specification, the enumeration of
handles, protocols, and openers will generally return results in order
of creation. Processing these objects in reverse order (as is already
done when calling DisconnectController() on the list of all handles)
will generally therefore perform the forcible uninstallation
operations in reverse order of object creation, which minimises the
number of implicit operations performed (e.g. when disconnecting a
controller that itself still has existent child controllers).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification states that the AgentHandle may be either the
driving binding protocol handle or the image handle.
Check for both handles when searching for stale handles to be forcibly
closed on behalf of a vetoed driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In most cases, the driver handle will be the image handle itself.
However, this is not required by the UEFI specification, and some
images will install multiple driver binding handles.
Use the image handle (extracted from the driver binding protocol
instance) when attempting to unload the driver's image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Pass the driver binding handle, the driver binding protocol instance,
the image handle, and the loaded image protocol instance to all veto
methods.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the process of adding new entries to the veto list by
including the manufacturer name within the standard debug output.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Polling for TX completions is arguably redundant when there are no
transmissions currently in progress. Commit c6c7e78 ("[efi] Poll for
TX completions only when there is an outstanding TX buffer") switched
to setting the PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS flag only when
there is an in-progress transmission awaiting completion, in order to
reduce reported TX errors and debug message noise from buggy NII
implementations that report spurious TX completions whenever the
transmit queue is empty.
Some other NII implementations (observed with the Realtek driver in a
Dell Latitude 3440) seem to have a bug in the transmit datapath
handling which results in the transmit ring freezing after sending a
few hundred packets under heavy load. The symptoms are that the
TPPoll register's NPQ bit remains set and the 256-entry transmit ring
contains a large number of uncompleted descriptors (with the OWN bit
set), the first two of which have identical data buffer addresses.
Though iPXE will submit at most one in-progress transmission via NII,
the Dell/Realtek driver seems to make a page-aligned copy of each
transmit data buffer and to report TX completions immediately without
waiting for the packet to actually be transmitted. These synthetic TX
completions continue even after the hardware transmit ring freezes.
Setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll reduces the
probability of this Dell/Realtek driver bug being triggered by a
factor of around 500, which brings the failure rate down to the point
that it can sensibly be managed by external logic such as the
"--timeout" option for image downloads. Closing and reopening the
interface (via "ifclose"/"ifopen") will clear the error condition and
allow transmissions to resume.
Revert to setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll,
and silently ignore situations in which the hardware reports a
completion when no transmission is in progress. This approximately
matches the behaviour of the SnpDxe driver, which will also generally
set PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI variables do not map neatly to the iPXE settings mechanism, since
the EFI variable identifier includes a namespace GUID that cannot
cleanly be supplied as part of a setting name. Creating a new EFI
variable requires the variable's attributes to be specified, which
does not fit within iPXE's settings concept.
However, EFI variable names are generally unique even without the
namespace GUID, and EFI does provide a mechanism to iterate over all
existent variables. We can therefore provide read-only access to EFI
variables by comparing only the names and ignoring the namespace
GUIDs.
Provide an "efi" settings block that implements this mechanism using a
syntax such as:
echo Platform language is ${efi/PlatformLang:string}
show efi/SecureBoot:int8
Settings are returned as raw binary values by default since an EFI
variable may contain boolean flags, integer values, ASCII strings,
UCS-2 strings, EFI device paths, X.509 certificates, or any other
arbitrary blob of data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EDK2 UefiPxeBcDxe driver includes some remarkably convoluted and
unsafe logic in its driver binding protocol Start() and Stop() methods
in order to support a pair of nominally independent driver binding
protocols (one for IPv4, one for IPv6) sharing a single dynamically
allocated data structure. This PXEBC_PRIVATE_DATA structure is
installed as a dummy protocol on the NIC handle in order to allow both
IPv4 and IPv6 driver binding protocols to locate it as needed.
The error handling code path in the UefiPxeBcDxe driver's Start()
method may attempt to uninstall the dummy protocol but fail to do so.
This failure is ignored and the containing memory is subsequently
freed anyway. On the next invocation of the driver binding protocol,
it will find and use this already freed block of memory. At some
point another memory allocation will occur, the PXEBC_PRIVATE_DATA
structure will be corrupted, and some undefined behaviour will occur.
The UEFI firmware used in VMware ESX 8 includes some proprietary
changes which attempt to install copies of the EFI_LOAD_FILE_PROTOCOL
and EFI_PXE_BASE_CODE_PROTOCOL instances from the IPv4 child handle
onto the NIC handle (along with a VMware-specific protocol with GUID
5190120d-453b-4d48-958d-f0bab3bc2161 and a NULL instance pointer).
This will inevitably fail with iPXE, since the NIC handle already
includes an EFI_LOAD_FILE_PROTOCOL instance.
These VMware proprietary changes end up triggering the unsafe error
handling code path described above. The typical symptom is that an
attempt to exit from iPXE back to the UEFI firmware will crash the VM
with a General Protection fault from within the UefiPxeBcDxe driver:
this happens when the UefiPxeBcDxe driver's Stop() method attempts to
call through a function pointer in the (freed) PXEBC_PRIVATE_DATA
structure, but the function pointer has by then been overwritten by
UCS-2 character data from an unrelated memory allocation.
Work around this failure by adding the VMware UefiPxeBcDxe driver to
the driver veto list.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The old IPv4-only IScsiDxe driver in MdeModulePkg/Universal/Network
was replaced by a dual-stack IScsiDxe driver in NetworkPkg.
Add the module GUID for this driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EDK2 headers may be included even in builds for non-EFI platforms.
Commits such as 9de6c45 ("[arm] Use -fno-short-enums for all 32-bit
ARM builds") have so far ensured that the compile-time checks within
the EDK2 headers will pass even when building for a non-EFI platform.
As a more general solution, temporarily disable static assertions
while including UefiBaseType.h if building on a non-EFI platform.
This avoids the need to modify the ABI on other platforms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "shim" command will skip downloading the shim binary (and is
therefore a conditional no-op) if there is already a selected EFI
image that can be executed directly via LoadImage()/StartImage().
This allows the same iPXE script to be used with Secure Boot either
enabled or disabled.
Generalise this further to provide a dummy "shim" command that is an
unconditional no-op on non-EFI platforms. This then allows the same
iPXE script to be used for BIOS, EFI with Secure Boot disabled, or EFI
with Secure Boot enabled.
The same effect could be achieved by using "iseq ${platform} efi"
within the script, but this would complicate end-user documentation.
To minimise the code size impact, the dummy "shim" command is a pure
no-op that does not call parse_options() and so will ignore even
standardised arguments such as "--help".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI shim implements a fairly nicely designed revocation mechanism
designed around the concept of security generations. Unfortunately
nobody in the shim community has thus far added the relevant metadata
to the Linux kernel, with the result that current versions of shim are
incapable of booting current versions of the Linux kernel.
Experience shows that there is unfortunately no point in trying to get
a fix for this upstreamed into shim. We therefore default to working
around this undesirable behaviour by patching data read from the
"SbatLevel" variable used to hold SBAT configuration.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow a shim to be used to facilitate booting a kernel using a script
such as:
kernel /images/vmlinuz console=ttyS0,115200n8
initrd /images/initrd.img
shim /images/shimx64.efi
boot
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for using a shim as a helper to execute an EFI image.
When a shim has been specified via shim(), the shim image will be
passed to LoadImage() instead of the selected EFI image and the
command line will be prepended with the name of the selected EFI
image. The selected EFI image will be accessible to the shim via the
virtual filesystem as a hidden file.
Reduce the Secure Boot attack surface by removing, where possible, the
spurious requirement for a third party second stage loader binary such
as GRUB to be used solely in order to call the "shim lock protocol"
entry point.
Do not install the EFI PXE APIs when using a shim, since if shim finds
EFI_PXE_BASE_CODE_PROTOCOL on the loaded image's device handle then it
will attempt to download files afresh instead of using the files
already downloaded by iPXE and exposed via the EFI_SIMPLE_FILE_SYSTEM
protocol. (Experience shows that there is no point in trying to get a
fix for this upstreamed into shim.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI shim includes a "shim lock protocol" that can be used by a
third party second stage loader such as GRUB to verify a kernel image.
Add definitions for the relevant portions of this protocol interface.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most image flags are independent values: any combination of flags may
be set for any image, and the flags for one image are independent of
the flags for any other image. The "selected" flag does not follow
this pattern: at most one image may be marked as selected at any time.
When invoking a kernel via the UEFI shim, there will be multiple
"special" images: the selected kernel itself, the shim image, and
potentially a shim-signed GRUB binary to be used as a crutch to assist
shim in loading the kernel (since current versions of the UEFI shim
are not capable of directly loading a Linux kernel).
Remove the "selected" image flag and replace it with a general concept
of an image tag with the same semantics: a given tag may be assigned
to at most one image, an image may be found by its tag only while the
image is currently registered, and a tag will survive unregistration
and reregistration of an image (if it has not already been assigned to
a new image). For visual consistency, also replace the current image
pointer with a current image tag.
The image pointer stored within the image tag holds only a weak
reference to the image, since the selection of an image should not
prevent that image from being freed. (The strong reference to the
currently executing image is held locally within the execution scope
of image_exec(), and is logically separate from the current image
pointer.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An EFI image that is rejected by LoadImage() due to failing Secure
Boot verification is still an EFI image. Unfortunately, the extremely
broken UEFI Secure Boot model provides no way for us to unambiguously
determine that a valid EFI executable image was rejected only because
it failed signature verification. We must therefore use heuristics to
guess whether not an image that was rejected by LoadImage() could
still be loaded via a separate PE loader such as the UEFI shim.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Versions 15.4 and earlier of the UEFI shim are incapable of correctly
parsing the command line in order to extract the second stage loader
filename, and will always attempt to load "grubx64.efi" or equivalent.
Versions 15.3 and later of the UEFI shim are currently incapable of
loading a Linux kernel directly anyway, since the kernel does not
include SBAT metadata. These versions will require a genuine
shim-signed GRUB binary to be used as a crutch to assist shim in
loading a Linux kernel.
This leaves versions 15.2 and earlier of the UEFI shim (as currently
used in e.g. RHEL7) as being capable of directly loading a Linux
kernel, but incorrectly attempting to load it using the filename
"grubx64.efi" or equivalent. To support the bugs in these older
versions of the UEFI shim, allow the currently selected image to be
opened via any filename of the form "grub*.efi".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When invoking a kernel via the UEFI shim, the kernel image must be
accessible via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL but must not be present
in the magic initrd constructed from all registered images.
Re-register a currently executing EFI image and mark it as hidden,
thereby allowing it to be accessed via the virtual filesystem exposed
via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL without appearing in the magic
initrd contents.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When invoking a kernel via the UEFI shim, the kernel (and potentially
also a helper binary such as GRUB) must be accessible via the virtual
filesystem exposed via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL but must not be
present in the magic initrd constructed from all registered images.
Allow for images to be flagged as hidden, which will cause them to be
excluded from API-level lists of all images such as the virtual
filesystem directory contents, the magic initrd, or the Multiboot
module list. Hidden images remain visible to iPXE commands including
"imgstat", which will show a "[HIDDEN]" flag for such images.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Show the original filename as used by the consumer when calling our
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL's Open() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Try searching for a matching registered image before checking for
fixed filenames (such as "initrd.magic" for the dynamically generated
magic initrd file). This minimises surprise by ensuring that an
explicitly downloaded image will always be used verbatim.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) include a bzImage
header within a section starting at offset zero, with the PE header
effectively occupying unused space within this section. This section
should not appear as a named section in the resulting PE file.
Allow for the existence of hidden sections that do not result in a
section header being written to the PE file.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid 32-bit BIOS and 64-bit UEFI binaries (such as wimboot) may
include R_X86_64_32 relocation records for the 32-bit BIOS portions.
These should be ignored when generating PE relocation records, since
they apply only to code that cannot be executed within the context of
the 64-bit UEFI binary, and creating a 4-byte relocation record is
invalid in a binary that may be relocated anywhere within the 64-bit
address space (see commit 907cffb "[efi] Disallow R_X86_64_32
relocations").
Add a "--hybrid" option to elf2efi, which will cause R_X86_64_32
relocation records to be silently discarded.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) require the PE
header to be kept as small as possible, since the bzImage header
starts at a fixed offset 0x1f1.
The EFI_IMAGE_OPTIONAL_HEADER structures in PeImage.h define an
optional header containing 16 data directory entries, of which the
last eight are unused in binaries that we create. Shrink the data
directory to contain only the first eight entries, to minimise the
overall size of the PE header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hybrid bzImage and UEFI binaries (such as wimboot) require the PE
header to be kept as small as possible, since the bzImage header
starts at a fixed offset 0x1f1.
The PE header currently includes 128 bytes of zero padding between the
DOS and NT header portions. This padding has been present since
commit 81d92c6 ("[efi] Add EFI image format and basic runtime
environment") first added support for EFI images in iPXE, and was
included on the basis of matching the observed behaviour of the
Microsoft toolchain. There appears to be no requirement for this
padding to exist: EDK2 binaries built with gcc include only 64 bytes
of zero padding, Linux kernel binaries include 66 bytes of non-zero
padding, and wimboot binaries include no padding at all.
Remove the unnecessary padding between the DOS and NT header portions
to minimise the overall size of the PE header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Prepare for the possibility that a record handler may choose not to
consume the entire record by passing the I/O buffer and requiring the
handler to mark consumed data using iob_pull().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As documented in commits 6a004be ("[efi] Support the initrd
autodetection mechanism in newer Linux kernels") and 04e60a2 ("[efi]
Omit EFI_LOAD_FILE2_PROTOCOL for a zero-length initrd"), the choice in
Linux of using a fixed device path requires bootloaders to allow for
the fact that a previous bootloader may have already installed a
handle with the fixed device path.
We currently deal with this situation by reusing the existing handle,
replacing the EFI_LOAD_FILE2_PROTOCOL instance with our own. Simplify
the code by instead uninstalling the EFI_DEVICE_PATH_PROTOCOL instance
from the existing handle (if present), thereby allowing the creation
of a new handle to succeed.
Create the new handle only if we have a non-empty initrd to provide.
This works around bugs in bootloaders such as the systemd EFI stub
that fail to allow for the existence of multiple-bootloader chains.
(The workaround is not comprehensive: if the user has downloaded other
images in iPXE before invoking the systemd Unified Kernel Image (UKI),
then the systemd EFI stub will still crash and burn since it fails to
allow for the fact that a previous bootloader has already installed a
handle with the fixed device path.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Intel I210's packet buffer size registers reset only on power up,
not when a reset signal is asserted. This can lead to the inability
to pass traffic in the event that the DMA TX Maximum Packet Size
(which does reset to its default value on reset) is bigger than the TX
Packet Buffer Size.
For example, an operating system may be using the time sensitive
networking features of the I210 and the registers may be programmed
correctly, but then a reset signal is asserted and iPXE on the next
boot will be unable to use the I210.
Mimic what Linux does and forcibly set the registers to their default
values.
Signed-off-by: Matt Parrella <parrella.matthew@gmail.com>
When a DHCP transaction does not result in the registration of a new
"proxydhcp" or "pxebs" settings block, any existing settings blocks
are currently left unaltered.
This can cause surprising behaviour. For example: when chainloading
iPXE, the "proxydhcp" and "pxebs" settings blocks may be prepopulated
using cached values from the previous PXE bootloader. If iPXE
performs a subsequent DHCP request, then the DHCP or ProxyDHCP servers
may choose to respond differently to iPXE. The response may choose to
omit the ProxyDHCP or PXEBS stages, in which case no new "proxydhcp"
or "pxebs" settings blocks may be registered. This will result in
iPXE using a combination of both old and new DHCP responses.
Fix by assuming that a successful DHCPACK effectively acquires
ownership of the "proxydhcp" and "pxebs" settings blocks, and that any
existing settings blocks should therefore be unregistered.
Reported-by: Henry Tung <htung@palantir.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We unregister script images during their execution, to prevent a
"boot" command from re-executing the containing script. This also has
the side effect of preventing executing scripts from showing up within
the Linux magic initrd image (or the Multiboot module list).
Additional logic in bzimage.c and efi_file.c prevents a currently
executing kernel from showing up within the magic initrd image.
Similar logic in multiboot.c prevents the Multiboot kernel from
showing up as a Multiboot module.
This still leaves some corner cases that are not covered correctly.
For example: when using a gzip-compressed kernel image, nothing will
currently hide the original compressed image from the magic initrd.
Fix by moving the logic that temporarily unregisters the current image
from script_exec() to image_exec(), so that it applies to all image
types, and simplify the magic initrd and Multiboot module list
construction logic on the basis that no further filtering of the
registered image list is necessary.
This change has the side effect of hiding currently executing EFI
images from the virtual filesystem exposed by iPXE. For example, when
using iPXE to boot wimboot, the wimboot binary itself will no longer
be visible within the virtual filesystem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the request parameter mechanism to allow for arbitrary HTTP
headers to be specified via e.g.:
params
param --header Referer http://www.example.com
imgfetch http://192.168.0.1/script.ipxe##params
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Prepare for the parameter mechanism to be generalised to specifying
request parameters that are passed via mechanisms other than an
application/x-www-form-urlencoded form.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An attempt to use an existent but empty form parameter list will
currently result in an invalid POST request since the Content-Length
header will be missing.
Fix by using GET instead of POST if the form parameter list is empty.
This is a non-breaking change (since the current behaviour produces an
invalid request), and simplifies the imminent generalisation of the
parameter list concept to handle both header and form parameters.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the Linux kernel is being used with no initrd, iPXE will still
provide a zero-length initrd.magic file within the virtual filesystem.
As of commit 6a004be ("[efi] Support the initrd autodetection
mechanism in newer Linux kernels"), this zero-length file will also be
exposed via an EFI_LOAD_FILE2_PROTOCOL instance on a handle with a
fixed device path.
The correct handling of zero-length files via EFI_LOAD_FILE2_PROTOCOL
is unfortunately not well defined.
Linux expects the first call to LoadFile() to always fail with
EFI_BUFFER_TOO_SMALL. When the initrd is genuinely zero-length, iPXE
will return success since the buffer is not too small to hold the
(zero-length) file. This causes Linux to immediately report a
spurious EFI_LOAD_ERROR boot failure.
We could change the logic in iPXE's efi_file_load() to always return
EFI_BUFFER_TOO_SMALL if Buffer is NULL on entry. Since the correct
behaviour of LoadFile() in the corner case of a zero-length file is
left undefined by the UEFI specification, this would be permissible.
Unfortunately this approach would not fix the problem. If we return
EFI_BUFFER_TOO_SMALL and set the file length to zero, then Linux will
call the boot services AllocatePages() method with a zero length. In
at least the EDK2 implementation, this combination of parameters will
cause AllocatePages() to return EFI_OUT_OF_RESOURCES, and Linux will
again report a boot failure.
Another approach would be to install the initrd device path handle
only if we have a non-empty initrd to offer. Unfortunately this would
lead to a failure in yet another corner case: if a previous bootloader
has installed an initrd device path handle (e.g. to pass a boot script
to iPXE) then we must not leave that initrd in place, since then our
loaded kernel would end up seeing the wrong initrd content.
The cleanest fix seems to be to ensure that the initrd device path
handle is installed with the EFI_DEVICE_PATH_PROTOCOL instance present
but with the EFI_LOAD_FILE2_PROTOCOL instance absent (and forcibly
uninstalled if necessary), matching the state in which we leave the
handle after uninstalling our virtual filesystem. Linux will then not
find any handle that supports EFI_LOAD_FILE2_PROTOCOL within the fixed
device path, and so will fall through to trying other mechanisms to
locate the initrd.
Reported-by: Chris Bradshaw <cwbshaw@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 7ca801d ("[efi] Use the EFI_RNG_PROTOCOL as an entropy source
if available") added EFI_RNG_PROTOCOL as an alternative entropy source
via an ad-hoc mechanism specific to efi_entropy.c.
Split out EFI_RNG_PROTOCOL to a separate entropy source, and allow the
entropy core to handle the selection of RDRAND, EFI_RNG_PROTOCOL, or
timer ticks as the active source.
The fault detection logic added in commit a87537d ("[efi] Detect and
disable seriously broken EFI_RNG_PROTOCOL implementations") may be
removed completely, since the failure will already be detected by the
generic ANS X9.82-mandated repetition count test and will now be
handled gracefully by the entropy core.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide per-source state variables for the repetition count test and
adaptive proportion test, to allow for the situation in which an
entropy source can be enabled but then fails during the startup tests,
thereby requiring an alternative entropy source to be used.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As noted in commit 3c83843 ("[rng] Check for several functioning RTC
interrupts"), experimentation shows that Hyper-V cannot be trusted to
reliably generate RTC interrupts. (As noted in commit f3ba0fb
("[hyperv] Provide timer based on the 10MHz time reference count
MSR"), Hyper-V appears to suffer from a general problem in reliably
generating any legacy interrupts.) An alternative entropy source is
therefore required for an image that may be used in a Hyper-V Gen1
virtual machine.
The x86 RDRAND instruction provides a suitable alternative entropy
source, but may not be supported by all CPUs. We must therefore allow
for multiple entropy sources to be compiled in, with the single active
entropy source selected only at runtime.
Restructure the internal entropy API to allow a working entropy source
to be detected and chosen at runtime.
Enable the RDRAND entropy source for all x86 builds, since it is
likely to be substantially faster than any other source.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently specify only the iSCSI default value for MaxBurstLength
and ignore any negotiated value, since our internal block device API
allows only for receiving directly into caller-allocated buffers and
so we have no intrinsic limit on burst length.
A conscientious target may however refuse to attempt a transfer that
we request for a number of blocks that would exceed the negotiated
maximum burst length.
Fix by recording the negotiated maximum burst length and using it to
limit the maximum number of blocks per transfer as reported by the
SCSI layer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Linux 5.7 added the ability to autodetect an initrd by searching for a
handle via a fixed vendor-specific "Linux initrd device path" and then
locating and using the EFI_LOAD_FILE2_PROTOCOL instance on that
handle.
This maps quite naturally onto our existing concept of a "magic
initrd" as introduced for EFI in commit e5f0255 ("[efi] Provide an
"initrd.magic" file for use by UEFI kernels").
Add an EFI_LOAD_FILE2_PROTOCOL instance to our EFI virtual files
(backed by simply calling the existing EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
method to read from the file), and install the protocol instance for
the "initrd.magic" virtual file onto a new device handle that also
provides the Linux initrd device path.
The design choice in Linux of using a single fixed device path makes
this unfortunately messy to support, since device paths must be unique
within a system. When multiple bootloaders are used (e.g. GRUB
loading iPXE loading Linux) then only one bootloader can ever install
the device path onto a handle. Subsequent bootloaders must locate the
existing handle and replace the load file protocol instance with their
own.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Show the requested range when a caller reads from a virtual file via
the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL interface.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Linux kernel bzImage image format and the CPIO archive constructor
will parse the image command line for certain arguments of the form
"key=value". This parsing is currently implemented using strstr() in
a way that can cause a false positive suffix match. For example, a
command line containing "highmem=<n>" would erroneously be treated as
containing a value for "mem=<n>".
Fix by centralising the logic used for parsing such arguments, and
including a check that the argument immediately follows a whitespace
delimiter (or is at the start of the string).
Reported-by: Filippo Giunchedi <filippo@esaurito.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 74222cd ("[rng] Check for functioning RTC interrupt") added a
check that the RTC is capable of generating interrupts via the legacy
PIC, since this mechanism appears to be broken in some Hyper-V virtual
machines.
Experimentation shows that the RTC is sometimes capable of generating
a single interrupt, but will then generate no subsequent interrupts.
This currently causes rtc_entropy_check() to falsely detect that the
entropy gathering mechanism is functional.
Fix by checking for several RTC interrupts before declaring that it is
a functional entropy source.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EISA expansion slot I/O port addresses overlap space that may be
assigned to PCI devices, which can lead to register reads and writes
with unwanted side effects during EISA probing.
Reduce the chances of performing EISA probing on PCI devices by
probing EISA slot vendor and product ID registers only if the EISA
system board vendor ID register indicates that the motherboard
supports EISA.
Debugged-by: Václav Ovsík <vaclav.ovsik@gmail.com>
Tested-by: Václav Ovsík <vaclav.ovsik@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building a LoongArch64 Linux userspace binary.
Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The test suites for the various architectures are often run back to
back, and there is currently nothing to visually distinguish one test
run from another.
Include the architecture name within the self-test startup banner, to
aid in visual identification of test results.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PAGE_SHIFT definition is an architectural property, rather than an
aspect of a particular I/O API implementation (of which, in theory,
there may be more than one per architecture).
Reflect this by moving the definition to the top-level bits/io.h for
each architecture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Over the years, the undocumented operand modifier used to produce the
unprefixed constant values in __einfo_error() has varied from "%c0" to
"%a0" in commit 1a77466 ("[build] Fix use of inline assembly on GCC
4.8 ARM64 builds") and back to "%c0" in commit 3fb3ffc ("[build] Fix
use of inline assembly on GCC 8 ARM64 builds"), according to the
evolving demands of the toolchain.
LoongArch64 suffers from a similar issue: GCC 13 will allow either,
but the currently released GCC 12 allows only the "%a0" form.
Introduce a macro ASM_NO_PREFIX, defined in bits/compiler.h, to
abstract away this difference and allow different architectures to use
different operand modifiers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Xen headers support only x86 and ARM. Allow for platforms such as
LoongArch64 to build despite the absence of Xen support by providing
an architecture-specific <bits/xen.h> that simply does:
#ifndef _BITS_XEN_H
#define _BITS_XEN_H
#include <ipxe/nonxen.h>
#endif /* _BITS_XEN_H */
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for recording LLDP packets and exposing TLV values via the
settings mechanism. LLDP settings are encoded as
${netX.lldp/<prefix>.<type>.<index>.<offset>.<length>}
where
<type> is the TLV type
<offset> is the starting offset within the TLV value
<length> is the length (or zero to read the from <offset> to the end)
<prefix>, if it has a non-zero value, is the subtype byte string of
length <offset> to match at the start of the TLV value, up to a
maximum matched length of 4 bytes
<index> is the index of the entry matching <type> and <prefix> to be
accessed, with zero indicating the first matching entry
The <prefix> is designed to accommodate both matching of the OUI
within an organization-specific TLV (e.g. 0x0080c2 for IEEE 802.1
TLVs) and of a subtype byte as found within many TLVs.
This encoding allows most LLDP values to be extracted easily. For
example
System name: ${netX.lldp/5.0.0.0:string}
System description: ${netX.lldp/6.0.0.0:string}
Port description: ${netX.lldp/4.0.0.0:string}
Port interface name: ${netX.lldp/5.2.0.1.0:string}
Chassis MAC address: ${netX.lldp/4.1.0.1.0:hex}
Management IPv4 address: ${netX.lldp/5.1.8.0.2.4:ipv4}
Port VLAN ID: ${netX.lldp/0x0080c2.1.127.0.4.2:int16}
Port VLAN name: ${netX.lldp/0x0080c2.3.127.0.7.0:string}
Maximum frame size: ${netX.lldp/0x00120f.4.127.0.4.2:uint16}
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC 2131 leaves undefined the behaviour of the client in response to a
DHCPNAK that comes from a server other than the selected DHCP server.
A substantial amount of online documentation suggests using multiple
independent DHCP servers with non-overlapping ranges in the same
subnet in order to provide some minimal redundancy. Experimentation
shows that in this setup, at least ISC dhcpd will send a DHCPNAK in
response to the client's DHCPREQUEST for an address that is not within
the range defined on that server. (Since the requested address does
lie within the subnet defined on that server, this will happen
regardless of the "authoritative" parameter.) The client will
therefore receive a DHCPACK from the selected DHCP server along with
one or more DHCPNAKs from each of the non-selected DHCP servers.
Filter out responses from non-selected DHCP servers before checking
for a DHCPNAK, so that these arguably spurious DHCPNAKs will not cause
iPXE to return to the discovery state.
Continue to check for DHCPNAK before filtering out responses for
non-selected lease addresses, since experimentation shows that the
DHCPNAK will usually have an empty yiaddr field.
Reported-by: Anders Blomdell <anders.blomdell@control.lth.se>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "bridge" driver introduced in 3aa6b79 ("[pci] Add minimal PCI
bridge driver") is required only for BIOS builds using the ENA driver,
where experimentation shows that we cannot rely on the BIOS to fully
assign MMIO addresses.
Since the driver is a valid PCI driver, it will end up binding to all
PCI bridge devices even on a UEFI platform, where the firmware is
likely to have completed MMIO address assignment correctly. This has
no impact on most systems since there is generally no UEFI driver for
PCI bridges: the enumeration of the whole PCI bus is handled by the
PciBusDxe driver bound to the root bridge.
Experimentation shows that at least one laptop will freeze at the
point that iPXE attempts to bind to the bridge device. No deeper
investigation has been carried out to find the root cause.
Fix by causing efipci_supported() to return an error unless the
configuration space header type indicates a non-bridge device.
Reported-by: Marcel Petersen <mp@sbe.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Try loading the autoexec.ipxe script first from the directory
containing the iPXE binary (based on the relative file path provided
to us via EFI_LOADED_IMAGE_PROTOCOL), then fall back to trying the
root directory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some cards seem to have the receive VLAN tag stripping feature enabled
by default, which causes received VLAN packets to be misinterpreted as
being received by the trunk device.
Fix by disabling VLAN tag stripping in the C+ Command Register.
Debugged-by: Xinming Lai <yiyihu@gmail.com>
Tested-by: Xinming Lai <yiyihu@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update to pick up the upstream commit bda715b ("MdePkg: Fix UINT64 and
INT64 word length for LoongArch64").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The self-test suite does not currently ever attempt to sleep the CPU.
This is an operation that may fail (e.g. by attempting to execute a
privileged instruction while running as a Linux userspace binary, or
by halting the CPU with all interrupts disabled).
Add a trivial self-test to exercise the ability to sleep the CPU
without crashing or halting forever.
Inspired-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Treat a command line passed to iPXE via UEFI LoadOptions as an image
to be registered at startup, as is already done for the .lkrn, .pxe,
and .exe BIOS images.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The obsolete ConsoleControl.h header is no longer present in the
current EDK2 codebase, but is still required for interoperability with
old iMacs.
Add an iPXE include guard to this file so that the EDK2 header import
script will no longer attempt to import it from the EDK2 tree.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The IntelFrameworkPkg and EdkCompatibilityPkg directories have been
removed from the EDK2 codebase. Remove these directories from the
EDK2 header import script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with util/elf2efi32 and util/elf2efi64 in commit a99e435 ("[efi] Do
not rely on ProcessorBind.h when building host binaries"), build
util/efirom without using any architecture-specific EDK2 headers since
the build host's CPU architecture may not be supported by EDK2.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current maximum window size of 256kB was calculated based on rough
link bandwidth and RTT measurements taken in 2012, and is too small to
avoid filling the TCP window on some modern links.
Update the list of typical link bandwidth and RTT figures to reflect
the modern world, and increase the maximum window size accordingly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PXE UDP receive queue may grow without limit if the PXE NBP does
not call PXENV_UDP_READ sufficiently frequently.
Fix by implementing a cache discarder for received PXE UDP packets
(similar to the TCP cache discarder).
Reported-by: Tal Shorer <shorer@amazon.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many consoles will scroll immediately upon drawing a character in the
rightmost column of the bottom row of the display, in order to be able
to advance the cursor to the next character (even if the cursor is
disabled).
This causes PXE menus to display incorrectly. Specifically, pressing
the down arrow key while already on the last menu item may cause the
whole screen to scroll and the line to be duplicated.
Fix by moving the PXE menu one row up from the bottom of the screen.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI has the mildly annoying habit of installing copies of the
EFI_SIMPLE_NETWORK_PROTOCOL instance on the IPv4 and IPv6 child device
handles. This can cause iPXE's SNP driver to attempt to bind to a
copy of the EFI_SIMPLE_NETWORK_PROTOCOL that iPXE itself provided on a
different handle.
Fix by refusing to bind to an SNP (or NII) handle if there exists
another instance of the same protocol further up the device path (on
the basis that we always want to bind to the highest possible device).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the functionality of efi_locate_device() to allow callers to
find instances of the protocol that may exist further up the device
path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of the 32-bit ARM linker seem to treat the absence of a
.note.GNU-stack section as implying an executable stack, and will
print a warning that this is deprecated behaviour.
Silence the warning by adding a .note.GNU-stack section to each
assembly file and retaining the sections in the Linux linker script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI ABI requires the use of -mfloat-abi=soft, but other platforms
may require -mfloat-abi=hard.
Allow for this by using -mfloat-abi=soft only for EFI builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI ABI requires the use of -fno-short-enums, and the EDK2 headers
will perform a compile-time check that enums are 32 bits.
The EDK2 headers may be included even in builds for non-EFI platforms,
and so the -fno-short-enums flag must be used in all 32-bit ARM
builds. Fortunately, nothing else currently cares about enum sizes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building as a Linux userspace binary for AArch64.
This allows the self-test suite to be more easily run for the 64-bit
ARM code. For example:
# On a native AArch64 system:
#
make bin-arm64-efi/tests.linux && ./bin-arm64-efi/tests.linux
# On a non-AArch64 system (e.g. x86_64) via cross-compilation,
# assuming that kernel and glibc headers are present within
# /usr/aarch64-linux-gnu/sys-root/:
#
make bin-arm64-linux/tests.linux CROSS=aarch64-linux-gnu- && \
qemu-aarch64 -L /usr/aarch64-linux-gnu/sys-root/ \
./bin-arm64-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move the platform-specific DHCP client architecture definitions to
header files of the form <ipxe/$(PLATFORM)/dhcparch.h>. This
simplifies the directory structure and allows the otherwise unused
arch/$(ARCH)/include/$(PLATFORM) to be removed from the include
directory search path, which avoids the confusing situation in which a
header file may potentially be accessed through more than one path.
For Linux userspace binaries on any architecture, use the EFI values
for that architecture by delegating to the EFI header file. This
avoids the need to explicitly select values for Linux userspace
binaries for each architecture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The requirement to undo the implicit "-Dlinux" is not specific to the
x86 architecture. Move this out of the x86-specific Makefile.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce duplication between i386 and x86_64 by providing a single
shared linker script that both architectures can include.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We cannot rely on the EDK2 ProcessorBind.h headers when compiling a
binary for execution on the build host itself (e.g. elf2efi), since
the host's CPU architecture may not even be supported by EDK2.
Fix by skipping ProcessorBind.h when building a host binary, and
defining the bare minimum required to allow other EDK2 headers to
compile cleanly.
Reported-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently don't allocate an Asynchronous Event Notification Queue
(AENQ) because we don't actually care about any of the events that may
come in.
The ENA firmware found on Graviton instances requires the AENQ to
exist, otherwise all admin queue commands will fail.
Fix by allocating an AENQ and disabling all events (so that we do not
need to include code to acknowledge any events that may arrive).
Signed-off-by: Alexander Graf <graf@amazon.com>
Ensure that the "${netX/...}" settings mechanism always uses the same
interpretation of the network device corresponding to "netX" as any
other mechanism that performs a name-based lookup of a network device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading iPXE from an EFI VLAN device, configure the
corresponding iPXE VLAN device to be created automatically.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the ability to automatically create a VLAN device for a specified
trunk device link-layer address and VLAN tag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading iPXE from a VLAN device, the MAC address of the
loaded image's device handle will match the MAC address of the trunk
device created by iPXE, and the autoboot process will then erroneously
consider the trunk device to be an autoboot device.
Fix by recording the VLAN tag along with the MAC address, and treating
the VLAN tag as part of the filter used to match the MAC address
against candidate network devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many laptops now include the ability to specify a "system-specific MAC
address" (also known as "pass-through MAC"), which is supposed to be
used for both the onboard NIC and for any attached docking station or
other USB NIC. This is intended to simplify interoperability with
software or hardware that relies on a MAC address to recognise an
individual machine: for example, a deployment server may associate the
MAC address with a particular operating system image to be deployed.
This therefore creates legitimate situations in which duplicate MAC
addresses may exist within the same system.
As described in commit 98d09a1 ("[netdevice] Avoid registering
duplicate network devices"), the Xen netfront driver relies on the
rejection of duplicate MAC addresses in order to inhibit registration
of the emulated PCI devices that a Xen PV-HVM guest will create to
shadow each of the paravirtual network devices.
Move the code that rejects duplicate MAC addresses from the network
device core to the Xen netfront driver, to allow for the existence of
duplicate MAC addresses in non-Xen setups.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The network device index currently serves two purposes: acting as a
sequential index for network device names ("net0", "net1", etc), and
acting as an opaque unique integer identifier used in socket address
scope IDs.
There is no particular need for these usages to be linked, and it can
lead to situations in which devices are named unexpectedly. For
example: if a system has two network devices "net0" and "net1", a VLAN
is created as "net1-42", and then a USB NIC is connected, then the USB
NIC will be named "net3" rather than the expected "net2" since the
VLAN device "net1-42" will have consumed an index.
Separate the usages: rename the "index" field to "scope_id" (matching
its one and only use case), and assign the name without reference to
the scope ID by finding the first unused name. For consistency,
assign the scope ID by similarly finding the first unused scope ID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UNDI drivers (such as the AMI UsbNetworkPkg currently in the
process of being upstreamed into EDK2) have a bug that will prevent
any packets from being received unless at least one attempt has been
made to disable some receive filters.
Work around these buggy drivers by attempting to disable receive
filters before enabling them. Ignore any errors, since we genuinely
do not care whether or not the disabling succeeds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently free an unclaimed cached DHCPACK immediately after
startup, in order to free up memory. This prevents the cached DHCPACK
from being applied to a device that is created after startup, such as
a VLAN device created via the "vcreate" command.
Retain any unclaimed DHCPACK after startup to allow it to be matched
against (and applied to) any device that gets created at runtime.
Free the DHCPACK during shutdown if it still remains unclaimed, in
order to exit with memory cleanly freed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading iPXE from a VLAN device, the MAC address within the
cached DHCPACK will match the MAC address of the trunk device created
by iPXE, and the cached DHCPACK will then end up being erroneously
applied to the trunk device. This tends to break outbound IPv4
routing, since both the trunk and VLAN devices will have the same
assigned IPv4 address.
Fix by recording the VLAN tag along with the cached DHCPACK, and
treating the VLAN tag as part of the filter used to match the cached
DHCPACK against candidate network devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no API for determining the VLAN tag (if any) for a
specified device handle. There is the EFI_VLAN_CONFIG_PROTOCOL, but
that exists only on the trunk device handle (not on the VLAN device
handle), and provides no way to match VLAN tags against the trunk
device's child device handles.
The EDK2 codebase seems to rely solely on the device path to determine
the VLAN tag for a specified device handle: both NetLibGetVlanId() and
BmGetNetworkDescription() will parse the device path to search for a
VLAN_DEVICE_PATH component.
Add efi_path_vlan() which uses the same device path parsing logic to
determine the VLAN tag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a single central implementation of the logic for stepping
through elements of an EFI device path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI implements VLAN support within the Managed Network Protocol (MNP)
driver, which may create child VLAN devices automatically based on
stored UEFI variables. These child devices do not themselves provide
a raw-packet interface via EFI_SIMPLE_NETWORK_PROTOCOL, and may be
consumed only via the EFI_MANAGED_NETWORK_PROTOCOL interface.
The device paths constructed for these child devices may conflict with
those for the EFI_SIMPLE_NETWORK_PROTOCOL instances that iPXE attempts
to install for its own VLAN devices. The upshot is that creating an
iPXE VLAN device (e.g. via the "vcreate" command) will fail if the
UEFI Managed Network Protocol has already created a device for the
same VLAN tag.
Fix by providing our own EFI_VLAN_CONFIG_PROTOCOL instance on the same
device handle as EFI_SIMPLE_NETWORK_PROTOCOL. This causes the MNP
driver to treat iPXE's device as supporting hardware VLAN offload, and
it will therefore not attempt to install its own instance of the
protocol.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The dangling pointer warning introduced in GCC 12 reports false
positives that result in build failures. In particular, storing the
address of a local code label used to record the current state of a
state machine (as done in crypto/deflate.c) is reported as an error.
There seems to be no way to mark the pointer type as being permitted
to hold such a value, so unconditionally disable the warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The array bounds checker on GCC 12 and newer reports a very large
number of false positives that result in build failures. In
particular, accesses through pointers to zero-length arrays (such as
those used by the linker table mechanism in include/ipxe/tables.h) are
reported as errors, contrary to the GCC documentation.
Work around this GCC issue by unconditionally disabling the warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The behaviour of PCI devices across a function-level reset seems to be
inconsistent in practice: some devices will preserve PCI BARs, some
will not.
Fix the behaviour of FLR on devices that do not preserve PCI BARs by
backing up and restoring PCI configuration space across the reset.
Preserve only the standard portion of the configuration space, since
there may be registers with unexpected side effects in the remaining
non-standardised space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS relies upon the ability of ciphers to perform in-place decryption,
in order to avoid allocating additional I/O buffers for received data.
Add verification of in-place encryption and decryption to the cipher
self-tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The hash calculation is currently performed incorrectly when
decrypting in place, since the ciphertext will have been overwritten
with the plaintext before being used to update the hash value.
Restructure the code to allow for in-place encryption and decryption.
Choose to optimise for the decryption case, since we are likely to
decrypt much more data than we encrypt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS relies upon the ability to reuse a cipher by resetting only the
initialisation vector while reusing the existing key.
Add verification of resetting the initialisation vector to the cipher
self-tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reset the accumulated authentication state when cipher_setiv() is
called, to allow the cipher to be reused without resetting the key.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All existing cipher suites use SHA-256 as the TLSv1.2 and above
handshake digest algorithm (even when using SHA-1 as the MAC digest
algorithm). Some GCM cipher suites use SHA-384 as the handshake
digest algorithm.
Allow the cipher suite to specify the handshake (and PRF) digest
algorithm to be used for TLSv1.2 and above.
This requires some restructuring to allow for the fact that the
ClientHello message must be included within the handshake digest, even
though the relevant digest algorithm is not yet known at the point
that the ClientHello is sent. Fortunately, the ClientHello may be
reproduced verbatim at the point of receiving the ServerHello, so we
rely on reconstructing (rather than storing) this message.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Always send the maximum supported version in our ClientHello message,
even when performing renegotiation (in which case the current version
may already be lower than the maximum supported version).
This is permitted by the specification, and allows the ClientHello to
be reconstructed verbatim at the point of selecting the handshake
digest algorithm in tls_new_server_hello().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for AEAD cipher suites where the MAC length may be zero and the
authentication is instead provided by an authenticating cipher, with
the plaintext authentication tag appended to the ciphertext.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Harden against padding oracle attacks by treating invalid block
padding as zero length padding, thereby deferring the failure until
after computing the (incorrect) MAC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Restructure the encryption and decryption operations to allow for the
use of ciphers where the initialisation vector is constructed by
concatenating the fixed IV (derived as part of key expansion) with a
record IV (prepended to the ciphertext).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS stream and block ciphers use a MAC with a length equal to the
output length of the digest algorithm in use. For AEAD ciphers there
is no MAC, with the equivalent functionality provided by the cipher
algorithm's authentication tag.
Allow for the existence of AEAD cipher suites by making the MAC length
a parameter of the cipher suite.
Assume that the MAC key length is equal to the MAC length, since this
is true for all currently supported cipher suites.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All TLS cipher types use a common structure for the per-record data
that is authenticated in addition to the plaintext itself. This data
is used as a prefix in the HMAC calculation for stream and block
ciphers, or as additional authenticated data for AEAD ciphers.
Define a "TLS authentication header" structure to hold this data as a
contiguous block, in order to meet the alignment requirement for AEAD
ciphers such as GCM.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Adjust the length of the first received ciphertext data buffer to
ensure that all decryption operations respect the cipher's alignment
size.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The GCM cipher mode of operation (in common with other counter-based
modes of operation) has a notion of blocksize that does not neatly
fall into our current abstraction: it does operate in 16-byte blocks
but allows for an arbitrary overall data length (i.e. the final block
may be incomplete).
Model this by adding a concept of alignment size. Each call to
encrypt() or decrypt() must begin at a multiple of the alignment size
from the start of the data stream. This allows us to model GCM by
using a block size of 1 byte and an alignment size of 16 bytes.
As a side benefit, this same concept allows us to neatly model the
fact that raw AES can encrypt only a single 16-byte block, by
specifying an alignment size of zero on this cipher.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS block ciphers always use CBC (as per RFC 5246 section 6.2.3.2)
with a record initialisation vector length that is equal to the cipher
block size, and no fixed initialisation vector.
The initialisation vector for AEAD ciphers such as GCM is less
straightforward, and requires both a fixed and per-record component.
Extend the definition of a cipher suite to include fixed and record
initialisation vector lengths, and generate the fixed portion (if any)
as part of key expansion.
Do not add explicit calls to cipher_setiv() in tls_assemble_block()
and tls_split_block(), since the constraints imposed by RFC 5246 are
specifically chosen to allow implementations to avoid doing so.
(Instead, add a sanity check that the record initialisation vector
length is equal to the cipher block size.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TLSv1.0 protocol was deprecated by RFC 8996 (along with TLSv1.1),
and has been disabled by default in iPXE since commit dc785b0fb
("[tls] Default to supporting only TLSv1.1 or above") in June 2020.
While there is value in continuing to support older protocols for
interoperability with older server appliances, the additional
complexity of supporting the implicit initialisation vector for
TLSv1.0 is not worth the cost.
Remove support for the obsolete TLSv1.0 protocol, to reduce complexity
of the implementation and simplify ongoing maintenance.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DMA mapping is performed implicitly as part of the call to
dma_alloc(). The current implementation creates the IOMMU mapping for
the allocated and potentially uninitialised data before returning to
the caller (which will immediately zero out or otherwise initialise
the buffer). This leaves a small window within which a malicious PCI
device could potentially attempt to retrieve firmware-owned secrets
present in the uninitialised buffer. (Note that the hypothetically
malicious PCI device has no viable way to know the address of the
buffer from which to attempt a DMA read, rendering the attack
extremely implausible.)
Guard against any such hypothetical attacks by zeroing out the
allocated buffer prior to creating the coherent DMA mapping.
Suggested-by: Mateusz Siwiec <Mateusz.Siwiec@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
bzimage_parse_cmdline() uses strcmp() to identify the named "vga=..."
kernel command line option values, which will give a false negative if
the option is not last on the command line.
Fix by temporarily changing the relevant command line separator (if
any) to a NUL terminator.
Debugged-by: Simon Rettberg <simon.rettberg@rz.uni-freiburg.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some ciphers (such as GCM) support the concept of a tag that can be
used to authenticate the encrypted data. Add a cipher method for
generating an authentication tag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some ciphers (such as GCM) support the concept of additional
authenticated data, which does not appear in the ciphertext but may
affect the operation of the cipher.
Allow cipher_encrypt() and cipher_decrypt() to be called with a NULL
destination buffer in order to pass additional data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>