The PCI I/O API (supporting accesses to PCI configuration space) is
not related to the general I/O API (supporting accesses to
memory-mapped I/O peripherals).
Remove the spurious inclusion of ipxe/io.h from the PCI I/O header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When invoking a kernel via the UEFI shim, the kernel (and potentially
also a helper binary such as GRUB) must be accessible via the virtual
filesystem exposed via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL but must not be
present in the magic initrd constructed from all registered images.
Allow for images to be flagged as hidden, which will cause them to be
excluded from API-level lists of all images such as the virtual
filesystem directory contents, the magic initrd, or the Multiboot
module list. Hidden images remain visible to iPXE commands including
"imgstat", which will show a "[HIDDEN]" flag for such images.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We unregister script images during their execution, to prevent a
"boot" command from re-executing the containing script. This also has
the side effect of preventing executing scripts from showing up within
the Linux magic initrd image (or the Multiboot module list).
Additional logic in bzimage.c and efi_file.c prevents a currently
executing kernel from showing up within the magic initrd image.
Similar logic in multiboot.c prevents the Multiboot kernel from
showing up as a Multiboot module.
This still leaves some corner cases that are not covered correctly.
For example: when using a gzip-compressed kernel image, nothing will
currently hide the original compressed image from the magic initrd.
Fix by moving the logic that temporarily unregisters the current image
from script_exec() to image_exec(), so that it applies to all image
types, and simplify the magic initrd and Multiboot module list
construction logic on the basis that no further filtering of the
registered image list is necessary.
This change has the side effect of hiding currently executing EFI
images from the virtual filesystem exposed by iPXE. For example, when
using iPXE to boot wimboot, the wimboot binary itself will no longer
be visible within the virtual filesystem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As noted in commit 3c83843 ("[rng] Check for several functioning RTC
interrupts"), experimentation shows that Hyper-V cannot be trusted to
reliably generate RTC interrupts. (As noted in commit f3ba0fb
("[hyperv] Provide timer based on the 10MHz time reference count
MSR"), Hyper-V appears to suffer from a general problem in reliably
generating any legacy interrupts.) An alternative entropy source is
therefore required for an image that may be used in a Hyper-V Gen1
virtual machine.
The x86 RDRAND instruction provides a suitable alternative entropy
source, but may not be supported by all CPUs. We must therefore allow
for multiple entropy sources to be compiled in, with the single active
entropy source selected only at runtime.
Restructure the internal entropy API to allow a working entropy source
to be detected and chosen at runtime.
Enable the RDRAND entropy source for all x86 builds, since it is
likely to be substantially faster than any other source.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Linux kernel bzImage image format and the CPIO archive constructor
will parse the image command line for certain arguments of the form
"key=value". This parsing is currently implemented using strstr() in
a way that can cause a false positive suffix match. For example, a
command line containing "highmem=<n>" would erroneously be treated as
containing a value for "mem=<n>".
Fix by centralising the logic used for parsing such arguments, and
including a check that the argument immediately follows a whitespace
delimiter (or is at the start of the string).
Reported-by: Filippo Giunchedi <filippo@esaurito.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 74222cd ("[rng] Check for functioning RTC interrupt") added a
check that the RTC is capable of generating interrupts via the legacy
PIC, since this mechanism appears to be broken in some Hyper-V virtual
machines.
Experimentation shows that the RTC is sometimes capable of generating
a single interrupt, but will then generate no subsequent interrupts.
This currently causes rtc_entropy_check() to falsely detect that the
entropy gathering mechanism is functional.
Fix by checking for several RTC interrupts before declaring that it is
a functional entropy source.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building a LoongArch64 Linux userspace binary.
Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PAGE_SHIFT definition is an architectural property, rather than an
aspect of a particular I/O API implementation (of which, in theory,
there may be more than one per architecture).
Reflect this by moving the definition to the top-level bits/io.h for
each architecture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Over the years, the undocumented operand modifier used to produce the
unprefixed constant values in __einfo_error() has varied from "%c0" to
"%a0" in commit 1a77466 ("[build] Fix use of inline assembly on GCC
4.8 ARM64 builds") and back to "%c0" in commit 3fb3ffc ("[build] Fix
use of inline assembly on GCC 8 ARM64 builds"), according to the
evolving demands of the toolchain.
LoongArch64 suffers from a similar issue: GCC 13 will allow either,
but the currently released GCC 12 allows only the "%a0" form.
Introduce a macro ASM_NO_PREFIX, defined in bits/compiler.h, to
abstract away this difference and allow different architectures to use
different operand modifiers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PXE UDP receive queue may grow without limit if the PXE NBP does
not call PXENV_UDP_READ sufficiently frequently.
Fix by implementing a cache discarder for received PXE UDP packets
(similar to the TCP cache discarder).
Reported-by: Tal Shorer <shorer@amazon.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of the 32-bit ARM linker seem to treat the absence of a
.note.GNU-stack section as implying an executable stack, and will
print a warning that this is deprecated behaviour.
Silence the warning by adding a .note.GNU-stack section to each
assembly file and retaining the sections in the Linux linker script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI ABI requires the use of -mfloat-abi=soft, but other platforms
may require -mfloat-abi=hard.
Allow for this by using -mfloat-abi=soft only for EFI builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI ABI requires the use of -fno-short-enums, and the EDK2 headers
will perform a compile-time check that enums are 32 bits.
The EDK2 headers may be included even in builds for non-EFI platforms,
and so the -fno-short-enums flag must be used in all 32-bit ARM
builds. Fortunately, nothing else currently cares about enum sizes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building as a Linux userspace binary for AArch64.
This allows the self-test suite to be more easily run for the 64-bit
ARM code. For example:
# On a native AArch64 system:
#
make bin-arm64-efi/tests.linux && ./bin-arm64-efi/tests.linux
# On a non-AArch64 system (e.g. x86_64) via cross-compilation,
# assuming that kernel and glibc headers are present within
# /usr/aarch64-linux-gnu/sys-root/:
#
make bin-arm64-linux/tests.linux CROSS=aarch64-linux-gnu- && \
qemu-aarch64 -L /usr/aarch64-linux-gnu/sys-root/ \
./bin-arm64-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move the platform-specific DHCP client architecture definitions to
header files of the form <ipxe/$(PLATFORM)/dhcparch.h>. This
simplifies the directory structure and allows the otherwise unused
arch/$(ARCH)/include/$(PLATFORM) to be removed from the include
directory search path, which avoids the confusing situation in which a
header file may potentially be accessed through more than one path.
For Linux userspace binaries on any architecture, use the EFI values
for that architecture by delegating to the EFI header file. This
avoids the need to explicitly select values for Linux userspace
binaries for each architecture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The requirement to undo the implicit "-Dlinux" is not specific to the
x86 architecture. Move this out of the x86-specific Makefile.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce duplication between i386 and x86_64 by providing a single
shared linker script that both architectures can include.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading iPXE from a VLAN device, the MAC address within the
cached DHCPACK will match the MAC address of the trunk device created
by iPXE, and the cached DHCPACK will then end up being erroneously
applied to the trunk device. This tends to break outbound IPv4
routing, since both the trunk and VLAN devices will have the same
assigned IPv4 address.
Fix by recording the VLAN tag along with the cached DHCPACK, and
treating the VLAN tag as part of the filter used to match the cached
DHCPACK against candidate network devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
bzimage_parse_cmdline() uses strcmp() to identify the named "vga=..."
kernel command line option values, which will give a false negative if
the option is not last on the command line.
Fix by temporarily changing the relevant command line separator (if
any) to a NUL terminator.
Debugged-by: Simon Rettberg <simon.rettberg@rz.uni-freiburg.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Pretty much all physical machines and off-the-shelf virtual machines
will provide a functional PCI BIOS. We therefore default to using
only the PCI BIOS, with no fallback to an alternative mechanism if the
PCI BIOS fails.
AWS EC2 provides the opportunity to experience some exceptions to this
rule. For example, the t3a.nano instances in eu-west-1 have no
functional PCI BIOS at all. As of commit 83516ba ("[cloud] Use
PCIAPI_DIRECT for cloud images") we therefore use direct Type 1
configuration space accesses in the images built and published for use
in the cloud.
Recent experience has discovered yet more variation in AWS EC2
instances. For example, some of the metal instance types have
multiple PCI host bridges and the direct Type 1 accesses therefore
see only a subset of the PCI devices.
Attempt to accommodate future such variations by making the PCI I/O
API selectable at runtime and choosing ECAM (if available), falling
back to the PCI BIOS (if available), then finally falling back to
direct Type 1 accesses.
This is implemented as a dedicated PCIAPI_CLOUD API, rather than by
having the PCI core select a suitable API at runtime (as was done for
timers in commit 302f1ee ("[time] Allow timer to be selected at
runtime"). The common case will remain that only the PCI BIOS API is
required, and we would prefer to retain the optimisations that come
from inlining the configuration space accesses in this common case.
Cloud images are (at present) disk images rather than ROM images, and
so the increased code size required for this design approach in the
PCIAPI_CLOUD case is acceptable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow pcibios_discover() to return an empty range if the INT 1A,B101
PCI BIOS installation check call fails.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow pci_find_next() to discover devices beyond the first PCI
segment, by generalising pci_num_bus() (which implicitly assumes that
there is only a single PCI segment) with pci_discover() (which has the
ability to return an arbitrary contiguous chunk of PCI bus:dev.fn
address space).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for linked-in code to override the mechanism used to locate an
ACPI table, thereby opening up the possibility of ACPI self-tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Accumulate UTF-8 characters in fbcon_putchar(), and require the frame
buffer console's .glyph() method to accept Unicode character values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several keyboard layouts define ASCII characters as accessible only
via the AltGr modifier. Add support for this modifier to ensure that
all ASCII characters are accessible.
Experiments suggest that the BIOS console is likely to fail to
generate ASCII characters when the AltGr key is pressed. Work around
this limitation by accepting LShift+RShift (which will definitely
produce an ASCII character) as a synonym for AltGr.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Handle Ctrl and CapsLock key modifiers within key_remap(), to provide
consistent behaviour across different console types.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The key with scancode 86 appears in the position between left shift
and Z on a US keyboard, where it typically fails to exist entirely.
Most US keyboard maps define this nonexistent key as generating "\|",
with the notable exception of "loadkeys" which instead reports it as
generating "<>". Both of these mapping choices duplicate keys that
exist elsewhere in the map, which causes problems for our ASCII-based
remapping mechanism.
Work around these quirks by treating the key as generating "\|" with
the high bit set, and making it subject to remapping. Where the BIOS
generates "\|" as expected, this allows us to remap to the correct
ASCII value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
To minimise code size, our keyboard mapping works on the basis of
allowing the BIOS to convert the keyboard scancode into an ASCII
character and then remapping the ASCII character.
This causes problems with keyboard layouts such as "fr" that swap the
shifted and unshifted digit keys, since the ASCII-based remapping will
spuriously remap the numeric keypad (which produces the same ASCII
values as the digit keys).
Fix by checking that the keyboard scancode is within the range of keys
that vary between keyboard mappings before attempting to remap the
ASCII character.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In real mode, code segments are always writable. In protected mode,
code segments can never be writable. The precise implementation of
this attribute differs between CPU generations, with subtly different
behaviour arising on the transitions from protected mode to real mode.
At the point of transition (when the PE bit is cleared in CR0) the
hidden portion of the %cs descriptor will retain whatever attributes
were in place for the protected-mode code segment, including the fact
that the segment is not writable. The immediately following code will
perform a far control flow transfer (such as ljmp or lret) in order to
load a real-mode value into %cs.
On the Pentium and later CPUs, the retained protected-mode attributes
will be ignored for any accesses via %cs while the CPU is in real
mode. A write via %cs will therefore be allowed even though the
hidden portion of the %cs descriptor still describes a non-writable
segment.
On the 486 and earlier CPUs, the retained protected-mode attributes
will not be ignored for accesses via %cs. A write via %cs will
therefore cause a CPU fault. To obtain normal real-mode behaviour
(i.e. a writable %cs descriptor), special logic is added to the ljmp
instruction that populates the hidden portion of the %cs descriptor
with real-mode attributes when a far jump is executed in real mode.
The result is that writes via %cs will cause a CPU fault until the
first ljmp instruction is executed, after which writes via %cs will be
allowed as expected in real mode.
The transition code in libprefix.S currently uses lret to load a
real-mode value into %cs after clearing the PE bit. Experimentation
shows that only the ljmp instruction will work to load real-mode
attributes into the hidden portion of the %cs descriptor: other far
control flow transfers (such as lret, lcall, or int) do not do so.
When running on a 486 or earlier CPU, this results in code within
libprefix.S running with a non-writable code segment after a mode
transition, which in turn results in a CPU fault when real-mode code
in liba20.S attempts to write to %cs:enable_a20_method.
Fix by constructing and executing an ljmp instruction, to trigger the
relevant descriptor population logic on 486 and earlier CPUs. This
ljmp instruction is constructed on the stack, since the .prefix
section may be executing directly from ROM (or from memory that the
BIOS has write-protected in order to emulate an ISA ROM region) and so
cannot be modified.
Reported-by: Nikolai Zhubr <n-a-zhubr@yandex.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SBAT defines an encoding for security generation numbers stored as a
CSV file within a special ".sbat" section in the signed binary. If a
Secure Boot exploit is discovered then the generation number will be
incremented alongside the corresponding fix.
Platforms may then record the minimum generation number required for
any given product. This allows for an efficient revocation mechanism
that consumes minimal flash storage space (in contrast to the DBX
mechanism, which allows for only a single-digit number of revocation
events to ever take place across all possible signed binaries).
Add SBAT metadata to iPXE EFI binaries to support this mechanism.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the DSDT/SSDT signature-scanning and value extraction code
to be reused for extracting a pass-through MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit cd3de55 ("[efi] Record cached DHCPACK from loaded image's
device handle, if present") added the ability for a chainloaded UEFI
iPXE to reuse an IPv4 address and DHCP options previously obtained by
a built-in PXE stack, without needing to perform a second DHCP
request.
Extend this to also record the cached ProxyDHCPOFFER and PXEBSACK
obtained from the EFI_PXE_BASE_CODE_PROTOCOL instance installed on the
loaded image's device handle, if present.
This allows a chainloaded UEFI iPXE to reuse a boot filename or other
options that were provided via a ProxyDHCP or PXE boot server
mechanism, rather than by standard DHCP.
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ARM versions of the big-integer inline assembly functions include
constraints to indicate that the output value is modified by the
assembly code. These constraints are not present in the equivalent
code for the x86 versions.
As of GCC 11, this results in the compiler reporting that the output
values may be uninitialized.
Fix by including the relevant memory output constraints.
Reported-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE will construct CPIO headers for images that have a non-empty
command line, thereby allowing raw images (without CPIO headers) to be
injected into a dynamically constructed initrd. This feature is
currently implemented within the BIOS-only bzImage format support.
Split out the CPIO header construction logic to allow for reuse in
other contexts such as in a UEFI build.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid using the "rdtsc" instruction unless profiling is enabled. This
allows the non-debug build of the UNDI driver to be used on a CPU such
as a 486 that does not support the TSC.
Reported-by: Nikolai Zhubr <n-a-zhubr@yandex.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The decompressor uses the i486 "bswap" instruction, but does not
require any instructions that exist only on i586 or above. Update the
".arch" directive to reflect the requirements of the code as
implemented.
Reported-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The INT 13 extensions provide a mechanism for accessing disks using
linear (LBA) rather than C/H/S addressing. SAN protocols such as
iSCSI invariably support only linear addresses and so iPXE currently
provides LBA access to all SAN disks (with autodetection and emulation
of an appropriate geometry for C/H/S accesses).
Most BIOSes will not report support for INT 13 extensions for floppy
disk drives, and some operating systems may be confused by a floppy
drive that claims such support.
Minimise surprise by reporting the existence of support for INT 13
extensions only for non-floppy drive numbers. Continue to provide
support for all drive numbers, to avoid breaking operating systems
that may unconditionally use the INT 13 extensions without first
checking for support.
Reported-by: Valdo Toost <vtoost@hot.ee>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no method for obtaining the number of PCI buses when using
PCIAPI_DIRECT, and we therefore currently scan all possible bus
numbers. This can cause a several-second startup delay in some
virtualised environments, since PCI configuration space access will
necessarily require the involvement of the hypervisor.
Ameliorate this situation by defaulting to scanning only a single bus,
and expanding the number of PCI buses to accommodate any subordinate
buses that are detected during enumeration.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "used" attribute can be applied only to functions or variables,
which prevents the use of __asmcall as a type attribute.
Fix by removing "used" from the definition of __asmcall for i386 and
x86_64 architectures, and adding explicit __used annotations where
necessary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ACPI API currently expects platforms to provide access to a single
contiguous ACPI table. Some platforms (e.g. Linux userspace) do not
provide a convenient way to obtain the entire ACPI table, but do
provide access to individual tables.
All iPXE consumers of the ACPI API require access only to individual
tables.
Redefine the internal API to make acpi_find() an API method, with all
existing implementations delegating to the current RSDT-based
implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building as a Linux userspace application, iPXE currently
implements its own system calls to the host kernel rather than relying
on the host's C library. The output binary is statically linked and
has no external dependencies.
This matches the general philosophy of other platforms on which iPXE
runs, since there are no external libraries available on either BIOS
or UEFI bare metal. However, it would be useful for the Linux
userspace application to be able to link against host libraries such
as libslirp.
Modify the build process to perform a two-stage link: first picking
out the requested objects in the usual way from blib.a but with
relocations left present, then linking again with a helper object to
create a standard hosted application. The helper object provides the
standard main() entry point and wrappers for the Linux system calls
required by the iPXE Linux drivers and interface code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the possibility of linking to platform libraries for the
Linux userspace build by adding an iPXE-specific symbol prefix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Recent versions of the GNU assembler (observed with GNU as 2.35 on
Fedora 33) will produce a warning message
Warning: no instruction mnemonic suffix given and no register
operands; using default for `bts'
The operand size affects only the potential range for the bit number.
Since we pass the bit number as an unsigned int, it is already
constrained to 32 bits for both i386 and x86_64.
Silence the assembler warning by specifying an explicit 32-bit operand
size (and thereby matching the choice that the assembler would
otherwise make automatically).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Assume that preservation of the %xmm registers is unnecessary during
installation of iPXE into memory, since this is an operation that by
its nature substantially disrupts large portions of the system anyway
(such as the E820 memory map). This assumption allows us to utilise
the existing CPUID code to check that FXSAVE/FXRSTOR are supported.
Test for support during the call to init_librm and store the flag for
use during subsequent calls to virt_call.
Reduce the scope of TIVOLI_VMM_WORKAROUND to affecting only the call
to check_fxsr(), to reduce #ifdef pollution in the remaining code.
Debugged-by: Johannes Heimansberg <git@jhe.dedyn.io>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The __asmcall declaration has no effect on a void function with no
parameters, but should be included for completeness since the function
is called directly from assembly code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic raw image prefix, which assumes that the iPXE image
has been loaded in its entirety on a paragraph boundary.
The resulting .raw image can be loaded via RPL using an rpld.conf file
such as:
HOST {
ethernet = 00:00:00:00:00:00/6;
FILE {
path="ipxe.raw";
load=0x2000;
};
execute=0x2000;
};
Debugged-by: Johannes Heimansberg <git@jhe.dedyn.io>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A zero-length initrd file will currently cause an endless loop during
reshuffling as the empty image is repeatedly swapped with itself.
Fix by terminating the inner loop before considering an image as a
candidate to be swapped with itself.
Reported-by: Pico Mitchell <pico@randomapplications.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split out the portions of cachedhcp.c that can be shared between BIOS
and UEFI (both of which can provide a buffer containing a previously
obtained DHCP packet, and neither of which provide a means to
determine the length of this DHCP packet).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of GNU ld (observed with binutils 2.36 on Arch Linux)
introduce a .note.gnu.property section marked as loadable at a high
address and with non-empty contents. This adds approximately 128MB of
garbage to the BIOS .usb disk images.
Fix by using a custom linker script for the prefix-only binaries such
as the USB disk partition table and MBR, in order to allow unwanted
sections to be explicitly discarded.
Reported-by: Christian Hesse <mail@eworm.de>
Tested-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The semantics of the assembler's .align directive vary by CPU
architecture. For the ARM builds, it specifies a power of two rather
than a number of bytes. This currently leads to the .einfo entries
(which do not appear in the final binary) having an alignment of 256
bytes for the ARM builds.
Fix by switching to the GNU-specific directive .balign, which is
consistent across architectures
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several distributions include versions of gcc that are patched to
create position-independent executables by default. These have caused
multiple problems over the years: see e.g. commits fe61f6d ("[build]
Fix compilation when gcc is patched to default to -fPIE -Wl,-pie"),
5de1346 ("[build] Apply the "-fno-PIE -nopie" workaround only to i386
builds"), 7c395b0 ("[build] Use -no-pie on newer versions of gcc"),
and decee20 ("[build] Disable position-independent code for ARM64 EFI
builds").
The build system currently attempts to work around these mildly broken
patched versions of gcc for the i386 and arm64 architectures. This
misses the relatively obscure bin-x86_64-pcbios build platform, which
turns out to also require the same workaround.
Attempt to preempt the next such required workaround by moving the
existing i386 version to apply to all platforms and all architectures,
unless -fpie has been requested explicitly by another Makefile (as is
done by arch/x86_64/Makefile.efi).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a few more ABSOLUTE() expressions to convince the FreeBSD linker
that already-absolute symbols are, in fact, absolute.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of objcopy will spuriously complain when asked to
extract the .zinfo section since doing so will nominally alter the
load addresses of the (non-loadable) .bss.* sections.
Avoid these warnings by placing the .zinfo section at the very end of
the load memory address space.
Allocate non-overlapping load memory addresses for the (non-loadable)
.bss.* sections, in the hope of avoiding spurious warnings about
overlapping load addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Consolidate the remaining logic common to initrd_init() and imgmem()
into a shared image_memory() function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise util/geniso, util/gensdsk, and util/genefidsk to create a
single script util/genfsimg that can be used to build either FAT
filesystem images or ISO images.
Extend the functionality to allow for building multi-architecture UEFI
bootable ISO images and combined BIOS+UEFI images.
For example:
./util/genfsimg -o combined.iso \
bin-x86_64-efi/ipxe.efi \
bin-arm64-efi/ipxe.efi \
bin/ipxe.lkrn
would generate a hybrid image that could be used as a CDROM (or hard
disk or USB key) on legacy BIOS, x86_64 UEFI, or ARM64 UEFI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The malloc_dma() function allocates memory with specified physical
alignment, and is typically (though not exclusively) used to allocate
memory for DMA.
Rename to malloc_phys() to more closely match the functionality, and
to create name space for functions that specifically allocate and map
DMA-capable buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define pci_ioremap() as a wrapper around ioremap() that could allow
for a non-zero address translation offset.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This change fixes the offset used when retrieving the iPXE stack
pointer after a COM32 binary returns. The iPXE stack pointer is saved
at the top of the available memory then the the top of the stack for
the COM32 binary is set just below it. However seven more items are
pushed on the COM32 stack before the entry point is invoked so when
the COM32 binary returns the location of the iPXE stack pointer is 28
(and not 24) bytes above the current stack pointer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This caused iPXE to reject images even when enough memory was
available.
Signed-off-by: David Decotigny <ddecotig@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only remaining use case in iPXE for the CPU direction flag is in
__memcpy_reverse() where it is set to allow the use of "rep movsb" to
perform the memory copy. This matches the equivalent functionality in
the EDK2 codebase, which has functions such as InternalMemCopyMem that
also temporarily set the direction flag in order to use "rep movsb".
As noted in commit d2fb317 ("[crypto] Avoid temporarily setting
direction flag in bigint_is_geq()"), some UEFI implementations are
known to have buggy interrupt handlers that may reboot the machine if
a timer interrupt happens to occur while the direction flag is set.
Work around these buggy UEFI implementations by using the
(unoptimised) generic_memcpy_reverse() on i386 or x86_64 UEFI
platforms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification states that the calling convention for IA-32
and x64 includes "Direction flag in EFLAGS is clear". This
specification covers only the calling convention used at the point of
calling functions annotated with EFIAPI. The specification explicitly
states that other functions (such as private functions or static
library calls) are not required to follow the UEFI calling
conventions.
The reference EDK2 implementation follows this specification. In
particular, the EDK2 interrupt handlers will clear the direction flag
before calling any EFIAPI functions, and will restore the direction
flag when returning from the interrupt handler. Some EDK2 private
library functions (most notably InternalMemCopyMem) may set the
direction flag temporarily in order to make efficient use of CPU
string operations.
The current implementation of iPXE's bigint_is_geq() for i386 and
x86_64 will similarly set the direction flag temporarily in order to
make efficient use of CPU string operations.
On some UEFI implementations (observed with a Getac RX10 tablet), a
timer interrupt that happens to occur while the direction flag is set
will reboot the machine. This very strongly indicates that the UEFI
timer interrupt handler is failing to clear the direction flag before
performing an affected operation (such as copying a block of memory).
Work around such buggy UEFI implementations by rewriting
bigint_is_geq() to avoid the use of string operations and so obviate
the requirement to temporarily set the direction flag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc (observed with the cross-compiling gcc 9.3.0 in
Ubuntu 20.04) default to enabling -fPIE. Experimentation shows that
this results in the emission of R_AARCH64_ADR_GOT_PAGE relocation
records for __stack_chk_guard. These relocation types are not
supported by elf2efi.c.
Fix by explicitly disabling position-independent code for ARM64 EFI
builds.
Debugged-by: Antony Messerli <antony@mes.ser.li>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the size of the USB disk image in the common case that
CONSOLE_INT13 is not enabled.
Originally-implemented-by: Romain Guyard <romain.guyard@mujin.co.jp>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is currently not possible to build the all-drivers iPXE binaries
for ARM, since there is no implementation for inb(), outb(), etc.
There is no common standard for accessing I/O space on ARM platforms,
and there are almost no ARM-compatible peripherals that actually
require I/O space accesses.
Provide dummy implementations that behave as though no device is
present (i.e. ignore writes, return all bits high for reads). This is
sufficient to allow the all-drivers binaries to link, and should cause
drivers to behave as though no I/O space peripherals are present in
the system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI Configuration Space contains fields prog-if at the offset 0x09,
sub-class at the offset 0x0a and base-class at the offset 0x0b (it
respects little endian). PCIR structure uses these fields in the same
order.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the subsystem IDs to be used when checking for PXE stacks with
broken interrupt support.
Suggested-by: Levi Hsieh <Levi.Hsieh@dell.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 6149e0a ("[librm] Provide symbols for inline code placed into
other sections") may cause build failures due to duplicate label names
if the compiler chooses to duplicate inline assembly code.
Fix by using the "%=" special format string to include a
guaranteed-unique number within the label name.
The "%=" will be expanded only if constraints exist for the inline
assembly. This fix therefore requires that all REAL_CODE() fragments
use a (possibly empty) constraint list.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide symbols constructed from the object name and line number for
code fragments placed into alternative sections, such as inline
REAL_CODE() assembly placed into .text16. This simplifies the
debugging task of finding the source code corresponding to a given
instruction pointer.
Note that we cannot use __FUNCTION__ since it is not a preprocessor
macro and so cannot be concatenated with string literals.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the underlying PXE stack reports an invalid IRQ number (above
IRQ_MAX), treat this as equivalent to an empty IRQ number and fall
back to using polling mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existence of MMX and SSE is required by the System V x86_64 ABI
and so is assumed by gcc, but these registers are not preserved by our
own interrupt handlers and are unlikely to be preserved by other
context switch handlers in a boot firmware environment.
Explicitly prevent gcc from using MMX or SSE registers to avoid
potential problems due to silent register corruption.
We must remove the %xmm0-%xmm5 clobbers from the x86_64 version of
hv_call() since otherwise gcc will complain about unknown register
names. Theoretically, we should probably add code to explicitly
preserve the %xmm0-%xmm5 registers across a hypercall, in order to
guarantee to external code that these registers remain unchanged. In
practice this is difficult since SSE registers are disabled by
default: for background information see commits 71560d1 ("[librm]
Preserve FPU, MMX and SSE state across calls to virt_call()") and
dd9a14d ("[librm] Conditionalize the workaround for the Tivoli VMM's
SSE garbling").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently perform various min-entropy calculations using build-time
floating-point arithmetic. No floating-point code ends up in the
final binary, since the results are eventually converted to integers
and asserted to be compile-time constants.
Though this mechanism is undoubtedly cute, it inhibits us from using
"-mno-sse" to prevent the use of SSE registers by the compiler.
Fix by using fixed-point arithmetic instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the ACPI power management timer to be used if enabled via
TIMER_ACPI in config/timer.h. This provides an alternative timer on
systems where the standard 8254 PIT is unavailable or unreliable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When DEBUG=librm_mgmt is enabled, intercept CPU exceptions and provide
a register and stack dump, then drop to an emergency shell. Exiting
from the shell will almost certainly not work, but this provides an
opportunity to view the register and stack dump and carry out some
basic debugging.
Note that we can intercept only the first 8 CPU exceptions, since a
PXE ROM is not permitted to rebase the PIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using "ld --oformat binary" for mbr.bin and usbdisk.bin seems to cause
segmentation faults on some versions of binutils (observed on Fedora
27). Work around this problem by using ld to create an intermediate
ELF object, followed by objcopy (via the existing %.tmp -> %.bin rule)
to create the final binary.
Note that we cannot simply use a single-stage "objcopy -O binary"
since this will not process the relocation records for x86_64: see
commit 1afcccd ("[build] Do not use "objcopy -O binary" for objects
with relocation records").
Reported-by: Brent S <bts@square-r00t.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add missing FILE_LICENCE declarations to x86_64 headers based on the
corresponding i386 headers (from which the x86_64 headers were
originally derived).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that all headers (PCI, UNDI, PnP, iPXE) are aligned to at least
four bytes, so that all accesses to header fields will be correctly
aligned even when reading directly from the expansion ROM BAR.
Reported-by: Peter von Konigsmark <peter@exablaze.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We must not steal ownership from the Gen 2 UEFI firmware, since doing
so will cause an immediate system crash (most likely in the form of a
reboot).
This problem was masked before commit a0f6e75 ("[hyperv] Do not fail
if guest OS ID MSR is already set"), since prior to that commit we
would always fail if we found any non-zero guest OS identity. We now
accept a non-zero previous guest OS identity in order to allow for
situations such as chainloading from iPXE to another iPXE, and as a
prerequisite for commit b91cc98 ("[hyperv] Cope with Windows Server
2016 enlightenments").
A proper fix would be to reverse engineer the UEFI protocols exposed
within the Hyper-V Gen 2 firmware and use these to bind to the VMBus
device representing the network connection, (with the native Hyper-V
driver moved to become a BIOS-only feature).
As an interim solution, fail to initialise the native Hyper-V driver
if we detect the guest OS identity known to be used by the Gen 2 UEFI
firmware. This will cause the standard all-drivers build (ipxe.efi)
to fall back to using the SNP driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EDK2 commit 6440385 ("MdePkg/Include: Add enumeration size checks to
Base.h") enforced the UEFI specification mandate that enums should
always be 32 bits. This revealed a latent bug in iPXE, which does not
build with -fno-short-enums.
Fix by adding -fno-short-enums to CFLAGS for ARM32 EFI builds.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The -mabi option was added in GCC 4.9. Test for the existence of this
option to allow for building with earlier versions of GCC.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some CPUID leaves (e.g. %eax=0x00000004), the result depends on
the input value of %ecx. Allow this subfunction number to be
specified as a parameter to the cpuid() wrapper.
The subfunction number is exposed via the ${cpuid/...} settings
mechanism using the syntax
${cpuid/<subfunction>.0x40.<register>.<function>}
e.g.
${cpuid/0.0x40.0.0x0000000b}
${cpuid/1.0x40.0.0x0000000b}
to retrieve the CPU topology information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some distributions patch gcc to generate position independent
executables by default. We currently include a workaround to check
for this and to add -fno-PIE -nopie to CFLAGS if required.
Newer patched versions of gcc require -fno-PIE -no-pie instead. Check
for both variants.
Reported-by: Nathan Rennie-Waldock <nathan.renniewaldock@gmail.com>
Originally-fixed-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When booting from a hard disk image (e.g. bin/ipxe.usb) within an
emulator such as QEMU, the disk may not exist beyond the end of the
image. Limit all reads to the length of the image to avoid spurious
errors when loading the iPXE image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader. This does not work when performing a SAN boot on a
machine where the EFI variable is not present.
Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting. For
example:
sanboot --filename \efi\redhat\grub.efi \
iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6
or
option ipxe.san-filename code 188 = string;
option ipxe.san-filename "\\efi\\redhat\\grub.efi";
option root-path "iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6";
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>