The gcc 8 compiler introduces a warning for certain string
manipulation functions, flagging usages which _may_ not be intended.
An audit of the iPXE sources indicates all usages of strncat and
strncpy are as intended, so the warnings currently issued are not
helpful, especially if warnings are considered errors.
Fix by detecting gcc's support for -Wno-stringop-truncation and, if
detected, using that option to avoid the warning.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Also-fixed-by: Christian Hesse <list@eworm.de>
Also-fixed-by: Roman Kagan <rkagan@virtuozzo.com>
Also-fixed-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Also-fixed-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As pointedly documented in RFC7230 section 2.3, HTTP is a stateless
protocol: each request message can be understood in isolation from any
other requests or responses. Various authentication schemes such as
NTLM break this fundamental property of HTTP and rely on the same TCP
connection being reused.
Work around these broken authentication schemes by ensuring that the
most recently pooled connection is reused for the subsequent
authentication retry.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the function mii_find() in order to locate the PHY address.
Signed-off-by: Sylvie Barlow <sylvie.c.barlow@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently have no generic concept of a PHY address, since all
existing implementations simply hardcode the PHY address within the
MII access methods.
A bit-bashing MII interface will need to be provided with an explicit
PHY address in order to generate the correct waveform. Allow for this
by separating out the concept of a MII device (i.e. a specific PHY
address attached to a particular MII interface).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the subsystem IDs to be used when checking for PXE stacks with
broken interrupt support.
Suggested-by: Levi Hsieh <Levi.Hsieh@dell.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The relocation type R_ARM_REL32 is generated when building
bin-arm32-efi/snp.efi using gcc 6.3 and ld 2.28.
R_ARM_REL32 is a program counter (PC) relative 32 bit relocation so we
can ignore it like all other PC relative relocations.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When booting some versions of the UEFI shell, our driver binding
protocol's Supported() entry point is called at TPL_NOTIFY for no
discernible reason. Attempting to raise to TPL_CALLBACK triggers an
immediate assertion failure in the firmware.
Since our Supported() method can run at any TPL, fix by simply not
attempting to raise the TPL within this method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Release SNP devices to allow the SAN booted image to use our
EFI_SIMPLE_NETWORK_PROTOCOL instance, and to ensure that the image is
started at TPL_APPLICATION.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The cipherstream xfer_window_changed() message is used to retrigger
the TLS transmit state machine. If the transmit state machine is
idle, then the window change message will not be propagated to the
plainstream interface. This can potentially cause the plainstream
interface peer (e.g. httpcore) to block waiting for a window change
message that will never arrive.
Fix by ensuring that the window change message is propagated to the
plainstream interface if the transmit state machine is idle. (If the
transmit state machine is not idle then the plainstream window will be
zero anyway.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In TLS terminology a session conceptually spans multiple individual
connections, and essentially represents the stored cryptographic state
(master secret and cipher suite) required to establish communication
without going through the certificate and key exchange handshakes.
Rename tls_session to tls_connection in order to make the name
tls_session available to represent the session state.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A failure in tls_generate_random() will result in a call to ref_put()
before the received data list has been initialised, which will cause
free_tls() to attempt to traverse an uninitialised list.
Fix by ensuring that all fields referenced by free_tls() are
initialised before any of the potential failure paths.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 6149e0a ("[librm] Provide symbols for inline code placed into
other sections") may cause build failures due to duplicate label names
if the compiler chooses to duplicate inline assembly code.
Fix by using the "%=" special format string to include a
guaranteed-unique number within the label name.
The "%=" will be expanded only if constraints exist for the inline
assembly. This fix therefore requires that all REAL_CODE() fragments
use a (possibly empty) constraint list.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide symbols constructed from the object name and line number for
code fragments placed into alternative sections, such as inline
REAL_CODE() assembly placed into .text16. This simplifies the
debugging task of finding the source code corresponding to a given
instruction pointer.
Note that we cannot use __FUNCTION__ since it is not a preprocessor
macro and so cannot be concatenated with string literals.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the underlying PXE stack reports an invalid IRQ number (above
IRQ_MAX), treat this as equivalent to an empty IRQ number and fall
back to using polling mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existence of MMX and SSE is required by the System V x86_64 ABI
and so is assumed by gcc, but these registers are not preserved by our
own interrupt handlers and are unlikely to be preserved by other
context switch handlers in a boot firmware environment.
Explicitly prevent gcc from using MMX or SSE registers to avoid
potential problems due to silent register corruption.
We must remove the %xmm0-%xmm5 clobbers from the x86_64 version of
hv_call() since otherwise gcc will complain about unknown register
names. Theoretically, we should probably add code to explicitly
preserve the %xmm0-%xmm5 registers across a hypercall, in order to
guarantee to external code that these registers remain unchanged. In
practice this is difficult since SSE registers are disabled by
default: for background information see commits 71560d1 ("[librm]
Preserve FPU, MMX and SSE state across calls to virt_call()") and
dd9a14d ("[librm] Conditionalize the workaround for the Tivoli VMM's
SSE garbling").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently perform various min-entropy calculations using build-time
floating-point arithmetic. No floating-point code ends up in the
final binary, since the results are eventually converted to integers
and asserted to be compile-time constants.
Though this mechanism is undoubtedly cute, it inhibits us from using
"-mno-sse" to prevent the use of SSE registers by the compiler.
Fix by using fixed-point arithmetic instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This is required to work around a bug in some firmware versions.
Signed-off-by: Ameer Mahagneh <ameerm@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the ACPI power management timer to be used if enabled via
TIMER_ACPI in config/timer.h. This provides an alternative timer on
systems where the standard 8254 PIT is unavailable or unreliable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some drivers are known to call the optional Map_Mem() callback without
first checking that the callback exists. Provide a usable basic
implementation of Map_Mem() along with the other callbacks that become
mandatory if Map_Mem() is provided.
Note that in theory the PCI I/O protocol is allowed to require
multiple calls to Map(), with each call handling only a subset of the
overall mapped range. However, the reference implementation in EDK2
assumes that a single Map() will always suffice, so we can probably
make the same simplifying assumption here.
Tested with the Intel E3522X2.EFI driver (which, incidentally, fails
to cleanly remove one of its mappings).
Originally-implemented-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The blocked link test in eth_slow_lacp_rx() is performed before the
actor TLV is copied to the partner TLV, and so must test the actor
state field rather than the partner state field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some CAs provide non-functional OCSP servers, and some clients are
forced to operate on networks without access to the OCSP servers.
Allow the user to explicitly disable the use of OCSP checks by
undefining OCSP_CHECK in config/crypto.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Limit the profile sample count to INT_MAX to avoid both signed
overflow and a potential division by zero when updating the stored
mean value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mark the link as blocked if the LACP partner is not reporting itself
as being in sync, collecting, and distributing.
This matches the behaviour for STP: we mark the link as blocked if we
detect that the switch is actively blocking traffic, in order to
extend the DHCP discovery period and so prevent boot failures on
switches that take an excessively long time to enable ports.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the global variable shomron_nodnic_supported, since it may have
different values for different PCI devices.
Originally-fixed-by: Mohammed Taha <mohammedt@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When DEBUG=librm_mgmt is enabled, intercept CPU exceptions and provide
a register and stack dump, then drop to an emergency shell. Exiting
from the shell will almost certainly not work, but this provides an
opportunity to view the register and stack dump and carry out some
basic debugging.
Note that we can intercept only the first 8 CPU exceptions, since a
PXE ROM is not permitted to rebase the PIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit c89a446 ("[efi] Run at TPL_CALLBACK to protect against UEFI
timers") introduced a regression in the EFI entropy gathering code.
When the EFI_RNG_PROTOCOL is not present, we fall back to using timer
interrupts (as for the BIOS build). Since timer interrupts are
disabled at TPL_CALLBACK, WaitForEvent() fails and no entropy can be
gathered.
Fix by dropping to TPL_APPLICATION while entropy gathering is enabled.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iSCSI root path may contain a literal IPv6 address. Update the
parser to handle this address format correctly.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As noted in the comments, UEFI manages to combines the all of the
worst aspects of both a polling design (inefficiency and inability to
sleep until something interesting happens) and of an interrupt-driven
design (the complexity of code that could be preempted at any time,
thanks to UEFI timers).
This causes problems in particular for UEFI USB keyboards: the
keyboard driver calls UsbAsyncInterruptTransfer() to set up a periodic
timer which is used to poll the USB bus. This poll may interrupt a
critical section within iPXE, typically resulting in list corruption
and either a hang or reboot.
Work around this problem by mirroring the BIOS design, in which we run
with interrupts disabled almost all of the time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reporting a completion via usb_complete() will pass control outside
the scope of xhci.c, and could potentially result in a further call to
xhci_event_poll() before returning from usb_complete(). Since we
currently update the event consumer counter only after calling
usb_complete(), this can result in duplicate completions and
consequent corruption of the submission TRB ring structures.
Fix by updating the event ring consumer counter before passing control
to usb_complete().
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The i219 appears to have a seriously broken reset mechanism. After
any transmit or receive activity, resetting the card will break both
the transmit and receive datapaths until the next PCI bus reset.
The Linux and BSD drivers include a convoluted workaround authored by
Intel which involves setting a bit in the undocumented FEXTNVM11
register, then transmitting a dummy 512-byte packet containing garbage
data, then reconfiguring the receive descriptor prefetch thresholds
and temporarily reenabling the receive datapath. The comments in the
Intel fix do not even remotely match what the code actually does, and
the code accidentally leaves the transmitter enabled after use.
Experimentation suggests that an equivalent fix is to simply set the
undocumented bit in FEXTNVM11 before enabling the transmit or receive
descriptor rings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Invalid protocol speed ID tables appear to be increasingly common in
the wild, to the point that it is infeasible to apply an explicit
XHCI_BAD_PSIV flag for each offending PCI device ID.
Fix by assuming an invalid PSI table as soon as any invalid value is
reported by the hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of gcc (observed with gcc 4.7.2) report a spurious
uninitialised variable warning in ena_get_device_attributes(). Work
around this warning by manually inlining the relevant code (which has
only a single call site).
Reported-by: xbgmsharp <xbgmsharp@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UNDI layer uses the NETDEV_IRQ_ENABLED flag to choose whether to
return PXENV_UNDI_ISR_OUT_OURS or PXENV_UNDI_ISR_OUT_NOT_OURS for a
given interrupt. For a network device that does not support
interrupts, the flag will never be set and so pxenv_undi_isr() will
always return PXENV_UNDI_ISR_OUT_NOT_OURS. This causes some NBPs
(such as lpxelinux.0) to hang.
Redefine NETDEV_IRQ_ENABLED as a simple administrative flag which can
be set even on network devices that do not support interrupts. This
allows pxenv_undi_isr() (which is the sole user of NETDEV_IRQ_ENABLED)
to function as expected by lpxelinux.0.
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most drivers do not utilise an MII interface, since the link state is
typically available directly from a memory-mapped register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using "ld --oformat binary" for mbr.bin and usbdisk.bin seems to cause
segmentation faults on some versions of binutils (observed on Fedora
27). Work around this problem by using ld to create an intermediate
ELF object, followed by objcopy (via the existing %.tmp -> %.bin rule)
to create the final binary.
Note that we cannot simply use a single-stage "objcopy -O binary"
since this will not process the relocation records for x86_64: see
commit 1afcccd ("[build] Do not use "objcopy -O binary" for objects
with relocation records").
Reported-by: Brent S <bts@square-r00t.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add missing FILE_LICENCE declarations to x86_64 headers based on the
corresponding i386 headers (from which the x86_64 headers were
originally derived).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The URIs printed as part of download progress messages are intended to
provide a quick visual progress indication to the user. Very long
query strings can render this visual indication useless in practice,
since the most important information (generally the URI host and path)
is drowned out by multiple lines of human-illegible URI-encoded data.
Omit the query string entirely from the download progress message.
For consistency and brevity, also omit the URI fragment along with the
username and password (which was previously redacted anyway).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The precise HTTP response status code is currently visible only at
DBGLVL_EXTRA. Allow for easier debugging by reporting the whole
status line at DBGLVL_LOG for any unsuccessful responses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Xen 4.4 includes the device "device/suspend/event-channel" which does
not have a "backend" key. This currently causes the entire XenBus
device tree probe to fail.
Fix by skipping probe attempts for device types for which there is no
iPXE driver.
Debugged-by: Eytan Heidingsfeld <eytanh@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow individual authentication schemes to parse WWW-Authenticate
headers that do not comply with RFC2617.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Servers may provide multiple WWW-Authenticate headers, each offering a
different authentication scheme. We currently fail the request as
soon as we encounter an unrecognised scheme, which prevents subsequent
offers from succeeding.
Fix by silently ignoring headers for schemes that we do not recognise.
If no schemes are recognised then the request will eventually fail
anyway due to the 401 response code.
If multiple schemes are supported, arbitrarily choose the scheme
appearing first within the response headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relocation type R_ARM_V4BX requires no computation. It marks the
location of an ARMv4 branch exchange instruction.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In fully self-contained deployments it may be desirable to build iPXE
with an empty CROSSCERT source to avoid talking to external services.
Add an explicit check for this case and make validator_start_download
fail immediately if the base URI is empty.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some HP BIOSes (observed with a Z840) seem to attempt to connect our
drivers in the middle of our call to DisconnectController(). The
precise chain of events is unclear, but the symptom is that we see
several calls to our Supported() and Start() methods, followed by a
system lock-up.
Work around this dubious BIOS behaviour by explicitly failing calls to
our Start() method while we are in the middle of attempting to
disconnect drivers.
Reported-by: Jordan Wright <jordan.m.wright@disney.com>
Debugged-by: Adrian Lucrèce Céleste <adrianlucrececeleste@airmail.cc>
Debugged-by: Christian Nilsson <nikize@gmail.com>
Tested-by: Jordan Wright <jordan.m.wright@disney.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When submitting binaries for UEFI Secure Boot signing, certain
known-dubious subsystems (such as 802.11 and NFS) must be excluded
from the build. Mark the directories containing these subsystems as
insecure, and allow the build target to include an explicit "security
flag" (a literal "-sb" appended to the build platform) to exclude
these source directories from the build process.
For example:
make bin-x86_64-efi-sb/ipxe.efi
will build iPXE with all code from the 802.11 and NFS subsystems
excluded from the build.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI BIOSes will deliberately break the implementation of
ConnectController() to return errors for devices that have been
"disabled" via the BIOS setup screen. (As an added bonus, such BIOSes
may return garbage EFI_STATUS values such as 0xff.)
Work around these broken UEFI BIOSes by ignoring failures and
continuing to attempt to connect any remaining handles.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification does not state whether or not a return value of
EFI_BUFFER_TOO_SMALL from the SNP Receive() method should follow the
usual EFI API behaviour of allowing the caller to retry the request
with an increased buffer size.
Examination of the SnpDxe driver in EDK2 suggests that Receive() will
just return the truncated packet (complete with any requested
link-layer header fields), so match this behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We do not currently check the length of the caller's buffer for
received packets. This creates a potential buffer overrun when iPXE
is being used via the SNP or UNDI protocols.
Fix by checking the buffer length and correctly returning the required
length and an EFI_BUFFER_TOO_SMALL error.
Reported-by: Paul McMillan <paul.mcmillan@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the underlying hardware address as a setting. For IPoIB
devices, this provides scripts with access to the Infiniband GUID.
Requested-by: Allen, Benjamin S. <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record and report the number of peers (calculated as the maximum
number of peers discovered for a block's segment at the time that the
block download is complete), and the percentage of blocks retrieved
from peers rather than from the origin server.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Checking for job progress is essentially a user interface activity,
and can safely be performed only once per timer tick (as is already
done with checking for keypresses).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some external code (such as the UEFI UNDI driver for the Realtek USB
NIC on a Microsoft Surface Book) will block during transmission
attempts and can take several seconds to report a transmit error. If
there is a large queue of pending transmissions, then the accumulated
time from a series of such failures can easily exceed the EFI watchdog
timeout, resulting in what appears to be a system lockup followed by a
reboot.
Work around this problem by immediately cancelling any pending
transmissions as soon as any transmit error occurs.
The only expected transmit error under normal operation is ENOBUFS
arising when the hardware transmit queue is full. By definition, this
can happen only for drivers that do not utilise deferred
transmissions, and so this new behaviour will not affect these
drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SnpDxe driver raises the task priority level to TPL_CALLBACK when
calling the UNDI entry point. This does not appear to be a documented
requirement, but we should probably match the behaviour of SnpDxe to
minimise surprises to third party code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The tap driver can retrieve a potentially unlimited number of packets
in a single poll. This can lead to heap exhaustion under heavy load.
Fix by imposing an artificial receive quota (as already used in other
drivers without natural receive limits).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calling discard_cache() is likely to result in a call to
free_memblock(), which will call valgrind_make_blocks_noaccess()
before returning. This causes valgrind to report an invalid read on
the next iteration through the loop in alloc_memblock().
Fix by explicitly calling valgrind_make_blocks_defined() after
discard_cache() returns. Also call valgrind_make_blocks_noaccess()
before calling discard_cache(), to guard against free list corruption
while executing cache discarders.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that all headers (PCI, UNDI, PnP, iPXE) are aligned to at least
four bytes, so that all accesses to header fields will be correctly
aligned even when reading directly from the expansion ROM BAR.
Reported-by: Peter von Konigsmark <peter@exablaze.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Setting BANNER_TIMEOUT to zero removes the only symbol reference to
shell.o, causing the "shell" command to become unavailable.
Add SHELL_CMD in config/general.h (enabled by default) which will
explicitly drag in shell.o regardless of the value of BANNER_TIMEOUT.
Reported-by: Julian Brost <julian@0x4a42.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We must not steal ownership from the Gen 2 UEFI firmware, since doing
so will cause an immediate system crash (most likely in the form of a
reboot).
This problem was masked before commit a0f6e75 ("[hyperv] Do not fail
if guest OS ID MSR is already set"), since prior to that commit we
would always fail if we found any non-zero guest OS identity. We now
accept a non-zero previous guest OS identity in order to allow for
situations such as chainloading from iPXE to another iPXE, and as a
prerequisite for commit b91cc98 ("[hyperv] Cope with Windows Server
2016 enlightenments").
A proper fix would be to reverse engineer the UEFI protocols exposed
within the Hyper-V Gen 2 firmware and use these to bind to the VMBus
device representing the network connection, (with the native Hyper-V
driver moved to become a BIOS-only feature).
As an interim solution, fail to initialise the native Hyper-V driver
if we detect the guest OS identity known to be used by the Gen 2 UEFI
firmware. This will cause the standard all-drivers build (ipxe.efi)
to fall back to using the SNP driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EDK2 commit 6440385 ("MdePkg/Include: Add enumeration size checks to
Base.h") enforced the UEFI specification mandate that enums should
always be 32 bits. This revealed a latent bug in iPXE, which does not
build with -fno-short-enums.
Fix by adding -fno-short-enums to CFLAGS for ARM32 EFI builds.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The inline assembly used in include/errno.h to generate the einfo
blocks requires the ability to generate an immediate constant with no
immediate-value prefix (such as the dollar sign for x86 assembly).
We currently achieve this via the undocumented "%c0" form of operand.
This causes an "invalid operand prefix" error on GCC 4.8 for ARM64
builds.
Fix by switching to the equally undocumented "%a0" form of operand,
which appears to work correctly on all tested versions of GCC.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The -mabi option was added in GCC 4.9. Test for the existence of this
option to allow for building with earlier versions of GCC.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification has an implicit and demonstrably incorrect
requirement (in the Mem_IO() calling convention) that any UNDI network
device has at most one memory BAR and one I/O BAR.
Some UEFI platforms have been observed to report the existence of
non-existent additional I/O BARs, causing iPXE to select the wrong
BAR. This problem does not affect the SnpDxe driver, since that
driver will always choose the lowest numbered existent BAR of each
type.
Adjust iPXE's behaviour to match that of SnpDxe, i.e. to always select
the lowest numbered BAR(s).
Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Debugged-by: Adklei <adklei@realtek.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx datapath is essentially identical to that of the SMSC75xx.
Expose the transmit, poll, and bulk IN endpoint operations to allow
for reuse by the LAN78xx driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx PHY interrupt source and mask registers do not match those
used by the SMSC75xx and SMSC95xx.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since we don't enable IOMMU at all, we can then simply enable the
IOMMU support by claiming the support of VIRITO_F_IOMMU_PLATFORM.
This fixes booting failure when iommu_platform is set from qemu cli.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The smsc75xx and smsc95xx drivers include a substantial amount of
identical functionality, varying only in the base address of register
sets. Abstract out this common functionality to allow code to be
shared between the drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support renegotiation with servers supporting RFC5746. This allows
for the use of per-directory client certificates.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use a zero language ID to retrieve strings such as the
ECM/NCM MAC address. This works on most hardware devices, but is
known to fail on some software emulated CDC-NCM devices.
Fix by using the first supported language ID, falling back to English
(0x0409) if any error occurs when fetching the list of supported
languages. This matches the behaviour of the Linux kernel.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our ASN.1 parsing code uses a struct asn1_cursor, while the object
construction code uses a struct asn1_builder. These structures are
identical apart from the const modifier applied to the data pointer in
struct asn1_cursor.
Provide asn1_built() to safely typecast a struct asn1_builder to a
struct asn1_cursor, allowing constructed objects to be passed to
functions expecting a struct asn1_cursor.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some CPUID leaves (e.g. %eax=0x00000004), the result depends on
the input value of %ecx. Allow this subfunction number to be
specified as a parameter to the cpuid() wrapper.
The subfunction number is exposed via the ${cpuid/...} settings
mechanism using the syntax
${cpuid/<subfunction>.0x40.<register>.<function>}
e.g.
${cpuid/0.0x40.0.0x0000000b}
${cpuid/1.0x40.0.0x0000000b}
to retrieve the CPU topology information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some distributions patch gcc to generate position independent
executables by default. We currently include a workaround to check
for this and to add -fno-PIE -nopie to CFLAGS if required.
Newer patched versions of gcc require -fno-PIE -no-pie instead. Check
for both variants.
Reported-by: Nathan Rennie-Waldock <nathan.renniewaldock@gmail.com>
Originally-fixed-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When booting from a hard disk image (e.g. bin/ipxe.usb) within an
emulator such as QEMU, the disk may not exist beyond the end of the
image. Limit all reads to the length of the image to avoid spurious
errors when loading the iPXE image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow values to be read from ACPI tables using the syntax
${acpi/<signature>.<index>.0.<offset>.<length>}
where <signature> is the ACPI table signature as a 32-bit hexadecimal
number (e.g. 0x41504093 for the 'APIC' signature on the MADT), <index>
is the index into the array of tables matching this signature,
<offset> is the byte offset within the table, and <length> is the
field length in bytes.
Numeric values are returned in reverse byte order, since ACPI numeric
values are usually little-endian.
For example:
${acpi/0x41504943.0.0.0.0} - entire MADT table in raw hex
${acpi/0x41504943.0.0.0x0a.6:string} - MADT table OEM ID
${acpi/0x41504943.0.0.0x24.4:uint32} - local APIC address
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When performing a SAN boot, the plainstream window size will be zero
(since this is the mechanism used internally to indicate that no data
should be fetched via the initial request). This zero value currently
propagates to the advertised TCP window size, which prevents the TLS
negotiation from completing.
Fix by ensuring that the cipherstream window is held open until TLS
negotiation is complete, and only then falling back to passing through
the plainstream window size.
Reported-by: John Wigley <johnwigley#ipxe@acorna.co.uk>
Tested-by: John Wigley <johnwigley#ipxe@acorna.co.uk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that efi_systab is an undefined symbol in non-EFI builds. In
particular, this prevents users from incorrectly enabling IMAGE_EFI in
a BIOS build of iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Xen network backend (xen-netback) suffered from a regression
between upstream Linux kernels 3.18 and 4.2 inclusive, which would
cause packet reception to fail unless at least 18 receive buffers were
available. This bug was fixed in kernel commit 1d5d485 ("xen-netback:
require fewer guest Rx slots when not using GSO").
Work around this bug in affected versions of xen-netback by providing
the requisite 18 receive buffers.
Reported-by: Taylor Schneider <tschneider@live.com>
Tested-by: Taylor Schneider <tschneider@live.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 7cfdd76 ("[block] Describe all SAN devices via ACPI tables")
changed the definition of the iSCSI initiator IQN in the iBFT to
represent a common initiator IQN used for all iSCSI sessions, and
attempted to calculate this common initiator IQN by fetching the
common ${initiator-iqn} setting.
This fails when no explicit ${initiator-iqn} has been specified
(i.e. when an initiator IQN has instead been constructed from either
the hostname or system UUID), and results in an empty initiator IQN in
the iBFT.
Fix by using the initiator IQN of an arbitrary iSCSI session
present in the iBFT.
Debugged-by: Tal Aloni <tal.aloni.il@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of kernel 4.11, the LIO target will propose a value for
FirstBurstLength if the initiator did not do so. This is entirely
redundant in our case, since FirstBurstLength is defined by RFC 3720
to be
"Irrelevant when: ( InitialR2T=Yes and ImmediateData=No )"
and we already enforce both InitialR2T=Yes and ImmediateData=No in our
initial proposal. However, LIO (arguably correctly) complains when we
do not respond to its redundant proposal of an already-irrelevant
value.
Fix by always proposing the default value for FirstBurstLength.
Debugged-by: Patrick Seeburger <info@8bit.de>
Tested-by: Patrick Seeburger <info@8bit.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the PCI bus:dev.fn address in debug messages, falling back to the
EFI handle name only if we do not yet have enough information to
determine the bus:dev.fn address.
Include the vendor and device IDs in debug messages when no suitable
driver is found, to match the diagnostics available in a BIOS
environment.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When performing a SAN boot via INT 13, there is no way for the
operating system to indicate that it has finished using the INT 13 SAN
device. We therefore have no opportunity to clean up state before the
loaded operating system's native drivers take over. This can cause
problems when booting Windows, which tends not to be forgiving of
unexpected system state.
Windows will typically write a flag to the SAN device as the last
action before transferring control to the native drivers. We can use
this as a heuristic to bring the system to a quiescent state (without
performing a full shutdown); this provides us an opportunity to
temporarily clean up state that could otherwise prevent a successful
Windows boot.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On most Intel NICs, Auto-Speed Detection Enable (ASDE) can be used to
automatically detect the correct link speed by sampling the link using
the internal PHY. This feature is automatically inhibited when not
appropriate for the physical link (e.g. when using internal SerDes
mode on the 8254x).
On the i350 datasheet ASDE is a reserved bit, but the relevant
auto-speed detection hardware appears still to be present. However,
enabling ASDE on the i350 1000BASE-KX backplane NIC seems to cause an
immediate link failure. It is possible that the auto-speed detection
hardware is still present, is not connected to a physical link, and is
not inhibited from being applied in this mode.
Work around this problem by adding an INTEL_NO_ASDE flag bit
(analogous to INTEL_NO_PHY_RST), and applying this for the i350
backplane NIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In situations where iPXE fails to reach link-up as expected, it is
useful to know the original values of the CTRL and STATUS registers
prior to our reset attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader. This does not work when performing a SAN boot on a
machine where the EFI variable is not present.
Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting. For
example:
sanboot --filename \efi\redhat\grub.efi \
iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6
or
option ipxe.san-filename code 188 = string;
option ipxe.san-filename "\\efi\\redhat\\grub.efi";
option root-path "iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6";
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Following changes were introduced:
- added GetBgxProp and GetLmacProp methods to ThunderxConfigProtocol
- replaced direct BOARD_CFG access with usage of introduced methods
- removed redundant BOARD_CFG
- changed GUID of ThunderxConfigProtocol, as this is not compatible
with previous version
- changed UINTN* to UINT64* buffer type to fix issue on 32-bit
platforms with MAC address
This change allows us to avoid alignment of BOARD_CFG definitions
every time it changes in UEFI.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TEST UNIT READY command is issued automatically when the device is
opened, and is not the result of a command being issued by the caller.
This is required in order that a permanent TEST UNIT READY failure can
be used to identify unusable paths in a multipath SAN device.
Since the TEST UNIT READY command is not part of the caller's command
issuing process, it is not covered by any external retry loops (such
as the main retry loop in sandev_command()).
We must therefore be prepared to retry the TEST UNIT READY command
within the SCSI layer itself. We retry only the TEST UNIT READY
command so as not to multiply the number of potential retries for
normal commands (which are already retried by sandev_command()).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
HTTP implements xfer_window_changed() on the underlying server
connection using http_step(), which does not propagate the window
change notification to the data transfer interface. This breaks the
multipath-capable SAN boot code, which relies on the window change
notification to discover that the HTTP block device is ready for
commands to be issued.
Fix by sending xfer_window_changed() in http_step() once the
underlying connection has been determined to be ready.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some block device protocols, the active path may continue to
receive xfer_window_changed() notifications during normal use. These
currently result in the active path being erroneously closed.
Fix by ignoring any xfer_window_changed() messages if this path is
already the active path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For multipath SAN devices, verify that the device is capable of being
opened (i.e. that all URIs are parseable and that at least one path is
alive) and thereafter retry indefinitely to reopen the device as
needed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When all SAN targets are completely unreachable, there will be a
natural delay between reopening attempts due to the network connection
timeout on the unreachable targets.
However, some SAN targets may accept connections instantly and report
a temporary unavailability by e.g. failing the TEST UNIT READY
command. If all targets are behaving this way then there will be no
natural delay, and we will attempt to saturate the network with
connection attempts.
Fix by introducing a small delay between attempts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the SAN retry count to be configured via the ${san-retry}
setting, defaulting to the current value of 10 retries if not
specified.
Note that setting a retry count of zero is inadvisable, since iSCSI
targets in particular will often report spurious errors such as "power
on occurred" for the first few commands.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The INT13 console type (CONSOLE_INT13) autodetects at initialisation
time a magic partition to be used for logging iPXE console output. If
the INT13 drive number mapping is subsequently changed (e.g. because
iPXE was used to perform a SAN boot), then the console logging output
will be written to the incorrect disk.
Fix by recording the INT13 vector at initialisation time, and using
this original vector to emulate INT13 calls for all subsequent
accesses. This should be robust against drive remapping performed
either by ourselves or by another bootloader (e.g. a chainloaded
undionly.kpxe which then performs a SAN boot).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some partition tables have partitions that are not aligned to a
cylinder boundary, which confuses the current geometry guessing logic.
Enhance the existing logic to ensure that we never reduce our guesses
for the number of heads or sectors per track, and add extra logic to
calculate the exact number of sectors per track if we find a partition
that starts within cylinder zero.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for multipath block devices. The "sanboot" and
"sanhook" commands now accept a list of SAN URIs. We open all URIs
concurrently. The first connection to become available for issuing
block device commands is marked as the active path and used for all
subsequent commands; all other connections are then closed. Whenever
the active path fails, we reopen all URIs and repeat the process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a dummy SAN device which allows the "sanhook" command to be tested
even when no SAN booting capability is present on the platform. This
allows substantial portions of the SAN boot code to be run in Linux
under Valgrind.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the TEST UNIT READY command receives an error response, the
shutdown of the command's block data interface will result in
scsidev_ready() closing the SCSI device. This will subsequently
result in a duplicate call to scsicmd_close(), leading to an assertion
failure when list_del() is called for the second time.
Fix by removing the command from the list of outstanding commands
before shutting down the command's interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The eIPoIB translation layer needs to translate outbound ARP packets
from Ethernet to IPoIB. A 64-byte buffer (starting with the Ethernet
header) does not provide enough tailroom to expand to hold the two
20-byte IPoIB MAC addresses. The result is that an UNDI API user will
be unable to send ARP packets.
We could potentially shuffle the packet contents to reuse the space
occupied by the stripped Ethernet link-layer header, but this would
add complexity. Instead, fix by increasing the minimum allocation
size to 128 bytes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The value of ( ( x & 0x0c00 ) | 0x0c00 ) is always 0x0c00 regardless
of the value of x, and so the read_csr() is redundant. (There are no
read side effects for this register, according to the datasheet.)
This line of code originated in Linux kernel 2.3.19pre1 as
a->write_csr(ioaddr, 80, a->read_csr(ioaddr, 80) | 0x0c00);
and was modified in kernel 2.3.41pre4 to read
a->write_csr(ioaddr, 80, (a->read_csr(ioaddr, 80) & 0x0C00) | 0x0c00);
In the absence of commit messages, the intention of the code is
unclear. However, the logic resulting in a fixed value of 0x0c00 has
remained unaltered for over 17 years, and can probably be assumed to
have the correct overall result.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Track the current and maximum heap usage, and display the maximum
during shutdown when DEBUG=malloc is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our asprintf() implementation guarantees that strp will be NULL on
allocation failure, but this is not standard behaviour. Detect errors
by checking for a negative return value instead of a NULL pointer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid potential division by zero when performing the check against
multiplication overflow. (Note that if the width is zero then there
can be no overflow anyway, so it is then safe to bypass the check.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For visual consistency with surrounding lines, the definitions of
DBG_MORE(), DBG_PAUSE(), etc include an unnecessary ##__VA_ARGS__
argument which is always elided. This confuses sparse, which
complains about DBG_MORE_IF() being called with more than one
argument.
Work around this problem by adding an unused variable argument list to
the single-argument macros DBG_MORE_IF() and DBG_PAUSE_IF().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Report errors in eoib_duplicate() via netdev_tx_err() rather than
netdev_tx_complete_err(), since netdev_tx_complete_err() accepts only
valid I/O buffers that are currently in the network device's transmit
queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit 10d19bd ("[pxe] Always retrieve cached DHCPACK and apply
to relevant network device"), the UNDI driver has been the only user
of pxeparent_call(). Remove the unnecessary layer of abstraction by
refactoring this code back into undinet.c, and fix the ability of
undiisr.S to fall back to chaining to the original handler if we were
unable to unhook our own ISR.
This effectively reverts commit 337e1ed ("[pxe] Separate parent PXE
API caller from UNDINET driver").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently request cable detection in PXE_OPCODE_INITIALIZE to work
around buggy Emulex drivers (see commit c0b61ba ("[efi] Work around
bugs in Emulex NII driver")).
This causes problems with some other NII drivers (e.g. Mellanox),
which may time out if the underlying link is intrinsically slow to
come up.
Attempt to work around both problems simultaneously by requesting
cable detection only if the underlying NII driver does not support
link status reporting via PXE_OPCODE_GET_STATUS. (This is based on a
potentially incorrect assumption that the buggy Emulex drivers do not
claim to report link status via PXE_OPCODE_GET_STATUS.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a basic proof of concept ACPI table description (e.g. iBFT for
iSCSI) for SAN devices in a UEFI environment, using a control flow
that is functionally identical to that used in a BIOS environment.
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several files define the ARRAY_SIZE() macro as used in Linux. Provide
a common definition for this in include/compiler.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some iSCSI targets send NOP-In. Rather than closing the connection
when we receive one, it is more user friendly to log a debug message
and keep the connection open. Eventually, it would be nice if iPXE
supported replying to NOP-Ins, but we might as well keep the
connection open until the target disconnects us.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some VF data is not cleared with reset, so make sure to return all the
settings to default before configuring the VF.
This fixes an issue where network packets would fail to be received if
the VF was previously used by the linux ixgbevf driver.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a SCSI device is closed in error, the shutdown of the device's
block data interface will probably lead to any outstanding commands
being closed (by whichever object is currently connected to the block
data interface). However, commands remain in the list of outstanding
commands until the final reference is dropped. The result is that
scsidev_close() will make a second call to scsicmd_close() for each
command. This is harmless, but produces confusing debug messages.
Fix by treating the outstanding command list as holding an explicit
reference to each command, and removing the command from the list of
outstanding commands in scsicmd_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SCSI layer currently implements a retry loop in order to retry
commands that fail due to spurious "error" conditions such as "power
on occurred". Move this retry loop to the generic SAN device layer:
this allow for retries due to other transient error conditions such as
an iSCSI target having dropped the connection due to inactivity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The concept of the SAN drive number is meaningful only in a BIOS
environment, where it represents the INT13 drive number (0x80 for the
first hard disk). We retain this concept in a UEFI environment to
allow for a simple way for iPXE commands to refer to SAN drives.
Centralise the concept of the default drive number, since it is shared
between all supported environments.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to ThunderX Errata G-17560, NIC_PF_CFG[ENA] bit should not
be cleared at exit. This allows other drivers to access the NIC regs
correctly.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is required to reset BGX context state for the LMAC using
BGX_CMR_CONFIG register.
This solves problem with network connectivity in Linux booted from
iPXE.
Signed-off-by: Bartosz Szczepanek <bartosz.szczepanek@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use intfs_shutdown() and intfs_restart() to cleanly shut down multiple
interfaces that may loop back to the same object.
This fixes a regression introduced by commit daa8ed9 ("[interface]
Provide intf_reinit() to reinitialise nullified interfaces") which
broke the use of HTTP Basic and Digest authentication.
Reported-by: murmansk <murmansk@hotmail.com>
Reported-by: Brett Waldo <brettwaldo@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Shutting down (and optionally restarting) multiple interfaces is
fraught with problems if there are loops in the interface connectivity
(e.g. the HTTP content-decoded and transfer-decoded interfaces, which
will generally loop back to each other). Various workarounds
currently exist across the codebase, generally involving preceding
calls to intf_nullify() to avoid problems due to known loops.
Provide intfs_shutdown() and intfs_restart() to allow all of an
object's interfaces to be shut down (or restarted) in a single call,
without having to worry about potential external loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the current wall-clock time (in seconds since the Epoch), since
this is often useful in captured boot logs and can also be useful when
checking unexpected X.509 certificate validation failures.
Use a :uint32 setting to avoid Y2K38 rollover, thereby ensuring that
this will eventually be somebody else's problem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Malte zu Klampen <malte@pclab.ifg.uni-kiel.de>
Originally-implemented-by: Richard Moore <rich@richud.com>
Tested-by: Esben Storgaard Nielsen <esn@solar.dk>
Signed-off-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
INT 13 calls return a status value via %ah, with CF set if %ah is
non-zero (indicating an error). Our wrappers zero the whole of %ax if
CF is clear, to allow C code (which has no easy access to CF) to
simply test for a non-zero status to detect an error.
The current code assigns the returned status to a uint8_t, effectively
testing %al rather than %ah. Fix by treating the returned status as a
uint16_t instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid using a zero sector count to guess the disk geometry, since that
would result in a division by zero when calculating the number of
cylinders.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running on AMD platforms, the legacy hardware emulation is
extremely unreliable. In particular, the IRQ0 timer interrupt is
likely to simply stop working, resulting in a total failure of any
code that relies on timers (such as DHCP retransmission attempts).
Work around this by using the 10MHz time counter provided by Hyper-V
via an MSR. (This timer can be tested in KVM via the command-line
option "-cpu host,hv_time".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the active timer (providing udelay() and currticks()) to be
selected at runtime based on probing during the INIT_EARLY stage of
initialisation.
TICKS_PER_SEC is now a fixed compile-time constant for all builds, and
is independent of the underlying clock tick rate. We choose the value
1024 to allow multiplications and divisions on seconds to be converted
to bit shifts.
TICKS_PER_MS is defined as 1, allowing multiplications and divisions
on milliseconds to be omitted entirely. The 2% inaccuracy in this
definition is negligible when using the standard BIOS timer (running
at around 18.2Hz).
TIMER_RDTSC now checks for a constant TSC before claiming to be a
usable timer. (This timer can be tested in KVM via the command-line
option "-cpu host,+invtsc".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Separate out the concept of "hardware maximum supported frame length"
and "configured link MTU", and limit the latter according to the
former.
In networks where the DHCP-supplied link MTU is inconsistent with the
hardware or driver capabilities (e.g. a network using jumbo frames),
this will result in iPXE advertising a TCP MSS consistent with a size
that can actually be received.
Note that the term "MTU" is typically used to refer to the maximum
length excluding the link-layer headers; we adopt this usage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The call to intf_close() may result in the original interface being
reopened. For example: when reading the capacity of a 2TB+ disk via
iSCSI, the SCSI layer will respond to the intf_close() from the READ
CAPACITY (10) command by immediately issuing a READ CAPACITY (16)
command. The iSCSI layer happens to reuse the same interface for the
new command (since it allows only a single concurrent command).
Currently, intf_shutdown() unplugs the interface after the call to
intf_close() returns. In the above scenario, this results in
unplugging the just-reopened interface.
Fix by transferring the interface destination (and its reference) to a
temporary interface, and so effectively performing the unplug before
making the call to intf_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The x86_64 EDK2 headers include a #pragma to mark all subsequent
symbol declarations and references as hidden if position-independent
code is being generated. Since libgen.h is currently included only
after the EDK2 headers, this results in __xpg_basename() being
erroneously marked as having hidden visibility (if the compiler
defaults to building position-independent code); this eventually
results in a failure to link the elf2efi binary.
Fix by including libgen.h prior to including the EDK2 headers.
Originally-fixed-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some high-end Azure instances (e.g. NC6) we may receive an
unsolicited VMBUS_OFFER_CHANNEL message for a PCIe pass-through device
some time after completing the bus enumeration. This currently causes
apparently random failures due to unexpected VMBus message types.
Fix by ignoring any unsolicited VMBus messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some problems arise only when running on a specific CPU type (e.g.
non-functional timer interrupts as observed in Azure AMD instances).
Include the CPU vendor and model within the sample cloud boot scripts,
to assist in debugging such problems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a settings applicator to modify netdev->max_pkt_len in
response to changes to the "mtu" setting (DHCP option 26).
Note that as with MAC address changes, drivers are permitted to
completely ignore any changes in the MTU value. The net result will
be that iPXE effectively uses the smaller of either the hardware
default MTU or the software configured MTU.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some unspecified "security" reason, the Google Compute Engine
metadata server will refuse any requests that do not include the
non-standard HTTP header "Metadata-Flavor: Google".
Attempt to autodetect such requests (by comparing the hostname against
"metadata.google.internal"), and add the "Metadata-Flavor: Google"
header if applicable.
Enable this feature in the CONFIG=cloud build, and include a sample
embedded script allowing iPXE to boot from a script configured as
metadata via e.g.
# Create shared boot image
make bin/ipxe.usb CONFIG=cloud EMBED=config/cloud/gce.ipxe
# Configure per-instance boot script
gcloud compute instances add-metadata <instance> \
--metadata-from-file ipxeboot=boot.ipxe
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some host implementations (notably Google Compute Platform) are known
to unconditionally write back VIRTIO_NET_HDR_F_DATA_VALID to
header->flags for received packets, regardless of the features
negotiated by the driver. This breaks the transmit datapath by
effectively setting an illegal flag for all subsequent transmitted
packets.
Work around this problem by using separate empty header buffers for
the receive and transmit queues.
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This code largely inspired by tap.c. Allows for testing iPXE on real
NICs from within Linux. For example:
make bin-x86_64-linux/af_packet.linux
valgrind ./bin-x86_64-linux/af_packet.linux --net af_packet,if=eth3
Tested as x86_64 and i386 binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 0.9 implementation was limited to the maximum virtqueue size of
MAX_QUEUE_NUM and the virtio-net driver would fail to initialize on hosts
exceeding this limit.
This commit lifts the restriction by allocating the queue memory based on
the actual queue size instead of using a fixed maximum. Note that virtio
1.0 still uses the MAX_QUEUE_NUM constant to cap the size (unfortunately
this functionality is not available in virtio 0.9).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit introduces virtnet_free_virtqueues called on all virtqueue
error and shutdown paths. vpm_find_vqs no longer cleans up after itself
and instead expects virtnet_free_virtqueues to be always called to undo
its effect.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
vpm_find_vqs incorrectly accepted the host provided queue size with no
regard to iPXE's internal limitations. Virtio 1.0 makes it possible for
the driver to override the queue size to reduce memory requirements and
iPXE is a great use case for this feature.
Also removing the extra vq->vring.num assignment which is already
handled in vring_init.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ISC Kea DHCP server transmits its DHCPOFFER as a unicast packet
with a broadcast IPv4 destination address (255.255.255.255). This
combination is currently rejected by iPXE.
Fix by explicitly accepting the local network broadcast address
(255.255.255.255) as a valid unicast destination address.
Reported-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Tested-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Updates:
- Nodnic: Support for arm cq doorbell via the UAR BAR
- Ensure hardware is quiescent when no interface is open - WinPE WA
- Support for clear interrupt via BAR
- Nodnic: Support for send TX doorbells via the UAR BAR
- Added ConnectX-5EX device
- Added ConnectX-5 device
Signed-off-by: Raed Salem <raeds@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no clean way for device drivers to shut down in
preparation for handover to a booted operating system. The platform
firmware simply doesn't bother to call the drivers' Stop() methods.
Instead, drivers must register an EVT_SIGNAL_EXIT_BOOT_SERVICES event
to be signalled when ExitBootServices() is called, and clean up
without any reference to the EFI driver model.
Unfortunately, all timers silently stop working when ExitBootServices()
is called. Even more unfortunately, and for no discernible reason,
this happens before any EVT_SIGNAL_EXIT_BOOT_SERVICES events are
signalled. The net effect of this entertaining design choice is that
any timeout loops on the shutdown path (e.g. for gracefully closing
outstanding TCP connections) may wait indefinitely.
There is no way to report failure from currticks(), since the API
lazily assumes that the host system continues to travel through time
in the usual direction. Work around EFI's violation of this
assumption by falling back to a simple free-running monotonic counter.
Debugged-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When searching for an UNDI ROM to match against a PCI device, search
in order of increasing ROM address (within the 128kB BIOS option ROM
area). This is likely (though not guaranteed) to match the order of
the original enumeration performed by the BIOS, which is in turn
likely to match the order of enumeration on the PCI bus.
Since we load at most one UNDI ROM, the net result is that we increase
our chances of loading the ROM corresponding to the selected PCI
device (rather than loading a ROM corresponding to a higher-numbered
PCI device with the same vendor and device IDs.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "progress" macro can be used only from within the .prefix section.
At the point of calling relocate(), we are running in .text16 and so
the near call to print_message() will end up calling a random function
somewhere in .text16.
Interestingly, this problem has remained unnoticed for some time. It
is rare to build with DEBUG=libprefix. In the few cases that it has
been used during development, the randomly selected function in
.text16 seems to have been a harmless no-op with no visible
side-effects (beyond the unnoticed failure to print the "relocate"
progress message).
Fix by removing the futile attempt to print a progress message before
calling relocate().
Reported-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix the <NULL> driver name reported by "ifstat" when using the undipci
driver (due to the unnecessary extra device node inserted as a child
of the PCI device).
Remove the "UNDI-" prefix from device names since the driver name is
also now visible via "ifstat", and tidy up the device name to match
the format used by standard PCI devices.
The output from "ifstat" now resembles:
iPXE> ifstat
net0: 52:54:00:12:34:56 using undipci on 0000:00:03.0
iPXE> ifstat
net0: 52:54:00:12:34:56 using undionly on 0000:00:03.0
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UNDI loader entry point is very likely to be called after POST,
when there is a high chance that the PMM-allocated image source area
and decompression area have been reused by something else.
In particular, using an iPXE .iso to test a separate iPXE ROM's UNDI
loader entry point in a qemu VM is likely to crash. SeaBIOS allocates
PMM blocks from close to the top of memory and so these blocks have a
high chance of colliding with the runtime addresses subsequently
chosen by the non-ROM iPXE by scanning the INT 15,e820 memory map.
The standard romprefix.S has no choice about relying on the
PMM-allocated image source area, since it has no other way to retrieve
its compressed payload.
In mromprefix.S, the image source area functions only as an optional
buffer used to avoid repeated reads from the (potentially slow)
expansion ROM BAR by the decompression code. We can therefore always
set %esi=0 when calling install_prealloc from the UNDI loader entry
point, and simply fall back to reading directly from the expansion ROM
BAR.
We can always set %edi=0 when calling install_prealloc from the UNDI
loader entry point. This will behave as though the decompression area
PMM allocation failed, and will therefore use INT 15,88 to find a
temporary decompression area somewhere close to 64MB. This is by no
means guaranteed to be safe from collisions, but it's probably safer
on balance than the PMM-allocated address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allocate base memory (by decreasing the free base memory counter)
before calling the UNDI loader entry point, to minimise surprises for
the UNDI loader code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The command and data interfaces may be connected to the same object.
Nullify the data interface before shutting down the control interface
to avoid potential infinite loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 71560d1 ("[librm] Preserve FPU, MMX and SSE state across calls
to virt_call()") added FXSAVE and FXRSTOR instructions to iPXE. In
KVM virtual machines, these instructions execute fine as long as the
host CPU supports the "unrestricted_guest" feature (that is, it can
virtualize big real mode natively). On older host CPUs however, KVM
has to emulate big real mode, and it currently doesn't implement
FXSAVE emulation.
Upstream QEMU rebuilt iPXE at commit 0418631 ("[thunderx] Fix
compilation with older versions of gcc") which is a descendant of
commit 71560d1 (see above).
This was done in QEMU commit ffdc5a2 ("ipxe: update submodule from
4e03af8ec to 041863191"). The resultant binaries were bundled with
the QEMU v2.7.0 release; see QEMU commit c52125a ("ipxe: update
prebuilt binaries").
This distributed the iPXE workaround for the Tivoli VMM bug to a
number of KVM users with old host CPUs, causing KVM emulation failures
(guest crashes) for them while netbooting.
Make the FXSAVE and FXRSTOR instructions conditional on a new feature
test macro called TIVOLI_VMM_WORKAROUND. Define the macro by default.
There is prior art for an assembly file including config/general.h:
see arch/x86/prefix/romprefix.S. Also, TIVOLI_VMM_WORKAROUND seems to
be a good fit for the "Obscure configuration options" section in
config/general.h.
Cc: Bandan Das <bsd@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Greg <rollenwiese@yahoo.com>
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Michael Prokop <launchpad@michael-prokop.at>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Pickford <arch@netremedies.ca>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Ref: https://bugs.archlinux.org/task/50778
Ref: https://bugs.launchpad.net/qemu/+bug/1623276
Ref: https://bugzilla.proxmox.com/show_bug.cgi?id=1182
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1356762
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The initrd_addr_max field represents the highest byte address that may
be used to hold initrd images, and is therefore almost certainly not
aligned to a page boundary: a typical value might be 0x7fffffff.
Fix the address calculations to ensure that the initrd images are
always aligned to a page boundary.
Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AppleNetBoot.h is not taken from the EDK2 codebase and so cannot be
imported using include/ipxe/efi/import.pl. Mark as a native iPXE
header (by changing the include guard) to avoid breaking the import
process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow certificates to be marked as having been added explicitly at run
time. Such certificates will not be discarded via the certificate
store cache discarder.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable IMAGE_PNG (but not IMAGE_PNM) by default, and drag in the
relevant objects only when image_pixbuf() is present in the binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable both IMAGE_DER and IMAGE_PEM by default, and drag in the
relevant objects only when image_asn1() is present in the binary.
This allows "imgverify" to transparently use either DER or PEM
signature files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit b1caa48 ("[crypto] Support SHA-{224,384,512} in X.509
certificates"), the list of supported cryptographic algorithms is
controlled by config/crypto.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PEM-encoded ASN.1 as an image format. We accept as PEM any image
containing a line starting with a "-----BEGIN" boundary marker.
We allow for PEM files containing multiple ASN.1 objects, such as a
certificate chain produced by concatenating individual certificate
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add DER-encoded ASN.1 as an image format. There is no fixed signature
for DER files. We treat an image as DER if it comprises a single
valid SEQUENCE object covering the entire length of the image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow code to create a partial ASN.1 cursor containing only the type
and length bytes, so that asn1_start() may be used to determine the
length of a large ASN.1 blob without first allocating memory to hold
the entire blob.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Windows drivers for VMBus devices are enumerated using the
instance UUID rather than the channel number. Include the instance
UUID within the iPXE device name to allow an iPXE network device to be
more easily associated with the corresponding Windows network device
when debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>