At the moment '\s*' is silently interpreted as just 's*', but in the
future it will be an error:
sed: 1: "s/\.o\s*:/_DEPS +=/": RE error: trailing backslash (\)
cf. https://bugs.freebsd.org/229925
Signed-off-by: Tobias Kortkamp <t@tobik.me>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split debug message since eth_ntoa() uses a static result buffer.
Originally-fixed-by: Michael Bazzinotti <bazz@bazz1.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix memcmp() to return proper standard positive/negative values for
unequal comparisons. Current implementation is backwards (i.e. the
functions are returning negative when should be positive and
vice-versa).
Currently most consumers of these functions only check the return value
for ==0 or !=0 and so we can safely change the implementation without
breaking things.
However, there is one call that checks the polarity of this function,
and that is prf_sha1() for wireless WPA 4-way handshake. Due to the
incorrect memcmp() polarity, the WPA handshake creates an incorrect
PTK, and the handshake would fail after step 2. Undoubtedly, the AP
noticed the supplicant failed the mic check. This commit fixes that
issue.
Similar to commit 3946aa9 ("[libc] Fix strcmp()/strncmp() to return
proper values").
Signed-off-by: Michael Bazzinotti <bazz@bazz1.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This caused iPXE to reject images even when enough memory was
available.
Signed-off-by: David Decotigny <ddecotig@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gensdsk currently creates a syslinux.cfg file that is invalid if the
filename ends in lkrn. Fix by setting the default target to label($b)
instead of filename($g).
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When no response is obtained from the first configured DNS server,
fall back to attempting the other configured servers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All implemented socket openers provide definitions for both IPv4 and
IPv6 using exactly the same opener method. Simplify the logic by
omitting the address family from the definition.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Claiming the SNP devices has the side effect of raising the TPL to
iPXE's normal operating level of TPL_CALLBACK (see the commit message
for c89a446 ("[efi] Run at TPL_CALLBACK to protect against UEFI
timers") for details). This must happen before executing any code
that relies upon the TPL having been raised to TPL_CALLBACK.
The call to efi_snp_claim() in efi_download_start() currently happens
only after the call to xfer_open(). Calling xfer_open() will
typically result in a retry timer being started, which will result in
a call to currticks() in order to initialise the timer. The call to
currticks() will drop to TPL_APPLICATION and restore to TPL_CALLBACK
in order to allow a timer tick to occur. Since this call happened
before the call to efi_snp_claim(), the restored TPL is incorrect.
This in turn results in efi_snp_claim() recording the incorrect
original TPL, causing efi_snp_release() to eventually restore the
incorrect TPL, causing the system to lock up when ExitBootServices()
is called at TPL_CALLBACK.
Fix by moving the call to efi_snp_claim() to the start of
efi_download_start().
Debugged-by: Jarrod Johnson <jjohnson2@lenovo.com>
Debugged-by: He He4 Huang <huanghe4@lenovo.com>
Debugged-by: James Wang <jameswang@ami.com.tw>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The NUL byte included within the stack cookie to act as a string
terminator should be placed at the lowest byte address within the
stack cookie, in order to avoid potentially including the stack cookie
value within an accidentally unterminated string.
Suggested-by: Pete Beck <pete.beck@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several of the values used to compute a stack cookie (in the absence
of a viable entropy source) will tend to have either all-zeroes or
all-ones in the higher order bits. Rotate the values in order to
distribute the (minimal) available entropy more evenly.
Suggested-by: Pete Beck <pete.beck@ioactive.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the bit rotation implementations to use a common macro, and
add roll() and rorl() to handle unsigned long values.
Each function will still compile down to a single instruction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only remaining use case in iPXE for the CPU direction flag is in
__memcpy_reverse() where it is set to allow the use of "rep movsb" to
perform the memory copy. This matches the equivalent functionality in
the EDK2 codebase, which has functions such as InternalMemCopyMem that
also temporarily set the direction flag in order to use "rep movsb".
As noted in commit d2fb317 ("[crypto] Avoid temporarily setting
direction flag in bigint_is_geq()"), some UEFI implementations are
known to have buggy interrupt handlers that may reboot the machine if
a timer interrupt happens to occur while the direction flag is set.
Work around these buggy UEFI implementations by using the
(unoptimised) generic_memcpy_reverse() on i386 or x86_64 UEFI
platforms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification states that the calling convention for IA-32
and x64 includes "Direction flag in EFLAGS is clear". This
specification covers only the calling convention used at the point of
calling functions annotated with EFIAPI. The specification explicitly
states that other functions (such as private functions or static
library calls) are not required to follow the UEFI calling
conventions.
The reference EDK2 implementation follows this specification. In
particular, the EDK2 interrupt handlers will clear the direction flag
before calling any EFIAPI functions, and will restore the direction
flag when returning from the interrupt handler. Some EDK2 private
library functions (most notably InternalMemCopyMem) may set the
direction flag temporarily in order to make efficient use of CPU
string operations.
The current implementation of iPXE's bigint_is_geq() for i386 and
x86_64 will similarly set the direction flag temporarily in order to
make efficient use of CPU string operations.
On some UEFI implementations (observed with a Getac RX10 tablet), a
timer interrupt that happens to occur while the direction flag is set
will reboot the machine. This very strongly indicates that the UEFI
timer interrupt handler is failing to clear the direction flag before
performing an affected operation (such as copying a block of memory).
Work around such buggy UEFI implementations by rewriting
bigint_is_geq() to avoid the use of string operations and so obviate
the requirement to temporarily set the direction flag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A failure in device registration (e.g. due to a device with malformed
descriptors) will currently result in the port being disabled as part
of the error path. This in turn causes the hardware to detect the
device as newly connected, leading to an endless loop of failed device
registrations.
Fix by leaving the port enabled in the case of a registration failure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When connected to a USB3 port, the AX88179 seems to have an
approximately 50% chance of producing a USB transaction error on each
of its three endpoints after being closed and reopened. The root
cause is unclear, but rewriting the USB device configuration value
seems to clear whatever internal error state has accumulated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Experimentation shows that the existing 20ms delay is insufficient,
and often results in device detection being deferred until after iPXE
has completed startup.
Fix by increasing the delay to 100ms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The driver-private data for root hubs is already set immediately after
allocating the USB bus. There seems to be no reason to set it again
when opening the root hub.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "disabled" port states for USB2 and USB3 are not directly
equivalent. In particular, a disabled USB3 port will not detect new
device connections. The result is that a USB3 device disconnected
from and reconnected to an xHCI root hub port will end up reconnecting
as a USB2 device.
Fix by setting the link state to RxDetect after disabling the port, as
is already done during initialisation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB3 specification removes PORT_ENABLE from the list of features
that may be cleared via a CLEAR_FEATURE request. Experimentation
shows that omitting the attempt to clear PORT_ENABLE seems to result
in the correct hotplug behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Resetting the host endpoint may immediately restart any pending
transfers for that endpoint. If the device endpoint halt has not yet
been cleared, then this will probably result in a second failed
transfer.
This second failure may be detected within usb_endpoint_reset() while
waiting for usb_clear_feature() to complete. The endpoint will
subsequently be removed from the list of halted endpoints, causing the
second failure to be effectively ignored and leaving the host endpoint
in a permanently halted state.
Fix by deferring the host endpoint reset until after the device
endpoint is ready to accept new transfers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ASIX USB NICs are capable of autodetecting the Ethernet link speed
and reporting it via PLSR but will not automatically update the
relevant GM and PS bits in MSR. The result is that a non-gigabit link
will fail to send or receive any packets.
The interrupt endpoint used to report link state includes the values
of the PHY BMSR and LPA registers. These are not sufficient to
differentiate between 100Mbps and 1000Mbps, since the LPA_NPAGE bit
does not necessarily indicate that the link partner is advertising
1000Mbps.
Extend axge_check_link() to write the MSR value based on the link
speed read from PLSR, and simplify the interrupt endpoint handler to
merely trigger a call to axge_check_link().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As per commit c89a446 ("[efi] Run at TPL_CALLBACK to protect against
UEFI timers") we expect to run at TPL_CALLBACK almost all of the time.
Various code paths rely on this assumption. Code paths that need to
temporarily lower the TPL (e.g. for entropy gathering) will restore it
to TPL_CALLBACK.
The entropy gathering code will be run during DRBG initialisation,
which happens during the call to startup(). In the case of iPXE
compiled as an EFI application this code will run within the scope of
efi_snp_claim() and so will execute at TPL_CALLBACK as expected.
In the case of iPXE compiled as an EFI driver the code will
incorrectly run at TPL_APPLICATION since there is nothing within the
EFI driver entry point that raises (and restores) the TPL. The net
effect is that a build that includes the entropy-gathering code
(e.g. a build with HTTPS enabled) will return from the driver entry
point at TPL_CALLBACK, which causes a system lockup.
Fix by raising and restoring the TPL within the EFI driver entry
point.
Debugged-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_RNG_PROTOCOL on the Microsoft Surface Go does not generate
random numbers. Successive calls to GetRNG() without any intervening
I/O operations (such as writing to the console) will produce identical
results. Successive reboots will produce identical results.
It is unclear what the Microsoft Surface Go is attempting to use as an
entropy source, but it is demonstrably producing zero bits of entropy.
The failure is already detected by the ANS-mandated Repetition Count
Test performed as part of our GetEntropy implementation. This
currently results in the entropy source being marked as broken, with
the result that iPXE refuses to perform any operations that require a
working entropy source.
We cannot use the existing EFI driver blacklisting mechanism to unload
the broken driver, since the RngDxe driver is integrated into the
DxeCore image.
Work around the broken driver by checking for consecutive identical
results returned by EFI_RNG_PROTOCOL and falling back to the original
timer-based entropy source.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc (observed with the cross-compiling gcc 9.3.0 in
Ubuntu 20.04) default to enabling -fPIE. Experimentation shows that
this results in the emission of R_AARCH64_ADR_GOT_PAGE relocation
records for __stack_chk_guard. These relocation types are not
supported by elf2efi.c.
Fix by explicitly disabling position-independent code for ARM64 EFI
builds.
Debugged-by: Antony Messerli <antony@mes.ser.li>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
GCC 10 emits warnings for implicit conversions of enumerated types.
The flexboot_nodnic code defines nodnic_queue_pair_type with values
identical to those of ib_queue_pair_type, and implicitly casts between
them. Add an explicit cast to fix the warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
GCC 10 produces a spurious warning about an out-of-bounds array access
for the unsized raw dword array in union intelvf_msg.
Avoid the warning by embedding the zero-length array within a struct.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gcc10 switched default behavior from -fcommon to -fno-common. Since
"__shared" relies on the legacy behavior, explicitly specify it.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Various implementation quirks in OCSP servers make it impractical to
use anything other than SHA1 to construct the issuerNameHash and
issuerKeyHash identifiers in the request certID. For example: both
the OpenCA OCSP responder used by ipxe.org and the Boulder OCSP
responder used by LetsEncrypt will fail if SHA256 is used in the
request certID.
As of commit 6ffe28a ("[ocsp] Accept response certID with missing
hashAlgorithm parameters") we rely on asn1_digest_algorithm() to parse
the algorithm identifier in the response certID. This will fail if
SHA1 is disabled via config/crypto.h.
Fix by using a direct ASN.1 object comparison on the OID within the
algorithm identifier.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable -fstack-protector for EFI builds, where binary size is less
critical than for BIOS builds.
The stack cookie must be constructed immediately on entry, which
prohibits the use of any viable entropy source. Construct a cookie by
XORing together various mildly random quantities to produce a value
that will at least not be identical on each run.
On detecting a stack corruption, attempt to call Exit() with an
appropriate error. If that fails, then lock up the machine since
there is no other safe action that can be taken.
The old conditional check for support of -fno-stack-protector is
omitted since this flag dates back to GCC 4.1.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some devices (observed with a Getac RX10 tablet and docking station
containing an embedded AX88179 USB NIC) seem to be capable of
detecting link state only during the call to Initialize(), and will
occasionally erroneously report that the link is down for the first
few such calls.
Work around these devices by retrying the Initialize() call multiple
times, terminating early if a link is detected. The eventual absence
of a link is treated as a non-fatal error, since it is entirely
possible that the link really is down.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Disable the use of MD5 as an OID-identifiable algorithm. Note that
the MD5 algorithm implementation will still be present in the build,
since it is used implicitly by various cryptographic components such
as HTTP digest authentication; this commit removes it only from the
list of OID-identifiable algorithms.
It would be appropriate to similarly disable the use of SHA-1 by
default, but doing so would break the use of OCSP since several OCSP
responders (including the current version of openca-ocspd) are not
capable of interpreting the hashAlgorithm field and so will fail if
the client uses any algorithm other than the configured default.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There are many ways in which the object for a cryptographic algorithm
may be included, even if not explicitly enabled in config/crypto.h.
For example: the MD5 algorithm is required by TLSv1.1 or earlier, by
iSCSI CHAP authentication, by HTTP digest authentication, and by NTLM
authentication.
In the current implementation, inclusion of an algorithm for any
reason will result in the algorithm's ASN.1 object identifier being
included in the "asn1_algorithms" table, which consequently allows the
algorithm to be used for any ASN1-identified purpose. For example: if
the MD5 algorithm is included in order to support HTTP digest
authentication, then iPXE would accept a (validly signed) TLS
certificate using an MD5 digest.
Split the ASN.1 object identifiers into separate files that are
required only if explicitly enabled in config/crypto.h. This allows
an algorithm to be omitted from the "asn1_algorithms" table even if
the algorithm implementation is dragged in for some other purpose.
The end result is that only the algorithms that are explicitly enabled
in config/crypto.h can be used for ASN1-identified purposes such as
signature verification.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The supported ciphers and digest algorithms may already be specified
via config/crypto.h. Extend this to allow a minimum TLS protocol
version to be specified.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some platforms (observed with an AMI BIOS on an Apollo Lake system)
will spuriously fail the call to ConnectController() when the UEFI
network stack is disabled. This appears to be a BIOS bug that also
affects attempts to connect any non-iPXE driver to the NIC controller
handle via the UEFI shell "connect" utility.
Work around this BIOS bug by falling back to calling our
efi_driver_start() directly if the call to ConnectController() fails.
This bypasses any BIOS policy in terms of deciding which driver to
connect but still cooperates with the UEFI driver model in terms of
handle ownership, since the use of EFI_OPEN_PROTOCOL_BY_DRIVER ensures
that the BIOS is aware of our ownership claim.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The URI parsing code for "host[:port]" checks that the final character
is not ']' in order to allow for IPv6 literals. If the entire
"host[:port]" portion of the URL is an empty string, then this will
access the preceding character. This does not result in accessing
invalid memory (since the string is guaranteed by construction to
always have a preceding character) and does not result in incorrect
behaviour (since if the string is empty then strrchr() is guaranteed
to return NULL), but it does make the code confusing to read.
Fix by inverting the order of the two tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As described in the previous commit, work around a UEFI specification
bug that necessitates calling UnloadImage if the return value from
LoadImage is EFI_SECURITY_VIOLATION.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently assumes that any error returned from LoadImage()
indicates that the image was not loaded. This assumption was correct
at the time the code was written and remained correct for UEFI
specifications up to and including version 2.1.
In version 2.3, the UEFI specification broke API and ABI compatibility
by defining that a return value of EFI_SECURITY_VIOLATION would now
indicate that the image had been loaded and a valid image handle had
been created, but that the image should not be started.
The wording in version 2.2 is ambiguous, and does not define whether
or not a return value of EFI_SECURITY_VIOLATION indicates that a valid
image handle has been created.
Attempt to work around all of these incompatible and partially
undefined APIs by calling UnloadImage if we get a return value of
EFI_SECURITY_VIOLATION. Minimise the risk of passing an uninitialised
pointer to UnloadImage by setting ImageHandle to NULL prior to calling
LoadImage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the size of the USB disk image in the common case that
CONSOLE_INT13 is not enabled.
Originally-implemented-by: Romain Guyard <romain.guyard@mujin.co.jp>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Eliminate an unnecessary variable-length stack allocation and memory
copy by allowing TFTP option processors to modify the option string
in-place.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to UEFI specification 2.8 p 24.1 we must set the
EFI_SIMPLE_NETWORK_RECEIVE_MULTICAST bit in the "Disable" mask, when
"ResetMCastFilter" is TRUE.
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Split-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Currently, if the SNP driver for whatever reason fails to enable
receive filters for multicast frames, it falls back to enabling just
unicast and broadcast filters. This breaks some IPv6 functionality as
the network card does not respond to neighbour solicitation requests.
Some cards refuse to enable EFI_SIMPLE_NETWORK_RECEIVE_MULTICAST, but
do support enabling EFI_SIMPLE_NETWORK_RECEIVE_PROMISCUOUS_MULTICAST,
so try it before falling back to just unicast+broadcast.
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Split-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow a PeerDist hosted cache server to be specified via the
${peerhost} setting, e.g.:
# Use 192.168.0.1 as hosted cache server
set peerhost 192.168.0.1
Note that this simply treats the hosted cache server as a permanently
discovered peer for all segments.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the use of PeerDist content encoding to be enabled or disabled
via the ${peerdist} setting, e.g.:
# Disable PeerDist
set peerdist 0
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On devices with no EEPROM or OTP, the MAC_CR register defaults to not
using automatic link speed detection, with the result that no packets
are successfully sent or received.
Fix by always enabling automatic speed and duplex detection, since
iPXE provides no mechanism for manual configuration of either link
speed or duplex.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On at least some platforms (observed with a Raspberry Pi), any attempt
to perform USB transfers via EFI_USB_IO_PROTOCOL during EFI shutdown
will lock up the system. This is quite probably due to the already
documented failure of all EFI timers when ExitBootServices() is
called: see e.g. commit 5cf5ffea2 "[efi] Work around temporal anomaly
encountered during ExitBootServices()".
Work around this problem by refusing to poll endpoints if shutdown is
in progress, and by immediately failing any attempts to enqueue new
transfers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB core reuses the I/O buffer space occupied by the USB setup
packet to hold the completion status for message transfers, assuming
that the message() method will always strip the setup packet before
returning. This assumption is correct for all of the hardware
controller drivers (XHCI, EHCI, and UHCI), since these drivers are
able to enqueue the transfer as a separate action from waiting for the
transfer to complete.
The EFI_USB_IO_PROTOCOL does not allow us to separate actions in this
way: there is only a single blocking method that both enqueues and
waits for completion. Our usbio driver therefore currently defers
stripping the setup packet until the control endpoint is polled.
This causes a bug if a message transfer is enqueued but never polled
and is subsequently cancelled, since the cancellation will be reported
with the I/O buffer still containing the setup packet. This breaks
the assumption that the setup packet has been stripped, and triggers
an assertion failure in usb_control_complete().
Fix by always stripping the setup packet in usbio_endpoint_message(),
and adjusting usbio_control_poll() to match.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that the configured RSA digestInfo prefixes are included in any
build that includes rsa.o (rather than relying on x509.o or tls.o also
being present in the final binary).
This allows the RSA self-tests to be run in isolation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The restart of negotiation triggered by a HelloRequest currently does
not call tls_tx_resume() and so may end up leaving the connection in
an idle state in which the pending ClientHello is never sent.
Fix by calling tls_tx_resume() as part of tls_restart(), since the
call to tls_tx_resume() logically belongs alongside the code that sets
bits in tls->tx_pending.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Raw block downloads are expensive if the origin server uses HTTPS,
since each concurrent download will require local TLS resources
(including potentially large received encrypted data buffers).
Raw block downloads may also be prohibitively slow to initiate when
the origin server is using HTTPS and client certificates. Origin
servers for PeerDist downloads are likely to be running IIS, which has
a bug that breaks session resumption and requires each connection to
go through the full client certificate verification.
Limit the total number of concurrent raw block downloads to ameliorate
these problems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move the responsibility for starting the block download timers from
peerblk_expired() to peerblk_raw_open() and peerblk_retrieval_open(),
in preparation for adding the ability to defer calls to
peerblk_raw_open() via a block download queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a build shortcut "rpi", allowing for e.g.
make CONFIG=rpi CROSS=aarch64-linux-gnu- bin-arm64-efi/rpi.efi
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The (very approximate) split between Makefile.housekeeping and
Makefile is that the former provides mechanism and the latter provides
policy.
Provide a section within Makefile as a home for predefined build
shortcuts such as the existing all-drivers build.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The WORKAROUND_CFLAGS list is constructed based on running tests on
the target compiler, and the results may not be valid for the host
compiler.
The only relevant workaround required for the host compiler is
-Wno-stringop-truncation, which is needed to avoid a spurious compiler
warning for a totally correct usage of strncpy() in util/elf2efi.c.
Duplicating the workaround tests for the host compiler is messy, as is
conditionally applying __attribute__((nonstring)). Fix instead by
disapplying WORKAROUND_CFLAGS for the host compiler, and using
memcpy() with an explicitly calculated length instead of strncpy() in
util/elf2efi.c.
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Reported-by: Christopher Clark <christopher.w.clark@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Compiling with gcc 9.1 generates lots of "taking address of packed
member of ... may result in an unaligned pointer value" warnings.
Some of these warnings are genuine, and indicate correctly that parts
of iPXE currently require the CPU (or runtime environment) to support
unaligned accesses. For example: the TCP/IP receive data path will
attempt to access 32-bit fields that may not be aligned to a 32-bit
boundary.
Other warnings are either spurious (such as when the pointer is to a
variable-length byte array, which can have no alignment requirement
anyway) or unhelpful (such as when the pointer is used solely to
provide a debug colour value for the DBGC() macro).
There appears to be no easy way to silence the spurious warnings.
Since the ability to perform unaligned accesses is already a
requirement for iPXE, work around the problem by silencing this class
of warnings.
Signed-off-by: Valentine Barshak <gvaxon@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use '%p' directive, and print handle's address if the address is null
and the handle doesn't have a name. This fixes the following
compilation error:
interface/efi/efi_debug.c:334:3: error: '%s' directive
argument is null [-Werror=format-overflow=]
Signed-off-by: Valentine Barshak <gvaxon@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Raspberry Pi NIC has no EEPROM to hold the MAC address. The
platform firmware (e.g. UEFI or U-Boot) will typically obtain the MAC
address from the VideoCore firmware and add it to the device tree,
which is then made available to subsequent programs such as iPXE or
the Linux kernel.
Add the ability to parse a flattened device tree and to extract the
MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
efidev_parent() currently assumes that any device with BUS_TYPE_EFI is
part of a struct efi_device. This assumption is not valid, since the
code in efi_device_info() may also create a device with BUS_TYPE_EFI.
Fix by searching through the list of registered EFI devices when
looking for a match, instead of relying on the bus type value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is currently not possible to build the all-drivers iPXE binaries
for ARM, since there is no implementation for inb(), outb(), etc.
There is no common standard for accessing I/O space on ARM platforms,
and there are almost no ARM-compatible peripherals that actually
require I/O space accesses.
Provide dummy implementations that behave as though no device is
present (i.e. ignore writes, return all bits high for reads). This is
sufficient to allow the all-drivers binaries to link, and should cause
drivers to behave as though no I/O space peripherals are present in
the system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 1a7746603 ("[build] Fix use of inline assembly on GCC 4.8 ARM64
builds") switched from using "%c0" to "%a0" in order to avoid an
"invalid operand prefix" error on the ARM64 version of GCC 4.8.
It appears that the ARM64 version of GCC 8 now produces an "invalid
address mode" error for the "%a0" form, but is happy with the original
"%c0" form.
Switch back to using the "%c0" form, on the assumption that the
requirement for "%a0" was a temporary aberration.
Originally-fixed-by: John L. Jolly <jjolly@suse.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function defaults to operating in "PXE mode" after a
power-on reset. In this mode, receive descriptors are fetched and
written back as single descriptors. In normal (non-PXE mode)
operation, receive descriptors are fetched and written back only as
complete cachelines unless an interrupt is raised.
There is no way to return to PXE mode from non-PXE mode, and there is
no way for the virtual function driver to operate in PXE mode.
Choose to operate in non-PXE mode. This requires us to trick the
hardware into believing that it is raising an interrupt, so that it
will not defer writing back receive descriptors until a complete
cacheline (i.e. four packets) have been consumed. We do so by
configuring the hardware to use MSI-X with a dummy target location in
place of the usual APIC register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The virtual function driver will use the same transmit and receive
descriptor ring structures, but will not itself construct and program
the ring context. Split out ring creation and destruction from the
programming of the ring context, to allow code to be shared between
physical and virtual function drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The virtual function transmit and receive ring tail register offsets
do not match those of the physical function. Allow the tail register
offsets to be specified separately.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function driver does not allow the virtual function to
request the use of 16-byte receive descriptors. Switch to using
32-byte receive descriptors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a weak stub function for handling the "send to VF" event used
for communications between the physical and virtual function drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "send to PF" and "send to VF" admin queue descriptors (ab)use the
cookie field to hold the extended opcode and return code values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A virtual function reset is triggered via an admin queue command and
will reset the admin queue configuration registers. Allow the admin
queues to be reinitialised after such a reset, without requiring the
overhead (and potential failure paths) of freeing and reallocating the
queues.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use a single data buffer shared between all admin queue
descriptors. This works for the physical function driver since we
have at most one command in progress and only a single event (which
does not use a data buffer).
The communication path between the physical and virtual function
drivers uses the event data buffer, and there is no way to prevent a
solicited event (i.e. a response to a request) from being overwritten
by an unsolicited event (e.g. a link status change).
Provide individual data buffers for each admin event queue descriptor
(and for each admin command queue descriptor, for the sake of
consistency).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The register map for the virtual functions appears to have been
constructed using a random number generator.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function driver does not allow the virtual function to
request that VLAN tags are left unstripped. Extract and use the VLAN
tag from the receive descriptor if present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Hermon driver uses vlan_find() to identify the appropriate VLAN
device for packets that are received with the VLAN tag already
stripped out by the hardware. Generalise this capability and expose
it for use by other network card drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Intel 40 Gigabit Ethernet virtual functions support only MSI-X
interrupts, and will write back completed interrupt descriptors only
when the device attempts to raise an interrupt (or when a complete
cacheline of receive descriptors has been completed).
We cannot actually use MSI-X interrupts within iPXE, since we never
have ownership of the APIC. However, an MSI-X interrupt is
fundamentally just a DMA write of a single dword to an arbitrary
address. We can therefore configure the device to "raise" an
interrupt by writing a meaningless value to an otherwise unused memory
location: this is sufficient to trigger the receive descriptor
writeback logic.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
One of the design goals of ASN.1 DER is to provide a canonical
serialization of a data structure, thereby allowing for equality of
values to be tested by simply comparing the serialized bytes.
Some OCSP servers will modify the request certID to omit the optional
(and null) "parameters" portion of the hashAlgorithm. This is
arguably legal but breaks the ability to perform a straightforward
bitwise comparison on the entire certID field between request and
response.
Fix by comparing the OID-identified hashAlgorithm separately from the
remaining certID fields.
Originally-fixed-by: Thilo Fromm <Thilo@kinvolk.io>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide increased visibility into the progress of TCP connections by
displaying an explicit "connecting" status message while waiting for
the TCP handshake to complete.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS connections will almost always create background connections to
perform cross-signed certificate downloads and OCSP checks. There is
currently no direct visibility into which checks are taking place,
which makes troubleshooting difficult in the absence of either a
packet capture or a debug build.
Use the job progress message buffer to report the current cross-signed
certificate download or OCSP status check, where applicable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the session ID (if any) provided by the server and attempt to
reuse it for any concurrent connections to the same server.
If multiple connections are initiated concurrently (e.g. when using
PeerDist) then defer sending the ClientHello for all but the first
connection, to allow time for the first connection to potentially
obtain a session ID (and thereby speed up the negotiation for all
remaining connections).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a Dell OptiPlex 7010, calling DisconnectController() on the LOM
device handle will lock up the system. Debugging shows that execution
is trapped in an infinite loop that is somehow trying to reconnect
drivers (without going via ConnectController()).
The problem can be reproduced in the UEFI shell with no iPXE code
present, by using the "disconnect" command. Experimentation shows
that the only fix is to unload (rather than just disconnect) the
"Ip4ConfigDxe" driver.
Add the concept of a blacklist of UEFI drivers that will be
automatically unloaded when iPXE runs as an application, and add the
Dell Ip4ConfigDxe driver to this blacklist.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Option::ROM module recognizes and checks EFI header of image. The
disrom.pl utility dumps this header if is present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Option::ROM module now compares the Code Type in the PCIR header
to 0x00 (PC-AT) in order to check the presence of other header types
(PnP, UNDI, iPXE, etc). The validity of these headers are checked not
only by offset, but by range and signature checks also. The image
checksum and initial size also depends on Code Type.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
GCC 9 warns that abs() may truncate its signed long argument. Fix by
using labs() instead.
Reported-by: Martin Liška <mliska@suse.cz>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix strcmp() and strncmp() to return proper standard positive/negative
values for unequal strings. Current implementation is backwards
(i.e. the functions are returning negative when should be positive and
vice-versa).
Currently all consumers of these functions only check the return value
for ==0 or !=0 and so we can safely change the implementation without
breaking things.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Current (simplified):
1. InstallMultipleProtocolInterfaces
if err goto err_install_protocol_interface;
2. OpenProtocol(efi_nii_protocol_guid)
if err goto err_open_nii;
3. OpenProtocol(efi_nii31_protocol_guid)
if err goto err_open_nii31;
4. efi_child_add
if err goto err_efi_child_add;
...
err_efi_child_add:
CloseProtocol(efi_nii_protocol_guid) <= should be efi_nii31_protocol_guid
err_open_nii: <= should be err_open_nii31
CloseProtocol(efi_nii31_protocol_guid) <= should be efi_nii_protocol_guid
err_open_nii31: <= should be err_open_nii
UninstallMultipleProtocolInterfaces
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI Configuration Space contains fields prog-if at the offset 0x09,
sub-class at the offset 0x0a and base-class at the offset 0x0b (it
respects little endian). PCIR structure uses these fields in the same
order.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Starting from binutils 2.31.0 (commit bd7ab16b) x86-64 assembler
generates R_X86_64_PLT32 instead of R_X86_64_PC32.
Acked-by: John Jolly <jjolly@suse.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The first adapters in this family are X2522-10, X2522-25, X2541 and
X2542.
These no longer use PCI BAR 0 for I/O, but use that for memory. In
other words, BAR 2 on SFN8xxx adapters now becomes BAR 0.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Devices that support jumbo frames will currently default to the
largest possible MTU. This assumption is valid for virtual adapters
such as virtio-net, where the MTU must have been configured by a
system administrator, but is unsafe in the general case of a physical
adapter.
Default to the standard Ethernet MTU, unless explicitly overridden
either by the driver or via the ${netX/mtu} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid calling rndis_halt() and rndis->op->close() twice if the call to
register_netdev() fails.
Reported-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc seem to silently accept an attempt to disable an
unrecognised warning (e.g. via -Wno-stringop-truncation) but will then
report the unrecognised warning if any other error occurs during the
build, resulting in a potentially misleading error message.
Avoid this potential confusion by using the positive-form tests in
order to determine the workaround CFLAGS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The definition of version_response channel message in Linux doesn't
include version field, so the upcoming VMBus implementation in QEMU
doesn't set it either. Neither Windows nor Linux had any problem with
this.
The check against this field is redundant because the message is the
response to initiate_contact message containing the specific version
requested, so the response with version_supported=true is unambiguous.
Drop this check and don't rely on the field to be present in the
message.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
register_netdev expects ->hw_addr and ->ll_addr to be already filled,
so move it towards the end of register_rndis, after the respective
fields have been successfully queried from the underlying device.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The gcc 8 compiler introduces a warning for certain string
manipulation functions, flagging usages which _may_ not be intended.
An audit of the iPXE sources indicates all usages of strncat and
strncpy are as intended, so the warnings currently issued are not
helpful, especially if warnings are considered errors.
Fix by detecting gcc's support for -Wno-stringop-truncation and, if
detected, using that option to avoid the warning.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Also-fixed-by: Christian Hesse <list@eworm.de>
Also-fixed-by: Roman Kagan <rkagan@virtuozzo.com>
Also-fixed-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Also-fixed-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As pointedly documented in RFC7230 section 2.3, HTTP is a stateless
protocol: each request message can be understood in isolation from any
other requests or responses. Various authentication schemes such as
NTLM break this fundamental property of HTTP and rely on the same TCP
connection being reused.
Work around these broken authentication schemes by ensuring that the
most recently pooled connection is reused for the subsequent
authentication retry.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the function mii_find() in order to locate the PHY address.
Signed-off-by: Sylvie Barlow <sylvie.c.barlow@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently have no generic concept of a PHY address, since all
existing implementations simply hardcode the PHY address within the
MII access methods.
A bit-bashing MII interface will need to be provided with an explicit
PHY address in order to generate the correct waveform. Allow for this
by separating out the concept of a MII device (i.e. a specific PHY
address attached to a particular MII interface).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the subsystem IDs to be used when checking for PXE stacks with
broken interrupt support.
Suggested-by: Levi Hsieh <Levi.Hsieh@dell.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The relocation type R_ARM_REL32 is generated when building
bin-arm32-efi/snp.efi using gcc 6.3 and ld 2.28.
R_ARM_REL32 is a program counter (PC) relative 32 bit relocation so we
can ignore it like all other PC relative relocations.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When booting some versions of the UEFI shell, our driver binding
protocol's Supported() entry point is called at TPL_NOTIFY for no
discernible reason. Attempting to raise to TPL_CALLBACK triggers an
immediate assertion failure in the firmware.
Since our Supported() method can run at any TPL, fix by simply not
attempting to raise the TPL within this method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Release SNP devices to allow the SAN booted image to use our
EFI_SIMPLE_NETWORK_PROTOCOL instance, and to ensure that the image is
started at TPL_APPLICATION.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The cipherstream xfer_window_changed() message is used to retrigger
the TLS transmit state machine. If the transmit state machine is
idle, then the window change message will not be propagated to the
plainstream interface. This can potentially cause the plainstream
interface peer (e.g. httpcore) to block waiting for a window change
message that will never arrive.
Fix by ensuring that the window change message is propagated to the
plainstream interface if the transmit state machine is idle. (If the
transmit state machine is not idle then the plainstream window will be
zero anyway.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In TLS terminology a session conceptually spans multiple individual
connections, and essentially represents the stored cryptographic state
(master secret and cipher suite) required to establish communication
without going through the certificate and key exchange handshakes.
Rename tls_session to tls_connection in order to make the name
tls_session available to represent the session state.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A failure in tls_generate_random() will result in a call to ref_put()
before the received data list has been initialised, which will cause
free_tls() to attempt to traverse an uninitialised list.
Fix by ensuring that all fields referenced by free_tls() are
initialised before any of the potential failure paths.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 6149e0a ("[librm] Provide symbols for inline code placed into
other sections") may cause build failures due to duplicate label names
if the compiler chooses to duplicate inline assembly code.
Fix by using the "%=" special format string to include a
guaranteed-unique number within the label name.
The "%=" will be expanded only if constraints exist for the inline
assembly. This fix therefore requires that all REAL_CODE() fragments
use a (possibly empty) constraint list.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide symbols constructed from the object name and line number for
code fragments placed into alternative sections, such as inline
REAL_CODE() assembly placed into .text16. This simplifies the
debugging task of finding the source code corresponding to a given
instruction pointer.
Note that we cannot use __FUNCTION__ since it is not a preprocessor
macro and so cannot be concatenated with string literals.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the underlying PXE stack reports an invalid IRQ number (above
IRQ_MAX), treat this as equivalent to an empty IRQ number and fall
back to using polling mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existence of MMX and SSE is required by the System V x86_64 ABI
and so is assumed by gcc, but these registers are not preserved by our
own interrupt handlers and are unlikely to be preserved by other
context switch handlers in a boot firmware environment.
Explicitly prevent gcc from using MMX or SSE registers to avoid
potential problems due to silent register corruption.
We must remove the %xmm0-%xmm5 clobbers from the x86_64 version of
hv_call() since otherwise gcc will complain about unknown register
names. Theoretically, we should probably add code to explicitly
preserve the %xmm0-%xmm5 registers across a hypercall, in order to
guarantee to external code that these registers remain unchanged. In
practice this is difficult since SSE registers are disabled by
default: for background information see commits 71560d1 ("[librm]
Preserve FPU, MMX and SSE state across calls to virt_call()") and
dd9a14d ("[librm] Conditionalize the workaround for the Tivoli VMM's
SSE garbling").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently perform various min-entropy calculations using build-time
floating-point arithmetic. No floating-point code ends up in the
final binary, since the results are eventually converted to integers
and asserted to be compile-time constants.
Though this mechanism is undoubtedly cute, it inhibits us from using
"-mno-sse" to prevent the use of SSE registers by the compiler.
Fix by using fixed-point arithmetic instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This is required to work around a bug in some firmware versions.
Signed-off-by: Ameer Mahagneh <ameerm@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the ACPI power management timer to be used if enabled via
TIMER_ACPI in config/timer.h. This provides an alternative timer on
systems where the standard 8254 PIT is unavailable or unreliable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some drivers are known to call the optional Map_Mem() callback without
first checking that the callback exists. Provide a usable basic
implementation of Map_Mem() along with the other callbacks that become
mandatory if Map_Mem() is provided.
Note that in theory the PCI I/O protocol is allowed to require
multiple calls to Map(), with each call handling only a subset of the
overall mapped range. However, the reference implementation in EDK2
assumes that a single Map() will always suffice, so we can probably
make the same simplifying assumption here.
Tested with the Intel E3522X2.EFI driver (which, incidentally, fails
to cleanly remove one of its mappings).
Originally-implemented-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The blocked link test in eth_slow_lacp_rx() is performed before the
actor TLV is copied to the partner TLV, and so must test the actor
state field rather than the partner state field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some CAs provide non-functional OCSP servers, and some clients are
forced to operate on networks without access to the OCSP servers.
Allow the user to explicitly disable the use of OCSP checks by
undefining OCSP_CHECK in config/crypto.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Limit the profile sample count to INT_MAX to avoid both signed
overflow and a potential division by zero when updating the stored
mean value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mark the link as blocked if the LACP partner is not reporting itself
as being in sync, collecting, and distributing.
This matches the behaviour for STP: we mark the link as blocked if we
detect that the switch is actively blocking traffic, in order to
extend the DHCP discovery period and so prevent boot failures on
switches that take an excessively long time to enable ports.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the global variable shomron_nodnic_supported, since it may have
different values for different PCI devices.
Originally-fixed-by: Mohammed Taha <mohammedt@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When DEBUG=librm_mgmt is enabled, intercept CPU exceptions and provide
a register and stack dump, then drop to an emergency shell. Exiting
from the shell will almost certainly not work, but this provides an
opportunity to view the register and stack dump and carry out some
basic debugging.
Note that we can intercept only the first 8 CPU exceptions, since a
PXE ROM is not permitted to rebase the PIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit c89a446 ("[efi] Run at TPL_CALLBACK to protect against UEFI
timers") introduced a regression in the EFI entropy gathering code.
When the EFI_RNG_PROTOCOL is not present, we fall back to using timer
interrupts (as for the BIOS build). Since timer interrupts are
disabled at TPL_CALLBACK, WaitForEvent() fails and no entropy can be
gathered.
Fix by dropping to TPL_APPLICATION while entropy gathering is enabled.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iSCSI root path may contain a literal IPv6 address. Update the
parser to handle this address format correctly.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As noted in the comments, UEFI manages to combines the all of the
worst aspects of both a polling design (inefficiency and inability to
sleep until something interesting happens) and of an interrupt-driven
design (the complexity of code that could be preempted at any time,
thanks to UEFI timers).
This causes problems in particular for UEFI USB keyboards: the
keyboard driver calls UsbAsyncInterruptTransfer() to set up a periodic
timer which is used to poll the USB bus. This poll may interrupt a
critical section within iPXE, typically resulting in list corruption
and either a hang or reboot.
Work around this problem by mirroring the BIOS design, in which we run
with interrupts disabled almost all of the time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reporting a completion via usb_complete() will pass control outside
the scope of xhci.c, and could potentially result in a further call to
xhci_event_poll() before returning from usb_complete(). Since we
currently update the event consumer counter only after calling
usb_complete(), this can result in duplicate completions and
consequent corruption of the submission TRB ring structures.
Fix by updating the event ring consumer counter before passing control
to usb_complete().
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Tested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The i219 appears to have a seriously broken reset mechanism. After
any transmit or receive activity, resetting the card will break both
the transmit and receive datapaths until the next PCI bus reset.
The Linux and BSD drivers include a convoluted workaround authored by
Intel which involves setting a bit in the undocumented FEXTNVM11
register, then transmitting a dummy 512-byte packet containing garbage
data, then reconfiguring the receive descriptor prefetch thresholds
and temporarily reenabling the receive datapath. The comments in the
Intel fix do not even remotely match what the code actually does, and
the code accidentally leaves the transmitter enabled after use.
Experimentation suggests that an equivalent fix is to simply set the
undocumented bit in FEXTNVM11 before enabling the transmit or receive
descriptor rings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Invalid protocol speed ID tables appear to be increasingly common in
the wild, to the point that it is infeasible to apply an explicit
XHCI_BAD_PSIV flag for each offending PCI device ID.
Fix by assuming an invalid PSI table as soon as any invalid value is
reported by the hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of gcc (observed with gcc 4.7.2) report a spurious
uninitialised variable warning in ena_get_device_attributes(). Work
around this warning by manually inlining the relevant code (which has
only a single call site).
Reported-by: xbgmsharp <xbgmsharp@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UNDI layer uses the NETDEV_IRQ_ENABLED flag to choose whether to
return PXENV_UNDI_ISR_OUT_OURS or PXENV_UNDI_ISR_OUT_NOT_OURS for a
given interrupt. For a network device that does not support
interrupts, the flag will never be set and so pxenv_undi_isr() will
always return PXENV_UNDI_ISR_OUT_NOT_OURS. This causes some NBPs
(such as lpxelinux.0) to hang.
Redefine NETDEV_IRQ_ENABLED as a simple administrative flag which can
be set even on network devices that do not support interrupts. This
allows pxenv_undi_isr() (which is the sole user of NETDEV_IRQ_ENABLED)
to function as expected by lpxelinux.0.
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most drivers do not utilise an MII interface, since the link state is
typically available directly from a memory-mapped register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using "ld --oformat binary" for mbr.bin and usbdisk.bin seems to cause
segmentation faults on some versions of binutils (observed on Fedora
27). Work around this problem by using ld to create an intermediate
ELF object, followed by objcopy (via the existing %.tmp -> %.bin rule)
to create the final binary.
Note that we cannot simply use a single-stage "objcopy -O binary"
since this will not process the relocation records for x86_64: see
commit 1afcccd ("[build] Do not use "objcopy -O binary" for objects
with relocation records").
Reported-by: Brent S <bts@square-r00t.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add missing FILE_LICENCE declarations to x86_64 headers based on the
corresponding i386 headers (from which the x86_64 headers were
originally derived).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The URIs printed as part of download progress messages are intended to
provide a quick visual progress indication to the user. Very long
query strings can render this visual indication useless in practice,
since the most important information (generally the URI host and path)
is drowned out by multiple lines of human-illegible URI-encoded data.
Omit the query string entirely from the download progress message.
For consistency and brevity, also omit the URI fragment along with the
username and password (which was previously redacted anyway).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The precise HTTP response status code is currently visible only at
DBGLVL_EXTRA. Allow for easier debugging by reporting the whole
status line at DBGLVL_LOG for any unsuccessful responses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Xen 4.4 includes the device "device/suspend/event-channel" which does
not have a "backend" key. This currently causes the entire XenBus
device tree probe to fail.
Fix by skipping probe attempts for device types for which there is no
iPXE driver.
Debugged-by: Eytan Heidingsfeld <eytanh@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow individual authentication schemes to parse WWW-Authenticate
headers that do not comply with RFC2617.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Servers may provide multiple WWW-Authenticate headers, each offering a
different authentication scheme. We currently fail the request as
soon as we encounter an unrecognised scheme, which prevents subsequent
offers from succeeding.
Fix by silently ignoring headers for schemes that we do not recognise.
If no schemes are recognised then the request will eventually fail
anyway due to the 401 response code.
If multiple schemes are supported, arbitrarily choose the scheme
appearing first within the response headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relocation type R_ARM_V4BX requires no computation. It marks the
location of an ARMv4 branch exchange instruction.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In fully self-contained deployments it may be desirable to build iPXE
with an empty CROSSCERT source to avoid talking to external services.
Add an explicit check for this case and make validator_start_download
fail immediately if the base URI is empty.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some HP BIOSes (observed with a Z840) seem to attempt to connect our
drivers in the middle of our call to DisconnectController(). The
precise chain of events is unclear, but the symptom is that we see
several calls to our Supported() and Start() methods, followed by a
system lock-up.
Work around this dubious BIOS behaviour by explicitly failing calls to
our Start() method while we are in the middle of attempting to
disconnect drivers.
Reported-by: Jordan Wright <jordan.m.wright@disney.com>
Debugged-by: Adrian Lucrèce Céleste <adrianlucrececeleste@airmail.cc>
Debugged-by: Christian Nilsson <nikize@gmail.com>
Tested-by: Jordan Wright <jordan.m.wright@disney.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When submitting binaries for UEFI Secure Boot signing, certain
known-dubious subsystems (such as 802.11 and NFS) must be excluded
from the build. Mark the directories containing these subsystems as
insecure, and allow the build target to include an explicit "security
flag" (a literal "-sb" appended to the build platform) to exclude
these source directories from the build process.
For example:
make bin-x86_64-efi-sb/ipxe.efi
will build iPXE with all code from the 802.11 and NFS subsystems
excluded from the build.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI BIOSes will deliberately break the implementation of
ConnectController() to return errors for devices that have been
"disabled" via the BIOS setup screen. (As an added bonus, such BIOSes
may return garbage EFI_STATUS values such as 0xff.)
Work around these broken UEFI BIOSes by ignoring failures and
continuing to attempt to connect any remaining handles.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification does not state whether or not a return value of
EFI_BUFFER_TOO_SMALL from the SNP Receive() method should follow the
usual EFI API behaviour of allowing the caller to retry the request
with an increased buffer size.
Examination of the SnpDxe driver in EDK2 suggests that Receive() will
just return the truncated packet (complete with any requested
link-layer header fields), so match this behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We do not currently check the length of the caller's buffer for
received packets. This creates a potential buffer overrun when iPXE
is being used via the SNP or UNDI protocols.
Fix by checking the buffer length and correctly returning the required
length and an EFI_BUFFER_TOO_SMALL error.
Reported-by: Paul McMillan <paul.mcmillan@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the underlying hardware address as a setting. For IPoIB
devices, this provides scripts with access to the Infiniband GUID.
Requested-by: Allen, Benjamin S. <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record and report the number of peers (calculated as the maximum
number of peers discovered for a block's segment at the time that the
block download is complete), and the percentage of blocks retrieved
from peers rather than from the origin server.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Checking for job progress is essentially a user interface activity,
and can safely be performed only once per timer tick (as is already
done with checking for keypresses).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some external code (such as the UEFI UNDI driver for the Realtek USB
NIC on a Microsoft Surface Book) will block during transmission
attempts and can take several seconds to report a transmit error. If
there is a large queue of pending transmissions, then the accumulated
time from a series of such failures can easily exceed the EFI watchdog
timeout, resulting in what appears to be a system lockup followed by a
reboot.
Work around this problem by immediately cancelling any pending
transmissions as soon as any transmit error occurs.
The only expected transmit error under normal operation is ENOBUFS
arising when the hardware transmit queue is full. By definition, this
can happen only for drivers that do not utilise deferred
transmissions, and so this new behaviour will not affect these
drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SnpDxe driver raises the task priority level to TPL_CALLBACK when
calling the UNDI entry point. This does not appear to be a documented
requirement, but we should probably match the behaviour of SnpDxe to
minimise surprises to third party code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The tap driver can retrieve a potentially unlimited number of packets
in a single poll. This can lead to heap exhaustion under heavy load.
Fix by imposing an artificial receive quota (as already used in other
drivers without natural receive limits).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calling discard_cache() is likely to result in a call to
free_memblock(), which will call valgrind_make_blocks_noaccess()
before returning. This causes valgrind to report an invalid read on
the next iteration through the loop in alloc_memblock().
Fix by explicitly calling valgrind_make_blocks_defined() after
discard_cache() returns. Also call valgrind_make_blocks_noaccess()
before calling discard_cache(), to guard against free list corruption
while executing cache discarders.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that all headers (PCI, UNDI, PnP, iPXE) are aligned to at least
four bytes, so that all accesses to header fields will be correctly
aligned even when reading directly from the expansion ROM BAR.
Reported-by: Peter von Konigsmark <peter@exablaze.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Setting BANNER_TIMEOUT to zero removes the only symbol reference to
shell.o, causing the "shell" command to become unavailable.
Add SHELL_CMD in config/general.h (enabled by default) which will
explicitly drag in shell.o regardless of the value of BANNER_TIMEOUT.
Reported-by: Julian Brost <julian@0x4a42.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We must not steal ownership from the Gen 2 UEFI firmware, since doing
so will cause an immediate system crash (most likely in the form of a
reboot).
This problem was masked before commit a0f6e75 ("[hyperv] Do not fail
if guest OS ID MSR is already set"), since prior to that commit we
would always fail if we found any non-zero guest OS identity. We now
accept a non-zero previous guest OS identity in order to allow for
situations such as chainloading from iPXE to another iPXE, and as a
prerequisite for commit b91cc98 ("[hyperv] Cope with Windows Server
2016 enlightenments").
A proper fix would be to reverse engineer the UEFI protocols exposed
within the Hyper-V Gen 2 firmware and use these to bind to the VMBus
device representing the network connection, (with the native Hyper-V
driver moved to become a BIOS-only feature).
As an interim solution, fail to initialise the native Hyper-V driver
if we detect the guest OS identity known to be used by the Gen 2 UEFI
firmware. This will cause the standard all-drivers build (ipxe.efi)
to fall back to using the SNP driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EDK2 commit 6440385 ("MdePkg/Include: Add enumeration size checks to
Base.h") enforced the UEFI specification mandate that enums should
always be 32 bits. This revealed a latent bug in iPXE, which does not
build with -fno-short-enums.
Fix by adding -fno-short-enums to CFLAGS for ARM32 EFI builds.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The inline assembly used in include/errno.h to generate the einfo
blocks requires the ability to generate an immediate constant with no
immediate-value prefix (such as the dollar sign for x86 assembly).
We currently achieve this via the undocumented "%c0" form of operand.
This causes an "invalid operand prefix" error on GCC 4.8 for ARM64
builds.
Fix by switching to the equally undocumented "%a0" form of operand,
which appears to work correctly on all tested versions of GCC.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The -mabi option was added in GCC 4.9. Test for the existence of this
option to allow for building with earlier versions of GCC.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification has an implicit and demonstrably incorrect
requirement (in the Mem_IO() calling convention) that any UNDI network
device has at most one memory BAR and one I/O BAR.
Some UEFI platforms have been observed to report the existence of
non-existent additional I/O BARs, causing iPXE to select the wrong
BAR. This problem does not affect the SnpDxe driver, since that
driver will always choose the lowest numbered existent BAR of each
type.
Adjust iPXE's behaviour to match that of SnpDxe, i.e. to always select
the lowest numbered BAR(s).
Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Debugged-by: Adklei <adklei@realtek.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx datapath is essentially identical to that of the SMSC75xx.
Expose the transmit, poll, and bulk IN endpoint operations to allow
for reuse by the LAN78xx driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LAN78xx PHY interrupt source and mask registers do not match those
used by the SMSC75xx and SMSC95xx.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since we don't enable IOMMU at all, we can then simply enable the
IOMMU support by claiming the support of VIRITO_F_IOMMU_PLATFORM.
This fixes booting failure when iommu_platform is set from qemu cli.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The smsc75xx and smsc95xx drivers include a substantial amount of
identical functionality, varying only in the base address of register
sets. Abstract out this common functionality to allow code to be
shared between the drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support renegotiation with servers supporting RFC5746. This allows
for the use of per-directory client certificates.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use a zero language ID to retrieve strings such as the
ECM/NCM MAC address. This works on most hardware devices, but is
known to fail on some software emulated CDC-NCM devices.
Fix by using the first supported language ID, falling back to English
(0x0409) if any error occurs when fetching the list of supported
languages. This matches the behaviour of the Linux kernel.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our ASN.1 parsing code uses a struct asn1_cursor, while the object
construction code uses a struct asn1_builder. These structures are
identical apart from the const modifier applied to the data pointer in
struct asn1_cursor.
Provide asn1_built() to safely typecast a struct asn1_builder to a
struct asn1_cursor, allowing constructed objects to be passed to
functions expecting a struct asn1_cursor.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some CPUID leaves (e.g. %eax=0x00000004), the result depends on
the input value of %ecx. Allow this subfunction number to be
specified as a parameter to the cpuid() wrapper.
The subfunction number is exposed via the ${cpuid/...} settings
mechanism using the syntax
${cpuid/<subfunction>.0x40.<register>.<function>}
e.g.
${cpuid/0.0x40.0.0x0000000b}
${cpuid/1.0x40.0.0x0000000b}
to retrieve the CPU topology information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some distributions patch gcc to generate position independent
executables by default. We currently include a workaround to check
for this and to add -fno-PIE -nopie to CFLAGS if required.
Newer patched versions of gcc require -fno-PIE -no-pie instead. Check
for both variants.
Reported-by: Nathan Rennie-Waldock <nathan.renniewaldock@gmail.com>
Originally-fixed-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When booting from a hard disk image (e.g. bin/ipxe.usb) within an
emulator such as QEMU, the disk may not exist beyond the end of the
image. Limit all reads to the length of the image to avoid spurious
errors when loading the iPXE image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow values to be read from ACPI tables using the syntax
${acpi/<signature>.<index>.0.<offset>.<length>}
where <signature> is the ACPI table signature as a 32-bit hexadecimal
number (e.g. 0x41504093 for the 'APIC' signature on the MADT), <index>
is the index into the array of tables matching this signature,
<offset> is the byte offset within the table, and <length> is the
field length in bytes.
Numeric values are returned in reverse byte order, since ACPI numeric
values are usually little-endian.
For example:
${acpi/0x41504943.0.0.0.0} - entire MADT table in raw hex
${acpi/0x41504943.0.0.0x0a.6:string} - MADT table OEM ID
${acpi/0x41504943.0.0.0x24.4:uint32} - local APIC address
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When performing a SAN boot, the plainstream window size will be zero
(since this is the mechanism used internally to indicate that no data
should be fetched via the initial request). This zero value currently
propagates to the advertised TCP window size, which prevents the TLS
negotiation from completing.
Fix by ensuring that the cipherstream window is held open until TLS
negotiation is complete, and only then falling back to passing through
the plainstream window size.
Reported-by: John Wigley <johnwigley#ipxe@acorna.co.uk>
Tested-by: John Wigley <johnwigley#ipxe@acorna.co.uk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that efi_systab is an undefined symbol in non-EFI builds. In
particular, this prevents users from incorrectly enabling IMAGE_EFI in
a BIOS build of iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Xen network backend (xen-netback) suffered from a regression
between upstream Linux kernels 3.18 and 4.2 inclusive, which would
cause packet reception to fail unless at least 18 receive buffers were
available. This bug was fixed in kernel commit 1d5d485 ("xen-netback:
require fewer guest Rx slots when not using GSO").
Work around this bug in affected versions of xen-netback by providing
the requisite 18 receive buffers.
Reported-by: Taylor Schneider <tschneider@live.com>
Tested-by: Taylor Schneider <tschneider@live.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 7cfdd76 ("[block] Describe all SAN devices via ACPI tables")
changed the definition of the iSCSI initiator IQN in the iBFT to
represent a common initiator IQN used for all iSCSI sessions, and
attempted to calculate this common initiator IQN by fetching the
common ${initiator-iqn} setting.
This fails when no explicit ${initiator-iqn} has been specified
(i.e. when an initiator IQN has instead been constructed from either
the hostname or system UUID), and results in an empty initiator IQN in
the iBFT.
Fix by using the initiator IQN of an arbitrary iSCSI session
present in the iBFT.
Debugged-by: Tal Aloni <tal.aloni.il@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of kernel 4.11, the LIO target will propose a value for
FirstBurstLength if the initiator did not do so. This is entirely
redundant in our case, since FirstBurstLength is defined by RFC 3720
to be
"Irrelevant when: ( InitialR2T=Yes and ImmediateData=No )"
and we already enforce both InitialR2T=Yes and ImmediateData=No in our
initial proposal. However, LIO (arguably correctly) complains when we
do not respond to its redundant proposal of an already-irrelevant
value.
Fix by always proposing the default value for FirstBurstLength.
Debugged-by: Patrick Seeburger <info@8bit.de>
Tested-by: Patrick Seeburger <info@8bit.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the PCI bus:dev.fn address in debug messages, falling back to the
EFI handle name only if we do not yet have enough information to
determine the bus:dev.fn address.
Include the vendor and device IDs in debug messages when no suitable
driver is found, to match the diagnostics available in a BIOS
environment.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When performing a SAN boot via INT 13, there is no way for the
operating system to indicate that it has finished using the INT 13 SAN
device. We therefore have no opportunity to clean up state before the
loaded operating system's native drivers take over. This can cause
problems when booting Windows, which tends not to be forgiving of
unexpected system state.
Windows will typically write a flag to the SAN device as the last
action before transferring control to the native drivers. We can use
this as a heuristic to bring the system to a quiescent state (without
performing a full shutdown); this provides us an opportunity to
temporarily clean up state that could otherwise prevent a successful
Windows boot.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On most Intel NICs, Auto-Speed Detection Enable (ASDE) can be used to
automatically detect the correct link speed by sampling the link using
the internal PHY. This feature is automatically inhibited when not
appropriate for the physical link (e.g. when using internal SerDes
mode on the 8254x).
On the i350 datasheet ASDE is a reserved bit, but the relevant
auto-speed detection hardware appears still to be present. However,
enabling ASDE on the i350 1000BASE-KX backplane NIC seems to cause an
immediate link failure. It is possible that the auto-speed detection
hardware is still present, is not connected to a physical link, and is
not inhibited from being applied in this mode.
Work around this problem by adding an INTEL_NO_ASDE flag bit
(analogous to INTEL_NO_PHY_RST), and applying this for the i350
backplane NIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In situations where iPXE fails to reach link-up as expected, it is
useful to know the original values of the CTRL and STATUS registers
prior to our reset attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader. This does not work when performing a SAN boot on a
machine where the EFI variable is not present.
Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting. For
example:
sanboot --filename \efi\redhat\grub.efi \
iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6
or
option ipxe.san-filename code 188 = string;
option ipxe.san-filename "\\efi\\redhat\\grub.efi";
option root-path "iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6";
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Following changes were introduced:
- added GetBgxProp and GetLmacProp methods to ThunderxConfigProtocol
- replaced direct BOARD_CFG access with usage of introduced methods
- removed redundant BOARD_CFG
- changed GUID of ThunderxConfigProtocol, as this is not compatible
with previous version
- changed UINTN* to UINT64* buffer type to fix issue on 32-bit
platforms with MAC address
This change allows us to avoid alignment of BOARD_CFG definitions
every time it changes in UEFI.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TEST UNIT READY command is issued automatically when the device is
opened, and is not the result of a command being issued by the caller.
This is required in order that a permanent TEST UNIT READY failure can
be used to identify unusable paths in a multipath SAN device.
Since the TEST UNIT READY command is not part of the caller's command
issuing process, it is not covered by any external retry loops (such
as the main retry loop in sandev_command()).
We must therefore be prepared to retry the TEST UNIT READY command
within the SCSI layer itself. We retry only the TEST UNIT READY
command so as not to multiply the number of potential retries for
normal commands (which are already retried by sandev_command()).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
HTTP implements xfer_window_changed() on the underlying server
connection using http_step(), which does not propagate the window
change notification to the data transfer interface. This breaks the
multipath-capable SAN boot code, which relies on the window change
notification to discover that the HTTP block device is ready for
commands to be issued.
Fix by sending xfer_window_changed() in http_step() once the
underlying connection has been determined to be ready.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some block device protocols, the active path may continue to
receive xfer_window_changed() notifications during normal use. These
currently result in the active path being erroneously closed.
Fix by ignoring any xfer_window_changed() messages if this path is
already the active path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For multipath SAN devices, verify that the device is capable of being
opened (i.e. that all URIs are parseable and that at least one path is
alive) and thereafter retry indefinitely to reopen the device as
needed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When all SAN targets are completely unreachable, there will be a
natural delay between reopening attempts due to the network connection
timeout on the unreachable targets.
However, some SAN targets may accept connections instantly and report
a temporary unavailability by e.g. failing the TEST UNIT READY
command. If all targets are behaving this way then there will be no
natural delay, and we will attempt to saturate the network with
connection attempts.
Fix by introducing a small delay between attempts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the SAN retry count to be configured via the ${san-retry}
setting, defaulting to the current value of 10 retries if not
specified.
Note that setting a retry count of zero is inadvisable, since iSCSI
targets in particular will often report spurious errors such as "power
on occurred" for the first few commands.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The INT13 console type (CONSOLE_INT13) autodetects at initialisation
time a magic partition to be used for logging iPXE console output. If
the INT13 drive number mapping is subsequently changed (e.g. because
iPXE was used to perform a SAN boot), then the console logging output
will be written to the incorrect disk.
Fix by recording the INT13 vector at initialisation time, and using
this original vector to emulate INT13 calls for all subsequent
accesses. This should be robust against drive remapping performed
either by ourselves or by another bootloader (e.g. a chainloaded
undionly.kpxe which then performs a SAN boot).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some partition tables have partitions that are not aligned to a
cylinder boundary, which confuses the current geometry guessing logic.
Enhance the existing logic to ensure that we never reduce our guesses
for the number of heads or sectors per track, and add extra logic to
calculate the exact number of sectors per track if we find a partition
that starts within cylinder zero.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for multipath block devices. The "sanboot" and
"sanhook" commands now accept a list of SAN URIs. We open all URIs
concurrently. The first connection to become available for issuing
block device commands is marked as the active path and used for all
subsequent commands; all other connections are then closed. Whenever
the active path fails, we reopen all URIs and repeat the process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a dummy SAN device which allows the "sanhook" command to be tested
even when no SAN booting capability is present on the platform. This
allows substantial portions of the SAN boot code to be run in Linux
under Valgrind.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the TEST UNIT READY command receives an error response, the
shutdown of the command's block data interface will result in
scsidev_ready() closing the SCSI device. This will subsequently
result in a duplicate call to scsicmd_close(), leading to an assertion
failure when list_del() is called for the second time.
Fix by removing the command from the list of outstanding commands
before shutting down the command's interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The eIPoIB translation layer needs to translate outbound ARP packets
from Ethernet to IPoIB. A 64-byte buffer (starting with the Ethernet
header) does not provide enough tailroom to expand to hold the two
20-byte IPoIB MAC addresses. The result is that an UNDI API user will
be unable to send ARP packets.
We could potentially shuffle the packet contents to reuse the space
occupied by the stripped Ethernet link-layer header, but this would
add complexity. Instead, fix by increasing the minimum allocation
size to 128 bytes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The value of ( ( x & 0x0c00 ) | 0x0c00 ) is always 0x0c00 regardless
of the value of x, and so the read_csr() is redundant. (There are no
read side effects for this register, according to the datasheet.)
This line of code originated in Linux kernel 2.3.19pre1 as
a->write_csr(ioaddr, 80, a->read_csr(ioaddr, 80) | 0x0c00);
and was modified in kernel 2.3.41pre4 to read
a->write_csr(ioaddr, 80, (a->read_csr(ioaddr, 80) & 0x0C00) | 0x0c00);
In the absence of commit messages, the intention of the code is
unclear. However, the logic resulting in a fixed value of 0x0c00 has
remained unaltered for over 17 years, and can probably be assumed to
have the correct overall result.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Track the current and maximum heap usage, and display the maximum
during shutdown when DEBUG=malloc is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our asprintf() implementation guarantees that strp will be NULL on
allocation failure, but this is not standard behaviour. Detect errors
by checking for a negative return value instead of a NULL pointer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid potential division by zero when performing the check against
multiplication overflow. (Note that if the width is zero then there
can be no overflow anyway, so it is then safe to bypass the check.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For visual consistency with surrounding lines, the definitions of
DBG_MORE(), DBG_PAUSE(), etc include an unnecessary ##__VA_ARGS__
argument which is always elided. This confuses sparse, which
complains about DBG_MORE_IF() being called with more than one
argument.
Work around this problem by adding an unused variable argument list to
the single-argument macros DBG_MORE_IF() and DBG_PAUSE_IF().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Report errors in eoib_duplicate() via netdev_tx_err() rather than
netdev_tx_complete_err(), since netdev_tx_complete_err() accepts only
valid I/O buffers that are currently in the network device's transmit
queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit 10d19bd ("[pxe] Always retrieve cached DHCPACK and apply
to relevant network device"), the UNDI driver has been the only user
of pxeparent_call(). Remove the unnecessary layer of abstraction by
refactoring this code back into undinet.c, and fix the ability of
undiisr.S to fall back to chaining to the original handler if we were
unable to unhook our own ISR.
This effectively reverts commit 337e1ed ("[pxe] Separate parent PXE
API caller from UNDINET driver").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently request cable detection in PXE_OPCODE_INITIALIZE to work
around buggy Emulex drivers (see commit c0b61ba ("[efi] Work around
bugs in Emulex NII driver")).
This causes problems with some other NII drivers (e.g. Mellanox),
which may time out if the underlying link is intrinsically slow to
come up.
Attempt to work around both problems simultaneously by requesting
cable detection only if the underlying NII driver does not support
link status reporting via PXE_OPCODE_GET_STATUS. (This is based on a
potentially incorrect assumption that the buggy Emulex drivers do not
claim to report link status via PXE_OPCODE_GET_STATUS.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a basic proof of concept ACPI table description (e.g. iBFT for
iSCSI) for SAN devices in a UEFI environment, using a control flow
that is functionally identical to that used in a BIOS environment.
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several files define the ARRAY_SIZE() macro as used in Linux. Provide
a common definition for this in include/compiler.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some iSCSI targets send NOP-In. Rather than closing the connection
when we receive one, it is more user friendly to log a debug message
and keep the connection open. Eventually, it would be nice if iPXE
supported replying to NOP-Ins, but we might as well keep the
connection open until the target disconnects us.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some VF data is not cleared with reset, so make sure to return all the
settings to default before configuring the VF.
This fixes an issue where network packets would fail to be received if
the VF was previously used by the linux ixgbevf driver.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a SCSI device is closed in error, the shutdown of the device's
block data interface will probably lead to any outstanding commands
being closed (by whichever object is currently connected to the block
data interface). However, commands remain in the list of outstanding
commands until the final reference is dropped. The result is that
scsidev_close() will make a second call to scsicmd_close() for each
command. This is harmless, but produces confusing debug messages.
Fix by treating the outstanding command list as holding an explicit
reference to each command, and removing the command from the list of
outstanding commands in scsicmd_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SCSI layer currently implements a retry loop in order to retry
commands that fail due to spurious "error" conditions such as "power
on occurred". Move this retry loop to the generic SAN device layer:
this allow for retries due to other transient error conditions such as
an iSCSI target having dropped the connection due to inactivity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The concept of the SAN drive number is meaningful only in a BIOS
environment, where it represents the INT13 drive number (0x80 for the
first hard disk). We retain this concept in a UEFI environment to
allow for a simple way for iPXE commands to refer to SAN drives.
Centralise the concept of the default drive number, since it is shared
between all supported environments.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to ThunderX Errata G-17560, NIC_PF_CFG[ENA] bit should not
be cleared at exit. This allows other drivers to access the NIC regs
correctly.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is required to reset BGX context state for the LMAC using
BGX_CMR_CONFIG register.
This solves problem with network connectivity in Linux booted from
iPXE.
Signed-off-by: Bartosz Szczepanek <bartosz.szczepanek@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use intfs_shutdown() and intfs_restart() to cleanly shut down multiple
interfaces that may loop back to the same object.
This fixes a regression introduced by commit daa8ed9 ("[interface]
Provide intf_reinit() to reinitialise nullified interfaces") which
broke the use of HTTP Basic and Digest authentication.
Reported-by: murmansk <murmansk@hotmail.com>
Reported-by: Brett Waldo <brettwaldo@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Shutting down (and optionally restarting) multiple interfaces is
fraught with problems if there are loops in the interface connectivity
(e.g. the HTTP content-decoded and transfer-decoded interfaces, which
will generally loop back to each other). Various workarounds
currently exist across the codebase, generally involving preceding
calls to intf_nullify() to avoid problems due to known loops.
Provide intfs_shutdown() and intfs_restart() to allow all of an
object's interfaces to be shut down (or restarted) in a single call,
without having to worry about potential external loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the current wall-clock time (in seconds since the Epoch), since
this is often useful in captured boot logs and can also be useful when
checking unexpected X.509 certificate validation failures.
Use a :uint32 setting to avoid Y2K38 rollover, thereby ensuring that
this will eventually be somebody else's problem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Malte zu Klampen <malte@pclab.ifg.uni-kiel.de>
Originally-implemented-by: Richard Moore <rich@richud.com>
Tested-by: Esben Storgaard Nielsen <esn@solar.dk>
Signed-off-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
INT 13 calls return a status value via %ah, with CF set if %ah is
non-zero (indicating an error). Our wrappers zero the whole of %ax if
CF is clear, to allow C code (which has no easy access to CF) to
simply test for a non-zero status to detect an error.
The current code assigns the returned status to a uint8_t, effectively
testing %al rather than %ah. Fix by treating the returned status as a
uint16_t instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid using a zero sector count to guess the disk geometry, since that
would result in a division by zero when calculating the number of
cylinders.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running on AMD platforms, the legacy hardware emulation is
extremely unreliable. In particular, the IRQ0 timer interrupt is
likely to simply stop working, resulting in a total failure of any
code that relies on timers (such as DHCP retransmission attempts).
Work around this by using the 10MHz time counter provided by Hyper-V
via an MSR. (This timer can be tested in KVM via the command-line
option "-cpu host,hv_time".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the active timer (providing udelay() and currticks()) to be
selected at runtime based on probing during the INIT_EARLY stage of
initialisation.
TICKS_PER_SEC is now a fixed compile-time constant for all builds, and
is independent of the underlying clock tick rate. We choose the value
1024 to allow multiplications and divisions on seconds to be converted
to bit shifts.
TICKS_PER_MS is defined as 1, allowing multiplications and divisions
on milliseconds to be omitted entirely. The 2% inaccuracy in this
definition is negligible when using the standard BIOS timer (running
at around 18.2Hz).
TIMER_RDTSC now checks for a constant TSC before claiming to be a
usable timer. (This timer can be tested in KVM via the command-line
option "-cpu host,+invtsc".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Separate out the concept of "hardware maximum supported frame length"
and "configured link MTU", and limit the latter according to the
former.
In networks where the DHCP-supplied link MTU is inconsistent with the
hardware or driver capabilities (e.g. a network using jumbo frames),
this will result in iPXE advertising a TCP MSS consistent with a size
that can actually be received.
Note that the term "MTU" is typically used to refer to the maximum
length excluding the link-layer headers; we adopt this usage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The call to intf_close() may result in the original interface being
reopened. For example: when reading the capacity of a 2TB+ disk via
iSCSI, the SCSI layer will respond to the intf_close() from the READ
CAPACITY (10) command by immediately issuing a READ CAPACITY (16)
command. The iSCSI layer happens to reuse the same interface for the
new command (since it allows only a single concurrent command).
Currently, intf_shutdown() unplugs the interface after the call to
intf_close() returns. In the above scenario, this results in
unplugging the just-reopened interface.
Fix by transferring the interface destination (and its reference) to a
temporary interface, and so effectively performing the unplug before
making the call to intf_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The x86_64 EDK2 headers include a #pragma to mark all subsequent
symbol declarations and references as hidden if position-independent
code is being generated. Since libgen.h is currently included only
after the EDK2 headers, this results in __xpg_basename() being
erroneously marked as having hidden visibility (if the compiler
defaults to building position-independent code); this eventually
results in a failure to link the elf2efi binary.
Fix by including libgen.h prior to including the EDK2 headers.
Originally-fixed-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some high-end Azure instances (e.g. NC6) we may receive an
unsolicited VMBUS_OFFER_CHANNEL message for a PCIe pass-through device
some time after completing the bus enumeration. This currently causes
apparently random failures due to unexpected VMBus message types.
Fix by ignoring any unsolicited VMBus messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some problems arise only when running on a specific CPU type (e.g.
non-functional timer interrupts as observed in Azure AMD instances).
Include the CPU vendor and model within the sample cloud boot scripts,
to assist in debugging such problems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a settings applicator to modify netdev->max_pkt_len in
response to changes to the "mtu" setting (DHCP option 26).
Note that as with MAC address changes, drivers are permitted to
completely ignore any changes in the MTU value. The net result will
be that iPXE effectively uses the smaller of either the hardware
default MTU or the software configured MTU.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some unspecified "security" reason, the Google Compute Engine
metadata server will refuse any requests that do not include the
non-standard HTTP header "Metadata-Flavor: Google".
Attempt to autodetect such requests (by comparing the hostname against
"metadata.google.internal"), and add the "Metadata-Flavor: Google"
header if applicable.
Enable this feature in the CONFIG=cloud build, and include a sample
embedded script allowing iPXE to boot from a script configured as
metadata via e.g.
# Create shared boot image
make bin/ipxe.usb CONFIG=cloud EMBED=config/cloud/gce.ipxe
# Configure per-instance boot script
gcloud compute instances add-metadata <instance> \
--metadata-from-file ipxeboot=boot.ipxe
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some host implementations (notably Google Compute Platform) are known
to unconditionally write back VIRTIO_NET_HDR_F_DATA_VALID to
header->flags for received packets, regardless of the features
negotiated by the driver. This breaks the transmit datapath by
effectively setting an illegal flag for all subsequent transmitted
packets.
Work around this problem by using separate empty header buffers for
the receive and transmit queues.
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This code largely inspired by tap.c. Allows for testing iPXE on real
NICs from within Linux. For example:
make bin-x86_64-linux/af_packet.linux
valgrind ./bin-x86_64-linux/af_packet.linux --net af_packet,if=eth3
Tested as x86_64 and i386 binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 0.9 implementation was limited to the maximum virtqueue size of
MAX_QUEUE_NUM and the virtio-net driver would fail to initialize on hosts
exceeding this limit.
This commit lifts the restriction by allocating the queue memory based on
the actual queue size instead of using a fixed maximum. Note that virtio
1.0 still uses the MAX_QUEUE_NUM constant to cap the size (unfortunately
this functionality is not available in virtio 0.9).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit introduces virtnet_free_virtqueues called on all virtqueue
error and shutdown paths. vpm_find_vqs no longer cleans up after itself
and instead expects virtnet_free_virtqueues to be always called to undo
its effect.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
vpm_find_vqs incorrectly accepted the host provided queue size with no
regard to iPXE's internal limitations. Virtio 1.0 makes it possible for
the driver to override the queue size to reduce memory requirements and
iPXE is a great use case for this feature.
Also removing the extra vq->vring.num assignment which is already
handled in vring_init.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ISC Kea DHCP server transmits its DHCPOFFER as a unicast packet
with a broadcast IPv4 destination address (255.255.255.255). This
combination is currently rejected by iPXE.
Fix by explicitly accepting the local network broadcast address
(255.255.255.255) as a valid unicast destination address.
Reported-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Tested-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Updates:
- Nodnic: Support for arm cq doorbell via the UAR BAR
- Ensure hardware is quiescent when no interface is open - WinPE WA
- Support for clear interrupt via BAR
- Nodnic: Support for send TX doorbells via the UAR BAR
- Added ConnectX-5EX device
- Added ConnectX-5 device
Signed-off-by: Raed Salem <raeds@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no clean way for device drivers to shut down in
preparation for handover to a booted operating system. The platform
firmware simply doesn't bother to call the drivers' Stop() methods.
Instead, drivers must register an EVT_SIGNAL_EXIT_BOOT_SERVICES event
to be signalled when ExitBootServices() is called, and clean up
without any reference to the EFI driver model.
Unfortunately, all timers silently stop working when ExitBootServices()
is called. Even more unfortunately, and for no discernible reason,
this happens before any EVT_SIGNAL_EXIT_BOOT_SERVICES events are
signalled. The net effect of this entertaining design choice is that
any timeout loops on the shutdown path (e.g. for gracefully closing
outstanding TCP connections) may wait indefinitely.
There is no way to report failure from currticks(), since the API
lazily assumes that the host system continues to travel through time
in the usual direction. Work around EFI's violation of this
assumption by falling back to a simple free-running monotonic counter.
Debugged-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When searching for an UNDI ROM to match against a PCI device, search
in order of increasing ROM address (within the 128kB BIOS option ROM
area). This is likely (though not guaranteed) to match the order of
the original enumeration performed by the BIOS, which is in turn
likely to match the order of enumeration on the PCI bus.
Since we load at most one UNDI ROM, the net result is that we increase
our chances of loading the ROM corresponding to the selected PCI
device (rather than loading a ROM corresponding to a higher-numbered
PCI device with the same vendor and device IDs.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "progress" macro can be used only from within the .prefix section.
At the point of calling relocate(), we are running in .text16 and so
the near call to print_message() will end up calling a random function
somewhere in .text16.
Interestingly, this problem has remained unnoticed for some time. It
is rare to build with DEBUG=libprefix. In the few cases that it has
been used during development, the randomly selected function in
.text16 seems to have been a harmless no-op with no visible
side-effects (beyond the unnoticed failure to print the "relocate"
progress message).
Fix by removing the futile attempt to print a progress message before
calling relocate().
Reported-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix the <NULL> driver name reported by "ifstat" when using the undipci
driver (due to the unnecessary extra device node inserted as a child
of the PCI device).
Remove the "UNDI-" prefix from device names since the driver name is
also now visible via "ifstat", and tidy up the device name to match
the format used by standard PCI devices.
The output from "ifstat" now resembles:
iPXE> ifstat
net0: 52:54:00:12:34:56 using undipci on 0000:00:03.0
iPXE> ifstat
net0: 52:54:00:12:34:56 using undionly on 0000:00:03.0
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UNDI loader entry point is very likely to be called after POST,
when there is a high chance that the PMM-allocated image source area
and decompression area have been reused by something else.
In particular, using an iPXE .iso to test a separate iPXE ROM's UNDI
loader entry point in a qemu VM is likely to crash. SeaBIOS allocates
PMM blocks from close to the top of memory and so these blocks have a
high chance of colliding with the runtime addresses subsequently
chosen by the non-ROM iPXE by scanning the INT 15,e820 memory map.
The standard romprefix.S has no choice about relying on the
PMM-allocated image source area, since it has no other way to retrieve
its compressed payload.
In mromprefix.S, the image source area functions only as an optional
buffer used to avoid repeated reads from the (potentially slow)
expansion ROM BAR by the decompression code. We can therefore always
set %esi=0 when calling install_prealloc from the UNDI loader entry
point, and simply fall back to reading directly from the expansion ROM
BAR.
We can always set %edi=0 when calling install_prealloc from the UNDI
loader entry point. This will behave as though the decompression area
PMM allocation failed, and will therefore use INT 15,88 to find a
temporary decompression area somewhere close to 64MB. This is by no
means guaranteed to be safe from collisions, but it's probably safer
on balance than the PMM-allocated address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allocate base memory (by decreasing the free base memory counter)
before calling the UNDI loader entry point, to minimise surprises for
the UNDI loader code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The command and data interfaces may be connected to the same object.
Nullify the data interface before shutting down the control interface
to avoid potential infinite loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 71560d1 ("[librm] Preserve FPU, MMX and SSE state across calls
to virt_call()") added FXSAVE and FXRSTOR instructions to iPXE. In
KVM virtual machines, these instructions execute fine as long as the
host CPU supports the "unrestricted_guest" feature (that is, it can
virtualize big real mode natively). On older host CPUs however, KVM
has to emulate big real mode, and it currently doesn't implement
FXSAVE emulation.
Upstream QEMU rebuilt iPXE at commit 0418631 ("[thunderx] Fix
compilation with older versions of gcc") which is a descendant of
commit 71560d1 (see above).
This was done in QEMU commit ffdc5a2 ("ipxe: update submodule from
4e03af8ec to 041863191"). The resultant binaries were bundled with
the QEMU v2.7.0 release; see QEMU commit c52125a ("ipxe: update
prebuilt binaries").
This distributed the iPXE workaround for the Tivoli VMM bug to a
number of KVM users with old host CPUs, causing KVM emulation failures
(guest crashes) for them while netbooting.
Make the FXSAVE and FXRSTOR instructions conditional on a new feature
test macro called TIVOLI_VMM_WORKAROUND. Define the macro by default.
There is prior art for an assembly file including config/general.h:
see arch/x86/prefix/romprefix.S. Also, TIVOLI_VMM_WORKAROUND seems to
be a good fit for the "Obscure configuration options" section in
config/general.h.
Cc: Bandan Das <bsd@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Greg <rollenwiese@yahoo.com>
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Michael Prokop <launchpad@michael-prokop.at>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Pickford <arch@netremedies.ca>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Ref: https://bugs.archlinux.org/task/50778
Ref: https://bugs.launchpad.net/qemu/+bug/1623276
Ref: https://bugzilla.proxmox.com/show_bug.cgi?id=1182
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1356762
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The initrd_addr_max field represents the highest byte address that may
be used to hold initrd images, and is therefore almost certainly not
aligned to a page boundary: a typical value might be 0x7fffffff.
Fix the address calculations to ensure that the initrd images are
always aligned to a page boundary.
Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AppleNetBoot.h is not taken from the EDK2 codebase and so cannot be
imported using include/ipxe/efi/import.pl. Mark as a native iPXE
header (by changing the include guard) to avoid breaking the import
process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow certificates to be marked as having been added explicitly at run
time. Such certificates will not be discarded via the certificate
store cache discarder.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable IMAGE_PNG (but not IMAGE_PNM) by default, and drag in the
relevant objects only when image_pixbuf() is present in the binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable both IMAGE_DER and IMAGE_PEM by default, and drag in the
relevant objects only when image_asn1() is present in the binary.
This allows "imgverify" to transparently use either DER or PEM
signature files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit b1caa48 ("[crypto] Support SHA-{224,384,512} in X.509
certificates"), the list of supported cryptographic algorithms is
controlled by config/crypto.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PEM-encoded ASN.1 as an image format. We accept as PEM any image
containing a line starting with a "-----BEGIN" boundary marker.
We allow for PEM files containing multiple ASN.1 objects, such as a
certificate chain produced by concatenating individual certificate
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add DER-encoded ASN.1 as an image format. There is no fixed signature
for DER files. We treat an image as DER if it comprises a single
valid SEQUENCE object covering the entire length of the image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow code to create a partial ASN.1 cursor containing only the type
and length bytes, so that asn1_start() may be used to determine the
length of a large ASN.1 blob without first allocating memory to hold
the entire blob.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Windows drivers for VMBus devices are enumerated using the
instance UUID rather than the channel number. Include the instance
UUID within the iPXE device name to allow an iPXE network device to be
more easily associated with the corresponding Windows network device
when debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Select the IPv6 source address and corresponding router (if any) using
a very simplified version of the algorithm from RFC6724:
- Ignore any source address that has a smaller scope than the
destination address. For example, do not use a link-local source
address when sending to a global destination address.
- If we have a source address which is on the same link as the
destination address, then use that source address.
- If we are left with multiple possible source addresses, then choose
the address with the smallest scope. For example, if we are sending
to a site-local destination address and we have both a global source
address and a site-local source address, then use the site-local
source address.
- If we are still left with multiple possible source addresses, then
choose the address with the longest matching prefix.
For the purposes of this algorithm, we treat RFC4193 Unique Local
Addresses as having organisation-local scope. Since we use only
link-local scope for our multicast transmissions, this approximation
should remain valid in all practical situations.
Originally-implemented-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the IPv6 settings to construct the routing table, in a matter
analogous to the construction of the IPv4 routing table.
This allows for manual assignment of IPv6 addresses via e.g.
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0/len6 64
set net0/gateway6 fe80::226:bff:fedd:d3c0
The prefix length ("len6") may be omitted, in which case a default
prefix length of 64 will be assumed.
Multiple IPv6 addresses may be assigned manually by implicitly
creating child settings blocks. For example:
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0.ula/ip6 fda4:2496:e992::6950:5845
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonable user expectation is that ${net0/ip6} should show the
"highest-priority" of the IPv6 addresses, even when multiple IPv6
addresses are active. The expected order of priority is likely to be
manually-assigned addresses first, then stateful DHCPv6 addresses,
then SLAAC addresses, and lastly link-local addresses.
Using ${priority} to enforce an ordering is undesirable since that
would affect the priority assigned to each of the net<N> blocks as a
whole, so use the sibling ordering capability instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow settings blocks to provide an explicit default ordering between
siblings, with lower precedence than the existing ${priority} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the IPv6 address (or prefix) as ${ip6}, the prefix length as
${len6}, and the router address as ${gateway6}.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The settings scope ipv6_scope refers specifically to IPv6 settings
that have a corresponding DHCPv6 option. Rename to dhcpv6_scope to
more accurately reflect this purpose.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently perform IPv6 stateless address autoconfiguration (SLAAC)
in response to any router advertisement with the relevant flags set.
This can result in the local IPv6 source address changing midway
through a TCP connection, since our connections bind only to a local
port number and do not store a local network address.
In addition, this behaviour for SLAAC is inconsistent with that for
DHCPv4 and stateful DHCPv6, both of which will be performed only as a
result of an explicit autoconfiguration action (e.g. via the default
autoboot sequence, or the "ifconf" command).
Fix by ignoring router advertisements arriving outside the context of
an ongoing autoconfiguration attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit db34436 ("[intel] Strip spurious VLAN tags received by virtual
function NICs") accidentally introduced two copies of the
intel[x]vf_mbox_queues() function. Remove the unintended copy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function may be configured to transparently insert a VLAN
tag into all transmitted packets. Unfortunately, it does not
equivalently strip this same VLAN tag from all received packets. This
behaviour may be observed in some Amazon EC2 instances with Enhanced
Networking enabled: transmissions work as expected but all packets
received by iPXE appear to have a spurious VLAN tag.
We can configure the receive queue to strip VLAN tags via the
RXDCTL.VME bit. We need to find out from the PF driver whether or not
we should do so.
There exists a "get queue configuration" mailbox message which
contains a field labelled IXGBE_VF_TRANS_VLAN in the Linux driver.
A comment in the Linux PF driver describes this field as "notify VF of
need for VLAN tag stripping, and correct queue". It will be filled
with a non-zero value if the PF is enforcing the use of a single VLAN
tag. It will also be filled with a non-zero value if the PF is using
multiple traffic classes.
The Linux VF driver seems to treat this field as being simply the
number of traffic classes, and gives it no VLAN-related
interpretation. The Linux VF driver instead handles the VLAN tag
stripping by simply assuming that any unrecognised VLAN tag ought to
be silently dropped.
We choose to strip and ignore the VLAN tag if the IXGBE_VF_TRANS_VLAN
field has a non-zero value.
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Tested-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a busy network (such as a public cloud), IPv4 addresses may be
recycled rapidly. When this happens, unidirectional traffic (such as
UDP syslog) will succeed, but bidirectional traffic (such as TCP
connections) may fail due to stale ARP cache entries on other nodes.
The remote ARP cache expiry timeout is likely to exceed iPXE's
connection timeout, meaning that boot attempts can fail before the
problem is automatically resolved.
Fix by sending gratuitous ARPs whenever an IPv4 address is changed, to
attempt to update stale remote ARP cache entries. Note that this is
not a guaranteed fix, since ARP is an unreliable protocol.
We avoid sending gratuitous ARPs unconditionally, since otherwise any
unrelated settings change (e.g. "set dns 192.168.0.1") would cause
unexpected gratuitous ARPs to be sent.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ACPI power off sequence may not take effect immediately. Delay
for one second, to eliminate potentially confusing log messages such
as "Could not power off: Error 0x43902001 (http://ipx".
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some platforms (observed in a small subset of Microsoft Azure
(Hyper-V) virtual machines), the RTC appears to be incapable of
generating an interrupt via the legacy PIC. The RTC status registers
show that a periodic interrupt has been asserted, but the PIC IRR
shows that IRQ8 remains inactive.
On such systems, iPXE will currently freeze during the "iPXE
initialising devices..." message.
Work around this problem by checking that RTC interrupts are being
raised before returning from rtc_entropy_enable(). If no interrupt is
seen within 100ms, then we assume that the RTC interrupt mechanism is
broken. In these circumstances, iPXE will continue to initialise but
any subsequent attempt to generate entropy will fail. In particular,
HTTPS connections will fail with an error indicating that no entropy
is available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In edk2, there are several drivers that associate HII forms (and
corresponding config access protocol instances) with each individual
network device. (In this context, "network device" means the EFI
handle on which the SNP protocol is installed, and on which the device
path ending with the MAC() node is installed also.) Such edk2 drivers
are, for example: Ip4Dxe, HttpBootDxe, VlanConfigDxe.
In UEFI, any given handle can carry at most one instance of a specific
protocol (see e.g. the specification of the InstallProtocolInterface()
boot service). This implies that the class of drivers mentioned above
can't install their EFI_HII_CONFIG_ACCESS_PROTOCOL instances on the
SNP handle directly -- they would conflict with each other.
Accordingly, each of those edk2 drivers creates a "private" child
handle under the SNP handle, and installs its config access protocol
(and corresponding HII package list) on its child handle.
The device path for the child handle is traditionally derived by
appending a Hardware Vendor Device Path node after the MAC() node.
The VenHw() nodes in question consist of a GUID (by definition), and
no trailing data (by choice). The purpose of these VenHw() nodes is
only that all the child nodes can be uniquely identified by device
path.
At the moment iPXE does not follow this pattern. It doesn't run into
a conflict when it installs its EFI_HII_CONFIG_ACCESS_PROTOCOL
directly on the SNP handle, but that's only because iPXE is the sole
driver not following the pattern. This behavior seems risky (one
might call it a "latent bug"); better align iPXE with the edk2 custom.
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Gary Lin <glin@suse.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13494/focus=13532
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with assertions, profiling is enabled for objects built with any
debug level (including an explicit debug level of zero).
Allow profiling to be globally enabled or disabled by adding PROFILE=1
or PROFILE=0 respectively to the build command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Assertions are enabled for objects built with any debug level
(including an explicit debug level of zero). It is sometimes useful
to be able to enable assertions across all objects; this currently
requires manually hacking include/assert.h.
Allow assertions to be globally enabled by adding ASSERT=1 to the
build command line. For example:
make bin/8086100e.mrom ASSERT=1
Similarly, allow assertions to be globally disabled by adding ASSERT=0
to the build command line. If no ASSERT=... is specified on the
build command line, then only objects mentioned in DEBUG=... will have
assertions enabled (as is currently the case).
Note than globally enabling assertions imposes a relatively heavy
runtime penalty, primarily due to the various sanity checks performed
by list_add(), list_for_each_entry(), etc.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the DEBUG=... syntax to allow debug messages to be compiled in
but disabled by default. For example:
make bin/undionly.kpxe DEBUG=netdevice:3:1
would compile in the messages as for DEBUG=netdevice:3, but would set
the debug level mask so that only the DEBUG=netdevice:1 messages would
be displayed.
This allows for external code to selectively enable the additional
debug messages at runtime, without being overwhelmed by unwanted
initial noise. For example, a developer of a new protocol may want to
temporarily enable tracing of all packets received: this can be done
by building with DEBUG=netdevice:3:1 and using
// temporarily enable per-packet messages
DBG_ENABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
...
// disable per-packet messages
DBG_DISABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
Note that unlike the usual DBG_ENABLE() and DBG_DISABLE() macros,
DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() will not be removed via
dead code elimination if debugging is disabled in the specified
object. In particular, this means that using either of these macros
will always result in a symbol reference to the specified object.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DBG_ENABLE() and DBG_DISABLE() macros currently affect the debug
level of all objects that were built with debugging enabled. This is
undesirable, since it is common to use different debug levels in each
object.
Make the debug level mask a per-object variable. DBG_ENABLE() and
DBG_DISABLE() now control only the debug level for the containing
object (which is consistent with the intended usage across the
existing codebase). DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() may
be used to control the debug level for a specified object. For
example:
// Enable DBG() messages from tcpip.c
DBG_ENABLE_OBJECT ( tcpip, DBGLVL_LOG );
Note that the existence of debug messages continues to be gated by the
DEBUG=... list specified on the build command line. If an object was
built without the relevant debug level, then DBG_ENABLE_OBJECT() will
have no effect on that object at runtime (other than to explicitly
drag in the object via a symbol reference).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A redirection failure is fatal, but provides no opportunity for the
caller of xfer_[v]redirect() to report the failure since the interface
will already have been disconnected. Fix by sending intf_close() from
within the default xfer_vredirect() handler.
Debugged-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The vendor class identifier strings in DHCP_ARCH_VENDOR_CLASS_ID are
out of sync with the (correct) client architecture values in
DHCP_ARCH_CLIENT_ARCHITECTURE.
Fix by removing all definitions of DHCP_ARCH_VENDOR_CLASS_ID, and
instead generating the vendor class identifier string automatically
based on DHCP_ARCH_CLIENT_ARCHITECTURE and DHCP_ARCH_CLIENT_NDI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC3315 defines DHCPv6 option 16 (vendor class identifier) but does
not define any direct relationship with the roughly equivalent DHCPv4
option 60.
The PXE specification predates IPv6, and the UEFI specification is
expectedly vague on the subject. Examination of the reference EDK2
codebase suggests that the DHCPv6 vendor class identifier will be
formatted in accordance with RFC3315, using a single vendor-class-data
item in which the opaque-data field is the string as would appear in
DHCPv4 option 60.
RFC3315 requires the vendor class identifier to specify an IANA
enterprise number, as a way of disambiguating the vendor-class-data
namespace. The EDK2 code uses the value 343, described as:
// TODO: IANA TBD: temporarily using Intel's
Since this "TODO" has been present since at least 2010, it is probably
safe to assume that it has now become a de facto standard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC5970 defines DHCPv6 options 61 (client system architecture type)
and 62 (client network interface identifier), with contents equivalent
to DHCPv4 options 93 and 94 respectively.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
DHCPv4 and DHCPv6 share some values in common for the architecture-
specific options (such as the client system architecture type), but
use different encapsulations: DHCPv4 has a single byte for the option
length while DHCPv6 has a 16-bit field for the option length.
Move the containing DHCP_OPTION() and related wrappers from the
individual dhcp_arch.h files to dhcp.c, thus allowing for the
architecture-specific values to be reused in dhcpv6.c.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes (observed with an HP Gen9) seem to spuriously enable
interrupts at the PIC. This causes problems with NBPs such as GRUB
which use the UNDI API (thereby enabling interrupts on the NIC)
without first hooking an interrupt service routine. In this
situation, the interrupt will end up being handled by the default BIOS
ISR, which will typically just send an EOI and return. Since nothing
in this handler causes the NIC to deassert the interrupt, this will
result in an interrupt storm.
Entertainingly, some BIOSes are immune to this problem because the
default ISR sends the EOI only to the slave PIC; this effectively
disables the interrupt.
Work around this problem by disabling the interrupt on the PIC before
invoking the PXE NBP. An NBP that expects to make use of interrupts
will need to be configuring the PIC anyway, so it is probably safe to
assume that it will explicitly reenable the interrupt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There seems to be no reason for the sti/cli pair used around each call
to INT 10. Remove these instructions, so that printing debug messages
from within an ISR does not temporarily reenable interrupts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The HII IFR structures are allocated via realloc() rather than
zalloc(), and so are not automatically zeroed. This results in the
presence of uninitialised and invalid data, causing crashes elsewhere
in the UEFI firmware.
Fix by explicitly zeroing the newly allocated portion of any IFR
structure in efi_ifr_op().
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Debugged-by: Gary Lin <glin@suse.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SNP device path includes the network device's MAC address within
the MAC_ADDR_DEVICE_PATH.MacAddress field. We check that the
link-layer address will fit within this field, and then perform the
copy using the length of the destination buffer.
At 32 bytes, the MacAddress field is actually larger than the current
maximum iPXE link-layer address. The copy therefore overflows the
source buffer, resulting in trailing garbage bytes being appended to
the device path's MacAddress. This is invisible in debug messages,
since the DevicePathToText protocol will render only the length
implied by the interface type.
Fix by copying only the actual length of the link-layer address (which
we have already verified will not overflow the destination buffer).
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE debug logging doesn't support %u. This commit replaces it with
%d in virtio-pci debug format strings.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the regions may end up being unmapped, either because they are
optional or because the attempt to map them has failed. Region types
starting at 0 didn't make it easy to test for this condition.
This commit bumps all valid region types up by 1 with 0 having the
implicit 'unmapped' meaning.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a mechanism to allow an arbitrary adjustment to be applied to
all subsequent calls to time().
Note that the underlying clock source (e.g. the RTC clock) will not be
changed; only the time as reported within iPXE will be affected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ARM64 has a weaker memory order model than x86. The missing memory
barrier caused phy initialization notification to be delayed beyond
the link-wait timeout (15 secs).
Signed-off-by: Leendert van Doorn <leendert@paramecium.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some circumstances, intermediate devices may lose state in a way
that temporarily prevents the successful delivery of packets from a
TCP peer. For example, a firewall may drop a NAT forwarding table
entry.
Since iPXE spends most of its time downloading files (and hence purely
receiving data, sending only TCP ACKs), this can easily happen in a
situation in which there is no reason for iPXE's TCP stack to generate
any retransmissions. The temporary loss of connectivity can therefore
effectively become permanent.
Work around this problem by sending TCP keepalives after a period of
inactivity on an established connection.
TCP keepalives usually send a single garbage byte in sequence number
space that has already been ACKed by the peer. Since we do not need
to elicit a response from the peer, we instead send pure ACKs (with no
garbage data) in order to keep the transmit code path simple.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The non-cryptographic RNG implemented by random() has the property
that a seed value of zero will result in a generated sequence of
all-zero values. This situation can arise if currticks() returns zero
at start of day.
Work around this problem by falling back to a fixed non-zero seed if
necessary.
This has no effect on the separate DRBG used by cryptographic code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix build error with perl >= 5.23.2:
Can't redeclare "my" in "my" at ./util/parserom.pl line 160
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mac OS X uses non-standard EFI protocols to obtain the DHCP packets
from the UEFI firmware.
Originally-implemented-by: Michael Kuron <m.kuron@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There has been a longstanding disagreement between RFC4578 and the
IANA "Processor Architecture Types" registry. RFC4578 section 2.1
defines type 7 as "EFI BC" and type 9 as "EFI x86-64"; the IANA
registry quotes RFC4578 as its source but has these values erroneously
swapped. The EDK2 codebase uses the IANA values.
As of March 2016, RFC4578 has been modified by an errata to match the
values as recorded in the IANA registry.
Fix our definitions to match the consensus values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI keyboard drivers are blissfully unaware of the existence of
either Ctrl key, and will report "Ctrl-<key>" as just "<key>". This
breaks substantial portions of the iPXE user interface.
Work around these broken UEFI drivers by allowing "ESC <key>" to be
used as a substitute for "Ctrl-<key>".
Tested-by: Dreamcat4 <dreamcat4@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some HTTP/2 servers send the header "Connection: upgrade, close". This
currently causes iPXE to fail due to the unrecognised "upgrade" token.
Fix by ignoring any unrecognised tokens in the "Connection" header.
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This backport is from linux kernel upstream commit 83d6f1f ("ath9k:
fix buffer overrun for ar9287").
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The raw cycle counter at PMCCNTR_EL0 works in qemu but seems to always
read as zero on physical hardware (tested on Juno r1 and Cavium
ThunderX), even after ensuring that PMCR_EL0.E and PMCNTENSET_EL0.C
are both enabled.
Use CNTVCT_EL0 instead; this seems to count at a lower resolution
(tens of CPU cycles), but is usable for profiling.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification requires the EFI_SIMPLE_NETWORK_PROTOCOL
GetStatus() method to set TxBuf to NULL if there are no transmit
buffers to recycle.
Some implementations (observed with Lan9118Dxe in EDK2) fill in TxBuf
only when there is a transmit buffer to recycle, which leads to large
numbers of "spurious TX completion" errors.
Work around this problem by initialising TxBuf to NULL before calling
the GetStatus() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Do not assume that an architecture-specific optimised memcpy() will
have the same properties as generic_memcpy() in terms of handling
overlapping regions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building for 64-bit ARM, some symbol references may be resolved
via an "adrp" instruction (to obtain the start of the 4kB page
containing the symbol) and a separate 12-bit offset. For example
(taken from the GNU assembler documentation):
adrp x0, foo
ldr x0, [x0, #:lo12:foo]
We occasionally refer to symbols defined via mechanisms that are not
directly visible to gcc. For example:
extern char some_magic_symbol[];
__asm__ ( ".equ some_magic_symbol, some_magic_expression" );
The subsequent use of the ":lo12:" prefix on such magically-defined
symbols triggers an assertion failure in the assembler.
This problem seems to affect only "private_key_len" in the current
codebase. Fix by storing this value as static data; this avoids the
need to provide the value as a literal within the instruction stream,
and so avoids the problematic use of the ":lo12:" prefix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the EFI_CPU_ARCH_PROTOCOL's GetTimerValue() method to
generate the currticks() timer, calibrated against a 1ms delay from
the boot services Stall() method.
This does not work on ARM platforms, where GetTimerValue() is an empty
stub which just returns EFI_UNSUPPORTED.
Fix by instead creating a periodic timer event, and using this event
to increment a current tick counter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Require architecture-specific code to make a deliberate choice to use
the unoptimised generic_tcpip_continue_chksum() function, if there is
no optimised version available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The dependency on zlib seems to have been introduced in commit 3dd7ce1
("[efi] Allow building with non-system libbfd") as an indirect
requirement of either libbfd or libiberty when building on Mac OS X.
Since we no longer use either of these libraries, remove the
unnecessary link against zlib.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Parse the intermediate ELF file directly instead of using libbfd, in
order to allow for cross-compiled ELF objects.
As a side bonus, this eliminates libbfd as a build requirement.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The IBM Tivoli Provisioning Manager for OS Deployment (also known as
TPMfOSD, Rembo-ia32, or Rembo Auto-Deploy) has a serious bug in some
older versions (observed with v5.1.1.0, apparently fixed by v7.1.1.0)
which can lead to arbitrary data corruption.
As mentioned in commit 87723a0 ("[libflat] Test A20 gate without
switching to flat real mode"), Tivoli's NBP sets up a VMM and makes
calls to the PXE stack in VM86 mode. This appears to be some kind of
attempt to run PXE API calls inside a sandbox. The VMM is fairly
sophisticated: for example, it handles our attempts to switch into
protected mode and patches our GDT so that our protected-mode code
runs in ring 1 instead of ring 0. However, it neglects to apply any
memory protections. In particular, it does not enable paging and
leaves us with 4GB segment limits. We can therefore trivially break
out of the sandbox by simply overwriting the GDT (or by modifying any
of Tivoli's VMM code or data structures).
When we attempt to execute privileged instructions (such as "lidt"),
the CPU raises an exception and control is passed to the Tivoli VMM.
This may result in a call to Tivoli's memcpy() function.
Tivoli's memcpy() function includes optimisations which use the SSE
registers %xmm0-%xmm3 to speed up aligned memory copies.
Unfortunately, the Tivoli VMM's exception handler does not save or
restore %xmm0-%xmm3. The net effect of this bug in the Tivoli VMM is
that any privileged instruction (such as "lidt") issued by iPXE may
result in unexpected corruption of the %xmm0-%xmm3 registers.
Even more unfortunately, this problem affects the code path taken in
response to a hardware interrupt from the NIC, since that code path
will call PXENV_UNDI_ISR. The net effect therefore becomes that any
NIC hardware interrupt (e.g. due to a received packet) may result in
unexpected corruption of the %xmm0-%xmm3 registers.
If a packet arrives while Tivoli is in the middle of using its
memcpy() function, then the unexpected corruption of the %xmm0-%xmm3
registers will result in unexpected corruption in the destination
buffer. The net effect therefore becomes that any received packet may
result in a 16-byte block of corruption somewhere in any data that
Tivoli copied using its memcpy() function.
We can work around this bug in the Tivoli VMM by saving and restoring
the %xmm0-%xmm3 registers across calls to virt_call(). To work around
the problem, we need to save registers before attempting to execute
any privileged instructions, and ensure that we attempt no further
privileged instructions after restoring the registers.
This is less simple than it may sound. We can use the "movups"
instruction to save and restore individual registers, but this will
itself generate an undefined opcode exception if SSE is not currently
enabled according to the flags in %cr0 and %cr4. We can't access %cr0
or %cr4 before attempting the "movups" instruction, because access a
control register is itself a privileged instruction (which may
therefore trigger corruption of the registers that we're trying to
save).
The best solution seems to be to use the "fxsave" and "fxrstor"
instructions. If SSE is not enabled, then these instructions may fail
to save and restore the SSE register contents, but will not generate
an undefined opcode exception. (If SSE is not enabled, then we don't
really care about preserving the SSE register contents anyway.)
The use of "fxsave" and "fxrstor" introduces an implicit assumption
that the CPU supports SSE instructions (even though we make no
assumption about whether or not SSE is currently enabled). SSE was
introduced in 1999 with the Pentium III (and added by AMD in 2001),
and is an architectural requirement for x86_64. Experimentation with
current versions of gcc suggest that it may generate SSE instructions
even when using "-m32", unless an explicit "-march=i386" or "-mno-sse"
is used to inhibit this. It therefore seems reasonable to assume that
SSE will be supported on any hardware that might realistically be used
with new iPXE builds.
As a side benefit of this change, the MMX register %mm0 will now be
preserved across virt_call() even in an i386 build of iPXE using a
driver that requires readq()/writeq(), and the SSE registers
%xmm0-%xmm5 will now be preserved across virt_call() even in an x86_64
build of iPXE using the Hyper-V netvsc driver.
Experimentation suggests that this change adds around 10% to the
number of cycles required for a do-nothing virt_call(), most of which
are due to the extra bytes copied using "rep movsb". Since the number
of bytes copied is a compile-time constant local to librm.S, we could
potentially reduce this impact by ensuring that we always copy a whole
number of dwords and so can use "rep movsl" instead of "rep movsb".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 86f96a4 ("[tg3] Remove x86-specific inline assembly")
introduced a regression in _tg3_flag() in 64-bit builds, since any
flags in the upper 32 bits of a 64-bit unsigned long would be
discarded when truncating to a 32-bit int.
Debugged-by: Shane Thompson <shane.thompson@aeontech.com.au>
Tested-by: Shane Thompson <shane.thompson@aeontech.com.au>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some PXE NBPs are known to make PXE API calls with very little space
available on the real-mode stack. For example, the Rembo-ia32 NBP
from some versions of IBM's Tivoli Provisioning Manager for Operating
System Deployment (TPMfOSD) will issue calls with the real-mode stack
placed at 0000:03d2; this is at the end of the interrupt vector table
and leaves only 498 bytes of stack space available before overwriting
the hardware IRQ vectors. This limits the amount of state that we can
preserve before transitioning to protected mode.
Work around these challenging conditions by preserving everything
other than the initial register dump in a temporary static buffer
within our real-mode data segment, and copying the contents of this
buffer to the protected-mode stack.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Return success (rather than failure) after an image format has been
correctly identified.
This has no practical effect, since the return value from
image_probe() is deliberately never used, but avoids a somewhat
surprising and misleading "format not recognised" error message when
debugging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some architectures (such as ARM), gcc will insert implicit calls to
memset(). Handle these using the same mechanism as for the implicit
calls to memcpy() used by x86.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a build configuration option NET_PROTO_LACP to control whether or
not LACP support is included for Ethernet devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit makes virtio-net support devices with VEN 0x1af4 and DEV
0x1041, which is how non-transitional (modern-only) virtio-net devices
are exposed on the PCI bus.
Transitional devices supporting both the old 0.9.5 and new 1.0 version
of the virtio spec are driven using the new protocol. Legacy devices
are driven using the old protocol, same as before this commit.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit adds support for driving virtio 1.0 PCI devices. In
addition to various helpers, a number of vpm_ functions are introduced
to be used instead of their legacy vp_ counterparts when accessing
virtio 1.0 (aka modern) devices.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 1.0 introduces new constants and data structures, common to all
devices as well as specific to virtio-net. This commit adds a subset
of these to be able to drive the virtio-net 1.0 network device.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI devices may support more capabilities of the same type (for
example PCI_CAP_ID_VNDR) and there was no way to discover all of them.
This commit adds a new API pci_find_next_capability which provides
this functionality. It would typically be used like so:
for (pos = pci_find_capability(pci, PCI_CAP_ID_VNDR);
pos > 0;
pos = pci_find_next_capability(pci, pos, PCI_CAP_ID_VNDR)) {
...
}
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_HII_CONFIG_ACCESS_PROTOCOL's ExtractConfig() method is passed
a request string which includes the parameters being queried plus an
apparently meaningless blob of information (the ConfigHdr), and is
expected to include this same meaningless blob of information in the
results string.
Neither the specification nor the existing EDK2 code (including the
nominal reference implementation in the DriverSampleDxe driver)
provide any reason for the existence of this meaningless blob of
information. It appears to be consumed in its entirety by the
EFI_HII_CONFIG_ROUTING_PROTOCOL, and to contain zero bits of
information by the time it reaches an EFI_HII_CONFIG_ACCESS_PROTOCOL
instance. It would potentially allow for multiple configuration data
sets to be handled by a single EFI_HII_CONFIG_ACCESS_PROTOCOL
instance, in a style alien to the rest of the UEFI specification
(which implicitly assumes that the instance pointer is always
sufficient to uniquely identify the instance).
iPXE currently handles this by simply copying the ConfigHdr from the
request string to the results string, and otherwise ignoring it. This
approach is also used by some code in EDK2, such as OVMF's PlatformDxe
driver.
As of EDK2 commit 8a45f80 ("MdeModulePkg: Make HII configuration
settings available to OS runtime"), this causes an assertion failure
inside EDK2. The failure arises when iPXE is handled a NULL request
string, and responds (as per the specification) with a results string
including all settings. Since there is no meaningless blob to copy
from the request string, there is no corresponding meaningless blob in
the results string. This now causes an assertion failure in
HiiDatabaseDxe's HiiConfigRoutingExportConfig().
The same failure does not affect the OVMF PlatformDxe driver, which
simply passes the request string to the HII BlockToConfig() utility
function. The BlockToConfig() function returns EFI_INVALID_PARAMETER
when passed a null request string, and PlatformDxe propagates this
error directly to the caller.
Fix by matching the behaviour of OVMF's PlatformDxe driver: explicitly
return EFI_INVALID_PARAMETER if the request string is NULL or empty.
This violates the specification (insofar as it is feasible to
determine what the specification actually requires), but causes
correct behaviour with the EDK2 codebase.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existing code intends to print NULL strings as "<NULL>" (for the
sake of debug messages), but the logic is incorrect when handling
wide-character strings. Fix the logic and add applicable unit tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no way for the hardware to give us an invalid length in the
LRH, since it must have parsed this length field in order to perform
header splitting. However, this is difficult to prove conclusively.
Add an unnecessary length check to explicitly reject any packets
larger than the posted receive I/O buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no way for the hardware to give us an invalid length in the
LRH, since it must have parsed this length field in order to perform
header splitting. However, this is difficult to prove conclusively.
Add an unnecessary length check to explicitly reject any packets
larger than the posted receive I/O buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is possible for the preloaded UNDI device to end up with no
specified bus type, since it may not be recognised as either a PCI or
an ISAPnP device. This will result in a bus type value of zero, which
currently results in NULL being treated as a string pointer by
netdev_fetch_bustype().
Fix by returning ENOENT if an unknown bus type is specified.
Reported-by: Todd Stansell <todd@stansell.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a build option CROSSCERT in config/crypto.h to allow the
default cross-signed certificate source to be configured at build
time. The ${crosscert} setting may still be used to reconfigure the
cross-signed certificate source at runtime.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc complain that "'__bswap_variable_32' is static
but used in inline function 'golan_check_rc_and_cmd_status' which is
not static".
Fix by making golan_check_rc_and_cmd_status() a static inline.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some end-user configurations have been observed in which the first NBP
(such as GRUB2) uses the UNDI API and then transfers control to a
second NBP (such as pxelinux) which uses the UDP API. The first NBP
closes the network device using PXENV_UNDI_CLOSE, which renders the
UDP API unable to transmit or receive packets.
The correct behaviour under these circumstances is (as often) simply
not documented by the PXE specification. Testing with the Intel PXE
stack suggests that PXENV_UDP_OPEN will implicitly reopen the network
device if necessary, so match this behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DHCP option 175.189 has been defined (by us) since 2006 as
containing the drive number to be used for a SAN boot, but has never
been automatically used as such by iPXE.
Use this option (if specified) to override the default SAN drive
number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Interpret the maximum drive number (0xff for hard disks, 0x7f for
floppy disks) as meaning "use natural drive number".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The mbr.bin and usbdisk.bin standalone blobs are currently generated
using "objcopy -O binary", which does not process relocation records.
For the i386 build, this does not matter since the section start
address is zero and so the ".rel" relocation records are effectively
no-ops anyway.
For the x86_64 build, the ".rela" relocation records are not no-ops,
since the addend is included as part of the relocation record (rather
than inline). Using "objcopy -O binary" will silently discard the
relocation records, with the result that all symbols are effectively
given a value of zero.
Fix by using "ld --oformat binary" instead of "objcopy -O binary" to
generate mbr.bin and usbdisk.bin.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Infiniband specification (volume 1, section 11.4.1.2 "Post Receive
Request") notes that for UD QPs, the GRH will be placed in the first
40 bytes of the receive buffer if present. (If no GRH is present,
which is normal, then the first 40 bytes of the receive buffer will be
unused.)
Mellanox hardware performs this placement automatically: other headers
will be stripped (and their values returned via the CQE), but the
first 40 bytes of the data buffer will be consumed by the (probably
non-existent) GRH.
This does not fit neatly into iPXE's internal abstraction, which
expects the data buffer to represent just the data payload with the
addresses from the GRH (if present) passed as additional parameters to
ib_complete_recv().
The end result of this discrepancy is that attempts to receive
full-sized 2048-byte IPoIB packets on Mellanox hardware will fail.
Fix by allocating a separate ring buffer to hold the received GRHs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The intention of the existing code (as documented in its own comments)
is that it should be possible to override the list of trusted root
certificates using a "trust" setting held in non-volatile stored
options. However, the rootcert_init() function currently executes
before any devices have been probed, and so will not be able to
retrieve any such non-volatile stored options.
Fix by executing rootcert_init() only after devices have been probed.
Since startup functions may be executed multiple times (unlike
initialisation functions), add an explicit flag to preserve the
property that rootcert_init() should run only once.
As before, if an explicit root of trust is specified at build time,
then any runtime "trust" setting will be ignored.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide access to local files via the "file://" URI scheme. There are
three syntaxes:
- An opaque URI with a relative path (e.g. "file:script.ipxe").
This will be interpreted as a path relative to the iPXE binary.
- A hierarchical URI with a non-network absolute path
(e.g. "file:/boot/script.ipxe"). This will be interpreted as a
path relative to the root of the filesystem from which the iPXE
binary was loaded.
- A hierarchical URI with a network path in which the authority is a
volume label (e.g. "file://bootdisk/script.ipxe"). This will be
interpreted as a path relative to the root of the filesystem with
the specified volume label.
Note that the potentially desirable shell mappings (e.g. "fs0:" and
"blk0:") are concepts internal to the UEFI shell binary, and do not
seem to be exposed in any way to external executables. The old
EFI_SHELL_PROTOCOL (which did provide access to these mappings) is no
longer installed by current versions of the UEFI shell.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some architectures (such as ARM) the "@" character is used as a
comment delimiter. A section type argument such as "@progbits"
therefore becomes "%progbits".
This is further complicated by the fact that the "%" character has
special meaning for inline assembly when input or output operands are
used, in which cases "@progbits" becomes "%%progbits".
Allow the section type character(s) to be defined via Makefile
variables.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This driver is the original source of the current readq() and writeq()
implementations for 32-bit iPXE. Switch to using the now-centralised
definitions, to avoid including architecture-specific code in an
otherwise architecture-independent driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 196f0f2 ("[librm] Convert prot_call() to a real-mode near
call") introduced a regression in which any deliberate modification to
the low 16 bits of the CPU flags (in struct i386_all_regs) would be
overwritten with the original flags value at the time of entry to
prot_call().
The regression arose because the alignment requirements of the
protected-mode stack necessitated the insertion of two bytes of
padding immediately below the prot_call() return address. The
solution chosen was to extend the existing "pushfl / popfl" pair to
"pushfw;pushfl / popfl;popfw". The extra "pushfw / popfw" appears at
first glance to be a no-op, but fails to take into account the fact
that the flags restored by popfl may have been deliberately modified
by the protected-mode function.
Fix by replacing "pushfw / popfw" with "pushw %ss / popw %ss". While
%ss does appear within struct i386_all_regs, any modification to the
stored value has always been ignored by prot_call() anyway.
The most visible symptom of this regression was that SAN booting would
fail since every INT 13 call would be chained to the original INT 13
vector.
Reported-by: Vishvananda Ishaya <vishvananda@gmail.com>
Reported-by: Jamie Thompson <forum.ipxe@jamie-thompson.co.uk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no practical way to generate an underlength ARP packet since
an ARP packet is always padded up to the minimum Ethernet frame length
(or dropped by the receiving Ethernet hardware if incorrectly padded),
but the absence of an explicit check causes warnings from some
analysis tools.
Fix by adding an explicit check on the I/O buffer length.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The assumption in asn1_type() that an ASN.1 cursor will always contain
a type byte is incorrect. A cursor that has been cleanly invalidated
via asn1_invalidate_cursor() will contain a type byte, but there are
other ways in which to arrive at a zero-length cursor.
Fix by explicitly checking the cursor length in asn1_type(). This
allows asn1_invalidate_cursor() to be reduced to simply zeroing the
length field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many TLS records contain variable-length fields. We currently
validate the overall record length, but do so only after reading the
length of the variable-length field. If the record is too short to
even contain the length field, then we may read uninitialised data
from beyond the end of the record.
This is harmless in practice (since the subsequent overall record
length check would fail regardless of the value read from the
uninitialised length field), but causes warnings from some analysis
tools.
Fix by validating that the overall record length is sufficient to
contain the length field before reading from the length field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several UEFI platforms are known to return EFI_NOT_FOUND when asked to
retrieve the system default font information via GetFontInfo(). Work
around these broken platforms by iterating over the glyphs to find the
maximum height used by a printable character.
Originally-fixed-by: Jonathan Dieter <jdieter@lesbg.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations utilise an EoIB-to-Ethernet gateway device
that does not perform a FullMember join to the multicast group for the
EoIB broadcast domain. This has various exciting side-effects, such
as requiring every EoIB node to send every broadcast packet twice.
As an added bonus, the gateway may also break the EoIB MAC address to
GID mapping protocol by sending Ethernet-sourced packets from the
wrong QPN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations require each individual EoIB node to create
the multicast group for the EoIB broadcast domain.
It is left as an exercise for the interested reader to determine how
such an implementation might ever allow the parameters of such a
multicast group to be changed without requiring a simultaneous upgrade
of every driver on every operating system on every machine currently
attached to the fabric.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations transmit a vendor-proprietary heartbeat
packet on the same multicast group used to provide the EoIB broadcast
domain.
Silently ignore these heartbeat packets, to avoid cluttering up the
network interface error statistics.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EoIB is a fairly simple protocol in which raw Ethernet frames
(excluding the CRC) are encapsulated within Infiniband Unreliable
Datagrams, with a four-byte fixed EoIB header (which conveys no actual
information). The Ethernet broadcast domain is provided by a
multicast group, similar to the IPoIB IPv4 multicast group.
The mapping from Ethernet MAC addresses to Infiniband address vectors
is achieved by snooping incoming traffic and building a peer cache
which can then be used to map a MAC address into a port GID. The
address vector is completed using a path record lookup, as for IPoIB.
Note that this requires every packet to include a GRH.
Add basic support for EoIB devices. This driver is substantially
derived from the IPoIB driver. There is currently no mechanism for
automatically creating EoIB devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a build configuration option VNIC_IPOIB to control whether or not
IPoIB support is included for Infiniband devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit e62e52b ("[ipoib] Simplify test for received broadcast
packets") relies upon the multicast LID being present in the
destination address vector as passed to ipoib_complete_recv().
Unfortunately, this information is not present in many Infiniband
devices' completion queue entries.
Fix by testing instead for the presence of a multicast GID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running the 64-bit BIOS version of iPXE, restrict external memory
allocations to the low 4GB to ensure that allocations (such as for
initrds) fall within our identity-mapped memory region, and will be
accessible to the potentially 32-bit operating system.
Move largest_memblock() back to memtop_umalloc.c, since this change
imposes a restriction that applies only to BIOS builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a CMRC connection is closed, the deferred shutdown process calls
ib_destroy_qp(). This will cause the receive work queue entries to
complete in error (since they are being cancelled), which will in turn
reschedule the deferred shutdown process. This eventually leads to
ib_destroy_conn() being called on a connection that has already been
freed.
Fix by explicitly cancelling any pending shutdown process after the
shutdown process has completed.
Ironically, this almost exactly reverts commit 019d4c1 ("[infiniband]
Use a one-shot process for CMRC shutdown"); prior to the introduction
of one-shot processes the only way to achieve a one-shot process was
for the process to cancel itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for running the BIOS version of iPXE in 64-bit long mode.
A 64-bit BIOS version of iPXE can be built using e.g.
make bin-x86_64-pcbios/ipxe.usb
make bin-x86_64-pcbios/8086100e.mrom
The 64-bit BIOS version should appear to function identically to the
normal 32-bit BIOS version. The physical memory layout is unaltered:
iPXE is still relocated to the top of the available 32-bit address
space. The code is linked to a virtual address of 0xffffffffeb000000
(in the negative 2GB as required by -mcmodel=kernel), with 4kB pages
created to cover the whole of .textdata. 2MB pages are created to
cover the whole of the 32-bit address space.
The 32-bit portions of the code run with VIRTUAL_CS and VIRTUAL_DS
configured such that truncating a 64-bit virtual address gives a
32-bit virtual address pointing to the same physical location.
The stack pointer remains as a physical address when running in long
mode (although the .stack section is accessible via the negative 2GB
virtual address); this is done in order to simplify the handling of
interrupts occurring while executing a portion of 32-bit code with
flat physical addressing via PHYS_CODE().
Interrupts may be enabled in either 64-bit long mode, 32-bit protected
mode with virtual addresses, 32-bit protected mode with physical
addresses, or 16-bit real mode. Interrupts occurring in any mode
other than real mode will be reflected down to real mode and handled
by whichever ISR is hooked into the BIOS interrupt vector table.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a 64-bit build, the entirety of the 32-bit address space is
identity-mapped and so any valid physical address may immediately be
used as a virtual address. Conversely, a virtual address that is
already within the 32-bit address space may immediately be used as a
physical address.
A valid virtual address that lies outside the 32-bit address space
must be an address within .textdata, and so can be converted to a
physical address by adding virt_offset.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical locations of .textdata, .text16 and .data16 are constant
from the point of view of C code. Mark the relevant variables as
constant to allow gcc to optimise out redundant reads.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
No callers of prot_to_phys, phys_to_prot, or intr_to_prot require the
flags to be preserved. Remove the unnecessary pushfl/popfl pairs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a phys_call() wrapper function (analogous to the existing
real_call() wrapper function) for calling code with flat physical
addressing, and use this wrapper within the PHYS_CODE() macro.
Move the relevant functionality inside librm.S, where it more
naturally belongs.
The COMBOOT code currently uses explicit calls to _virt_to_phys and
_phys_to_virt. These will need to be rewritten if our COMBOOT support
is ever generalised to be able to run in a 64-bit build.
Specifically:
- com32_exec_loop() should be restructured to use PHYS_CODE()
- com32_wrapper.S should be restructured to use an equivalent of
prot_call(), passing parameters via a struct i386_all_regs
- there appears to be no need for com32_wrapper.S to switch between
external and internal stacks; this could be omitted to simplify
the design.
For now, librm.S continues to expose _virt_to_phys and _phys_to_virt
for use by com32.c and com32_wrapper.S. Similarly, librm.S continues
to expose _intr_to_virt for use by gdbidt.S.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of binutils have issues with both the use of
PROVIDE() and the interpretation of numeric literals within a section
description.
Work around these older versions by defining the required numeric
literals outside of any section description, and by automatically
determining whether or not to generate extra space for page tables
rather than relying on LDFLAGS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The bulk of the iPXE binary (the .textdata section) is physically
relocated at runtime to the top of the 32-bit address space in order
to allow space for an OS to be loaded. The relocation is achieved
with the assistance of segmentation: we adjust the code and data
segment bases so that the link-time addresses remain valid.
Segmentation is not available (for normal code and data segments) in
long mode. We choose to compile the C code with -mcmodel=kernel and
use a link-time address of 0xffffffffeb000000. This choice allows us
to identity-map the entirety of the 32-bit address space, and to alias
our chosen link-time address to the physical location of our .textdata
section. (This requires the .textdata section to always be aligned to
a page boundary.)
We simultaneously choose to set the 32-bit virtual address segment
bases such that the link-time addresses may simply be truncated to 32
bits in order to generate a valid 32-bit virtual address. This allows
symbols in .textdata to be trivially accessed by both 32-bit and
64-bit code.
There is no (sensible) way in 32-bit assembly code to generate the
required R_X86_64_32S relocation records for these truncated symbols.
However, subtracting the fixed constant 0xffffffff00000000 has the
same effect as truncation, and can be represented in a standard
R_X86_64_32 relocation record. We define the VIRTUAL() macro to
abstract away this truncation operation, and apply it to all
references by 32-bit (or 16-bit) assembly code to any symbols within
the .textdata section.
We define "virt_offset" for a 64-bit build as "the value to be added
to an address within .textdata in order to obtain its physical
address". With this definition, the low 32 bits of "virt_offset" can
be treated by 32-bit code as functionally equivalent to "virt_offset"
in a 32-bit build.
We define "text16" and "data16" for a 64-bit build as the physical
addresses of the .text16 and .data16 sections. Since a physical
address within the 32-bit address space may be used directly as a
64-bit virtual address (thanks to the identity map), this definition
provides the most natural access to variables in .text16 and .data16.
Note that this requires a minor adjustment in prot_to_real(), which
accesses .text16 using 32-bit virtual addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Long-mode operation will require page tables, which are too large to
sensibly fit in our .data16 segment in base memory.
Add a portion of init_librm() running in 32-bit protected mode to
provide access to high memory. Use this portion of init_librm() to
initialise the .textdata variables "virt_offset", "text16", and
"data16", eliminating the redundant (re)initialisation currently
performed on every mode transition as part of real_to_prot().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the standard "pushl $function ; pushw %cs ; call prot_call"
sequence everywhere that prot_call() is used.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a 64-bit CPU, any modification of a register by 32-bit or 16-bit
code will destroy the invisible upper 32 bits of the corresponding
64-bit register. For example: a 32-bit "pushl %eax" followed by a
"popl %eax" will zero the upper half of %rax. This differs from the
treatment of upper halves of 32-bit registers by 16-bit code: a
"pushw %ax" followed by a "popw %ax" will leave the upper 16 bits of
%eax unmodified.
Inline assembly generated using REAL_CODE() or PHYS_CODE() will
therefore have to preserve the upper halves of all registers, to avoid
clobbering registers that gcc expects to be preserved.
Output operands from REAL_CODE() and PHYS_CODE() assembly may
therefore contain undefined values in the upper 32 bits.
Fix by using explicit variable widths (e.g. uint32_t) for
non-discarded output operands, to ensure that undefined values in the
upper 32 bits of 64-bit registers are ignored.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move most arch/i386 files to arch/x86, and adjust the contents of the
Makefiles and the include/bits/*.h headers to reflect the new
locations.
This patch makes no substantive code changes, as can be seen using a
rename-aware diff (e.g. "git show -M5").
This patch does not make the pcbios platform functional for x86_64; it
merely allows it to compile without errors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit c64747d ("[librm] Speed up real-to-protected mode transition
under KVM") rounded down the .text16 segment address calculated in
alloc_basemem() to a multiple of 64 bytes in order to speed up mode
transitions under KVM.
This creates a potential discrepancy between alloc_basemem() and
free_basemem(), meaning that free_basemem() may free less memory than
was allocated by alloc_basemem().
Fix by padding the calculated sizes of both .text16 and .data16 to a
multiple of 64 bytes at build time.
Debugged-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Guard against various corner cases (such as zero-length buffers, zero
alignments, and integer overflow when rounding up allocation lengths
and alignments) and ensure that the struct io_buffer is correctly
aligned even when the caller requests a non-zero alignment for the I/O
buffer itself.
Add self-tests to verify that the resulting alignments and lengths are
correct for a range of allocations.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit f3fbb5f ("[malloc] Avoid integer overflow for excessively large
memory allocations") fixed signed integer overflow issues caused by
the use of ssize_t, but did not guard against unsigned integer
overflow.
Add explicit checks for unsigned integer overflow where needed. As a
side bonus, erroneous calls to malloc_dma() with an (illegal) size of
zero will now fail cleanly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ath_rx_init() demonstrates some serious confusion over how to use
pointers, resulting in (uint32_t*)NULL being used as a temporary
variable. This does not end well.
The broken code in question is performing manual alignment of I/O
buffers, which can now be achieved more simply using alloc_iob_raw().
Fix by removing ath_rxbuf_alloc() entirely.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The various early-exit paths in parse_uri() accidentally bypass the
URI field decoding. The result is that opaque or relative URIs do not
undergo URI field decoding, resulting in double-encoding when the URIs
are subsequently used. For example:
#!ipxe
set mac ${macstring}
imgfetch /boot/by-mac/${mac:uristring}
would result in an HTTP GET such as
GET /boot/by-mac/00%253A0c%253A29%253Ac5%253A39%253Aa1 HTTP/1.1
rather than the expected
GET /boot/by-mac/00%3A0c%3A29%3Ac5%3A39%3Aa1 HTTP/1.1
Fix by ensuring that URI decoding is always applied regardless of the
URI format.
Reported-by: Andrew Widdersheim <awiddersheim@inetu.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TFTP URIs are intrinsically problematic, since:
- TFTP servers may use either normal slashes or backslashes as a
directory separator,
- TFTP servers allow filenames to be specified using relative paths
(with no initial directory separator),
- TFTP filenames present in a DHCP filename field may use special
characters such as "?" or "#" that prevent parsing as a generic URI.
As of commit 7667536 ("[uri] Refactor URI parsing and formatting"), we
have directly constructed TFTP URIs from DHCP next-server and filename
pairs, avoiding the generic URI parser. This eliminated the problems
related to special characters, but indirectly made it impossible to
parse a "tftp://..." URI string into a TFTP URI with a non-absolute
path.
Re-introduce the convention of requiring an extra slash in a
"tftp://..." URI string in order to specify a TFTP URI with an initial
slash in the filename. For example:
tftp://192.168.0.1/boot/pxelinux.0 => RRQ "boot/pxelinux.0"
tftp://192.168.0.1//boot/pxelinux.0 => RRQ "/boot/pxelinux.0"
This is ugly, but there seems to be no other sensible way to provide
the ability to specify all possible TFTP filenames.
A side-effect of this change is that format_uri() will no longer add a
spurious initial "/" when formatting a relative URI string. This
improves the console output when fetching an image specified via a
relative URI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The OCSP responder URI included within an X.509 certificate may or may
not include a trailing slash. We currently rely on the fact that
format_uri() incorrectly inserts an initial slash, which we include
unconditionally within the OCSP request URI.
Switch to using uri_encode() directly, and insert a slash only if the
X.509 certificate's OCSP responder URI does not already include a
trailing slash.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 53d2d9e ("[uri] Generalise tftp_uri() to pxe_uri()") introduced
a regression in which an NFS root path would no longer be treated as
an unsupported root path, causing a boot with an NFS root path to fail
with a "Could not open SAN device" error.
Reported-by: David Evans <dave.evans55@googlemail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some protocols (such as ARP) may modify the received packet and re-use
the same I/O buffer for transmission of a reply. The SMSC95XX
transmit header is larger than the receive header: the re-used I/O
buffer therefore does not have sufficient headroom for the transmit
header, and the ARP reply will therefore fail to be transmitted. This
is essentially the same problem as in commit 2e72d10 ("[ncm] Reserve
headroom in received packets").
Fix by reserving sufficient space at the start of each received packet
to allow for the difference between the lengths of the transmit and
receive headers.
This problem is not caught by the current driver development test
suite (documented at http://ipxe.org/dev/driver), since even the large
file transfer tests tend to completely sufficiently quickly that there
is no need for the server to ever send an ARP request. The failure
shows up only when using a very slow protocol such as RFC7440-enhanced
TFTP (as used by Windows Deployment Services).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LED pins are configured by default as GPIO inputs. While it is
conceivable that a board might actually use these pins as GPIOs, no
such board is known to exist.
The Linux smsc95xx driver configures these pins unconditionally as LED
outputs. Assume that it is safe to do likewise.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the network interface name (e.g. "net0") as a setting. This
allows a script to obtain the name of the most recently opened network
interface via ${netX/ifname}.
Signed-off-by: Andrew Widdersheim <amwiddersheim@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a named CONFIG=cloud configuration, which enables console types
useful for obtaining output from virtual machines in public clouds
such as AWS EC2.
An image suitable for use in AWS EC2 can be built using
make bin/ipxe.usb CONFIG=cloud EMBED=config/cloud/aws.ipxe
The embedded script will direct iPXE to download and execute the EC2
"user-data" file, which is always available to an EC2 VM via the URI
http://169.254.169.254/latest/user-data (regardless of the VPC
networking settings). The boot can therefore be controlled by
modifying the per-instance user data, without having to modify the
boot disk image.
Console output can be obtained via syslog (with a syslog server
configured in the user-data script), via the AWS "System Log" (after
the instance has been stopped), or as a last resort from the log
partition on the boot disk.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The three nominally-disambiguated ENOTSUP errors accidentally all used
the same error disambiguator, rendering them identical. Fix by
changing all three values. We avoid reusing the 0x01 disambiguator
value, since that remains ambiguous in older binaries.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOS console redirection capabilities do not work well with the
colourised debug messages used by iPXE. We already allow the range of
colours to be controlled via the DBGCOL=... build parameter. Extend
this syntax to allow DBGCOL=0 to be used to mean "disable colours".
Requested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a debug function check_bios_interrupts() to look for changes
to the interrupt vector table. This can be useful when investigating
the behaviour (including crashes) of external PXE NBPs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For historical reasons, iPXE sets the current working URI to the root
of the TFTP server whenever the TFTP server address is changed. This
was originally implemented in the hope of allowing a DHCP-provided
TFTP filename to be treated simply as a relative URI. This usage
turns out to be impractical since DHCP-provided TFTP filenames may
include characters which would have special significance to the URI
parser, and so the DHCP next-server+filename combination is now
handled by the dedicated pxe_uri() function instead.
The practice of setting the current working URI to the root of the
TFTP server is potentially helpful for interactive uses of iPXE,
allowing a user to type e.g.
iPXE> dhcp
Configuring (net0 52:54:00:12:34:56)... ok
iPXE> chain pxelinux.0
and have the URI "pxelinux.0" interpreted as being relative to the
root of the TFTP server provided via DHCP.
The current implementation of tftp_apply_settings() has an unintended
flaw. When the "dhcp" command is used to renew a DHCP lease (or to
pick up potentially modified DHCP options), the old settings block
will be unregistered before the new settings block is registered.
This causes tftp_apply_settings() to believe that the TFTP server has
been changed twice (to 0.0.0.0 and back again), and so the current
working URI will always be set to the root of the TFTP server, even if
the DHCP response provides exactly the same TFTP server as previously.
Fix by doing nothing in tftp_apply_settings() whenever there is no
TFTP server address.
Debugged-by: Andrew Widdersheim <awiddersheim@inetu.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update the image's recorded URI when a download redirection occurs.
This ensures that URIs relative to a redirected download are resolved
correctly.
In particular, this allows for the use of relative URIs in scripts
that are themselves downloaded via a redirection, such as the HTTP 301
redirection used to fix up URIs pointing to directories but omitting
the trailing slash (e.g. "http://boot.ipxe.org/demo", which will be
redirected to "http://boot.ipxe.org/demo/").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Resolve redirection URIs as being relative to the original HTTP
request URI, rather than treating them as being implicitly relative to
the current working URI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 5de45cd ("[romprefix] Report a pessimistic runtime size
estimate") set the PCI3.0 "runtime size" field equal to the worst-case
runtime size, on the basis that there is no guarantee that PMM
allocation will succeed and hence no guarantee that we will be able to
shrink the ROM image.
On a PCI3.0 system where PMM allocation would succeed, this can cause
the BIOS to unnecessarily refuse to initialise the iPXE ROM due to a
perceived shortage of option ROM space.
Fix by reporting the best-case runtime size via the PCI header, and
checking that we have sufficient runtime space (if applicable). This
allows iPXE ROMs to initialise on PCI3.0 systems that might otherwise
fail due to a shortage of option ROM space.
This may cause iPXE ROMs to fail to initialise on PCI3.0 systems where
PMM is broken. (Pre-PCI3.0 systems are unaffected since there must
already have been sufficient option ROM space available for the
initialisation entry point to be called.)
On balance, it seems preferable to avoid breaking "good" systems
(PCI3.0 with working PMM) at the cost of potentially breaking "bad"
systems (PCI3.0 with broken PMM).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The GuestRPC mechanism (used for VMWARE_SETTINGS and CONSOLE_VMWARE)
does not use any real-mode code and so can be exposed in both 64-bit
and 32-bit builds.
Reported-by: Matthew Helton <mwhelton@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the use of the iPXE DRBG implementation in BSD-licensed
projects.
Requested-by: Sean Davis <dive@hq.endersgame.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the use of the iPXE DRBG implementation in BSD-licensed
projects.
Requested-by: Sean Davis <dive@hq.endersgame.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SMSC95xx devices tend to be used in embedded systems with a
variety of ad-hoc mechanisms for storing the MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When USB network card drivers are used, the BIOS' legacy USB
capability is necessarily disabled since there is no way to share the
host controller between the BIOS and iPXE.
Commit 3726722 ("[usb] Add basic support for USB keyboards") added
support allowing a USB keyboard to be used within iPXE. However,
external code such as a PXE NBP has no way to utilise this support,
and so a USB keyboard cannot be used to control a PXE NBP loaded from
a USB network card.
Add support for injecting keypresses from any iPXE console into the
BIOS keyboard buffer. This allows external code such as a PXE NBP to
function with a USB keyboard even after the BIOS' legacy USB
capability has been disabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The build system allows for additional drivers (or other objects) to
be specified using build targets such as
make bin/intel--realtek.usb
make bin/8086100e--8086100f.mrom
This currently fails if the base target is the "bin/ipxe.*" all-drivers
target, e.g.
make bin/ipxe--acm.usb
Fix the build target parsing logic to allow additional drivers (or
other objects) to be included on top of the base all-drivers target.
This can be used to include USB network card drivers, which are not
yet included by default in the all-drivers build.
Reported-by: Andrew Sloma <asloma@lenovo.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some xHCI controllers (such as qemu's emulated xHCI controller) do not
correctly handle zero-length packets that are part of a TRB chain.
The zero-length TRB ends up being squashed and does not result in a
zero-length packet as seen by the device.
Work around this problem by marking the zero-length packet as
belonging to a separate transfer descriptor.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some hubs (e.g. the Avocent Corp. Virtual Hub on a Lenovo x3550
Integrated Management Module) have been observed to require more than
the standard 200ms for ports to stabilise, with the result that
devices appear to disconnect and immediately reconnect during the
initial bus enumeration.
Work around this problem by allowing specific hubs an extra 500ms of
settling time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the speed of a USB device based on the port's speed at the time
that the device was enabled. This allows us to remember the device's
speed even after the device has been disconnected (and so the port's
current speed has changed).
In particular, this allows us to correctly identify the transaction
translator for a low-speed or full-speed device after the device has
been disconnected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The usb_message() and usb_stream() functions currently check for
port->speed==USB_SPEED_NONE to determine whether or not a device has
been unplugged. This test will give a false negative result if a new
device has been plugged in before the hotplug mechanism has finished
handling the removal of the old device.
Fix by checking instead the port->disconnected flag, which is now
cleared only after completing the removal of the old device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide BIT_QWORD_PTR() to allow for easy extraction of non-endian
fields (e.g. Infiniband GUIDs) without unnecessary byte swapping.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Tested using QEMU and usbredir to expose the LAN9512 chip present on a
Raspberry Pi.
There is a known issue with the LAN9512: an extra two bytes are
appended to every transmitted packet. These two bytes comprise:
{ 0x00, 0x08 } if packet length == 0 (mod 8)
{ CRC[0], 0x00 } if packet length == 7 (mod 8)
{ CRC[0], CRC[1] } otherwise
The extra bytes are appended whether the Ethernet CRC is generated
manually or added automatically by the hardware. The issue occurs
with the Linux kernel driver as well as the iPXE driver. It appears
to be an undocumented hardware errata.
TCP/IP traffic is not affected, since the IP header length field
causes the extraneous bytes to be discarded by the receiver. However,
protocols that rely on the length of the Ethernet frame (such as FCoE
or iPXE's "lotest" protocol) will be unusable on this hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some models (notably ICH), the PHY reset mechanism appears to be
broken. In particular, the PHY_CTRL register will be correctly loaded
from NVM but the values will not be propagated to the "OEM bits" PHY
register. This typically has the effect of dropping the link speed to
10Mbps.
Since the original version of this driver in commit 945e428 ("[intel]
Replace driver for Intel Gigabit NICs"), we have always worked around
this problem by skipping the PHY reset if the link is already up.
Enhance this workaround by explicitly checking for known-broken PCI
IDs.
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE does not call shutdown() before invoking a COMBOOT executable,
since the executable is allowed to make API calls back into iPXE. If
a background picture is used, then the console will not be restored to
text mode before invoking the COMBOOT executable. This can cause
undefined behaviour.
Fix by adding an explicit call to console_reset() immediately before
invoking a COMBOOT or COM32 executable, analogous to the call made to
console_reset() immediately before invokving a PXE NBP.
Debugged-by: Andrew Widdersheim <awiddersheim@inetu.net>
Tested-by: Andrew Widdersheim <awiddersheim@inetu.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For switches which remain permanently in the non-forwarding state (or
which erroneously report a non-forwarding state), ensure that iPXE
will eventually give up waiting for the link to become unblocked.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If we detect (via STP) that a switch port is in a non-forwarding
state, then the link is marked as being temporarily blocked and DHCP
discovery will be deferred until the link becomes unblocked.
The timer used to decide when to give up waiting for ProxyDHCPOFFERs
is currently based on the time that DHCP discovery was started, and
makes no allowances for any time spent waiting for the link to become
unblocked. Consequently, if STP is used then the timeout for
ProxyDHCPOFFERs becomes essentially zero.
Fix by resetting the recorded start time whenever DHCP discovery is
deferred due to a blocked link.
Debugged-by: Sebastian Roth <sebastian.roth@zoho.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The name "vesafb" is intrinsically specific to a BIOS environment.
Generalise the build configuration option CONSOLE_VESAFB to
CONSOLE_FRAMEBUFFER, in preparation for adding EFI framebuffer
support.
Existing configurations using CONSOLE_VESAFB will continue to work.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid accidentally dereferencing a NULL cipher context pointer for
plaintext blocks (which are usually messages with a block length of
zero, indicating a missing block).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI UNDI is a hideously ugly lump of poorly specified garbage bolted
on as an appendix of the UEFI specification. My personal favourite
line from the UNDI 'specification' is section E.2.2, which states
"Basically, the rule is: Do it right, or don't do it at all". The
author appears to believe that such exhortations are a viable
substitute for documenting what it is that the wretched reader is
supposed to, in fact, do.
(Second favourite is the section listing the pros and cons of various
driver types. This fails to identify a single con for the mythical
"Hardware UNDI", a design so insanely intrinsically slow that it
appears to have been the inspiration for the EFI_USB_IO_PROTOCOL.)
UNDI is functionally isomorphic to the substantially less preposterous
EFI_SIMPLE_NETWORK_PROTOCOL. Provide an UNDI interface (as a thin
wrapper around the existing SNP interface) to allow for use by
third-party software that has made poor life choices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calling EDK2's OpenProtocol() with attributes BY_DRIVER|EXCLUSIVE will
call DisconnectController() in a loop to attempt to dislodge any
existing openers with attributes BY_DRIVER. The loop will continue
indefinitely until either no such openers remain, or until
DisconnectController() returns an error.
If our driver binding protocol's Stop() method is ever called to
disconnect a device that we are not in fact driving, then return
EFI_DEVICE_ERROR rather than EFI_SUCCESS, in order to break this
potentially infinite loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Microsoft PE/COFF specification defines the MajorLinkerVersion and
MinorLinkerVersion fields as "The linker major version number" and
"The linker minor version number" respectively, and has nothing more
to say on the matter. These fields have no significance: they do not
affect the interpretation of the remainder of the file, but merely
provide diagnostic information for interested humans to read.
Apparently, versions 2.4 and earlier of the Microsoft linker produced
binaries so incorrigibly cursed that even to attempt to parse such a
binary would risk summoning a plague of enraged spiders. To protect
users from unwanted arachnids, ImageHlp.dll's MapAndLoad() function
will helpfully fail to map and/or load a 32-bit binary unless the
linker version field indicates version 2.5 or later. (64-bit binaries
are exempt from such helpfulness.)
Work around the broken Microsoft ImageHlp.dll library by providing a
linker version number that will satisfy the arbitrary whims of the
MapAndLoad() function.
This mirrors wimboot commit 670c7e2 ("[efi] Work around broken 32-bit
PE executable parsing in ImageHlp.dll").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use INT 1a,564e to notify the BIOS of each network device that we
detect. This provides an opportunity for the BIOS to implement
platform policy such as changing the MAC address by issuing a call to
PXENV_UNDI_SET_STATION_ADDRESS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Invoke INT 1a,564e whenever a PXE stack is activated, passing the
address of the PXENV+ structure in %es:%bx. This is designed to allow
a BIOS to be notified when a PXE stack has been installed, providing
an opportunity for start-of-day commands such as setting the MAC
address according to a policy chosen by the BIOS.
PXE defines INT 1a,5650 as a means of locating the PXENV+ structure:
this call returns %ax=0x564e and the address of the PXENV+ structure
in %es:%bx. We choose INT 1a,564e as a fairly natural notification
call, using the parameters as would be returned by INT 1a,5650.
The full calling convention (documented as per section 3.1 of the PXE
specification) is:
INT 1a,564e - PXE installation notification
Enter:
%ax = 0x564e
%es = 16-bit segment address of the PXENV+ structure
%bx = 16-bit offset of the PXENV+ structure
Exit:
%edx may be trashed (as is the case for INT 1a,5650)
All other register contents must be preserved
CF is cleared
IF is preserved
All other flags are undefined
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid dragging in unnecessary iPXE header files such as <ipxe/uuid.h>
and <ipxe/tables.h> when building host utilities, and ensure that
FILE_LICENCE() (present in the imported EDK2 headers) expands to a
no-op.
Reported-by: Michael Tautschnig <mt@debian.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 7d36a1b ("[build] Explicitly link efilink against -liberty")
introduced a dependency on libiberty to cope with old versions of
libbfd. This commit dates from 2008 and seems to apply only to what
are now extremely old versions of libbfd (prior to binutils 2.12).
There are systems (such as current Debian) which do not include
libiberty within the binutils packages. On such systems, our build
dependency on libiberty represents a pointless hurdle.
Remove the explicit dependency on libiberty, hoping that there are no
modern systems where this will cause a problem.
Suggested-by: Ben Hildred <42656e@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the UEFI platform firmware to provide drivers for unrecognised
devices, by exposing our own implementation of EFI_USB_IO_PROTOCOL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make the class ID a property of the USB driver (rather than a property
of the USB device ID), and allow USB drivers to specify a wildcard ID
for any of the three component IDs (class, subclass, or protocol).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generate a score for each possible USB device configuration based on
the available driver support, and select the configuration with the
highest score. This will allow us to prefer ECM over RNDIS (for
devices which support both) and will allow us to meaningfully select a
configuration even when we have drivers available for all functions
(e.g. when exposing unused functions via EFI_USB_IO_PROTOCOL).
Signed-off-by: Michael Brown <mcb30@ipxe.org>