This commit introduces virtnet_free_virtqueues called on all virtqueue
error and shutdown paths. vpm_find_vqs no longer cleans up after itself
and instead expects virtnet_free_virtqueues to be always called to undo
its effect.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
vpm_find_vqs incorrectly accepted the host provided queue size with no
regard to iPXE's internal limitations. Virtio 1.0 makes it possible for
the driver to override the queue size to reduce memory requirements and
iPXE is a great use case for this feature.
Also removing the extra vq->vring.num assignment which is already
handled in vring_init.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ISC Kea DHCP server transmits its DHCPOFFER as a unicast packet
with a broadcast IPv4 destination address (255.255.255.255). This
combination is currently rejected by iPXE.
Fix by explicitly accepting the local network broadcast address
(255.255.255.255) as a valid unicast destination address.
Reported-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Tested-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Updates:
- Nodnic: Support for arm cq doorbell via the UAR BAR
- Ensure hardware is quiescent when no interface is open - WinPE WA
- Support for clear interrupt via BAR
- Nodnic: Support for send TX doorbells via the UAR BAR
- Added ConnectX-5EX device
- Added ConnectX-5 device
Signed-off-by: Raed Salem <raeds@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no clean way for device drivers to shut down in
preparation for handover to a booted operating system. The platform
firmware simply doesn't bother to call the drivers' Stop() methods.
Instead, drivers must register an EVT_SIGNAL_EXIT_BOOT_SERVICES event
to be signalled when ExitBootServices() is called, and clean up
without any reference to the EFI driver model.
Unfortunately, all timers silently stop working when ExitBootServices()
is called. Even more unfortunately, and for no discernible reason,
this happens before any EVT_SIGNAL_EXIT_BOOT_SERVICES events are
signalled. The net effect of this entertaining design choice is that
any timeout loops on the shutdown path (e.g. for gracefully closing
outstanding TCP connections) may wait indefinitely.
There is no way to report failure from currticks(), since the API
lazily assumes that the host system continues to travel through time
in the usual direction. Work around EFI's violation of this
assumption by falling back to a simple free-running monotonic counter.
Debugged-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When searching for an UNDI ROM to match against a PCI device, search
in order of increasing ROM address (within the 128kB BIOS option ROM
area). This is likely (though not guaranteed) to match the order of
the original enumeration performed by the BIOS, which is in turn
likely to match the order of enumeration on the PCI bus.
Since we load at most one UNDI ROM, the net result is that we increase
our chances of loading the ROM corresponding to the selected PCI
device (rather than loading a ROM corresponding to a higher-numbered
PCI device with the same vendor and device IDs.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "progress" macro can be used only from within the .prefix section.
At the point of calling relocate(), we are running in .text16 and so
the near call to print_message() will end up calling a random function
somewhere in .text16.
Interestingly, this problem has remained unnoticed for some time. It
is rare to build with DEBUG=libprefix. In the few cases that it has
been used during development, the randomly selected function in
.text16 seems to have been a harmless no-op with no visible
side-effects (beyond the unnoticed failure to print the "relocate"
progress message).
Fix by removing the futile attempt to print a progress message before
calling relocate().
Reported-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix the <NULL> driver name reported by "ifstat" when using the undipci
driver (due to the unnecessary extra device node inserted as a child
of the PCI device).
Remove the "UNDI-" prefix from device names since the driver name is
also now visible via "ifstat", and tidy up the device name to match
the format used by standard PCI devices.
The output from "ifstat" now resembles:
iPXE> ifstat
net0: 52:54:00:12:34:56 using undipci on 0000:00:03.0
iPXE> ifstat
net0: 52:54:00:12:34:56 using undionly on 0000:00:03.0
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UNDI loader entry point is very likely to be called after POST,
when there is a high chance that the PMM-allocated image source area
and decompression area have been reused by something else.
In particular, using an iPXE .iso to test a separate iPXE ROM's UNDI
loader entry point in a qemu VM is likely to crash. SeaBIOS allocates
PMM blocks from close to the top of memory and so these blocks have a
high chance of colliding with the runtime addresses subsequently
chosen by the non-ROM iPXE by scanning the INT 15,e820 memory map.
The standard romprefix.S has no choice about relying on the
PMM-allocated image source area, since it has no other way to retrieve
its compressed payload.
In mromprefix.S, the image source area functions only as an optional
buffer used to avoid repeated reads from the (potentially slow)
expansion ROM BAR by the decompression code. We can therefore always
set %esi=0 when calling install_prealloc from the UNDI loader entry
point, and simply fall back to reading directly from the expansion ROM
BAR.
We can always set %edi=0 when calling install_prealloc from the UNDI
loader entry point. This will behave as though the decompression area
PMM allocation failed, and will therefore use INT 15,88 to find a
temporary decompression area somewhere close to 64MB. This is by no
means guaranteed to be safe from collisions, but it's probably safer
on balance than the PMM-allocated address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allocate base memory (by decreasing the free base memory counter)
before calling the UNDI loader entry point, to minimise surprises for
the UNDI loader code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The command and data interfaces may be connected to the same object.
Nullify the data interface before shutting down the control interface
to avoid potential infinite loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 71560d1 ("[librm] Preserve FPU, MMX and SSE state across calls
to virt_call()") added FXSAVE and FXRSTOR instructions to iPXE. In
KVM virtual machines, these instructions execute fine as long as the
host CPU supports the "unrestricted_guest" feature (that is, it can
virtualize big real mode natively). On older host CPUs however, KVM
has to emulate big real mode, and it currently doesn't implement
FXSAVE emulation.
Upstream QEMU rebuilt iPXE at commit 0418631 ("[thunderx] Fix
compilation with older versions of gcc") which is a descendant of
commit 71560d1 (see above).
This was done in QEMU commit ffdc5a2 ("ipxe: update submodule from
4e03af8ec to 041863191"). The resultant binaries were bundled with
the QEMU v2.7.0 release; see QEMU commit c52125a ("ipxe: update
prebuilt binaries").
This distributed the iPXE workaround for the Tivoli VMM bug to a
number of KVM users with old host CPUs, causing KVM emulation failures
(guest crashes) for them while netbooting.
Make the FXSAVE and FXRSTOR instructions conditional on a new feature
test macro called TIVOLI_VMM_WORKAROUND. Define the macro by default.
There is prior art for an assembly file including config/general.h:
see arch/x86/prefix/romprefix.S. Also, TIVOLI_VMM_WORKAROUND seems to
be a good fit for the "Obscure configuration options" section in
config/general.h.
Cc: Bandan Das <bsd@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Greg <rollenwiese@yahoo.com>
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Michael Prokop <launchpad@michael-prokop.at>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Pickford <arch@netremedies.ca>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Ref: https://bugs.archlinux.org/task/50778
Ref: https://bugs.launchpad.net/qemu/+bug/1623276
Ref: https://bugzilla.proxmox.com/show_bug.cgi?id=1182
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1356762
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The initrd_addr_max field represents the highest byte address that may
be used to hold initrd images, and is therefore almost certainly not
aligned to a page boundary: a typical value might be 0x7fffffff.
Fix the address calculations to ensure that the initrd images are
always aligned to a page boundary.
Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AppleNetBoot.h is not taken from the EDK2 codebase and so cannot be
imported using include/ipxe/efi/import.pl. Mark as a native iPXE
header (by changing the include guard) to avoid breaking the import
process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow certificates to be marked as having been added explicitly at run
time. Such certificates will not be discarded via the certificate
store cache discarder.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable IMAGE_PNG (but not IMAGE_PNM) by default, and drag in the
relevant objects only when image_pixbuf() is present in the binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable both IMAGE_DER and IMAGE_PEM by default, and drag in the
relevant objects only when image_asn1() is present in the binary.
This allows "imgverify" to transparently use either DER or PEM
signature files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit b1caa48 ("[crypto] Support SHA-{224,384,512} in X.509
certificates"), the list of supported cryptographic algorithms is
controlled by config/crypto.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PEM-encoded ASN.1 as an image format. We accept as PEM any image
containing a line starting with a "-----BEGIN" boundary marker.
We allow for PEM files containing multiple ASN.1 objects, such as a
certificate chain produced by concatenating individual certificate
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add DER-encoded ASN.1 as an image format. There is no fixed signature
for DER files. We treat an image as DER if it comprises a single
valid SEQUENCE object covering the entire length of the image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow code to create a partial ASN.1 cursor containing only the type
and length bytes, so that asn1_start() may be used to determine the
length of a large ASN.1 blob without first allocating memory to hold
the entire blob.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Windows drivers for VMBus devices are enumerated using the
instance UUID rather than the channel number. Include the instance
UUID within the iPXE device name to allow an iPXE network device to be
more easily associated with the corresponding Windows network device
when debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Select the IPv6 source address and corresponding router (if any) using
a very simplified version of the algorithm from RFC6724:
- Ignore any source address that has a smaller scope than the
destination address. For example, do not use a link-local source
address when sending to a global destination address.
- If we have a source address which is on the same link as the
destination address, then use that source address.
- If we are left with multiple possible source addresses, then choose
the address with the smallest scope. For example, if we are sending
to a site-local destination address and we have both a global source
address and a site-local source address, then use the site-local
source address.
- If we are still left with multiple possible source addresses, then
choose the address with the longest matching prefix.
For the purposes of this algorithm, we treat RFC4193 Unique Local
Addresses as having organisation-local scope. Since we use only
link-local scope for our multicast transmissions, this approximation
should remain valid in all practical situations.
Originally-implemented-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the IPv6 settings to construct the routing table, in a matter
analogous to the construction of the IPv4 routing table.
This allows for manual assignment of IPv6 addresses via e.g.
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0/len6 64
set net0/gateway6 fe80::226:bff:fedd:d3c0
The prefix length ("len6") may be omitted, in which case a default
prefix length of 64 will be assumed.
Multiple IPv6 addresses may be assigned manually by implicitly
creating child settings blocks. For example:
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0.ula/ip6 fda4:2496:e992::6950:5845
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonable user expectation is that ${net0/ip6} should show the
"highest-priority" of the IPv6 addresses, even when multiple IPv6
addresses are active. The expected order of priority is likely to be
manually-assigned addresses first, then stateful DHCPv6 addresses,
then SLAAC addresses, and lastly link-local addresses.
Using ${priority} to enforce an ordering is undesirable since that
would affect the priority assigned to each of the net<N> blocks as a
whole, so use the sibling ordering capability instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow settings blocks to provide an explicit default ordering between
siblings, with lower precedence than the existing ${priority} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the IPv6 address (or prefix) as ${ip6}, the prefix length as
${len6}, and the router address as ${gateway6}.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The settings scope ipv6_scope refers specifically to IPv6 settings
that have a corresponding DHCPv6 option. Rename to dhcpv6_scope to
more accurately reflect this purpose.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently perform IPv6 stateless address autoconfiguration (SLAAC)
in response to any router advertisement with the relevant flags set.
This can result in the local IPv6 source address changing midway
through a TCP connection, since our connections bind only to a local
port number and do not store a local network address.
In addition, this behaviour for SLAAC is inconsistent with that for
DHCPv4 and stateful DHCPv6, both of which will be performed only as a
result of an explicit autoconfiguration action (e.g. via the default
autoboot sequence, or the "ifconf" command).
Fix by ignoring router advertisements arriving outside the context of
an ongoing autoconfiguration attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit db34436 ("[intel] Strip spurious VLAN tags received by virtual
function NICs") accidentally introduced two copies of the
intel[x]vf_mbox_queues() function. Remove the unintended copy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function may be configured to transparently insert a VLAN
tag into all transmitted packets. Unfortunately, it does not
equivalently strip this same VLAN tag from all received packets. This
behaviour may be observed in some Amazon EC2 instances with Enhanced
Networking enabled: transmissions work as expected but all packets
received by iPXE appear to have a spurious VLAN tag.
We can configure the receive queue to strip VLAN tags via the
RXDCTL.VME bit. We need to find out from the PF driver whether or not
we should do so.
There exists a "get queue configuration" mailbox message which
contains a field labelled IXGBE_VF_TRANS_VLAN in the Linux driver.
A comment in the Linux PF driver describes this field as "notify VF of
need for VLAN tag stripping, and correct queue". It will be filled
with a non-zero value if the PF is enforcing the use of a single VLAN
tag. It will also be filled with a non-zero value if the PF is using
multiple traffic classes.
The Linux VF driver seems to treat this field as being simply the
number of traffic classes, and gives it no VLAN-related
interpretation. The Linux VF driver instead handles the VLAN tag
stripping by simply assuming that any unrecognised VLAN tag ought to
be silently dropped.
We choose to strip and ignore the VLAN tag if the IXGBE_VF_TRANS_VLAN
field has a non-zero value.
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Tested-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a busy network (such as a public cloud), IPv4 addresses may be
recycled rapidly. When this happens, unidirectional traffic (such as
UDP syslog) will succeed, but bidirectional traffic (such as TCP
connections) may fail due to stale ARP cache entries on other nodes.
The remote ARP cache expiry timeout is likely to exceed iPXE's
connection timeout, meaning that boot attempts can fail before the
problem is automatically resolved.
Fix by sending gratuitous ARPs whenever an IPv4 address is changed, to
attempt to update stale remote ARP cache entries. Note that this is
not a guaranteed fix, since ARP is an unreliable protocol.
We avoid sending gratuitous ARPs unconditionally, since otherwise any
unrelated settings change (e.g. "set dns 192.168.0.1") would cause
unexpected gratuitous ARPs to be sent.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ACPI power off sequence may not take effect immediately. Delay
for one second, to eliminate potentially confusing log messages such
as "Could not power off: Error 0x43902001 (http://ipx".
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some platforms (observed in a small subset of Microsoft Azure
(Hyper-V) virtual machines), the RTC appears to be incapable of
generating an interrupt via the legacy PIC. The RTC status registers
show that a periodic interrupt has been asserted, but the PIC IRR
shows that IRQ8 remains inactive.
On such systems, iPXE will currently freeze during the "iPXE
initialising devices..." message.
Work around this problem by checking that RTC interrupts are being
raised before returning from rtc_entropy_enable(). If no interrupt is
seen within 100ms, then we assume that the RTC interrupt mechanism is
broken. In these circumstances, iPXE will continue to initialise but
any subsequent attempt to generate entropy will fail. In particular,
HTTPS connections will fail with an error indicating that no entropy
is available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In edk2, there are several drivers that associate HII forms (and
corresponding config access protocol instances) with each individual
network device. (In this context, "network device" means the EFI
handle on which the SNP protocol is installed, and on which the device
path ending with the MAC() node is installed also.) Such edk2 drivers
are, for example: Ip4Dxe, HttpBootDxe, VlanConfigDxe.
In UEFI, any given handle can carry at most one instance of a specific
protocol (see e.g. the specification of the InstallProtocolInterface()
boot service). This implies that the class of drivers mentioned above
can't install their EFI_HII_CONFIG_ACCESS_PROTOCOL instances on the
SNP handle directly -- they would conflict with each other.
Accordingly, each of those edk2 drivers creates a "private" child
handle under the SNP handle, and installs its config access protocol
(and corresponding HII package list) on its child handle.
The device path for the child handle is traditionally derived by
appending a Hardware Vendor Device Path node after the MAC() node.
The VenHw() nodes in question consist of a GUID (by definition), and
no trailing data (by choice). The purpose of these VenHw() nodes is
only that all the child nodes can be uniquely identified by device
path.
At the moment iPXE does not follow this pattern. It doesn't run into
a conflict when it installs its EFI_HII_CONFIG_ACCESS_PROTOCOL
directly on the SNP handle, but that's only because iPXE is the sole
driver not following the pattern. This behavior seems risky (one
might call it a "latent bug"); better align iPXE with the edk2 custom.
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Gary Lin <glin@suse.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13494/focus=13532
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with assertions, profiling is enabled for objects built with any
debug level (including an explicit debug level of zero).
Allow profiling to be globally enabled or disabled by adding PROFILE=1
or PROFILE=0 respectively to the build command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Assertions are enabled for objects built with any debug level
(including an explicit debug level of zero). It is sometimes useful
to be able to enable assertions across all objects; this currently
requires manually hacking include/assert.h.
Allow assertions to be globally enabled by adding ASSERT=1 to the
build command line. For example:
make bin/8086100e.mrom ASSERT=1
Similarly, allow assertions to be globally disabled by adding ASSERT=0
to the build command line. If no ASSERT=... is specified on the
build command line, then only objects mentioned in DEBUG=... will have
assertions enabled (as is currently the case).
Note than globally enabling assertions imposes a relatively heavy
runtime penalty, primarily due to the various sanity checks performed
by list_add(), list_for_each_entry(), etc.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the DEBUG=... syntax to allow debug messages to be compiled in
but disabled by default. For example:
make bin/undionly.kpxe DEBUG=netdevice:3:1
would compile in the messages as for DEBUG=netdevice:3, but would set
the debug level mask so that only the DEBUG=netdevice:1 messages would
be displayed.
This allows for external code to selectively enable the additional
debug messages at runtime, without being overwhelmed by unwanted
initial noise. For example, a developer of a new protocol may want to
temporarily enable tracing of all packets received: this can be done
by building with DEBUG=netdevice:3:1 and using
// temporarily enable per-packet messages
DBG_ENABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
...
// disable per-packet messages
DBG_DISABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
Note that unlike the usual DBG_ENABLE() and DBG_DISABLE() macros,
DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() will not be removed via
dead code elimination if debugging is disabled in the specified
object. In particular, this means that using either of these macros
will always result in a symbol reference to the specified object.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DBG_ENABLE() and DBG_DISABLE() macros currently affect the debug
level of all objects that were built with debugging enabled. This is
undesirable, since it is common to use different debug levels in each
object.
Make the debug level mask a per-object variable. DBG_ENABLE() and
DBG_DISABLE() now control only the debug level for the containing
object (which is consistent with the intended usage across the
existing codebase). DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() may
be used to control the debug level for a specified object. For
example:
// Enable DBG() messages from tcpip.c
DBG_ENABLE_OBJECT ( tcpip, DBGLVL_LOG );
Note that the existence of debug messages continues to be gated by the
DEBUG=... list specified on the build command line. If an object was
built without the relevant debug level, then DBG_ENABLE_OBJECT() will
have no effect on that object at runtime (other than to explicitly
drag in the object via a symbol reference).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A redirection failure is fatal, but provides no opportunity for the
caller of xfer_[v]redirect() to report the failure since the interface
will already have been disconnected. Fix by sending intf_close() from
within the default xfer_vredirect() handler.
Debugged-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The vendor class identifier strings in DHCP_ARCH_VENDOR_CLASS_ID are
out of sync with the (correct) client architecture values in
DHCP_ARCH_CLIENT_ARCHITECTURE.
Fix by removing all definitions of DHCP_ARCH_VENDOR_CLASS_ID, and
instead generating the vendor class identifier string automatically
based on DHCP_ARCH_CLIENT_ARCHITECTURE and DHCP_ARCH_CLIENT_NDI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC3315 defines DHCPv6 option 16 (vendor class identifier) but does
not define any direct relationship with the roughly equivalent DHCPv4
option 60.
The PXE specification predates IPv6, and the UEFI specification is
expectedly vague on the subject. Examination of the reference EDK2
codebase suggests that the DHCPv6 vendor class identifier will be
formatted in accordance with RFC3315, using a single vendor-class-data
item in which the opaque-data field is the string as would appear in
DHCPv4 option 60.
RFC3315 requires the vendor class identifier to specify an IANA
enterprise number, as a way of disambiguating the vendor-class-data
namespace. The EDK2 code uses the value 343, described as:
// TODO: IANA TBD: temporarily using Intel's
Since this "TODO" has been present since at least 2010, it is probably
safe to assume that it has now become a de facto standard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC5970 defines DHCPv6 options 61 (client system architecture type)
and 62 (client network interface identifier), with contents equivalent
to DHCPv4 options 93 and 94 respectively.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
DHCPv4 and DHCPv6 share some values in common for the architecture-
specific options (such as the client system architecture type), but
use different encapsulations: DHCPv4 has a single byte for the option
length while DHCPv6 has a 16-bit field for the option length.
Move the containing DHCP_OPTION() and related wrappers from the
individual dhcp_arch.h files to dhcp.c, thus allowing for the
architecture-specific values to be reused in dhcpv6.c.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes (observed with an HP Gen9) seem to spuriously enable
interrupts at the PIC. This causes problems with NBPs such as GRUB
which use the UNDI API (thereby enabling interrupts on the NIC)
without first hooking an interrupt service routine. In this
situation, the interrupt will end up being handled by the default BIOS
ISR, which will typically just send an EOI and return. Since nothing
in this handler causes the NIC to deassert the interrupt, this will
result in an interrupt storm.
Entertainingly, some BIOSes are immune to this problem because the
default ISR sends the EOI only to the slave PIC; this effectively
disables the interrupt.
Work around this problem by disabling the interrupt on the PIC before
invoking the PXE NBP. An NBP that expects to make use of interrupts
will need to be configuring the PIC anyway, so it is probably safe to
assume that it will explicitly reenable the interrupt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There seems to be no reason for the sti/cli pair used around each call
to INT 10. Remove these instructions, so that printing debug messages
from within an ISR does not temporarily reenable interrupts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The HII IFR structures are allocated via realloc() rather than
zalloc(), and so are not automatically zeroed. This results in the
presence of uninitialised and invalid data, causing crashes elsewhere
in the UEFI firmware.
Fix by explicitly zeroing the newly allocated portion of any IFR
structure in efi_ifr_op().
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Debugged-by: Gary Lin <glin@suse.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SNP device path includes the network device's MAC address within
the MAC_ADDR_DEVICE_PATH.MacAddress field. We check that the
link-layer address will fit within this field, and then perform the
copy using the length of the destination buffer.
At 32 bytes, the MacAddress field is actually larger than the current
maximum iPXE link-layer address. The copy therefore overflows the
source buffer, resulting in trailing garbage bytes being appended to
the device path's MacAddress. This is invisible in debug messages,
since the DevicePathToText protocol will render only the length
implied by the interface type.
Fix by copying only the actual length of the link-layer address (which
we have already verified will not overflow the destination buffer).
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE debug logging doesn't support %u. This commit replaces it with
%d in virtio-pci debug format strings.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the regions may end up being unmapped, either because they are
optional or because the attempt to map them has failed. Region types
starting at 0 didn't make it easy to test for this condition.
This commit bumps all valid region types up by 1 with 0 having the
implicit 'unmapped' meaning.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a mechanism to allow an arbitrary adjustment to be applied to
all subsequent calls to time().
Note that the underlying clock source (e.g. the RTC clock) will not be
changed; only the time as reported within iPXE will be affected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ARM64 has a weaker memory order model than x86. The missing memory
barrier caused phy initialization notification to be delayed beyond
the link-wait timeout (15 secs).
Signed-off-by: Leendert van Doorn <leendert@paramecium.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some circumstances, intermediate devices may lose state in a way
that temporarily prevents the successful delivery of packets from a
TCP peer. For example, a firewall may drop a NAT forwarding table
entry.
Since iPXE spends most of its time downloading files (and hence purely
receiving data, sending only TCP ACKs), this can easily happen in a
situation in which there is no reason for iPXE's TCP stack to generate
any retransmissions. The temporary loss of connectivity can therefore
effectively become permanent.
Work around this problem by sending TCP keepalives after a period of
inactivity on an established connection.
TCP keepalives usually send a single garbage byte in sequence number
space that has already been ACKed by the peer. Since we do not need
to elicit a response from the peer, we instead send pure ACKs (with no
garbage data) in order to keep the transmit code path simple.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The non-cryptographic RNG implemented by random() has the property
that a seed value of zero will result in a generated sequence of
all-zero values. This situation can arise if currticks() returns zero
at start of day.
Work around this problem by falling back to a fixed non-zero seed if
necessary.
This has no effect on the separate DRBG used by cryptographic code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix build error with perl >= 5.23.2:
Can't redeclare "my" in "my" at ./util/parserom.pl line 160
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mac OS X uses non-standard EFI protocols to obtain the DHCP packets
from the UEFI firmware.
Originally-implemented-by: Michael Kuron <m.kuron@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There has been a longstanding disagreement between RFC4578 and the
IANA "Processor Architecture Types" registry. RFC4578 section 2.1
defines type 7 as "EFI BC" and type 9 as "EFI x86-64"; the IANA
registry quotes RFC4578 as its source but has these values erroneously
swapped. The EDK2 codebase uses the IANA values.
As of March 2016, RFC4578 has been modified by an errata to match the
values as recorded in the IANA registry.
Fix our definitions to match the consensus values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI keyboard drivers are blissfully unaware of the existence of
either Ctrl key, and will report "Ctrl-<key>" as just "<key>". This
breaks substantial portions of the iPXE user interface.
Work around these broken UEFI drivers by allowing "ESC <key>" to be
used as a substitute for "Ctrl-<key>".
Tested-by: Dreamcat4 <dreamcat4@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some HTTP/2 servers send the header "Connection: upgrade, close". This
currently causes iPXE to fail due to the unrecognised "upgrade" token.
Fix by ignoring any unrecognised tokens in the "Connection" header.
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This backport is from linux kernel upstream commit 83d6f1f ("ath9k:
fix buffer overrun for ar9287").
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The raw cycle counter at PMCCNTR_EL0 works in qemu but seems to always
read as zero on physical hardware (tested on Juno r1 and Cavium
ThunderX), even after ensuring that PMCR_EL0.E and PMCNTENSET_EL0.C
are both enabled.
Use CNTVCT_EL0 instead; this seems to count at a lower resolution
(tens of CPU cycles), but is usable for profiling.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification requires the EFI_SIMPLE_NETWORK_PROTOCOL
GetStatus() method to set TxBuf to NULL if there are no transmit
buffers to recycle.
Some implementations (observed with Lan9118Dxe in EDK2) fill in TxBuf
only when there is a transmit buffer to recycle, which leads to large
numbers of "spurious TX completion" errors.
Work around this problem by initialising TxBuf to NULL before calling
the GetStatus() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Do not assume that an architecture-specific optimised memcpy() will
have the same properties as generic_memcpy() in terms of handling
overlapping regions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building for 64-bit ARM, some symbol references may be resolved
via an "adrp" instruction (to obtain the start of the 4kB page
containing the symbol) and a separate 12-bit offset. For example
(taken from the GNU assembler documentation):
adrp x0, foo
ldr x0, [x0, #:lo12:foo]
We occasionally refer to symbols defined via mechanisms that are not
directly visible to gcc. For example:
extern char some_magic_symbol[];
__asm__ ( ".equ some_magic_symbol, some_magic_expression" );
The subsequent use of the ":lo12:" prefix on such magically-defined
symbols triggers an assertion failure in the assembler.
This problem seems to affect only "private_key_len" in the current
codebase. Fix by storing this value as static data; this avoids the
need to provide the value as a literal within the instruction stream,
and so avoids the problematic use of the ":lo12:" prefix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the EFI_CPU_ARCH_PROTOCOL's GetTimerValue() method to
generate the currticks() timer, calibrated against a 1ms delay from
the boot services Stall() method.
This does not work on ARM platforms, where GetTimerValue() is an empty
stub which just returns EFI_UNSUPPORTED.
Fix by instead creating a periodic timer event, and using this event
to increment a current tick counter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Require architecture-specific code to make a deliberate choice to use
the unoptimised generic_tcpip_continue_chksum() function, if there is
no optimised version available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The dependency on zlib seems to have been introduced in commit 3dd7ce1
("[efi] Allow building with non-system libbfd") as an indirect
requirement of either libbfd or libiberty when building on Mac OS X.
Since we no longer use either of these libraries, remove the
unnecessary link against zlib.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Parse the intermediate ELF file directly instead of using libbfd, in
order to allow for cross-compiled ELF objects.
As a side bonus, this eliminates libbfd as a build requirement.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The IBM Tivoli Provisioning Manager for OS Deployment (also known as
TPMfOSD, Rembo-ia32, or Rembo Auto-Deploy) has a serious bug in some
older versions (observed with v5.1.1.0, apparently fixed by v7.1.1.0)
which can lead to arbitrary data corruption.
As mentioned in commit 87723a0 ("[libflat] Test A20 gate without
switching to flat real mode"), Tivoli's NBP sets up a VMM and makes
calls to the PXE stack in VM86 mode. This appears to be some kind of
attempt to run PXE API calls inside a sandbox. The VMM is fairly
sophisticated: for example, it handles our attempts to switch into
protected mode and patches our GDT so that our protected-mode code
runs in ring 1 instead of ring 0. However, it neglects to apply any
memory protections. In particular, it does not enable paging and
leaves us with 4GB segment limits. We can therefore trivially break
out of the sandbox by simply overwriting the GDT (or by modifying any
of Tivoli's VMM code or data structures).
When we attempt to execute privileged instructions (such as "lidt"),
the CPU raises an exception and control is passed to the Tivoli VMM.
This may result in a call to Tivoli's memcpy() function.
Tivoli's memcpy() function includes optimisations which use the SSE
registers %xmm0-%xmm3 to speed up aligned memory copies.
Unfortunately, the Tivoli VMM's exception handler does not save or
restore %xmm0-%xmm3. The net effect of this bug in the Tivoli VMM is
that any privileged instruction (such as "lidt") issued by iPXE may
result in unexpected corruption of the %xmm0-%xmm3 registers.
Even more unfortunately, this problem affects the code path taken in
response to a hardware interrupt from the NIC, since that code path
will call PXENV_UNDI_ISR. The net effect therefore becomes that any
NIC hardware interrupt (e.g. due to a received packet) may result in
unexpected corruption of the %xmm0-%xmm3 registers.
If a packet arrives while Tivoli is in the middle of using its
memcpy() function, then the unexpected corruption of the %xmm0-%xmm3
registers will result in unexpected corruption in the destination
buffer. The net effect therefore becomes that any received packet may
result in a 16-byte block of corruption somewhere in any data that
Tivoli copied using its memcpy() function.
We can work around this bug in the Tivoli VMM by saving and restoring
the %xmm0-%xmm3 registers across calls to virt_call(). To work around
the problem, we need to save registers before attempting to execute
any privileged instructions, and ensure that we attempt no further
privileged instructions after restoring the registers.
This is less simple than it may sound. We can use the "movups"
instruction to save and restore individual registers, but this will
itself generate an undefined opcode exception if SSE is not currently
enabled according to the flags in %cr0 and %cr4. We can't access %cr0
or %cr4 before attempting the "movups" instruction, because access a
control register is itself a privileged instruction (which may
therefore trigger corruption of the registers that we're trying to
save).
The best solution seems to be to use the "fxsave" and "fxrstor"
instructions. If SSE is not enabled, then these instructions may fail
to save and restore the SSE register contents, but will not generate
an undefined opcode exception. (If SSE is not enabled, then we don't
really care about preserving the SSE register contents anyway.)
The use of "fxsave" and "fxrstor" introduces an implicit assumption
that the CPU supports SSE instructions (even though we make no
assumption about whether or not SSE is currently enabled). SSE was
introduced in 1999 with the Pentium III (and added by AMD in 2001),
and is an architectural requirement for x86_64. Experimentation with
current versions of gcc suggest that it may generate SSE instructions
even when using "-m32", unless an explicit "-march=i386" or "-mno-sse"
is used to inhibit this. It therefore seems reasonable to assume that
SSE will be supported on any hardware that might realistically be used
with new iPXE builds.
As a side benefit of this change, the MMX register %mm0 will now be
preserved across virt_call() even in an i386 build of iPXE using a
driver that requires readq()/writeq(), and the SSE registers
%xmm0-%xmm5 will now be preserved across virt_call() even in an x86_64
build of iPXE using the Hyper-V netvsc driver.
Experimentation suggests that this change adds around 10% to the
number of cycles required for a do-nothing virt_call(), most of which
are due to the extra bytes copied using "rep movsb". Since the number
of bytes copied is a compile-time constant local to librm.S, we could
potentially reduce this impact by ensuring that we always copy a whole
number of dwords and so can use "rep movsl" instead of "rep movsb".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 86f96a4 ("[tg3] Remove x86-specific inline assembly")
introduced a regression in _tg3_flag() in 64-bit builds, since any
flags in the upper 32 bits of a 64-bit unsigned long would be
discarded when truncating to a 32-bit int.
Debugged-by: Shane Thompson <shane.thompson@aeontech.com.au>
Tested-by: Shane Thompson <shane.thompson@aeontech.com.au>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some PXE NBPs are known to make PXE API calls with very little space
available on the real-mode stack. For example, the Rembo-ia32 NBP
from some versions of IBM's Tivoli Provisioning Manager for Operating
System Deployment (TPMfOSD) will issue calls with the real-mode stack
placed at 0000:03d2; this is at the end of the interrupt vector table
and leaves only 498 bytes of stack space available before overwriting
the hardware IRQ vectors. This limits the amount of state that we can
preserve before transitioning to protected mode.
Work around these challenging conditions by preserving everything
other than the initial register dump in a temporary static buffer
within our real-mode data segment, and copying the contents of this
buffer to the protected-mode stack.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Return success (rather than failure) after an image format has been
correctly identified.
This has no practical effect, since the return value from
image_probe() is deliberately never used, but avoids a somewhat
surprising and misleading "format not recognised" error message when
debugging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some architectures (such as ARM), gcc will insert implicit calls to
memset(). Handle these using the same mechanism as for the implicit
calls to memcpy() used by x86.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a build configuration option NET_PROTO_LACP to control whether or
not LACP support is included for Ethernet devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit makes virtio-net support devices with VEN 0x1af4 and DEV
0x1041, which is how non-transitional (modern-only) virtio-net devices
are exposed on the PCI bus.
Transitional devices supporting both the old 0.9.5 and new 1.0 version
of the virtio spec are driven using the new protocol. Legacy devices
are driven using the old protocol, same as before this commit.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit adds support for driving virtio 1.0 PCI devices. In
addition to various helpers, a number of vpm_ functions are introduced
to be used instead of their legacy vp_ counterparts when accessing
virtio 1.0 (aka modern) devices.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 1.0 introduces new constants and data structures, common to all
devices as well as specific to virtio-net. This commit adds a subset
of these to be able to drive the virtio-net 1.0 network device.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI devices may support more capabilities of the same type (for
example PCI_CAP_ID_VNDR) and there was no way to discover all of them.
This commit adds a new API pci_find_next_capability which provides
this functionality. It would typically be used like so:
for (pos = pci_find_capability(pci, PCI_CAP_ID_VNDR);
pos > 0;
pos = pci_find_next_capability(pci, pos, PCI_CAP_ID_VNDR)) {
...
}
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_HII_CONFIG_ACCESS_PROTOCOL's ExtractConfig() method is passed
a request string which includes the parameters being queried plus an
apparently meaningless blob of information (the ConfigHdr), and is
expected to include this same meaningless blob of information in the
results string.
Neither the specification nor the existing EDK2 code (including the
nominal reference implementation in the DriverSampleDxe driver)
provide any reason for the existence of this meaningless blob of
information. It appears to be consumed in its entirety by the
EFI_HII_CONFIG_ROUTING_PROTOCOL, and to contain zero bits of
information by the time it reaches an EFI_HII_CONFIG_ACCESS_PROTOCOL
instance. It would potentially allow for multiple configuration data
sets to be handled by a single EFI_HII_CONFIG_ACCESS_PROTOCOL
instance, in a style alien to the rest of the UEFI specification
(which implicitly assumes that the instance pointer is always
sufficient to uniquely identify the instance).
iPXE currently handles this by simply copying the ConfigHdr from the
request string to the results string, and otherwise ignoring it. This
approach is also used by some code in EDK2, such as OVMF's PlatformDxe
driver.
As of EDK2 commit 8a45f80 ("MdeModulePkg: Make HII configuration
settings available to OS runtime"), this causes an assertion failure
inside EDK2. The failure arises when iPXE is handled a NULL request
string, and responds (as per the specification) with a results string
including all settings. Since there is no meaningless blob to copy
from the request string, there is no corresponding meaningless blob in
the results string. This now causes an assertion failure in
HiiDatabaseDxe's HiiConfigRoutingExportConfig().
The same failure does not affect the OVMF PlatformDxe driver, which
simply passes the request string to the HII BlockToConfig() utility
function. The BlockToConfig() function returns EFI_INVALID_PARAMETER
when passed a null request string, and PlatformDxe propagates this
error directly to the caller.
Fix by matching the behaviour of OVMF's PlatformDxe driver: explicitly
return EFI_INVALID_PARAMETER if the request string is NULL or empty.
This violates the specification (insofar as it is feasible to
determine what the specification actually requires), but causes
correct behaviour with the EDK2 codebase.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existing code intends to print NULL strings as "<NULL>" (for the
sake of debug messages), but the logic is incorrect when handling
wide-character strings. Fix the logic and add applicable unit tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>