When the TEST UNIT READY command receives an error response, the
shutdown of the command's block data interface will result in
scsidev_ready() closing the SCSI device. This will subsequently
result in a duplicate call to scsicmd_close(), leading to an assertion
failure when list_del() is called for the second time.
Fix by removing the command from the list of outstanding commands
before shutting down the command's interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The eIPoIB translation layer needs to translate outbound ARP packets
from Ethernet to IPoIB. A 64-byte buffer (starting with the Ethernet
header) does not provide enough tailroom to expand to hold the two
20-byte IPoIB MAC addresses. The result is that an UNDI API user will
be unable to send ARP packets.
We could potentially shuffle the packet contents to reuse the space
occupied by the stripped Ethernet link-layer header, but this would
add complexity. Instead, fix by increasing the minimum allocation
size to 128 bytes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The value of ( ( x & 0x0c00 ) | 0x0c00 ) is always 0x0c00 regardless
of the value of x, and so the read_csr() is redundant. (There are no
read side effects for this register, according to the datasheet.)
This line of code originated in Linux kernel 2.3.19pre1 as
a->write_csr(ioaddr, 80, a->read_csr(ioaddr, 80) | 0x0c00);
and was modified in kernel 2.3.41pre4 to read
a->write_csr(ioaddr, 80, (a->read_csr(ioaddr, 80) & 0x0C00) | 0x0c00);
In the absence of commit messages, the intention of the code is
unclear. However, the logic resulting in a fixed value of 0x0c00 has
remained unaltered for over 17 years, and can probably be assumed to
have the correct overall result.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Track the current and maximum heap usage, and display the maximum
during shutdown when DEBUG=malloc is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our asprintf() implementation guarantees that strp will be NULL on
allocation failure, but this is not standard behaviour. Detect errors
by checking for a negative return value instead of a NULL pointer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid potential division by zero when performing the check against
multiplication overflow. (Note that if the width is zero then there
can be no overflow anyway, so it is then safe to bypass the check.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For visual consistency with surrounding lines, the definitions of
DBG_MORE(), DBG_PAUSE(), etc include an unnecessary ##__VA_ARGS__
argument which is always elided. This confuses sparse, which
complains about DBG_MORE_IF() being called with more than one
argument.
Work around this problem by adding an unused variable argument list to
the single-argument macros DBG_MORE_IF() and DBG_PAUSE_IF().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Report errors in eoib_duplicate() via netdev_tx_err() rather than
netdev_tx_complete_err(), since netdev_tx_complete_err() accepts only
valid I/O buffers that are currently in the network device's transmit
queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit 10d19bd ("[pxe] Always retrieve cached DHCPACK and apply
to relevant network device"), the UNDI driver has been the only user
of pxeparent_call(). Remove the unnecessary layer of abstraction by
refactoring this code back into undinet.c, and fix the ability of
undiisr.S to fall back to chaining to the original handler if we were
unable to unhook our own ISR.
This effectively reverts commit 337e1ed ("[pxe] Separate parent PXE
API caller from UNDINET driver").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently request cable detection in PXE_OPCODE_INITIALIZE to work
around buggy Emulex drivers (see commit c0b61ba ("[efi] Work around
bugs in Emulex NII driver")).
This causes problems with some other NII drivers (e.g. Mellanox),
which may time out if the underlying link is intrinsically slow to
come up.
Attempt to work around both problems simultaneously by requesting
cable detection only if the underlying NII driver does not support
link status reporting via PXE_OPCODE_GET_STATUS. (This is based on a
potentially incorrect assumption that the buggy Emulex drivers do not
claim to report link status via PXE_OPCODE_GET_STATUS.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a basic proof of concept ACPI table description (e.g. iBFT for
iSCSI) for SAN devices in a UEFI environment, using a control flow
that is functionally identical to that used in a BIOS environment.
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several files define the ARRAY_SIZE() macro as used in Linux. Provide
a common definition for this in include/compiler.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some iSCSI targets send NOP-In. Rather than closing the connection
when we receive one, it is more user friendly to log a debug message
and keep the connection open. Eventually, it would be nice if iPXE
supported replying to NOP-Ins, but we might as well keep the
connection open until the target disconnects us.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some VF data is not cleared with reset, so make sure to return all the
settings to default before configuring the VF.
This fixes an issue where network packets would fail to be received if
the VF was previously used by the linux ixgbevf driver.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a SCSI device is closed in error, the shutdown of the device's
block data interface will probably lead to any outstanding commands
being closed (by whichever object is currently connected to the block
data interface). However, commands remain in the list of outstanding
commands until the final reference is dropped. The result is that
scsidev_close() will make a second call to scsicmd_close() for each
command. This is harmless, but produces confusing debug messages.
Fix by treating the outstanding command list as holding an explicit
reference to each command, and removing the command from the list of
outstanding commands in scsicmd_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SCSI layer currently implements a retry loop in order to retry
commands that fail due to spurious "error" conditions such as "power
on occurred". Move this retry loop to the generic SAN device layer:
this allow for retries due to other transient error conditions such as
an iSCSI target having dropped the connection due to inactivity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The concept of the SAN drive number is meaningful only in a BIOS
environment, where it represents the INT13 drive number (0x80 for the
first hard disk). We retain this concept in a UEFI environment to
allow for a simple way for iPXE commands to refer to SAN drives.
Centralise the concept of the default drive number, since it is shared
between all supported environments.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to ThunderX Errata G-17560, NIC_PF_CFG[ENA] bit should not
be cleared at exit. This allows other drivers to access the NIC regs
correctly.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is required to reset BGX context state for the LMAC using
BGX_CMR_CONFIG register.
This solves problem with network connectivity in Linux booted from
iPXE.
Signed-off-by: Bartosz Szczepanek <bartosz.szczepanek@cavium.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use intfs_shutdown() and intfs_restart() to cleanly shut down multiple
interfaces that may loop back to the same object.
This fixes a regression introduced by commit daa8ed9 ("[interface]
Provide intf_reinit() to reinitialise nullified interfaces") which
broke the use of HTTP Basic and Digest authentication.
Reported-by: murmansk <murmansk@hotmail.com>
Reported-by: Brett Waldo <brettwaldo@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Shutting down (and optionally restarting) multiple interfaces is
fraught with problems if there are loops in the interface connectivity
(e.g. the HTTP content-decoded and transfer-decoded interfaces, which
will generally loop back to each other). Various workarounds
currently exist across the codebase, generally involving preceding
calls to intf_nullify() to avoid problems due to known loops.
Provide intfs_shutdown() and intfs_restart() to allow all of an
object's interfaces to be shut down (or restarted) in a single call,
without having to worry about potential external loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the current wall-clock time (in seconds since the Epoch), since
this is often useful in captured boot logs and can also be useful when
checking unexpected X.509 certificate validation failures.
Use a :uint32 setting to avoid Y2K38 rollover, thereby ensuring that
this will eventually be somebody else's problem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Malte zu Klampen <malte@pclab.ifg.uni-kiel.de>
Originally-implemented-by: Richard Moore <rich@richud.com>
Tested-by: Esben Storgaard Nielsen <esn@solar.dk>
Signed-off-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
INT 13 calls return a status value via %ah, with CF set if %ah is
non-zero (indicating an error). Our wrappers zero the whole of %ax if
CF is clear, to allow C code (which has no easy access to CF) to
simply test for a non-zero status to detect an error.
The current code assigns the returned status to a uint8_t, effectively
testing %al rather than %ah. Fix by treating the returned status as a
uint16_t instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid using a zero sector count to guess the disk geometry, since that
would result in a division by zero when calculating the number of
cylinders.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running on AMD platforms, the legacy hardware emulation is
extremely unreliable. In particular, the IRQ0 timer interrupt is
likely to simply stop working, resulting in a total failure of any
code that relies on timers (such as DHCP retransmission attempts).
Work around this by using the 10MHz time counter provided by Hyper-V
via an MSR. (This timer can be tested in KVM via the command-line
option "-cpu host,hv_time".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the active timer (providing udelay() and currticks()) to be
selected at runtime based on probing during the INIT_EARLY stage of
initialisation.
TICKS_PER_SEC is now a fixed compile-time constant for all builds, and
is independent of the underlying clock tick rate. We choose the value
1024 to allow multiplications and divisions on seconds to be converted
to bit shifts.
TICKS_PER_MS is defined as 1, allowing multiplications and divisions
on milliseconds to be omitted entirely. The 2% inaccuracy in this
definition is negligible when using the standard BIOS timer (running
at around 18.2Hz).
TIMER_RDTSC now checks for a constant TSC before claiming to be a
usable timer. (This timer can be tested in KVM via the command-line
option "-cpu host,+invtsc".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Separate out the concept of "hardware maximum supported frame length"
and "configured link MTU", and limit the latter according to the
former.
In networks where the DHCP-supplied link MTU is inconsistent with the
hardware or driver capabilities (e.g. a network using jumbo frames),
this will result in iPXE advertising a TCP MSS consistent with a size
that can actually be received.
Note that the term "MTU" is typically used to refer to the maximum
length excluding the link-layer headers; we adopt this usage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The call to intf_close() may result in the original interface being
reopened. For example: when reading the capacity of a 2TB+ disk via
iSCSI, the SCSI layer will respond to the intf_close() from the READ
CAPACITY (10) command by immediately issuing a READ CAPACITY (16)
command. The iSCSI layer happens to reuse the same interface for the
new command (since it allows only a single concurrent command).
Currently, intf_shutdown() unplugs the interface after the call to
intf_close() returns. In the above scenario, this results in
unplugging the just-reopened interface.
Fix by transferring the interface destination (and its reference) to a
temporary interface, and so effectively performing the unplug before
making the call to intf_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The x86_64 EDK2 headers include a #pragma to mark all subsequent
symbol declarations and references as hidden if position-independent
code is being generated. Since libgen.h is currently included only
after the EDK2 headers, this results in __xpg_basename() being
erroneously marked as having hidden visibility (if the compiler
defaults to building position-independent code); this eventually
results in a failure to link the elf2efi binary.
Fix by including libgen.h prior to including the EDK2 headers.
Originally-fixed-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some high-end Azure instances (e.g. NC6) we may receive an
unsolicited VMBUS_OFFER_CHANNEL message for a PCIe pass-through device
some time after completing the bus enumeration. This currently causes
apparently random failures due to unexpected VMBus message types.
Fix by ignoring any unsolicited VMBus messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some problems arise only when running on a specific CPU type (e.g.
non-functional timer interrupts as observed in Azure AMD instances).
Include the CPU vendor and model within the sample cloud boot scripts,
to assist in debugging such problems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a settings applicator to modify netdev->max_pkt_len in
response to changes to the "mtu" setting (DHCP option 26).
Note that as with MAC address changes, drivers are permitted to
completely ignore any changes in the MTU value. The net result will
be that iPXE effectively uses the smaller of either the hardware
default MTU or the software configured MTU.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For some unspecified "security" reason, the Google Compute Engine
metadata server will refuse any requests that do not include the
non-standard HTTP header "Metadata-Flavor: Google".
Attempt to autodetect such requests (by comparing the hostname against
"metadata.google.internal"), and add the "Metadata-Flavor: Google"
header if applicable.
Enable this feature in the CONFIG=cloud build, and include a sample
embedded script allowing iPXE to boot from a script configured as
metadata via e.g.
# Create shared boot image
make bin/ipxe.usb CONFIG=cloud EMBED=config/cloud/gce.ipxe
# Configure per-instance boot script
gcloud compute instances add-metadata <instance> \
--metadata-from-file ipxeboot=boot.ipxe
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some host implementations (notably Google Compute Platform) are known
to unconditionally write back VIRTIO_NET_HDR_F_DATA_VALID to
header->flags for received packets, regardless of the features
negotiated by the driver. This breaks the transmit datapath by
effectively setting an illegal flag for all subsequent transmitted
packets.
Work around this problem by using separate empty header buffers for
the receive and transmit queues.
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This code largely inspired by tap.c. Allows for testing iPXE on real
NICs from within Linux. For example:
make bin-x86_64-linux/af_packet.linux
valgrind ./bin-x86_64-linux/af_packet.linux --net af_packet,if=eth3
Tested as x86_64 and i386 binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 0.9 implementation was limited to the maximum virtqueue size of
MAX_QUEUE_NUM and the virtio-net driver would fail to initialize on hosts
exceeding this limit.
This commit lifts the restriction by allocating the queue memory based on
the actual queue size instead of using a fixed maximum. Note that virtio
1.0 still uses the MAX_QUEUE_NUM constant to cap the size (unfortunately
this functionality is not available in virtio 0.9).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit introduces virtnet_free_virtqueues called on all virtqueue
error and shutdown paths. vpm_find_vqs no longer cleans up after itself
and instead expects virtnet_free_virtqueues to be always called to undo
its effect.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
vpm_find_vqs incorrectly accepted the host provided queue size with no
regard to iPXE's internal limitations. Virtio 1.0 makes it possible for
the driver to override the queue size to reduce memory requirements and
iPXE is a great use case for this feature.
Also removing the extra vq->vring.num assignment which is already
handled in vring_init.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ISC Kea DHCP server transmits its DHCPOFFER as a unicast packet
with a broadcast IPv4 destination address (255.255.255.255). This
combination is currently rejected by iPXE.
Fix by explicitly accepting the local network broadcast address
(255.255.255.255) as a valid unicast destination address.
Reported-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Tested-by: Roy Ledochowski <roy.ledochowski@hpe.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Updates:
- Nodnic: Support for arm cq doorbell via the UAR BAR
- Ensure hardware is quiescent when no interface is open - WinPE WA
- Support for clear interrupt via BAR
- Nodnic: Support for send TX doorbells via the UAR BAR
- Added ConnectX-5EX device
- Added ConnectX-5 device
Signed-off-by: Raed Salem <raeds@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no clean way for device drivers to shut down in
preparation for handover to a booted operating system. The platform
firmware simply doesn't bother to call the drivers' Stop() methods.
Instead, drivers must register an EVT_SIGNAL_EXIT_BOOT_SERVICES event
to be signalled when ExitBootServices() is called, and clean up
without any reference to the EFI driver model.
Unfortunately, all timers silently stop working when ExitBootServices()
is called. Even more unfortunately, and for no discernible reason,
this happens before any EVT_SIGNAL_EXIT_BOOT_SERVICES events are
signalled. The net effect of this entertaining design choice is that
any timeout loops on the shutdown path (e.g. for gracefully closing
outstanding TCP connections) may wait indefinitely.
There is no way to report failure from currticks(), since the API
lazily assumes that the host system continues to travel through time
in the usual direction. Work around EFI's violation of this
assumption by falling back to a simple free-running monotonic counter.
Debugged-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When searching for an UNDI ROM to match against a PCI device, search
in order of increasing ROM address (within the 128kB BIOS option ROM
area). This is likely (though not guaranteed) to match the order of
the original enumeration performed by the BIOS, which is in turn
likely to match the order of enumeration on the PCI bus.
Since we load at most one UNDI ROM, the net result is that we increase
our chances of loading the ROM corresponding to the selected PCI
device (rather than loading a ROM corresponding to a higher-numbered
PCI device with the same vendor and device IDs.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "progress" macro can be used only from within the .prefix section.
At the point of calling relocate(), we are running in .text16 and so
the near call to print_message() will end up calling a random function
somewhere in .text16.
Interestingly, this problem has remained unnoticed for some time. It
is rare to build with DEBUG=libprefix. In the few cases that it has
been used during development, the randomly selected function in
.text16 seems to have been a harmless no-op with no visible
side-effects (beyond the unnoticed failure to print the "relocate"
progress message).
Fix by removing the futile attempt to print a progress message before
calling relocate().
Reported-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix the <NULL> driver name reported by "ifstat" when using the undipci
driver (due to the unnecessary extra device node inserted as a child
of the PCI device).
Remove the "UNDI-" prefix from device names since the driver name is
also now visible via "ifstat", and tidy up the device name to match
the format used by standard PCI devices.
The output from "ifstat" now resembles:
iPXE> ifstat
net0: 52:54:00:12:34:56 using undipci on 0000:00:03.0
iPXE> ifstat
net0: 52:54:00:12:34:56 using undionly on 0000:00:03.0
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UNDI loader entry point is very likely to be called after POST,
when there is a high chance that the PMM-allocated image source area
and decompression area have been reused by something else.
In particular, using an iPXE .iso to test a separate iPXE ROM's UNDI
loader entry point in a qemu VM is likely to crash. SeaBIOS allocates
PMM blocks from close to the top of memory and so these blocks have a
high chance of colliding with the runtime addresses subsequently
chosen by the non-ROM iPXE by scanning the INT 15,e820 memory map.
The standard romprefix.S has no choice about relying on the
PMM-allocated image source area, since it has no other way to retrieve
its compressed payload.
In mromprefix.S, the image source area functions only as an optional
buffer used to avoid repeated reads from the (potentially slow)
expansion ROM BAR by the decompression code. We can therefore always
set %esi=0 when calling install_prealloc from the UNDI loader entry
point, and simply fall back to reading directly from the expansion ROM
BAR.
We can always set %edi=0 when calling install_prealloc from the UNDI
loader entry point. This will behave as though the decompression area
PMM allocation failed, and will therefore use INT 15,88 to find a
temporary decompression area somewhere close to 64MB. This is by no
means guaranteed to be safe from collisions, but it's probably safer
on balance than the PMM-allocated address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allocate base memory (by decreasing the free base memory counter)
before calling the UNDI loader entry point, to minimise surprises for
the UNDI loader code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The command and data interfaces may be connected to the same object.
Nullify the data interface before shutting down the control interface
to avoid potential infinite loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 71560d1 ("[librm] Preserve FPU, MMX and SSE state across calls
to virt_call()") added FXSAVE and FXRSTOR instructions to iPXE. In
KVM virtual machines, these instructions execute fine as long as the
host CPU supports the "unrestricted_guest" feature (that is, it can
virtualize big real mode natively). On older host CPUs however, KVM
has to emulate big real mode, and it currently doesn't implement
FXSAVE emulation.
Upstream QEMU rebuilt iPXE at commit 0418631 ("[thunderx] Fix
compilation with older versions of gcc") which is a descendant of
commit 71560d1 (see above).
This was done in QEMU commit ffdc5a2 ("ipxe: update submodule from
4e03af8ec to 041863191"). The resultant binaries were bundled with
the QEMU v2.7.0 release; see QEMU commit c52125a ("ipxe: update
prebuilt binaries").
This distributed the iPXE workaround for the Tivoli VMM bug to a
number of KVM users with old host CPUs, causing KVM emulation failures
(guest crashes) for them while netbooting.
Make the FXSAVE and FXRSTOR instructions conditional on a new feature
test macro called TIVOLI_VMM_WORKAROUND. Define the macro by default.
There is prior art for an assembly file including config/general.h:
see arch/x86/prefix/romprefix.S. Also, TIVOLI_VMM_WORKAROUND seems to
be a good fit for the "Obscure configuration options" section in
config/general.h.
Cc: Bandan Das <bsd@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Greg <rollenwiese@yahoo.com>
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Michael Prokop <launchpad@michael-prokop.at>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Pickford <arch@netremedies.ca>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Ref: https://bugs.archlinux.org/task/50778
Ref: https://bugs.launchpad.net/qemu/+bug/1623276
Ref: https://bugzilla.proxmox.com/show_bug.cgi?id=1182
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1356762
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The initrd_addr_max field represents the highest byte address that may
be used to hold initrd images, and is therefore almost certainly not
aligned to a page boundary: a typical value might be 0x7fffffff.
Fix the address calculations to ensure that the initrd images are
always aligned to a page boundary.
Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AppleNetBoot.h is not taken from the EDK2 codebase and so cannot be
imported using include/ipxe/efi/import.pl. Mark as a native iPXE
header (by changing the include guard) to avoid breaking the import
process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow certificates to be marked as having been added explicitly at run
time. Such certificates will not be discarded via the certificate
store cache discarder.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable IMAGE_PNG (but not IMAGE_PNM) by default, and drag in the
relevant objects only when image_pixbuf() is present in the binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable both IMAGE_DER and IMAGE_PEM by default, and drag in the
relevant objects only when image_asn1() is present in the binary.
This allows "imgverify" to transparently use either DER or PEM
signature files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit b1caa48 ("[crypto] Support SHA-{224,384,512} in X.509
certificates"), the list of supported cryptographic algorithms is
controlled by config/crypto.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PEM-encoded ASN.1 as an image format. We accept as PEM any image
containing a line starting with a "-----BEGIN" boundary marker.
We allow for PEM files containing multiple ASN.1 objects, such as a
certificate chain produced by concatenating individual certificate
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add DER-encoded ASN.1 as an image format. There is no fixed signature
for DER files. We treat an image as DER if it comprises a single
valid SEQUENCE object covering the entire length of the image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow code to create a partial ASN.1 cursor containing only the type
and length bytes, so that asn1_start() may be used to determine the
length of a large ASN.1 blob without first allocating memory to hold
the entire blob.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Windows drivers for VMBus devices are enumerated using the
instance UUID rather than the channel number. Include the instance
UUID within the iPXE device name to allow an iPXE network device to be
more easily associated with the corresponding Windows network device
when debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Select the IPv6 source address and corresponding router (if any) using
a very simplified version of the algorithm from RFC6724:
- Ignore any source address that has a smaller scope than the
destination address. For example, do not use a link-local source
address when sending to a global destination address.
- If we have a source address which is on the same link as the
destination address, then use that source address.
- If we are left with multiple possible source addresses, then choose
the address with the smallest scope. For example, if we are sending
to a site-local destination address and we have both a global source
address and a site-local source address, then use the site-local
source address.
- If we are still left with multiple possible source addresses, then
choose the address with the longest matching prefix.
For the purposes of this algorithm, we treat RFC4193 Unique Local
Addresses as having organisation-local scope. Since we use only
link-local scope for our multicast transmissions, this approximation
should remain valid in all practical situations.
Originally-implemented-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the IPv6 settings to construct the routing table, in a matter
analogous to the construction of the IPv4 routing table.
This allows for manual assignment of IPv6 addresses via e.g.
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0/len6 64
set net0/gateway6 fe80::226:bff:fedd:d3c0
The prefix length ("len6") may be omitted, in which case a default
prefix length of 64 will be assumed.
Multiple IPv6 addresses may be assigned manually by implicitly
creating child settings blocks. For example:
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0.ula/ip6 fda4:2496:e992::6950:5845
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonable user expectation is that ${net0/ip6} should show the
"highest-priority" of the IPv6 addresses, even when multiple IPv6
addresses are active. The expected order of priority is likely to be
manually-assigned addresses first, then stateful DHCPv6 addresses,
then SLAAC addresses, and lastly link-local addresses.
Using ${priority} to enforce an ordering is undesirable since that
would affect the priority assigned to each of the net<N> blocks as a
whole, so use the sibling ordering capability instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow settings blocks to provide an explicit default ordering between
siblings, with lower precedence than the existing ${priority} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the IPv6 address (or prefix) as ${ip6}, the prefix length as
${len6}, and the router address as ${gateway6}.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The settings scope ipv6_scope refers specifically to IPv6 settings
that have a corresponding DHCPv6 option. Rename to dhcpv6_scope to
more accurately reflect this purpose.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently perform IPv6 stateless address autoconfiguration (SLAAC)
in response to any router advertisement with the relevant flags set.
This can result in the local IPv6 source address changing midway
through a TCP connection, since our connections bind only to a local
port number and do not store a local network address.
In addition, this behaviour for SLAAC is inconsistent with that for
DHCPv4 and stateful DHCPv6, both of which will be performed only as a
result of an explicit autoconfiguration action (e.g. via the default
autoboot sequence, or the "ifconf" command).
Fix by ignoring router advertisements arriving outside the context of
an ongoing autoconfiguration attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit db34436 ("[intel] Strip spurious VLAN tags received by virtual
function NICs") accidentally introduced two copies of the
intel[x]vf_mbox_queues() function. Remove the unintended copy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical function may be configured to transparently insert a VLAN
tag into all transmitted packets. Unfortunately, it does not
equivalently strip this same VLAN tag from all received packets. This
behaviour may be observed in some Amazon EC2 instances with Enhanced
Networking enabled: transmissions work as expected but all packets
received by iPXE appear to have a spurious VLAN tag.
We can configure the receive queue to strip VLAN tags via the
RXDCTL.VME bit. We need to find out from the PF driver whether or not
we should do so.
There exists a "get queue configuration" mailbox message which
contains a field labelled IXGBE_VF_TRANS_VLAN in the Linux driver.
A comment in the Linux PF driver describes this field as "notify VF of
need for VLAN tag stripping, and correct queue". It will be filled
with a non-zero value if the PF is enforcing the use of a single VLAN
tag. It will also be filled with a non-zero value if the PF is using
multiple traffic classes.
The Linux VF driver seems to treat this field as being simply the
number of traffic classes, and gives it no VLAN-related
interpretation. The Linux VF driver instead handles the VLAN tag
stripping by simply assuming that any unrecognised VLAN tag ought to
be silently dropped.
We choose to strip and ignore the VLAN tag if the IXGBE_VF_TRANS_VLAN
field has a non-zero value.
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Tested-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a busy network (such as a public cloud), IPv4 addresses may be
recycled rapidly. When this happens, unidirectional traffic (such as
UDP syslog) will succeed, but bidirectional traffic (such as TCP
connections) may fail due to stale ARP cache entries on other nodes.
The remote ARP cache expiry timeout is likely to exceed iPXE's
connection timeout, meaning that boot attempts can fail before the
problem is automatically resolved.
Fix by sending gratuitous ARPs whenever an IPv4 address is changed, to
attempt to update stale remote ARP cache entries. Note that this is
not a guaranteed fix, since ARP is an unreliable protocol.
We avoid sending gratuitous ARPs unconditionally, since otherwise any
unrelated settings change (e.g. "set dns 192.168.0.1") would cause
unexpected gratuitous ARPs to be sent.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ACPI power off sequence may not take effect immediately. Delay
for one second, to eliminate potentially confusing log messages such
as "Could not power off: Error 0x43902001 (http://ipx".
Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some platforms (observed in a small subset of Microsoft Azure
(Hyper-V) virtual machines), the RTC appears to be incapable of
generating an interrupt via the legacy PIC. The RTC status registers
show that a periodic interrupt has been asserted, but the PIC IRR
shows that IRQ8 remains inactive.
On such systems, iPXE will currently freeze during the "iPXE
initialising devices..." message.
Work around this problem by checking that RTC interrupts are being
raised before returning from rtc_entropy_enable(). If no interrupt is
seen within 100ms, then we assume that the RTC interrupt mechanism is
broken. In these circumstances, iPXE will continue to initialise but
any subsequent attempt to generate entropy will fail. In particular,
HTTPS connections will fail with an error indicating that no entropy
is available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In edk2, there are several drivers that associate HII forms (and
corresponding config access protocol instances) with each individual
network device. (In this context, "network device" means the EFI
handle on which the SNP protocol is installed, and on which the device
path ending with the MAC() node is installed also.) Such edk2 drivers
are, for example: Ip4Dxe, HttpBootDxe, VlanConfigDxe.
In UEFI, any given handle can carry at most one instance of a specific
protocol (see e.g. the specification of the InstallProtocolInterface()
boot service). This implies that the class of drivers mentioned above
can't install their EFI_HII_CONFIG_ACCESS_PROTOCOL instances on the
SNP handle directly -- they would conflict with each other.
Accordingly, each of those edk2 drivers creates a "private" child
handle under the SNP handle, and installs its config access protocol
(and corresponding HII package list) on its child handle.
The device path for the child handle is traditionally derived by
appending a Hardware Vendor Device Path node after the MAC() node.
The VenHw() nodes in question consist of a GUID (by definition), and
no trailing data (by choice). The purpose of these VenHw() nodes is
only that all the child nodes can be uniquely identified by device
path.
At the moment iPXE does not follow this pattern. It doesn't run into
a conflict when it installs its EFI_HII_CONFIG_ACCESS_PROTOCOL
directly on the SNP handle, but that's only because iPXE is the sole
driver not following the pattern. This behavior seems risky (one
might call it a "latent bug"); better align iPXE with the edk2 custom.
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Gary Lin <glin@suse.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13494/focus=13532
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with assertions, profiling is enabled for objects built with any
debug level (including an explicit debug level of zero).
Allow profiling to be globally enabled or disabled by adding PROFILE=1
or PROFILE=0 respectively to the build command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Assertions are enabled for objects built with any debug level
(including an explicit debug level of zero). It is sometimes useful
to be able to enable assertions across all objects; this currently
requires manually hacking include/assert.h.
Allow assertions to be globally enabled by adding ASSERT=1 to the
build command line. For example:
make bin/8086100e.mrom ASSERT=1
Similarly, allow assertions to be globally disabled by adding ASSERT=0
to the build command line. If no ASSERT=... is specified on the
build command line, then only objects mentioned in DEBUG=... will have
assertions enabled (as is currently the case).
Note than globally enabling assertions imposes a relatively heavy
runtime penalty, primarily due to the various sanity checks performed
by list_add(), list_for_each_entry(), etc.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the DEBUG=... syntax to allow debug messages to be compiled in
but disabled by default. For example:
make bin/undionly.kpxe DEBUG=netdevice:3:1
would compile in the messages as for DEBUG=netdevice:3, but would set
the debug level mask so that only the DEBUG=netdevice:1 messages would
be displayed.
This allows for external code to selectively enable the additional
debug messages at runtime, without being overwhelmed by unwanted
initial noise. For example, a developer of a new protocol may want to
temporarily enable tracing of all packets received: this can be done
by building with DEBUG=netdevice:3:1 and using
// temporarily enable per-packet messages
DBG_ENABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
...
// disable per-packet messages
DBG_DISABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
Note that unlike the usual DBG_ENABLE() and DBG_DISABLE() macros,
DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() will not be removed via
dead code elimination if debugging is disabled in the specified
object. In particular, this means that using either of these macros
will always result in a symbol reference to the specified object.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DBG_ENABLE() and DBG_DISABLE() macros currently affect the debug
level of all objects that were built with debugging enabled. This is
undesirable, since it is common to use different debug levels in each
object.
Make the debug level mask a per-object variable. DBG_ENABLE() and
DBG_DISABLE() now control only the debug level for the containing
object (which is consistent with the intended usage across the
existing codebase). DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() may
be used to control the debug level for a specified object. For
example:
// Enable DBG() messages from tcpip.c
DBG_ENABLE_OBJECT ( tcpip, DBGLVL_LOG );
Note that the existence of debug messages continues to be gated by the
DEBUG=... list specified on the build command line. If an object was
built without the relevant debug level, then DBG_ENABLE_OBJECT() will
have no effect on that object at runtime (other than to explicitly
drag in the object via a symbol reference).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A redirection failure is fatal, but provides no opportunity for the
caller of xfer_[v]redirect() to report the failure since the interface
will already have been disconnected. Fix by sending intf_close() from
within the default xfer_vredirect() handler.
Debugged-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The vendor class identifier strings in DHCP_ARCH_VENDOR_CLASS_ID are
out of sync with the (correct) client architecture values in
DHCP_ARCH_CLIENT_ARCHITECTURE.
Fix by removing all definitions of DHCP_ARCH_VENDOR_CLASS_ID, and
instead generating the vendor class identifier string automatically
based on DHCP_ARCH_CLIENT_ARCHITECTURE and DHCP_ARCH_CLIENT_NDI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC3315 defines DHCPv6 option 16 (vendor class identifier) but does
not define any direct relationship with the roughly equivalent DHCPv4
option 60.
The PXE specification predates IPv6, and the UEFI specification is
expectedly vague on the subject. Examination of the reference EDK2
codebase suggests that the DHCPv6 vendor class identifier will be
formatted in accordance with RFC3315, using a single vendor-class-data
item in which the opaque-data field is the string as would appear in
DHCPv4 option 60.
RFC3315 requires the vendor class identifier to specify an IANA
enterprise number, as a way of disambiguating the vendor-class-data
namespace. The EDK2 code uses the value 343, described as:
// TODO: IANA TBD: temporarily using Intel's
Since this "TODO" has been present since at least 2010, it is probably
safe to assume that it has now become a de facto standard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC5970 defines DHCPv6 options 61 (client system architecture type)
and 62 (client network interface identifier), with contents equivalent
to DHCPv4 options 93 and 94 respectively.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
DHCPv4 and DHCPv6 share some values in common for the architecture-
specific options (such as the client system architecture type), but
use different encapsulations: DHCPv4 has a single byte for the option
length while DHCPv6 has a 16-bit field for the option length.
Move the containing DHCP_OPTION() and related wrappers from the
individual dhcp_arch.h files to dhcp.c, thus allowing for the
architecture-specific values to be reused in dhcpv6.c.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes (observed with an HP Gen9) seem to spuriously enable
interrupts at the PIC. This causes problems with NBPs such as GRUB
which use the UNDI API (thereby enabling interrupts on the NIC)
without first hooking an interrupt service routine. In this
situation, the interrupt will end up being handled by the default BIOS
ISR, which will typically just send an EOI and return. Since nothing
in this handler causes the NIC to deassert the interrupt, this will
result in an interrupt storm.
Entertainingly, some BIOSes are immune to this problem because the
default ISR sends the EOI only to the slave PIC; this effectively
disables the interrupt.
Work around this problem by disabling the interrupt on the PIC before
invoking the PXE NBP. An NBP that expects to make use of interrupts
will need to be configuring the PIC anyway, so it is probably safe to
assume that it will explicitly reenable the interrupt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There seems to be no reason for the sti/cli pair used around each call
to INT 10. Remove these instructions, so that printing debug messages
from within an ISR does not temporarily reenable interrupts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The HII IFR structures are allocated via realloc() rather than
zalloc(), and so are not automatically zeroed. This results in the
presence of uninitialised and invalid data, causing crashes elsewhere
in the UEFI firmware.
Fix by explicitly zeroing the newly allocated portion of any IFR
structure in efi_ifr_op().
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Debugged-by: Gary Lin <glin@suse.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SNP device path includes the network device's MAC address within
the MAC_ADDR_DEVICE_PATH.MacAddress field. We check that the
link-layer address will fit within this field, and then perform the
copy using the length of the destination buffer.
At 32 bytes, the MacAddress field is actually larger than the current
maximum iPXE link-layer address. The copy therefore overflows the
source buffer, resulting in trailing garbage bytes being appended to
the device path's MacAddress. This is invisible in debug messages,
since the DevicePathToText protocol will render only the length
implied by the interface type.
Fix by copying only the actual length of the link-layer address (which
we have already verified will not overflow the destination buffer).
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE debug logging doesn't support %u. This commit replaces it with
%d in virtio-pci debug format strings.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the regions may end up being unmapped, either because they are
optional or because the attempt to map them has failed. Region types
starting at 0 didn't make it easy to test for this condition.
This commit bumps all valid region types up by 1 with 0 having the
implicit 'unmapped' meaning.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a mechanism to allow an arbitrary adjustment to be applied to
all subsequent calls to time().
Note that the underlying clock source (e.g. the RTC clock) will not be
changed; only the time as reported within iPXE will be affected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ARM64 has a weaker memory order model than x86. The missing memory
barrier caused phy initialization notification to be delayed beyond
the link-wait timeout (15 secs).
Signed-off-by: Leendert van Doorn <leendert@paramecium.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some circumstances, intermediate devices may lose state in a way
that temporarily prevents the successful delivery of packets from a
TCP peer. For example, a firewall may drop a NAT forwarding table
entry.
Since iPXE spends most of its time downloading files (and hence purely
receiving data, sending only TCP ACKs), this can easily happen in a
situation in which there is no reason for iPXE's TCP stack to generate
any retransmissions. The temporary loss of connectivity can therefore
effectively become permanent.
Work around this problem by sending TCP keepalives after a period of
inactivity on an established connection.
TCP keepalives usually send a single garbage byte in sequence number
space that has already been ACKed by the peer. Since we do not need
to elicit a response from the peer, we instead send pure ACKs (with no
garbage data) in order to keep the transmit code path simple.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The non-cryptographic RNG implemented by random() has the property
that a seed value of zero will result in a generated sequence of
all-zero values. This situation can arise if currticks() returns zero
at start of day.
Work around this problem by falling back to a fixed non-zero seed if
necessary.
This has no effect on the separate DRBG used by cryptographic code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix build error with perl >= 5.23.2:
Can't redeclare "my" in "my" at ./util/parserom.pl line 160
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mac OS X uses non-standard EFI protocols to obtain the DHCP packets
from the UEFI firmware.
Originally-implemented-by: Michael Kuron <m.kuron@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There has been a longstanding disagreement between RFC4578 and the
IANA "Processor Architecture Types" registry. RFC4578 section 2.1
defines type 7 as "EFI BC" and type 9 as "EFI x86-64"; the IANA
registry quotes RFC4578 as its source but has these values erroneously
swapped. The EDK2 codebase uses the IANA values.
As of March 2016, RFC4578 has been modified by an errata to match the
values as recorded in the IANA registry.
Fix our definitions to match the consensus values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI keyboard drivers are blissfully unaware of the existence of
either Ctrl key, and will report "Ctrl-<key>" as just "<key>". This
breaks substantial portions of the iPXE user interface.
Work around these broken UEFI drivers by allowing "ESC <key>" to be
used as a substitute for "Ctrl-<key>".
Tested-by: Dreamcat4 <dreamcat4@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some HTTP/2 servers send the header "Connection: upgrade, close". This
currently causes iPXE to fail due to the unrecognised "upgrade" token.
Fix by ignoring any unrecognised tokens in the "Connection" header.
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This backport is from linux kernel upstream commit 83d6f1f ("ath9k:
fix buffer overrun for ar9287").
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The raw cycle counter at PMCCNTR_EL0 works in qemu but seems to always
read as zero on physical hardware (tested on Juno r1 and Cavium
ThunderX), even after ensuring that PMCR_EL0.E and PMCNTENSET_EL0.C
are both enabled.
Use CNTVCT_EL0 instead; this seems to count at a lower resolution
(tens of CPU cycles), but is usable for profiling.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI specification requires the EFI_SIMPLE_NETWORK_PROTOCOL
GetStatus() method to set TxBuf to NULL if there are no transmit
buffers to recycle.
Some implementations (observed with Lan9118Dxe in EDK2) fill in TxBuf
only when there is a transmit buffer to recycle, which leads to large
numbers of "spurious TX completion" errors.
Work around this problem by initialising TxBuf to NULL before calling
the GetStatus() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Do not assume that an architecture-specific optimised memcpy() will
have the same properties as generic_memcpy() in terms of handling
overlapping regions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building for 64-bit ARM, some symbol references may be resolved
via an "adrp" instruction (to obtain the start of the 4kB page
containing the symbol) and a separate 12-bit offset. For example
(taken from the GNU assembler documentation):
adrp x0, foo
ldr x0, [x0, #:lo12:foo]
We occasionally refer to symbols defined via mechanisms that are not
directly visible to gcc. For example:
extern char some_magic_symbol[];
__asm__ ( ".equ some_magic_symbol, some_magic_expression" );
The subsequent use of the ":lo12:" prefix on such magically-defined
symbols triggers an assertion failure in the assembler.
This problem seems to affect only "private_key_len" in the current
codebase. Fix by storing this value as static data; this avoids the
need to provide the value as a literal within the instruction stream,
and so avoids the problematic use of the ":lo12:" prefix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the EFI_CPU_ARCH_PROTOCOL's GetTimerValue() method to
generate the currticks() timer, calibrated against a 1ms delay from
the boot services Stall() method.
This does not work on ARM platforms, where GetTimerValue() is an empty
stub which just returns EFI_UNSUPPORTED.
Fix by instead creating a periodic timer event, and using this event
to increment a current tick counter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Require architecture-specific code to make a deliberate choice to use
the unoptimised generic_tcpip_continue_chksum() function, if there is
no optimised version available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The dependency on zlib seems to have been introduced in commit 3dd7ce1
("[efi] Allow building with non-system libbfd") as an indirect
requirement of either libbfd or libiberty when building on Mac OS X.
Since we no longer use either of these libraries, remove the
unnecessary link against zlib.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Parse the intermediate ELF file directly instead of using libbfd, in
order to allow for cross-compiled ELF objects.
As a side bonus, this eliminates libbfd as a build requirement.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The IBM Tivoli Provisioning Manager for OS Deployment (also known as
TPMfOSD, Rembo-ia32, or Rembo Auto-Deploy) has a serious bug in some
older versions (observed with v5.1.1.0, apparently fixed by v7.1.1.0)
which can lead to arbitrary data corruption.
As mentioned in commit 87723a0 ("[libflat] Test A20 gate without
switching to flat real mode"), Tivoli's NBP sets up a VMM and makes
calls to the PXE stack in VM86 mode. This appears to be some kind of
attempt to run PXE API calls inside a sandbox. The VMM is fairly
sophisticated: for example, it handles our attempts to switch into
protected mode and patches our GDT so that our protected-mode code
runs in ring 1 instead of ring 0. However, it neglects to apply any
memory protections. In particular, it does not enable paging and
leaves us with 4GB segment limits. We can therefore trivially break
out of the sandbox by simply overwriting the GDT (or by modifying any
of Tivoli's VMM code or data structures).
When we attempt to execute privileged instructions (such as "lidt"),
the CPU raises an exception and control is passed to the Tivoli VMM.
This may result in a call to Tivoli's memcpy() function.
Tivoli's memcpy() function includes optimisations which use the SSE
registers %xmm0-%xmm3 to speed up aligned memory copies.
Unfortunately, the Tivoli VMM's exception handler does not save or
restore %xmm0-%xmm3. The net effect of this bug in the Tivoli VMM is
that any privileged instruction (such as "lidt") issued by iPXE may
result in unexpected corruption of the %xmm0-%xmm3 registers.
Even more unfortunately, this problem affects the code path taken in
response to a hardware interrupt from the NIC, since that code path
will call PXENV_UNDI_ISR. The net effect therefore becomes that any
NIC hardware interrupt (e.g. due to a received packet) may result in
unexpected corruption of the %xmm0-%xmm3 registers.
If a packet arrives while Tivoli is in the middle of using its
memcpy() function, then the unexpected corruption of the %xmm0-%xmm3
registers will result in unexpected corruption in the destination
buffer. The net effect therefore becomes that any received packet may
result in a 16-byte block of corruption somewhere in any data that
Tivoli copied using its memcpy() function.
We can work around this bug in the Tivoli VMM by saving and restoring
the %xmm0-%xmm3 registers across calls to virt_call(). To work around
the problem, we need to save registers before attempting to execute
any privileged instructions, and ensure that we attempt no further
privileged instructions after restoring the registers.
This is less simple than it may sound. We can use the "movups"
instruction to save and restore individual registers, but this will
itself generate an undefined opcode exception if SSE is not currently
enabled according to the flags in %cr0 and %cr4. We can't access %cr0
or %cr4 before attempting the "movups" instruction, because access a
control register is itself a privileged instruction (which may
therefore trigger corruption of the registers that we're trying to
save).
The best solution seems to be to use the "fxsave" and "fxrstor"
instructions. If SSE is not enabled, then these instructions may fail
to save and restore the SSE register contents, but will not generate
an undefined opcode exception. (If SSE is not enabled, then we don't
really care about preserving the SSE register contents anyway.)
The use of "fxsave" and "fxrstor" introduces an implicit assumption
that the CPU supports SSE instructions (even though we make no
assumption about whether or not SSE is currently enabled). SSE was
introduced in 1999 with the Pentium III (and added by AMD in 2001),
and is an architectural requirement for x86_64. Experimentation with
current versions of gcc suggest that it may generate SSE instructions
even when using "-m32", unless an explicit "-march=i386" or "-mno-sse"
is used to inhibit this. It therefore seems reasonable to assume that
SSE will be supported on any hardware that might realistically be used
with new iPXE builds.
As a side benefit of this change, the MMX register %mm0 will now be
preserved across virt_call() even in an i386 build of iPXE using a
driver that requires readq()/writeq(), and the SSE registers
%xmm0-%xmm5 will now be preserved across virt_call() even in an x86_64
build of iPXE using the Hyper-V netvsc driver.
Experimentation suggests that this change adds around 10% to the
number of cycles required for a do-nothing virt_call(), most of which
are due to the extra bytes copied using "rep movsb". Since the number
of bytes copied is a compile-time constant local to librm.S, we could
potentially reduce this impact by ensuring that we always copy a whole
number of dwords and so can use "rep movsl" instead of "rep movsb".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 86f96a4 ("[tg3] Remove x86-specific inline assembly")
introduced a regression in _tg3_flag() in 64-bit builds, since any
flags in the upper 32 bits of a 64-bit unsigned long would be
discarded when truncating to a 32-bit int.
Debugged-by: Shane Thompson <shane.thompson@aeontech.com.au>
Tested-by: Shane Thompson <shane.thompson@aeontech.com.au>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some PXE NBPs are known to make PXE API calls with very little space
available on the real-mode stack. For example, the Rembo-ia32 NBP
from some versions of IBM's Tivoli Provisioning Manager for Operating
System Deployment (TPMfOSD) will issue calls with the real-mode stack
placed at 0000:03d2; this is at the end of the interrupt vector table
and leaves only 498 bytes of stack space available before overwriting
the hardware IRQ vectors. This limits the amount of state that we can
preserve before transitioning to protected mode.
Work around these challenging conditions by preserving everything
other than the initial register dump in a temporary static buffer
within our real-mode data segment, and copying the contents of this
buffer to the protected-mode stack.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Return success (rather than failure) after an image format has been
correctly identified.
This has no practical effect, since the return value from
image_probe() is deliberately never used, but avoids a somewhat
surprising and misleading "format not recognised" error message when
debugging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some architectures (such as ARM), gcc will insert implicit calls to
memset(). Handle these using the same mechanism as for the implicit
calls to memcpy() used by x86.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a build configuration option NET_PROTO_LACP to control whether or
not LACP support is included for Ethernet devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit makes virtio-net support devices with VEN 0x1af4 and DEV
0x1041, which is how non-transitional (modern-only) virtio-net devices
are exposed on the PCI bus.
Transitional devices supporting both the old 0.9.5 and new 1.0 version
of the virtio spec are driven using the new protocol. Legacy devices
are driven using the old protocol, same as before this commit.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit adds support for driving virtio 1.0 PCI devices. In
addition to various helpers, a number of vpm_ functions are introduced
to be used instead of their legacy vp_ counterparts when accessing
virtio 1.0 (aka modern) devices.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 1.0 introduces new constants and data structures, common to all
devices as well as specific to virtio-net. This commit adds a subset
of these to be able to drive the virtio-net 1.0 network device.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI devices may support more capabilities of the same type (for
example PCI_CAP_ID_VNDR) and there was no way to discover all of them.
This commit adds a new API pci_find_next_capability which provides
this functionality. It would typically be used like so:
for (pos = pci_find_capability(pci, PCI_CAP_ID_VNDR);
pos > 0;
pos = pci_find_next_capability(pci, pos, PCI_CAP_ID_VNDR)) {
...
}
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_HII_CONFIG_ACCESS_PROTOCOL's ExtractConfig() method is passed
a request string which includes the parameters being queried plus an
apparently meaningless blob of information (the ConfigHdr), and is
expected to include this same meaningless blob of information in the
results string.
Neither the specification nor the existing EDK2 code (including the
nominal reference implementation in the DriverSampleDxe driver)
provide any reason for the existence of this meaningless blob of
information. It appears to be consumed in its entirety by the
EFI_HII_CONFIG_ROUTING_PROTOCOL, and to contain zero bits of
information by the time it reaches an EFI_HII_CONFIG_ACCESS_PROTOCOL
instance. It would potentially allow for multiple configuration data
sets to be handled by a single EFI_HII_CONFIG_ACCESS_PROTOCOL
instance, in a style alien to the rest of the UEFI specification
(which implicitly assumes that the instance pointer is always
sufficient to uniquely identify the instance).
iPXE currently handles this by simply copying the ConfigHdr from the
request string to the results string, and otherwise ignoring it. This
approach is also used by some code in EDK2, such as OVMF's PlatformDxe
driver.
As of EDK2 commit 8a45f80 ("MdeModulePkg: Make HII configuration
settings available to OS runtime"), this causes an assertion failure
inside EDK2. The failure arises when iPXE is handled a NULL request
string, and responds (as per the specification) with a results string
including all settings. Since there is no meaningless blob to copy
from the request string, there is no corresponding meaningless blob in
the results string. This now causes an assertion failure in
HiiDatabaseDxe's HiiConfigRoutingExportConfig().
The same failure does not affect the OVMF PlatformDxe driver, which
simply passes the request string to the HII BlockToConfig() utility
function. The BlockToConfig() function returns EFI_INVALID_PARAMETER
when passed a null request string, and PlatformDxe propagates this
error directly to the caller.
Fix by matching the behaviour of OVMF's PlatformDxe driver: explicitly
return EFI_INVALID_PARAMETER if the request string is NULL or empty.
This violates the specification (insofar as it is feasible to
determine what the specification actually requires), but causes
correct behaviour with the EDK2 codebase.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existing code intends to print NULL strings as "<NULL>" (for the
sake of debug messages), but the logic is incorrect when handling
wide-character strings. Fix the logic and add applicable unit tests.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no way for the hardware to give us an invalid length in the
LRH, since it must have parsed this length field in order to perform
header splitting. However, this is difficult to prove conclusively.
Add an unnecessary length check to explicitly reject any packets
larger than the posted receive I/O buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no way for the hardware to give us an invalid length in the
LRH, since it must have parsed this length field in order to perform
header splitting. However, this is difficult to prove conclusively.
Add an unnecessary length check to explicitly reject any packets
larger than the posted receive I/O buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is possible for the preloaded UNDI device to end up with no
specified bus type, since it may not be recognised as either a PCI or
an ISAPnP device. This will result in a bus type value of zero, which
currently results in NULL being treated as a string pointer by
netdev_fetch_bustype().
Fix by returning ENOENT if an unknown bus type is specified.
Reported-by: Todd Stansell <todd@stansell.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a build option CROSSCERT in config/crypto.h to allow the
default cross-signed certificate source to be configured at build
time. The ${crosscert} setting may still be used to reconfigure the
cross-signed certificate source at runtime.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some versions of gcc complain that "'__bswap_variable_32' is static
but used in inline function 'golan_check_rc_and_cmd_status' which is
not static".
Fix by making golan_check_rc_and_cmd_status() a static inline.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some end-user configurations have been observed in which the first NBP
(such as GRUB2) uses the UNDI API and then transfers control to a
second NBP (such as pxelinux) which uses the UDP API. The first NBP
closes the network device using PXENV_UNDI_CLOSE, which renders the
UDP API unable to transmit or receive packets.
The correct behaviour under these circumstances is (as often) simply
not documented by the PXE specification. Testing with the Intel PXE
stack suggests that PXENV_UDP_OPEN will implicitly reopen the network
device if necessary, so match this behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DHCP option 175.189 has been defined (by us) since 2006 as
containing the drive number to be used for a SAN boot, but has never
been automatically used as such by iPXE.
Use this option (if specified) to override the default SAN drive
number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Interpret the maximum drive number (0xff for hard disks, 0x7f for
floppy disks) as meaning "use natural drive number".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The mbr.bin and usbdisk.bin standalone blobs are currently generated
using "objcopy -O binary", which does not process relocation records.
For the i386 build, this does not matter since the section start
address is zero and so the ".rel" relocation records are effectively
no-ops anyway.
For the x86_64 build, the ".rela" relocation records are not no-ops,
since the addend is included as part of the relocation record (rather
than inline). Using "objcopy -O binary" will silently discard the
relocation records, with the result that all symbols are effectively
given a value of zero.
Fix by using "ld --oformat binary" instead of "objcopy -O binary" to
generate mbr.bin and usbdisk.bin.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Infiniband specification (volume 1, section 11.4.1.2 "Post Receive
Request") notes that for UD QPs, the GRH will be placed in the first
40 bytes of the receive buffer if present. (If no GRH is present,
which is normal, then the first 40 bytes of the receive buffer will be
unused.)
Mellanox hardware performs this placement automatically: other headers
will be stripped (and their values returned via the CQE), but the
first 40 bytes of the data buffer will be consumed by the (probably
non-existent) GRH.
This does not fit neatly into iPXE's internal abstraction, which
expects the data buffer to represent just the data payload with the
addresses from the GRH (if present) passed as additional parameters to
ib_complete_recv().
The end result of this discrepancy is that attempts to receive
full-sized 2048-byte IPoIB packets on Mellanox hardware will fail.
Fix by allocating a separate ring buffer to hold the received GRHs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The intention of the existing code (as documented in its own comments)
is that it should be possible to override the list of trusted root
certificates using a "trust" setting held in non-volatile stored
options. However, the rootcert_init() function currently executes
before any devices have been probed, and so will not be able to
retrieve any such non-volatile stored options.
Fix by executing rootcert_init() only after devices have been probed.
Since startup functions may be executed multiple times (unlike
initialisation functions), add an explicit flag to preserve the
property that rootcert_init() should run only once.
As before, if an explicit root of trust is specified at build time,
then any runtime "trust" setting will be ignored.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide access to local files via the "file://" URI scheme. There are
three syntaxes:
- An opaque URI with a relative path (e.g. "file:script.ipxe").
This will be interpreted as a path relative to the iPXE binary.
- A hierarchical URI with a non-network absolute path
(e.g. "file:/boot/script.ipxe"). This will be interpreted as a
path relative to the root of the filesystem from which the iPXE
binary was loaded.
- A hierarchical URI with a network path in which the authority is a
volume label (e.g. "file://bootdisk/script.ipxe"). This will be
interpreted as a path relative to the root of the filesystem with
the specified volume label.
Note that the potentially desirable shell mappings (e.g. "fs0:" and
"blk0:") are concepts internal to the UEFI shell binary, and do not
seem to be exposed in any way to external executables. The old
EFI_SHELL_PROTOCOL (which did provide access to these mappings) is no
longer installed by current versions of the UEFI shell.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some architectures (such as ARM) the "@" character is used as a
comment delimiter. A section type argument such as "@progbits"
therefore becomes "%progbits".
This is further complicated by the fact that the "%" character has
special meaning for inline assembly when input or output operands are
used, in which cases "@progbits" becomes "%%progbits".
Allow the section type character(s) to be defined via Makefile
variables.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This driver is the original source of the current readq() and writeq()
implementations for 32-bit iPXE. Switch to using the now-centralised
definitions, to avoid including architecture-specific code in an
otherwise architecture-independent driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 196f0f2 ("[librm] Convert prot_call() to a real-mode near
call") introduced a regression in which any deliberate modification to
the low 16 bits of the CPU flags (in struct i386_all_regs) would be
overwritten with the original flags value at the time of entry to
prot_call().
The regression arose because the alignment requirements of the
protected-mode stack necessitated the insertion of two bytes of
padding immediately below the prot_call() return address. The
solution chosen was to extend the existing "pushfl / popfl" pair to
"pushfw;pushfl / popfl;popfw". The extra "pushfw / popfw" appears at
first glance to be a no-op, but fails to take into account the fact
that the flags restored by popfl may have been deliberately modified
by the protected-mode function.
Fix by replacing "pushfw / popfw" with "pushw %ss / popw %ss". While
%ss does appear within struct i386_all_regs, any modification to the
stored value has always been ignored by prot_call() anyway.
The most visible symptom of this regression was that SAN booting would
fail since every INT 13 call would be chained to the original INT 13
vector.
Reported-by: Vishvananda Ishaya <vishvananda@gmail.com>
Reported-by: Jamie Thompson <forum.ipxe@jamie-thompson.co.uk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no practical way to generate an underlength ARP packet since
an ARP packet is always padded up to the minimum Ethernet frame length
(or dropped by the receiving Ethernet hardware if incorrectly padded),
but the absence of an explicit check causes warnings from some
analysis tools.
Fix by adding an explicit check on the I/O buffer length.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The assumption in asn1_type() that an ASN.1 cursor will always contain
a type byte is incorrect. A cursor that has been cleanly invalidated
via asn1_invalidate_cursor() will contain a type byte, but there are
other ways in which to arrive at a zero-length cursor.
Fix by explicitly checking the cursor length in asn1_type(). This
allows asn1_invalidate_cursor() to be reduced to simply zeroing the
length field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many TLS records contain variable-length fields. We currently
validate the overall record length, but do so only after reading the
length of the variable-length field. If the record is too short to
even contain the length field, then we may read uninitialised data
from beyond the end of the record.
This is harmless in practice (since the subsequent overall record
length check would fail regardless of the value read from the
uninitialised length field), but causes warnings from some analysis
tools.
Fix by validating that the overall record length is sufficient to
contain the length field before reading from the length field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several UEFI platforms are known to return EFI_NOT_FOUND when asked to
retrieve the system default font information via GetFontInfo(). Work
around these broken platforms by iterating over the glyphs to find the
maximum height used by a printable character.
Originally-fixed-by: Jonathan Dieter <jdieter@lesbg.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations utilise an EoIB-to-Ethernet gateway device
that does not perform a FullMember join to the multicast group for the
EoIB broadcast domain. This has various exciting side-effects, such
as requiring every EoIB node to send every broadcast packet twice.
As an added bonus, the gateway may also break the EoIB MAC address to
GID mapping protocol by sending Ethernet-sourced packets from the
wrong QPN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations require each individual EoIB node to create
the multicast group for the EoIB broadcast domain.
It is left as an exercise for the interested reader to determine how
such an implementation might ever allow the parameters of such a
multicast group to be changed without requiring a simultaneous upgrade
of every driver on every operating system on every machine currently
attached to the fabric.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations transmit a vendor-proprietary heartbeat
packet on the same multicast group used to provide the EoIB broadcast
domain.
Silently ignore these heartbeat packets, to avoid cluttering up the
network interface error statistics.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EoIB is a fairly simple protocol in which raw Ethernet frames
(excluding the CRC) are encapsulated within Infiniband Unreliable
Datagrams, with a four-byte fixed EoIB header (which conveys no actual
information). The Ethernet broadcast domain is provided by a
multicast group, similar to the IPoIB IPv4 multicast group.
The mapping from Ethernet MAC addresses to Infiniband address vectors
is achieved by snooping incoming traffic and building a peer cache
which can then be used to map a MAC address into a port GID. The
address vector is completed using a path record lookup, as for IPoIB.
Note that this requires every packet to include a GRH.
Add basic support for EoIB devices. This driver is substantially
derived from the IPoIB driver. There is currently no mechanism for
automatically creating EoIB devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a build configuration option VNIC_IPOIB to control whether or not
IPoIB support is included for Infiniband devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit e62e52b ("[ipoib] Simplify test for received broadcast
packets") relies upon the multicast LID being present in the
destination address vector as passed to ipoib_complete_recv().
Unfortunately, this information is not present in many Infiniband
devices' completion queue entries.
Fix by testing instead for the presence of a multicast GID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running the 64-bit BIOS version of iPXE, restrict external memory
allocations to the low 4GB to ensure that allocations (such as for
initrds) fall within our identity-mapped memory region, and will be
accessible to the potentially 32-bit operating system.
Move largest_memblock() back to memtop_umalloc.c, since this change
imposes a restriction that applies only to BIOS builds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a CMRC connection is closed, the deferred shutdown process calls
ib_destroy_qp(). This will cause the receive work queue entries to
complete in error (since they are being cancelled), which will in turn
reschedule the deferred shutdown process. This eventually leads to
ib_destroy_conn() being called on a connection that has already been
freed.
Fix by explicitly cancelling any pending shutdown process after the
shutdown process has completed.
Ironically, this almost exactly reverts commit 019d4c1 ("[infiniband]
Use a one-shot process for CMRC shutdown"); prior to the introduction
of one-shot processes the only way to achieve a one-shot process was
for the process to cancel itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for running the BIOS version of iPXE in 64-bit long mode.
A 64-bit BIOS version of iPXE can be built using e.g.
make bin-x86_64-pcbios/ipxe.usb
make bin-x86_64-pcbios/8086100e.mrom
The 64-bit BIOS version should appear to function identically to the
normal 32-bit BIOS version. The physical memory layout is unaltered:
iPXE is still relocated to the top of the available 32-bit address
space. The code is linked to a virtual address of 0xffffffffeb000000
(in the negative 2GB as required by -mcmodel=kernel), with 4kB pages
created to cover the whole of .textdata. 2MB pages are created to
cover the whole of the 32-bit address space.
The 32-bit portions of the code run with VIRTUAL_CS and VIRTUAL_DS
configured such that truncating a 64-bit virtual address gives a
32-bit virtual address pointing to the same physical location.
The stack pointer remains as a physical address when running in long
mode (although the .stack section is accessible via the negative 2GB
virtual address); this is done in order to simplify the handling of
interrupts occurring while executing a portion of 32-bit code with
flat physical addressing via PHYS_CODE().
Interrupts may be enabled in either 64-bit long mode, 32-bit protected
mode with virtual addresses, 32-bit protected mode with physical
addresses, or 16-bit real mode. Interrupts occurring in any mode
other than real mode will be reflected down to real mode and handled
by whichever ISR is hooked into the BIOS interrupt vector table.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a 64-bit build, the entirety of the 32-bit address space is
identity-mapped and so any valid physical address may immediately be
used as a virtual address. Conversely, a virtual address that is
already within the 32-bit address space may immediately be used as a
physical address.
A valid virtual address that lies outside the 32-bit address space
must be an address within .textdata, and so can be converted to a
physical address by adding virt_offset.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The physical locations of .textdata, .text16 and .data16 are constant
from the point of view of C code. Mark the relevant variables as
constant to allow gcc to optimise out redundant reads.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
No callers of prot_to_phys, phys_to_prot, or intr_to_prot require the
flags to be preserved. Remove the unnecessary pushfl/popfl pairs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a phys_call() wrapper function (analogous to the existing
real_call() wrapper function) for calling code with flat physical
addressing, and use this wrapper within the PHYS_CODE() macro.
Move the relevant functionality inside librm.S, where it more
naturally belongs.
The COMBOOT code currently uses explicit calls to _virt_to_phys and
_phys_to_virt. These will need to be rewritten if our COMBOOT support
is ever generalised to be able to run in a 64-bit build.
Specifically:
- com32_exec_loop() should be restructured to use PHYS_CODE()
- com32_wrapper.S should be restructured to use an equivalent of
prot_call(), passing parameters via a struct i386_all_regs
- there appears to be no need for com32_wrapper.S to switch between
external and internal stacks; this could be omitted to simplify
the design.
For now, librm.S continues to expose _virt_to_phys and _phys_to_virt
for use by com32.c and com32_wrapper.S. Similarly, librm.S continues
to expose _intr_to_virt for use by gdbidt.S.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older versions of binutils have issues with both the use of
PROVIDE() and the interpretation of numeric literals within a section
description.
Work around these older versions by defining the required numeric
literals outside of any section description, and by automatically
determining whether or not to generate extra space for page tables
rather than relying on LDFLAGS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The bulk of the iPXE binary (the .textdata section) is physically
relocated at runtime to the top of the 32-bit address space in order
to allow space for an OS to be loaded. The relocation is achieved
with the assistance of segmentation: we adjust the code and data
segment bases so that the link-time addresses remain valid.
Segmentation is not available (for normal code and data segments) in
long mode. We choose to compile the C code with -mcmodel=kernel and
use a link-time address of 0xffffffffeb000000. This choice allows us
to identity-map the entirety of the 32-bit address space, and to alias
our chosen link-time address to the physical location of our .textdata
section. (This requires the .textdata section to always be aligned to
a page boundary.)
We simultaneously choose to set the 32-bit virtual address segment
bases such that the link-time addresses may simply be truncated to 32
bits in order to generate a valid 32-bit virtual address. This allows
symbols in .textdata to be trivially accessed by both 32-bit and
64-bit code.
There is no (sensible) way in 32-bit assembly code to generate the
required R_X86_64_32S relocation records for these truncated symbols.
However, subtracting the fixed constant 0xffffffff00000000 has the
same effect as truncation, and can be represented in a standard
R_X86_64_32 relocation record. We define the VIRTUAL() macro to
abstract away this truncation operation, and apply it to all
references by 32-bit (or 16-bit) assembly code to any symbols within
the .textdata section.
We define "virt_offset" for a 64-bit build as "the value to be added
to an address within .textdata in order to obtain its physical
address". With this definition, the low 32 bits of "virt_offset" can
be treated by 32-bit code as functionally equivalent to "virt_offset"
in a 32-bit build.
We define "text16" and "data16" for a 64-bit build as the physical
addresses of the .text16 and .data16 sections. Since a physical
address within the 32-bit address space may be used directly as a
64-bit virtual address (thanks to the identity map), this definition
provides the most natural access to variables in .text16 and .data16.
Note that this requires a minor adjustment in prot_to_real(), which
accesses .text16 using 32-bit virtual addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Long-mode operation will require page tables, which are too large to
sensibly fit in our .data16 segment in base memory.
Add a portion of init_librm() running in 32-bit protected mode to
provide access to high memory. Use this portion of init_librm() to
initialise the .textdata variables "virt_offset", "text16", and
"data16", eliminating the redundant (re)initialisation currently
performed on every mode transition as part of real_to_prot().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the standard "pushl $function ; pushw %cs ; call prot_call"
sequence everywhere that prot_call() is used.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a 64-bit CPU, any modification of a register by 32-bit or 16-bit
code will destroy the invisible upper 32 bits of the corresponding
64-bit register. For example: a 32-bit "pushl %eax" followed by a
"popl %eax" will zero the upper half of %rax. This differs from the
treatment of upper halves of 32-bit registers by 16-bit code: a
"pushw %ax" followed by a "popw %ax" will leave the upper 16 bits of
%eax unmodified.
Inline assembly generated using REAL_CODE() or PHYS_CODE() will
therefore have to preserve the upper halves of all registers, to avoid
clobbering registers that gcc expects to be preserved.
Output operands from REAL_CODE() and PHYS_CODE() assembly may
therefore contain undefined values in the upper 32 bits.
Fix by using explicit variable widths (e.g. uint32_t) for
non-discarded output operands, to ensure that undefined values in the
upper 32 bits of 64-bit registers are ignored.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move most arch/i386 files to arch/x86, and adjust the contents of the
Makefiles and the include/bits/*.h headers to reflect the new
locations.
This patch makes no substantive code changes, as can be seen using a
rename-aware diff (e.g. "git show -M5").
This patch does not make the pcbios platform functional for x86_64; it
merely allows it to compile without errors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit c64747d ("[librm] Speed up real-to-protected mode transition
under KVM") rounded down the .text16 segment address calculated in
alloc_basemem() to a multiple of 64 bytes in order to speed up mode
transitions under KVM.
This creates a potential discrepancy between alloc_basemem() and
free_basemem(), meaning that free_basemem() may free less memory than
was allocated by alloc_basemem().
Fix by padding the calculated sizes of both .text16 and .data16 to a
multiple of 64 bytes at build time.
Debugged-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Guard against various corner cases (such as zero-length buffers, zero
alignments, and integer overflow when rounding up allocation lengths
and alignments) and ensure that the struct io_buffer is correctly
aligned even when the caller requests a non-zero alignment for the I/O
buffer itself.
Add self-tests to verify that the resulting alignments and lengths are
correct for a range of allocations.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit f3fbb5f ("[malloc] Avoid integer overflow for excessively large
memory allocations") fixed signed integer overflow issues caused by
the use of ssize_t, but did not guard against unsigned integer
overflow.
Add explicit checks for unsigned integer overflow where needed. As a
side bonus, erroneous calls to malloc_dma() with an (illegal) size of
zero will now fail cleanly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ath_rx_init() demonstrates some serious confusion over how to use
pointers, resulting in (uint32_t*)NULL being used as a temporary
variable. This does not end well.
The broken code in question is performing manual alignment of I/O
buffers, which can now be achieved more simply using alloc_iob_raw().
Fix by removing ath_rxbuf_alloc() entirely.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The various early-exit paths in parse_uri() accidentally bypass the
URI field decoding. The result is that opaque or relative URIs do not
undergo URI field decoding, resulting in double-encoding when the URIs
are subsequently used. For example:
#!ipxe
set mac ${macstring}
imgfetch /boot/by-mac/${mac:uristring}
would result in an HTTP GET such as
GET /boot/by-mac/00%253A0c%253A29%253Ac5%253A39%253Aa1 HTTP/1.1
rather than the expected
GET /boot/by-mac/00%3A0c%3A29%3Ac5%3A39%3Aa1 HTTP/1.1
Fix by ensuring that URI decoding is always applied regardless of the
URI format.
Reported-by: Andrew Widdersheim <awiddersheim@inetu.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TFTP URIs are intrinsically problematic, since:
- TFTP servers may use either normal slashes or backslashes as a
directory separator,
- TFTP servers allow filenames to be specified using relative paths
(with no initial directory separator),
- TFTP filenames present in a DHCP filename field may use special
characters such as "?" or "#" that prevent parsing as a generic URI.
As of commit 7667536 ("[uri] Refactor URI parsing and formatting"), we
have directly constructed TFTP URIs from DHCP next-server and filename
pairs, avoiding the generic URI parser. This eliminated the problems
related to special characters, but indirectly made it impossible to
parse a "tftp://..." URI string into a TFTP URI with a non-absolute
path.
Re-introduce the convention of requiring an extra slash in a
"tftp://..." URI string in order to specify a TFTP URI with an initial
slash in the filename. For example:
tftp://192.168.0.1/boot/pxelinux.0 => RRQ "boot/pxelinux.0"
tftp://192.168.0.1//boot/pxelinux.0 => RRQ "/boot/pxelinux.0"
This is ugly, but there seems to be no other sensible way to provide
the ability to specify all possible TFTP filenames.
A side-effect of this change is that format_uri() will no longer add a
spurious initial "/" when formatting a relative URI string. This
improves the console output when fetching an image specified via a
relative URI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The OCSP responder URI included within an X.509 certificate may or may
not include a trailing slash. We currently rely on the fact that
format_uri() incorrectly inserts an initial slash, which we include
unconditionally within the OCSP request URI.
Switch to using uri_encode() directly, and insert a slash only if the
X.509 certificate's OCSP responder URI does not already include a
trailing slash.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 53d2d9e ("[uri] Generalise tftp_uri() to pxe_uri()") introduced
a regression in which an NFS root path would no longer be treated as
an unsupported root path, causing a boot with an NFS root path to fail
with a "Could not open SAN device" error.
Reported-by: David Evans <dave.evans55@googlemail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some protocols (such as ARP) may modify the received packet and re-use
the same I/O buffer for transmission of a reply. The SMSC95XX
transmit header is larger than the receive header: the re-used I/O
buffer therefore does not have sufficient headroom for the transmit
header, and the ARP reply will therefore fail to be transmitted. This
is essentially the same problem as in commit 2e72d10 ("[ncm] Reserve
headroom in received packets").
Fix by reserving sufficient space at the start of each received packet
to allow for the difference between the lengths of the transmit and
receive headers.
This problem is not caught by the current driver development test
suite (documented at http://ipxe.org/dev/driver), since even the large
file transfer tests tend to completely sufficiently quickly that there
is no need for the server to ever send an ARP request. The failure
shows up only when using a very slow protocol such as RFC7440-enhanced
TFTP (as used by Windows Deployment Services).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The LED pins are configured by default as GPIO inputs. While it is
conceivable that a board might actually use these pins as GPIOs, no
such board is known to exist.
The Linux smsc95xx driver configures these pins unconditionally as LED
outputs. Assume that it is safe to do likewise.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the network interface name (e.g. "net0") as a setting. This
allows a script to obtain the name of the most recently opened network
interface via ${netX/ifname}.
Signed-off-by: Andrew Widdersheim <amwiddersheim@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a named CONFIG=cloud configuration, which enables console types
useful for obtaining output from virtual machines in public clouds
such as AWS EC2.
An image suitable for use in AWS EC2 can be built using
make bin/ipxe.usb CONFIG=cloud EMBED=config/cloud/aws.ipxe
The embedded script will direct iPXE to download and execute the EC2
"user-data" file, which is always available to an EC2 VM via the URI
http://169.254.169.254/latest/user-data (regardless of the VPC
networking settings). The boot can therefore be controlled by
modifying the per-instance user data, without having to modify the
boot disk image.
Console output can be obtained via syslog (with a syslog server
configured in the user-data script), via the AWS "System Log" (after
the instance has been stopped), or as a last resort from the log
partition on the boot disk.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The three nominally-disambiguated ENOTSUP errors accidentally all used
the same error disambiguator, rendering them identical. Fix by
changing all three values. We avoid reusing the 0x01 disambiguator
value, since that remains ambiguous in older binaries.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOS console redirection capabilities do not work well with the
colourised debug messages used by iPXE. We already allow the range of
colours to be controlled via the DBGCOL=... build parameter. Extend
this syntax to allow DBGCOL=0 to be used to mean "disable colours".
Requested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a debug function check_bios_interrupts() to look for changes
to the interrupt vector table. This can be useful when investigating
the behaviour (including crashes) of external PXE NBPs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For historical reasons, iPXE sets the current working URI to the root
of the TFTP server whenever the TFTP server address is changed. This
was originally implemented in the hope of allowing a DHCP-provided
TFTP filename to be treated simply as a relative URI. This usage
turns out to be impractical since DHCP-provided TFTP filenames may
include characters which would have special significance to the URI
parser, and so the DHCP next-server+filename combination is now
handled by the dedicated pxe_uri() function instead.
The practice of setting the current working URI to the root of the
TFTP server is potentially helpful for interactive uses of iPXE,
allowing a user to type e.g.
iPXE> dhcp
Configuring (net0 52:54:00:12:34:56)... ok
iPXE> chain pxelinux.0
and have the URI "pxelinux.0" interpreted as being relative to the
root of the TFTP server provided via DHCP.
The current implementation of tftp_apply_settings() has an unintended
flaw. When the "dhcp" command is used to renew a DHCP lease (or to
pick up potentially modified DHCP options), the old settings block
will be unregistered before the new settings block is registered.
This causes tftp_apply_settings() to believe that the TFTP server has
been changed twice (to 0.0.0.0 and back again), and so the current
working URI will always be set to the root of the TFTP server, even if
the DHCP response provides exactly the same TFTP server as previously.
Fix by doing nothing in tftp_apply_settings() whenever there is no
TFTP server address.
Debugged-by: Andrew Widdersheim <awiddersheim@inetu.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update the image's recorded URI when a download redirection occurs.
This ensures that URIs relative to a redirected download are resolved
correctly.
In particular, this allows for the use of relative URIs in scripts
that are themselves downloaded via a redirection, such as the HTTP 301
redirection used to fix up URIs pointing to directories but omitting
the trailing slash (e.g. "http://boot.ipxe.org/demo", which will be
redirected to "http://boot.ipxe.org/demo/").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Resolve redirection URIs as being relative to the original HTTP
request URI, rather than treating them as being implicitly relative to
the current working URI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 5de45cd ("[romprefix] Report a pessimistic runtime size
estimate") set the PCI3.0 "runtime size" field equal to the worst-case
runtime size, on the basis that there is no guarantee that PMM
allocation will succeed and hence no guarantee that we will be able to
shrink the ROM image.
On a PCI3.0 system where PMM allocation would succeed, this can cause
the BIOS to unnecessarily refuse to initialise the iPXE ROM due to a
perceived shortage of option ROM space.
Fix by reporting the best-case runtime size via the PCI header, and
checking that we have sufficient runtime space (if applicable). This
allows iPXE ROMs to initialise on PCI3.0 systems that might otherwise
fail due to a shortage of option ROM space.
This may cause iPXE ROMs to fail to initialise on PCI3.0 systems where
PMM is broken. (Pre-PCI3.0 systems are unaffected since there must
already have been sufficient option ROM space available for the
initialisation entry point to be called.)
On balance, it seems preferable to avoid breaking "good" systems
(PCI3.0 with working PMM) at the cost of potentially breaking "bad"
systems (PCI3.0 with broken PMM).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The GuestRPC mechanism (used for VMWARE_SETTINGS and CONSOLE_VMWARE)
does not use any real-mode code and so can be exposed in both 64-bit
and 32-bit builds.
Reported-by: Matthew Helton <mwhelton@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the use of the iPXE DRBG implementation in BSD-licensed
projects.
Requested-by: Sean Davis <dive@hq.endersgame.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the use of the iPXE DRBG implementation in BSD-licensed
projects.
Requested-by: Sean Davis <dive@hq.endersgame.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The SMSC95xx devices tend to be used in embedded systems with a
variety of ad-hoc mechanisms for storing the MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When USB network card drivers are used, the BIOS' legacy USB
capability is necessarily disabled since there is no way to share the
host controller between the BIOS and iPXE.
Commit 3726722 ("[usb] Add basic support for USB keyboards") added
support allowing a USB keyboard to be used within iPXE. However,
external code such as a PXE NBP has no way to utilise this support,
and so a USB keyboard cannot be used to control a PXE NBP loaded from
a USB network card.
Add support for injecting keypresses from any iPXE console into the
BIOS keyboard buffer. This allows external code such as a PXE NBP to
function with a USB keyboard even after the BIOS' legacy USB
capability has been disabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The build system allows for additional drivers (or other objects) to
be specified using build targets such as
make bin/intel--realtek.usb
make bin/8086100e--8086100f.mrom
This currently fails if the base target is the "bin/ipxe.*" all-drivers
target, e.g.
make bin/ipxe--acm.usb
Fix the build target parsing logic to allow additional drivers (or
other objects) to be included on top of the base all-drivers target.
This can be used to include USB network card drivers, which are not
yet included by default in the all-drivers build.
Reported-by: Andrew Sloma <asloma@lenovo.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some xHCI controllers (such as qemu's emulated xHCI controller) do not
correctly handle zero-length packets that are part of a TRB chain.
The zero-length TRB ends up being squashed and does not result in a
zero-length packet as seen by the device.
Work around this problem by marking the zero-length packet as
belonging to a separate transfer descriptor.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some hubs (e.g. the Avocent Corp. Virtual Hub on a Lenovo x3550
Integrated Management Module) have been observed to require more than
the standard 200ms for ports to stabilise, with the result that
devices appear to disconnect and immediately reconnect during the
initial bus enumeration.
Work around this problem by allowing specific hubs an extra 500ms of
settling time.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the speed of a USB device based on the port's speed at the time
that the device was enabled. This allows us to remember the device's
speed even after the device has been disconnected (and so the port's
current speed has changed).
In particular, this allows us to correctly identify the transaction
translator for a low-speed or full-speed device after the device has
been disconnected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The usb_message() and usb_stream() functions currently check for
port->speed==USB_SPEED_NONE to determine whether or not a device has
been unplugged. This test will give a false negative result if a new
device has been plugged in before the hotplug mechanism has finished
handling the removal of the old device.
Fix by checking instead the port->disconnected flag, which is now
cleared only after completing the removal of the old device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide BIT_QWORD_PTR() to allow for easy extraction of non-endian
fields (e.g. Infiniband GUIDs) without unnecessary byte swapping.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Tested using QEMU and usbredir to expose the LAN9512 chip present on a
Raspberry Pi.
There is a known issue with the LAN9512: an extra two bytes are
appended to every transmitted packet. These two bytes comprise:
{ 0x00, 0x08 } if packet length == 0 (mod 8)
{ CRC[0], 0x00 } if packet length == 7 (mod 8)
{ CRC[0], CRC[1] } otherwise
The extra bytes are appended whether the Ethernet CRC is generated
manually or added automatically by the hardware. The issue occurs
with the Linux kernel driver as well as the iPXE driver. It appears
to be an undocumented hardware errata.
TCP/IP traffic is not affected, since the IP header length field
causes the extraneous bytes to be discarded by the receiver. However,
protocols that rely on the length of the Ethernet frame (such as FCoE
or iPXE's "lotest" protocol) will be unusable on this hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some models (notably ICH), the PHY reset mechanism appears to be
broken. In particular, the PHY_CTRL register will be correctly loaded
from NVM but the values will not be propagated to the "OEM bits" PHY
register. This typically has the effect of dropping the link speed to
10Mbps.
Since the original version of this driver in commit 945e428 ("[intel]
Replace driver for Intel Gigabit NICs"), we have always worked around
this problem by skipping the PHY reset if the link is already up.
Enhance this workaround by explicitly checking for known-broken PCI
IDs.
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE does not call shutdown() before invoking a COMBOOT executable,
since the executable is allowed to make API calls back into iPXE. If
a background picture is used, then the console will not be restored to
text mode before invoking the COMBOOT executable. This can cause
undefined behaviour.
Fix by adding an explicit call to console_reset() immediately before
invoking a COMBOOT or COM32 executable, analogous to the call made to
console_reset() immediately before invokving a PXE NBP.
Debugged-by: Andrew Widdersheim <awiddersheim@inetu.net>
Tested-by: Andrew Widdersheim <awiddersheim@inetu.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For switches which remain permanently in the non-forwarding state (or
which erroneously report a non-forwarding state), ensure that iPXE
will eventually give up waiting for the link to become unblocked.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If we detect (via STP) that a switch port is in a non-forwarding
state, then the link is marked as being temporarily blocked and DHCP
discovery will be deferred until the link becomes unblocked.
The timer used to decide when to give up waiting for ProxyDHCPOFFERs
is currently based on the time that DHCP discovery was started, and
makes no allowances for any time spent waiting for the link to become
unblocked. Consequently, if STP is used then the timeout for
ProxyDHCPOFFERs becomes essentially zero.
Fix by resetting the recorded start time whenever DHCP discovery is
deferred due to a blocked link.
Debugged-by: Sebastian Roth <sebastian.roth@zoho.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The name "vesafb" is intrinsically specific to a BIOS environment.
Generalise the build configuration option CONSOLE_VESAFB to
CONSOLE_FRAMEBUFFER, in preparation for adding EFI framebuffer
support.
Existing configurations using CONSOLE_VESAFB will continue to work.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid accidentally dereferencing a NULL cipher context pointer for
plaintext blocks (which are usually messages with a block length of
zero, indicating a missing block).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI UNDI is a hideously ugly lump of poorly specified garbage bolted
on as an appendix of the UEFI specification. My personal favourite
line from the UNDI 'specification' is section E.2.2, which states
"Basically, the rule is: Do it right, or don't do it at all". The
author appears to believe that such exhortations are a viable
substitute for documenting what it is that the wretched reader is
supposed to, in fact, do.
(Second favourite is the section listing the pros and cons of various
driver types. This fails to identify a single con for the mythical
"Hardware UNDI", a design so insanely intrinsically slow that it
appears to have been the inspiration for the EFI_USB_IO_PROTOCOL.)
UNDI is functionally isomorphic to the substantially less preposterous
EFI_SIMPLE_NETWORK_PROTOCOL. Provide an UNDI interface (as a thin
wrapper around the existing SNP interface) to allow for use by
third-party software that has made poor life choices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calling EDK2's OpenProtocol() with attributes BY_DRIVER|EXCLUSIVE will
call DisconnectController() in a loop to attempt to dislodge any
existing openers with attributes BY_DRIVER. The loop will continue
indefinitely until either no such openers remain, or until
DisconnectController() returns an error.
If our driver binding protocol's Stop() method is ever called to
disconnect a device that we are not in fact driving, then return
EFI_DEVICE_ERROR rather than EFI_SUCCESS, in order to break this
potentially infinite loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Microsoft PE/COFF specification defines the MajorLinkerVersion and
MinorLinkerVersion fields as "The linker major version number" and
"The linker minor version number" respectively, and has nothing more
to say on the matter. These fields have no significance: they do not
affect the interpretation of the remainder of the file, but merely
provide diagnostic information for interested humans to read.
Apparently, versions 2.4 and earlier of the Microsoft linker produced
binaries so incorrigibly cursed that even to attempt to parse such a
binary would risk summoning a plague of enraged spiders. To protect
users from unwanted arachnids, ImageHlp.dll's MapAndLoad() function
will helpfully fail to map and/or load a 32-bit binary unless the
linker version field indicates version 2.5 or later. (64-bit binaries
are exempt from such helpfulness.)
Work around the broken Microsoft ImageHlp.dll library by providing a
linker version number that will satisfy the arbitrary whims of the
MapAndLoad() function.
This mirrors wimboot commit 670c7e2 ("[efi] Work around broken 32-bit
PE executable parsing in ImageHlp.dll").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use INT 1a,564e to notify the BIOS of each network device that we
detect. This provides an opportunity for the BIOS to implement
platform policy such as changing the MAC address by issuing a call to
PXENV_UNDI_SET_STATION_ADDRESS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Invoke INT 1a,564e whenever a PXE stack is activated, passing the
address of the PXENV+ structure in %es:%bx. This is designed to allow
a BIOS to be notified when a PXE stack has been installed, providing
an opportunity for start-of-day commands such as setting the MAC
address according to a policy chosen by the BIOS.
PXE defines INT 1a,5650 as a means of locating the PXENV+ structure:
this call returns %ax=0x564e and the address of the PXENV+ structure
in %es:%bx. We choose INT 1a,564e as a fairly natural notification
call, using the parameters as would be returned by INT 1a,5650.
The full calling convention (documented as per section 3.1 of the PXE
specification) is:
INT 1a,564e - PXE installation notification
Enter:
%ax = 0x564e
%es = 16-bit segment address of the PXENV+ structure
%bx = 16-bit offset of the PXENV+ structure
Exit:
%edx may be trashed (as is the case for INT 1a,5650)
All other register contents must be preserved
CF is cleared
IF is preserved
All other flags are undefined
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid dragging in unnecessary iPXE header files such as <ipxe/uuid.h>
and <ipxe/tables.h> when building host utilities, and ensure that
FILE_LICENCE() (present in the imported EDK2 headers) expands to a
no-op.
Reported-by: Michael Tautschnig <mt@debian.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 7d36a1b ("[build] Explicitly link efilink against -liberty")
introduced a dependency on libiberty to cope with old versions of
libbfd. This commit dates from 2008 and seems to apply only to what
are now extremely old versions of libbfd (prior to binutils 2.12).
There are systems (such as current Debian) which do not include
libiberty within the binutils packages. On such systems, our build
dependency on libiberty represents a pointless hurdle.
Remove the explicit dependency on libiberty, hoping that there are no
modern systems where this will cause a problem.
Suggested-by: Ben Hildred <42656e@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the UEFI platform firmware to provide drivers for unrecognised
devices, by exposing our own implementation of EFI_USB_IO_PROTOCOL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make the class ID a property of the USB driver (rather than a property
of the USB device ID), and allow USB drivers to specify a wildcard ID
for any of the three component IDs (class, subclass, or protocol).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generate a score for each possible USB device configuration based on
the available driver support, and select the configuration with the
highest score. This will allow us to prefer ECM over RNDIS (for
devices which support both) and will allow us to meaningfully select a
configuration even when we have drivers available for all functions
(e.g. when exposing unused functions via EFI_USB_IO_PROTOCOL).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The decision on whether or not a zero-length packet needs to be
transmitted is independent of the host controller and belongs in the
USB core.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TCP/IP checksum fields are one's complement values and therefore have
two possible representations of zero: positive zero (0x0000) and
negative zero (0xffff).
In RFC768, UDP over IPv4 exploits this redundancy to repurpose the
positive representation of zero (0x0000) to mean "no checksum
calculated"; checksums are optional for UDP over IPv4.
In RFC2460, checksums are made mandatory for UDP over IPv4. The
wording of the RFC is such that the UDP header is mandated to use only
the negative representation of zero (0xffff), rather than simply
requiring the checksum to be correct but allowing for either
representation of zero to be used.
In RFC1071, an example algorithm is given for calculating the TCP/IP
checksum. This algorithm happens to produce only the positive
representation of zero (0x0000); this is an artifact of the way that
unsigned arithmetic is used to calculate a signed one's complement
sum (and its final negation).
A common misconception has developed (exemplified in RFC1624) that
this artifact is part of the specification. Many people have assumed
that the checksum field should never contain the negative
representation of zero (0xffff).
A sensible receiver will calculate the checksum over the whole packet
and verify that the result is zero (in whichever representation of
zero happens to be generated by the receiver's algorithm). Such a
receiver will not care which representation of zero happens to be used
in the checksum field.
However, there are receivers in existence which will verify the
received checksum the hard way: by calculating the checksum over the
remainder of the packet and comparing the result against the checksum
field. If the representation of zero used by the receiver's algorithm
does not match the representation of zero used by the transmitter (and
so placed in the checksum field), and if the receiver does not
explicitly allow for both representations to compare as equal, then
the receiver may reject packets with a valid checksum.
For UDP, the combined RFCs effectively mandate that we should generate
only the negative representation of zero in the checksum field.
For IP, TCP and ICMP, the RFCs do not mandate which representation of
zero should be used, but the misconceptions which have grown up around
RFC1071 and RFC1624 suggest that it would be least surprising to
generate only the positive representation of zero in the checksum
field.
Fix by ensuring that all of our checksum algorithms generate only the
positive representation of zero, and explicitly inverting this in the
case of transmitted UDP packets.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow iPXE to coexist with other USB device drivers, by attaching to
the EFI_USB_IO_PROTOCOL instances provided by the UEFI platform
firmware.
The EFI_USB_IO_PROTOCOL is an unsurprisingly badly designed
abstraction of a USB device. The poor design choices intrinsic in the
UEFI specification prevent efficient operation as a network device,
with the result that devices operated using the EFI_USB_IO_PROTOCOL
operate approximately two orders of magnitude slower than devices
operated using our native EHCI or xHCI host controller drivers.
Since the performance is so abysmally slow, and since the underlying
problems are due to fundamental architectural mistakes in the UEFI
specification, support for the EFI_USB_IO_PROTOCOL host controller
driver is left as disabled by default. Users are advised to use the
native iPXE host controller drivers instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many UEFI NBPs expect to find an EFI_PXE_BASE_CODE_PROTOCOL installed
in addition to the EFI_SIMPLE_NETWORK_PROTOCOL. Most NBPs use the
EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the cached DHCP packets.
This implementation has been tested with grub.efi, shim.efi,
syslinux.efi, and wdsmgfw.efi. Some methods (such as Discover() and
Arp()) are not used by any known NBP and so have not (yet) been
implemented.
Usage notes for the tested bootstraps are:
- grub.efi uses EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the
cached DHCP packet, and uses no other methods.
- shim.efi uses EFI_PXE_BASE_CODE_PROTOCOL to retrieve the cached
DHCP packet and to retrieve the next NBP via the Mtftp() method.
If shim.efi was downloaded via HTTP (or other non-TFTP protocol)
then shim.efi will blindly call Mtftp() with an HTTP URI as the
filename: this allows the next NBP (e.g. grubx64.efi) to also be
transparently retrieved by HTTP.
shim.efi can also use the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL to
retrieve files previously loaded by "imgfetch" or similar commands
in iPXE. The current implementation of shim.efi will use the
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL only if it does not find an
EFI_PXE_BASE_CODE_PROTOCOL; this patch therefore prevents this
usage of our EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. This logic could be
trivially reversed in shim.efi if needed.
- syslinux.efi uses EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the
cached DHCP packet. Versions 6.03 and earlier have a bug which
may cause syslinux.efi to attach to the wrong NIC if there are
multiple NICs in the system (or if the UEFI firmware supports
IPv6).
- wdsmgfw.efi (ab)uses EFI_PXE_BASE_CODE_PROTOCOL to retrieve the
cached DHCP packets, and to send and retrieve UDP packets via the
UdpWrite() and UdpRead() methods. (This was presumably done in
order to minimise the amount of benefit obtainable by switching to
UEFI, by replicating all of the design mistakes present in the
original PXE specification.)
The EFI_DOWNGRADE_UX configuration option remains available for now,
until this implementation has received more widespread testing.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Merge the functionality of parse_next_server_and_filename() and
tftp_uri() into a single pxe_uri(), which takes a server address
(IPv4/IPv6/none) and a filename, and produces a URI using the rule:
- if the filename is a hierarchical absolute URI (i.e. includes a
scheme such as "http://" or "tftp://") then use that URI and ignore
the server address,
- otherwise, if the server address is recognised (according to
sa_family) then construct a TFTP URI based on the server address,
port, and filename
- otherwise fail.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently do not populate the ciaddr field in the constructed PXE
Boot Server ACK packet. This causes a WDS server to respond with a
broadcast packet, which is then ignored by wdsmgfw.efi since it does
not match the specified IP address filter.
Fix by populating ciaddr within the constructed PXE Boot Server ACK
packet.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our SNP ReceiveFilters() method is a no-op, since we always (if
possible) use promiscuous mode for all network cards. The method
currently returns EFI_NOT_READY if the SNP interfaces are claimed for
use by iPXE, as with all other SNP methods.
The WDS bootstrap wdsmgfw.efi attempts to use both the PXE Base Code
protocol and the Simple Network Protocol simultaneously. This is
fundamentally broken, since use of the PXE Base Code protocol requires
us to disable the use of SNP (by claiming the interfaces for use by
iPXE), otherwise MnpDxe swallows all of the received packets before
our PXE Base Code's UdpRead() method is able to return them.
The root cause of this problem is that, as with BIOS PXE, the network
booting portions of the UEFI specification are less of a specification
and more of an application note sketchily describing how the original
hacked-together Intel implementation works. No sane design would ever
have included the UdpWrite() and UdpRead() methods.
Work around these fundamental conceptual flaws by unconditionally
returning success from efi_snp_receive_filters().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some distributions (observed with Ubuntu 15.04) place ldlinux.c32 in a
separate directory from isolinux.bin. Search for these files
separately, and allow an alternative location of ldlinux.c32 to be
provided via LDLINUX_C32=... on the make command line.
Reported-by: Adrian Koshka <adriankoshcha@teknik.io>
Tested-by: Adrian Koshka <adriankoshcha@teknik.io>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The debug directory size specified in the data directory should cover
only the EFI_IMAGE_DEBUG_DIRECTORY_ENTRY structure, not the whole of
the .debug section.
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add definitions of protocols observed to be used by wdsmgfw.efi, and
add a handle name type for ConIn, ConOut, and StdErr.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit edf74df ("[pxe] Always reconstruct packet for
PXENV_GET_CACHED_INFO") fixed the problems caused by returning stale
DHCP packets (e.g. from an earlier boot attempt using a different
network device), but broke interoperability with NBPs such as WDS
which may overwrite our cached (fake) DHCP packets and expect the
modified packets to be returned by a subsequent call to
PXENV_GET_CACHED_INFO.
Fix by constructing the fake DHCP packets immediately before
transferring control to a PXE NBP. Calls to PXENV_GET_CACHED_INFO
will now never modify the cached packets.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add debug wrappers for more boot services functions, and print
symbolic values rather than raw numbers where possible.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The raw EFI_HANDLE value is almost never useful to know, and simply
adds noise to the already verbose debug messages. Improve the
legibility of debug messages by using only the name generated by
efi_handle_name().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We attempt to mimic the behaviour of Intel's PXE ROM by skipping the
separate ProxyDHCPREQUEST if the ProxyDHCPOFFER already contains a
boot filename or a PXE boot menu.
Experimentation reveals that Intel's PXE ROM will also check for a
non-empty next-server address alongside the boot filename. Update our
test to match this behaviour.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 09b057c ("[settings] Remove "uristring" setting type") removed
support for URI-encoded settings via the "uristring" setting type, on
the basis that such encoding was no longer necessary to avoid problems
with the command line parser.
Other valid use cases for the "uristring" setting type do exist: for
example, a password containing a '/' character expanded via
chain http://username:${password:uristring}@server.name/boot.php
Restore the existence of the "uristring" setting, avoiding the
potentially large stack allocations that were used in the old code
prior to commit 09b057c ("[settings] Remove "uristring" setting
type").
Requested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When booting without an embedded script, display the imgstat()
information immediately before executing the downloaded image. This
allows potentially useful diagnostic information (such as the detected
image type) to be observed by the user without needing to enter the
iPXE shell and manually download the image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current usage pattern of image_probe() is a legacy from the time
before commit 34b6ecb ("[image] Simplify image management") when
loading an image to its executable location in memory was a separate
action from actually executing the image.
Call image_probe() as soon as an image is registered. This allows
"imgstat" to display image type information for all images and allows
image-consuming code to assume that image->type is already set
correctly.
Ignore failures if image_probe() does not recognise the image, since
we do expect to handle unrecognised images (initrds, modules, etc).
Unrecognised images will be left with a NULL image->type, which
image-consuming code can easily check.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the return status from an embedded image to propagate out to the
eventual return status from main(). When running under Linux, this
allows the pass/fail result of unit tests to be observable without
having to visually inspect the console output.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A relatively common user mistake is to attempt to boot an EFI
executable (such as grub.efi) using a BIOS version of iPXE.
Unfortunately there are no signature checks that we can use to
unambiguously identify a PXE NBP, since a PXE NBP is just raw machine
code. We therefore have to accept anything sufficiently small to fit
into base memory as a valid PXE NBP.
We can detect that a file might be an EFI executable by checking for
the initial "MZ" signature bytes. This does not necessarily preclude
the file from also being a PXE NBP (since it would be possible to
create a hybrid binary which acts as both an EFI executable and a PXE
NBP, similar to the way in which wimboot and the Linux kernel are
hybrid binaries which act as both an EFI executable and a bzImage).
If the initial "MZ" signature bytes are present, then attempt to warn
the user by setting the image type to "PXE-NBP (may be EFI?)". We
can't (sensibly) prevent the user from accidentally running an EFI
executable as a PXE NBP, but we can at least make it easier for the
user to identify their mistake.
Inspired-by: Robin Smidsrød <robin@smidsrod.no>
Inspired-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some ProxyDHCP servers and PXE boot servers do not specify a DHCP
server identifier via option 54. We currently work around this in a
variety of ad-hoc ways:
- if a ProxyDHCPACK has no server identifier then we treat it as
having the correct server identifier,
- if a boot server ACK has no server identifier then we use the
packet's source IP address as the server identifier.
Introduce the concept of a DHCP server pseudo-identifier, defined as
being:
- the server identifier (option 54), or
- if there is no server identifier, then the next-server address
(siaddr),
- if there is no server identifier or next-server address, then the
DHCP packet's source IP address.
Use the pseudo-identifier in place of the server identifier when
handling ProxyDHCP and PXE boot server responses.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Infiniband link status change callback ipoib_link_state_changed()
may be called while the IPoIB device is closed, in which case there
will not be an IPoIB queue pair to be joined to the IPv4 broadcast
group. This leads to NULL pointer dereferences in ib_mcast_attach()
and ib_mcast_detach().
Fix by not attempting to join (or leave) the broadcast group unless we
actually have an IPoIB queue pair.
Signed-off-by: Wissam Shoukair <wissams@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite the HTTP core to allow for the addition of arbitrary content
encoding mechanisms, such as PeerDist and gzip.
The core now exposes http_open() which can be used to create requests
with an explicitly selected HTTP method, an optional requested content
range, and an optional request body. A simple wrapper provides the
preexisting behaviour of creating either a GET request or an
application/x-www-form-urlencoded POST request (if the URI includes
parameters).
The HTTP SAN interface is now implemented using the generic block
device translator. Individual blocks are requested using http_open()
to create a range request.
Server connections are now managed via a connection pool; this allows
for multiple requests to the same server (e.g. for SAN blocks) to be
completely unaware of each other. Repeated HTTPS connections to the
same server can reuse a pooled connection, avoiding the per-connection
overhead of establishing a TLS session (which can take several seconds
if using a client certificate).
Support for HTTP SAN booting and for the Basic and Digest
authentication schemes is now optional and can be controlled via the
SANBOOT_PROTO_HTTP, HTTP_AUTH_BASIC, and HTTP_AUTH_DIGEST build
configuration options in config/general.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI platforms may provide a watchdog timer, which will reboot the
machine if an operating system takes more than five minutes to load.
This can cause long-lived iPXE downloads (or interactive shell
sessions) to unexpectedly reboot.
Fix by resetting the watchdog timer every ten seconds while the iPXE
main processing loop continues to run.
Reported-by: Bradley B Williams <bradleybwilliams@swbell.net>
Reported-by: John Clark <john.r.clark.3@gmail.com>
Reported-by: wdriever@gmail.com
Reported-by: Charlie Beima <cbeima@indiana.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for SHA-224, SHA-384, and SHA-512 as digest algorithms in
X.509 certificates, and allow the choice of public-key, cipher, and
digest algorithms to be configured at build time via config/crypto.h.
Originally-implemented-by: Tufan Karadere <tufank@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current implementation handles big-endian 24-bit integers (which
occur in several TLS record types) by treating them as big-endian
32-bit integers which are shifted by 8 bits. This can result in
"Invalid read" errors when running under valgrind, if the 24-bit field
happens to be exactly at the end of an I/O buffer.
Fix by ensuring that we touch only the three bytes which comprise the
24-bit integer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Check for existence of the UART in uart_select(), not just in
uart_init(). This allows uart_select() to refuse to set a non-working
address in uart->base, which in turns means that the serial console
code will not attempt to use a non-existent UART.
Reported-by: Torgeir Wulfsberg <Torgeir.Wulfsberg@kongsberg.com>
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When the ability for iPXE to handle multiple serial ports was added,
the choice was made that the singular serial port referred to by
COMBOOT calls should mean the port used for the serial console. This
unintentionally caused IMAGE_COMBOOT to also enable CONSOLE_SERIAL.
Fix by providing a weak-symbol version of the serial console which
will be used if serial console support was not explicitly enabled.
Reported-by: Torgeir Wulfsberg <Torgeir.Wulfsberg@kongsberg.com>
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We do not set up any kind of virtual addressing before invoking an
ELFBoot image. Reject if the image's program headers indicate that
virtual addresses are not equal to physical addresses.
This avoids problems when loading some RHEL5 kernels, which seem to
include ELFBoot headers using virtual addressing. With this change,
these kernels are no longer detected as ELFBoot, and so may be
(correctly) detected as bzImage instead.
Reported-by: Torgeir.Wulfsberg@kongsberg.com
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow line buffer to accumulate multiple lines, with buffered_line()
returning each freshly-completed line as it is encountered. This
allows buffered lines to be subsequently processed as a group.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
VLAN and 802.11 devices use a network device operations structure that
wraps an underlying structure. For example, the vlan_operations
structure wraps the network device operations structure of the
underlying trunk device. This can cause false positives from the
current implementation of netdev_irq_supported(), which will always
report that VLAN devices support interrupts since it has no visibility
into the support provided by the underlying trunk device.
Fix by allowing network devices to explicitly flag that interrupts are
not supported, despite the presence of an irq() method.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iscsi_tx_done() is missing "break" statements at the end of each case.
(Fortunately, this happens not to cause a bug in practice, since
iscsi_login_request_done() is effectively a no-op when completing a
data-out PDU.)
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the IPv6 concept of "scope ID" (indicating the network device
index) to IPv4 socket addresses, so that IPv4 multicast transmissions
may specify the transmitting network device.
The scope ID is not (currently) exposed via the string representation
of the socket address, since IPv4 does not use the IPv6 concept of
link-local addresses (which could legitimately be specified in a URI).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Redefine various IPv4 address constants and testing macros to avoid
unnecessary byte swapping at runtime, and slightly rename the macros
to prevent code from accidentally using the old definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid using zero as a network device index, so that a zero
sin6_scope_id can be used to mean "unspecified" (rather than
unintentionally meaning "net0").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an IPv6 socket address string specifies a link-local or multicast
address but does not specify the requisite network device name
(e.g. "fe80::69ff:fe50:5845" rather than "fe80::69ff:fe50:5845%net0"),
assume the use of "netX".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove AXTLS headers now that no AXTLS code remains, with many thanks
to the AXTLS project for use of their cryptography code over the past
several years.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the AES implementation from AXTLS with a dedicated iPXE
implementation which is slightly smaller and around 1000% faster.
This implementation has been verified using the existing self-tests
based on the NIST AES test vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generalise the existing support for performing CBC-mode block cipher
tests, and update the code to use okx() for neater reporting of test
results.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
xfer_buffer() uses intf_get_dest_op() to obtain the destination
interface for xfer_deliver(), in order to check that this is the same
interface which provides xfer_buffer(). The return value from
intf_get_dest_op() (which contains the actual method implementing
xfer_deliver()) is not used.
On some gcc versions, this triggers a "value computed is not used"
warning, since the explicit type cast included within the
intf_get_dest_op() macro is treated as a "value computed".
Fix by explicitly casting the result of intf_get_dest_op() to void.
Reported-by: Matthew Helton <mwhelton@gmail.com>
Reported-by: James A. Peltier <jpeltier@sfu.ca>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the cost of implementing object methods which convey no
information beyond the fact that the method has been called.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide profile_custom() as a trivial wrapper around profile_update()
to allow for the use of the profiling infrastructure by code using
timers other than the default profile_timestamp() provider.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an inject_corruption() function that can be used to randomly
corrupt data bytes with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic inject_fault() function that can be used to inject
random faults with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a named configuration for qemu, based on the config.ipxe.general.h
file taken from the current qemu repository and enabling the option to
work around the missing EFI_PXE_BASE_CODE_PROTOCOL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE does not currently provide EFI_PXE_BASE_CODE_PROTOCOL: this
causes failures when chainloading bootloaders such as shim.efi which
assume that this protocol will be present.
Provide the ability to work around these problems via the build
configuration option EFI_DOWNGRADE_UX. If this option is enabled,
then we will not install our usual EFI_LOAD_FILE_PROTOCOL
implementation, thereby allowing the platform firmware to install its
own EFI_PXE_BASE_CODE_PROTOCOL implementation on top of our
EFI_SIMPLE_NETWORK_PROTOCOL handle.
A somewhat major side-effect of this workaround is that almost all
iPXE features will be disabled.
This configuration option will be removed in future when support for
EFI_PXE_BASE_CODE_PROTOCOL is added.
Requested-by: Laszlo Ersek <lersek@redhat.com>
Requested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix the TxBuf value filled in by GetStatus() to report the transmit
buffer address as required by the (now clarified) specification.
Simplify "interrupt" handling in GetStatus() to report only that one
or more packets have been transmitted or received; there is no need to
report one GetStatus() "interrupt" per packet.
Simplify receive handling to dequeue received packets immediately from
the network device into an internal list (thereby avoiding the hacks
previously used to determine when to report new packet arrivals).
Originally-fixed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the high bit of the VGA text attribute byte via the ANSI SGR
parameters 5 ("blink on") and 25 ("blink off").
Note that some video cards (and virtual machines) may display a high
intensity background colour instead of blinking text.
Signed-off-by: Christian Nilsson <nikize@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Multicast MAC addresses will never have REMAC cache entries, and the
corresponding multicast IPoIB MAC address cannot be obtained simply by
issuing an ARP request.
For the trivial volume of multicast packets that we expect to send in
any realistic scenario, the simplest solution is to send them as
broadcasts instead.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently do not wait for a received FIN before exiting to boot a
loaded OS. In the common case of booting from an HTTP server, this
means that the TCP connection is left consuming resources on the
server side: the server will retransmit the FIN several times before
giving up.
Fix by initiating a graceful close of all TCP connections and waiting
(for up to one second) for all connections to finish closing
gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and
for the incoming FIN to have been received and ACKed at least once).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Older, out-of-tree Xen kernel modules (such as those provided with
SuSE Linux Enterprise Server 11) do not clear the leftover "event
pending" bit when opening an event channel. Consequently, no event is
ever delivered to indicate that there is information in the XenStore
ring buffer, and the system hangs shortly after loading the
xen-platform-pci kernel module.
Work around this problem by always waiting for the XenStore event
channel to be signalled, and clearing the event before processing the
received data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only way to map an eIPoIB MAC address (REMAC) to an IPoIB MAC
address is to intercept an incoming ARP request or reply.
If we do not have an REMAC cache entry for a particular destination
MAC address, then we cannot transmit the packet. This can arise in at
least two situations:
- An external program (e.g. a PXE NBP using the UNDI API) may attempt
to transmit to a destination MAC address that has been obtained by
some method other than ARP.
- Memory pressure may have caused REMAC cache entries to be
discarded. This is fairly likely on a busy network, since REMAC
cache entries are created for all received (broadcast) ARP
requests. (We can't sensibly avoid creating these cache entries,
since they are required in order to send an ARP reply, and when we
are being used via the UNDI API we may have no knowledge of which
IP addresses are "ours".)
Attempt to ameliorate the situation by generating a semi-spurious ARP
request whenever we find a missing REMAC cache entry. This will
hopefully trigger an ARP reply, which would then provide us with the
information required to populate the REMAC cache.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with the neighbour cache, discarding an REMAC cache entry is
potentially very disruptive.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid accidentally returning stale packets (e.g. for a previously
attempted network device) by always constructing a fresh DHCP packet.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the link is blocked (e.g. due to a Spanning Tree Protocol port not
yet forwarding packets) then defer DHCP discovery until the link
becomes unblocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A fairly common end-user problem is that the default configuration of
a switch may leave the port in a non-forwarding state for a
substantial length of time (tens of seconds) after link up. This can
cause iPXE to time out and give up attempting to boot.
We cannot force the switch to start forwarding packets sooner, since
any attempt to send a Spanning Tree Protocol bridge PDU may cause the
switch to disable our port (if the switch happens to have the Bridge
PDU Guard feature enabled for the port).
For non-ancient versions of the Spanning Tree Protocol, we can detect
whether or not the port is currently forwarding and use this to inform
the network device core that the link is currently blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When Spanning Tree Protocol (STP) is used, there may be a substantial
delay (tens of seconds) from the time that the link goes up to the
time that the port starts forwarding packets.
Add a generic concept of a "blocked link" (i.e. a link which is up but
which is not expected to communicate successfully), and allow "ifstat"
to indicate when a link is blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some Ethernet framing variants the two-byte protocol field is used
as a length, with the Ethernet header being followed by an IEEE 802.2
LLC header. The first two bytes of the LLC header are the DSAP and
SSAP.
If the received Ethernet packet appears to use this framing, then
interpret the two-byte DSAP and SSAP as being the network-layer
protocol. This allows support for receiving Spanning Tree Protocol
frames (which use an LLC header with {DSAP,SSAP}=0x4242) to be added
without requiring a full LLC protocol layer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The size of the .mrom payload (the second PCI ROM image) is defined in
its PCI header. The code type for the .mrom payload image is
deliberately set to an invalid value (0xff) to ensure that no BIOS
tries to parse anything in the image other than the PCI header.
Since the code type is not set to 0x00 ("Intel x86, PC-AT
compatible"), bytes 0x02-0x17 should not be interpreted by the BIOS as
being in the standard ISA expansion ROM format. In particular, the
byte at offset 0x02 does not represent the length of the ROM image (in
512-byte blocks).
However, some Dell BIOSes seem to erroneously use the byte at offset
0x02 to determine the length of the .mrom payload when walking the
list of PCI ROM images. Since this byte is currently set to zero,
this can lead to the BIOS getting stuck in an infinite loop during
POST. (This problem may not arise if the .mrom payload is the final
image in the ROM, since the BIOS will then have no reason to attempt
to locate the next image.)
One possible workaround would be to put the real payload size in this
byte, but doing so would constrain the .mrom payload size to 128kB
(see commit 8049a52 ("[mromprefix] Allow for .mrom images larger than
128kB") for more details).
Another possible workaround would be to put the real payload size as a
word in bytes 0x02-0x03 (as is done for EFI ROMs). This would not
constrain the .mrom payload size, but a payload size which happened to
be exactly 128kB would result in a zero value in the byte at offset
0x02 and so could still result in infinite loops on BIOSes with this
bug.
We choose to place a fixed value of 0x01 in the byte at offset 0x02.
This should at least prevent the BIOS from getting stuck in an
infinite loop. (The BIOS may walk into the middle of the .mrom
payload, where it will almost certainly not find a valid {0x55,0xaa}
signature or a valid PCIR header, and will therefore hopefully abort
processing.)
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently shrink the TCP window permanently if we are ever forced
(by a low-memory condition) to discard a previously received TCP
packet. This behaviour was intended to reduce the number of
retransmissions in a lossy network, since lost packets might
potentially result in the entire window contents being retransmitted.
Since commit e0fc8fe ("[tcp] Implement support for TCP Selective
Acknowledgements (SACK)") the cost of lost packets has been reduced by
around one order of magnitude, and the reduction in the window size
(which affects the maximum throughput) is now the more significant
cost.
Remove the code which reduces the TCP maximum window size when a
received packet is discarded.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some HP BIOSes (observed with an HP ProLiant m710p Server Cartridge)
have a bug in the implementation of INT 1a,b101: they blithely assume
that real-mode code is able to read from anywhere in the 32-bit memory
space.
This problem affects the call to INT 1a,b101 made from within
pcibios_num_bus() (which uses REAL_CODE() and hence executes in
genuine real mode) but does not affect the call made from within
romprefix.S (since with a PMM BIOS, that call executes in flat real
mode anyway).
Work around the problem by explicitly calling flatten_real_mode()
before invoking INT 1a,b101. This is a rarely-used code path, and so
the extra overhead of emulating instructions in some VM configurations
(see commit 6d4deee ("[librm] Use genuine real mode to accelerate
operation in virtual machines") for more details) is negligible.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Debugged-by: Wissam Shoukair <wissams@mellanox.com>
Debugged-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some Intel Skylake platforms (observed on a prototype Lenovo ThinkPad)
report the list of available USB3 protocol speed ID values as {1,2,3}
but then report a port's speed using ID value 4.
The value 4 happens to be the default value for SuperSpeed (when no
protocol speed ID value list is explicitly defined), and the hardware
seems to function correctly if we simply ignore its protocol speed ID
table and assume that it uses the default values.
Fix by adding a "broken PSI values" quirk for this controller.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gcc 4.8.2 fails to report this erroneous comparison unless assertions
are enabled.
Reported-by: Mary-Ann Johnson <MaryAnn.Johnson@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit dc19e63 ("[build] Construct all-drivers list based on driver
class") accidentally excluded the USB bus drivers from the list of
files parsed in order to create PCI 3.0 device ID lists.
Fix by returning $(DRIVERS) to its previous definition as a list of
all driver files, and use only $(DRIVERS_ipxe) to contain the
filtered list containing only those drivers which we want to include
in the "all-drivers" build.
Reported-by: Mary-Ann Johnson <MaryAnn.Johnson@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The xHCI slot ID is one-based, not zero-based. Fix the length of the
xhci->slot[] array to account for this, and add assertions to check
that the hardware returns a valid slot ID in response to the Enable
Slot command.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Deferral of a packet for neighbour discovery is not really an error.
If we fail to discover a neighbour then the failure will eventually be
reported by the call to neighbour_destroy() when any outstanding I/O
buffers are discarded.
The current behaviour breaks PXE booting on FreeBSD, which seems to
treat the error return from PXENV_UDP_WRITE as a fatal error and so
never proceeds to poll PXENV_UDP_READ (and hence never allows iPXE to
receive the ARP reply and send the deferred UDP packet).
Change neighbour_tx() to return success when deferring a packet. This
fixes interoperability with FreeBSD and removes transient neighbour
cache misses from the "ifstat" error output, while leaving genuine
neighbour discovery failures visible via "ifstat" (once neighbour
discovery times out, or the interface is closed).
Debugged-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When jumbo frames are enabled, the Linux ixgbe physical function
driver will disable the virtual function's receive datapath by
default, and will enable it only if the virtual function negotiates
API version 1.1 (or higher) and explicitly selects an MTU.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several popular public cloud providers do not provide any sensible
mechanism for obtaining debug output from an OS which is failing to
boot. For example, Amazon EC2 provides the "Get System Log" facility,
which occasionally deigns to report a random subset of the characters
emitted via the VM's serial port, but usually returns only a blank
screen. (Amazingly, this is still superior to the debugging
facilities provided by Azure.)
Work around these shortcomings by adding a console type which sends
output to a magically detected raw disk partition, and including such
a partition within any iPXE .usb-format image.
To use this facility:
- build an iPXE .usb image with CONSOLE_INT13 enabled
- boot the cloud VM from this image
- after the boot fails, attach the VM's boot disk to a second VM
- from this second VM, use "less -f -R /dev/sdb3" (or similar) to
view the iPXE output.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtual functions use a mailbox to communicate with the physical
function driver: this covers functionality such as obtaining the MAC
address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Intel virtual function NICs almost work with the use of "legacy"
transmit and receive descriptors (which are backwards compatible right
back to the original Intel Gigabit NICs).
Unfortunately the "TX switching" feature (which allows for VM<->VM
traffic to be looped back within the NIC itself) does not work when a
legacy TX descriptor is used: the packet is instead sent onto the
wire.
Fix by allowing for the use of an "advanced" TX descriptor (containing
exactly the same information as is found in the "legacy" descriptor).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The recorded disconnections (in port->disconnected) will currently be
left uncleared if usb_attached() returns an error (e.g. because there
are no drivers for a particular USB device). This is incorrect
behaviour: the disconnection has been handled and the record should be
cleared until the next physical disconnection is detected (via the CSC
bit).
The problem is masked for EHCI, UHCI, and USB hubs, since these will
report a changed port (via usb_port_changed()) only when the
underlying hardware reports a change. xHCI will call
usb_port_changed() in response to any port status event, at which
point the stale value of port->disconnected will be erroneously acted
upon. This can lead to an endless loop of repeatedly enumerating the
same device when a driverless device is attached to an xHCI root hub
port.
Fix by unconditionally clearing port->disconnected in usb_hotplugged().
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The action of registering a new hub can itself happen in only two
ways: either a new USB hub has been created (in which case we are
already inside a call to usb_hotplug()), or a new root hub has been
created.
In the former case, we do not need to issue a further call to
usb_hotplug(), since the hub's ports will all be marked as changed and
so will be handled after the return from register_usb_hub() anyway.
Calling usb_hotplug() within register_usb_hub() leads to a confusing
order of events, such as:
- root hub port 1 detects a change
- root hub port 2 detects a change
- usb_hotplug() is called
- root hub port 1 finds a USB hub
- usb_hotplug() is called
- this inner call to usb_hotplug() handles root hub port 2
Fix by calling usb_hotplug() only from usb_step() and from
register_usb_bus(). This avoids recursive calls to usb_hotplug() and
ensures that devices are enumerated in the order of detection.
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When USB network card drivers are used, the BIOS' legacy USB
capability is necessarily disabled since there is no way to share the
host controller between the BIOS and iPXE. This currently results in
USB keyboards becoming non-functional in USB-enabled builds of iPXE.
Fix by adding basic support for USB keyboards, enabled by default in
iPXE builds which include USB support.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an EHCI hotplug action results in the controller disowning the
port, it will result in a hotplug action on the corresponding UHCI or
OHCI controller. Allow such hotplug actions to be carried out as part
of the same call to usb_step() or usb_register_bus(), by maintaining a
single central list of changed ports.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB core will currently fail to detect disconnections if a new
device has attached by the time the port is examined in
usb_hotplug().
Fix by recording the fact that a disconnection has taken place
whenever the "connection status changed" (CSC) bit is observed to be
set. (Whether the change represents a disconnection or a
reconnection, it indicates that the port has experienced some time of
being disconnected.)
Note that the time at which a disconnection can be detected varies by
hub type. In particular: root hubs can observe the CSC bit when
polling, and so will record the disconnection before calling
usb_port_changed(), but USB hubs read the port status (and hence the
CSC bit) only during the call to hub_speed(), long after the call to
usb_port_changed().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rename PCI_CLASS() (which constructs a struct pci_class_id) to
PCI_CLASS_ID(), and provide PCI_CLASS() as a macro which constructs
the 24-bit scalar value of a PCI class code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB API currently assumes that host controllers will have
immediate data buffer space available in which to store the setup
packet. This is true for xHCI, partially true for EHCI (which happens
to have 12 bytes of padding in each transfer descriptor due to
alignment requirements), and not true at all for UHCI.
Include the setup packet within the I/O buffer passed to the host
controller's message() method, thereby eliminating the requirement for
host controllers to provide immediate data buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "vram" setting returns the (Base64-encoded) contents of video RAM,
and can be used to capture a screenshot. For example: after running
memtest.0 and encountering an error, the output can be captured and
sent to a remote server for later diagnosis:
#!ipxe
chain -a http://server/memtest.0 && goto ok || goto bad
:bad
params
param errno ${errno}
param vram ${vram}
chain -a http://server/report.php##params
:ok
Inspired-by: Christian Nilsson <nikize@gmail.com>
Originally-implemented-by: Christian Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current API for Base16 (and Base64) encoding requires the caller
to always provide sufficient buffer space. This prevents the use of
the generic encoding/decoding functionality in some situations, such
as in formatting the hex setting types.
Implement a generic hex_encode() (based on the existing
format_hex_setting()), implement base16_encode() and base16_decode()
in terms of the more generic hex_encode() and hex_decode(), and update
all callers to provide the additional buffer length parameter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The settings self-tests include tests for the "ipv6" setting type.
When IPv6 support is not included, this setting type exists (since it
is referred to by some dual-stack code, such as dns.c) but is
non-functional.
Force IPv6 support to be included within a settings self-test build
using an explicit REQUIRE_OBJECT() macro.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This changed in Linux kernel the same way in commit 7067e701
("ath9k_hw: remove confusing logic inversion in an ANI variable") by
Felix Fietkau.
Additionally this fixes "error: logical not is only applied to the
left hand side of comparison" with GCC 5.1.0.
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This fixes "initialization discards 'const' qualifier from pointer
target type" warnings with GCC 5.1.0.
Signed-off-by: Christian Hesse <mail@eworm.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
I218-LM (rev 3) is found in Lenovo Thinkpad X250. The remaining
device IDs are from linux/drivers/net/ethernet/intel/e1000e/hw.h
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB bus drivers (ehci.c and xhci.c) have PCI device ID tables and
hence PCI_ROM() lines, but should probably not be included in the
all-drivers build on this basis, since they do nothing useful unless a
USB network driver is also present.
Fix by constructing the all-drivers list based on the driver class
(i.e. the portion of the source path immediately after "drivers/").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some RTL8169 onboard NICs (observed with a Lenovo ThinkPad 11e),
the EEPROM is not merely not present: any attempt to read from the
non-existent EEPROM will crash and reboot the system.
The equivalent code to read from the EEPROM was removed from the Linux
r8169 driver in 2009 with a comment suggesting that it was similarly
found to be unreliable on some systems.
Fix by accessing the EEPROM only on RTL8139 NICs, and assuming that
the MAC address will always be correctly preset on RTL8169 NICs.
Reported-by: Evan Prohaska <eprohaska@edkey.org>
Tested-by: Evan Prohaska <eprohaska@edkey.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The emulated Intel 82545em in some versions of VMware (observed with
ESXi v5.1) seems to sometimes fail to set the RXT0 bit in the
interrupt cause register (ICR), causing iPXE to stop receiving
packets. Work around this problem (for the 82545em only) by always
polling the receive queue regardless of the state of the ICR.
Reported-by: Slava Bendersky <volga629@networklab.ca>
Tested-by: Slava Bendersky <volga629@networklab.ca>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The assembler on OpenBSD 5.7 seems not to correctly handle the
combinations of .struct and .previous used in unlzma.S, and ends up
complaining about an "attempt to allocate data in absolute section".
Work around this problem by explicitly resetting the section after the
data structure definitions.
Reported-by: Jiri B <jirib@devio.us>
Tested-by: Jiri B <jirib@devio.us>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI v3.0 supports a "device list" which allows the ROM to claim
support for multiple PCI device IDs (but only a single vendor ID).
Add support for building such ROMs by scanning the build target
element list and incorporating any device IDs into the ROM's device
list header. For example:
make bin/8086153a--8086153b.mrom
would build a ROM claiming support for both 8086:153a and 8086:153b.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Entropy gathering via timer ticks is slow under UEFI (of the order of
20-30 seconds on some machines). Use the EFI_RNG_PROTOCOL if
available, to speed up the process of entropy gathering.
Note that some implementations (including EDK2) will fail if we
request fewer than 32 random bytes at a time, and that the RNG
protocol provides no guarantees about the amount of entropy provided
by a call to GetRNG(). We take the (hopefully pessimistic) view that
a 32-byte block returned by GetRNG() will contain at least the 1.3
bits of entropy claimed by min_entropy_per_sample().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At least one NII implementation (in a Microsoft Surface tablet) seems
to fail to report the absence (sic) of TX completions properly. Work
around this by checking for TX completions only when we expect to see
one.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some NII implementations will fail the GET_STATUS operation if we
request the media status. Fix by doing so only if GET_INIT_INFO
reported that media status is supported.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our current behaviour when booting as a ROM is to autoboot only from
devices which are attached via the PCI bus:dev.fn address passed to
the ROM's initialisation vector.
Add a build configuration option AUTOBOOT_ROM_FILTER (enabled by
default) to control this behaviour. This allows for ROMs to be built
which will attempt to boot from any detected device, even if not
attached via the original PCI bus:dev.fn address. (This is
particularly useful when building combined EHCI/xHCI ROMs for USB
network boot, since the BIOS may request a boot via the EHCI
controller but the xHCI driver will reroute the root hub ports to the
xHCI controller.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In theory USB3 ports do not require a reset to enable the port.
Experimentation shows that this is sometimes required, particularly
when rerouting ports from EHCI to xHCI and switching speeds.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Running util/parserom.pl on all source files (637) one by one takes
approximately 35 seconds because of the startup cost of each invocation.
With the utility rewritten to support multiple source files it now takes
approximately 1 second to scan all source files for ROM declarations.
The --exclude-driver and --exclude-driver-class options have been added,
making it possible to skip certain source files from being scanned at all.
In addition --debug option has been added to more easily trace progress.
Finally --help option was added to show usage information.
Signed-off-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We hook the UEFI ExitBootServices() event and use it to trigger a call
to shutdown_boot(). This does not automatically cause drivers to be
disconnected from their devices, since device enumeration is now
handled by the UEFI core rather than by iPXE. (Under the old and
dubiously compatible device model, iPXE used to perform its own device
enumeration and so the call to shutdown_boot() would indeed have
caused drivers to be disconnected.)
Fix by replicating parts of the dummy "EFI root device" from
efiprefix.c to efidrvprefix.c, so that the call to shutdown_boot()
will call efi_driver_disconnect_all().
Originally-fixed-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-512/224 is almost identical to SHA-512, with differing initial
hash values and a truncated output length.
This implementation has been verified using the NIST SHA-512/224 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-512/256 is almost identical to SHA-512, with differing initial
hash values and a truncated output length.
This implementation has been verified using the NIST SHA-512/256 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-384 is almost identical to SHA-512, with differing initial hash
values and a truncated output length.
This implementation has been verified using the NIST SHA-384 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-224 is almost identical to SHA-256, with differing initial hash
values and a truncated output length.
This implementation has been verified using the NIST SHA-224 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update the digest self-tests to use okx(), and centralise concepts and
data shared between tests for multiple algorithms to reduce duplicated
code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
None of the x86_64 builds currently have any way of invoking these
functions. They are included only to avoid introducing unnecessary
architecture-specific dependencies into the self-test suite.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 8ab4b00 ("[libc] Rewrite setjmp() and longjmp()") introduced a
regression in which the saved values of %ebx, %esi, and %edi were all
accidentally restored into %esp. The result is that the second and
subsequent returns from setjmp() would effectively corrupt %ebx, %esi,
%edi, and the stack pointer %esp.
Use of setjmp() and longjmp() is generally discouraged: our only use
occurs as part of the implementation of PXENV_RESTART_TFTP, since the
PXE API effectively mandates its use here. The call to setjmp()
occurs at the start of pxe_start_nbp(), where there are almost
certainly no values held in %ebx, %esi, or %edi. The corruption of
these registers therefore had no visible effect on program execution.
The corruption of %esp would have been visible on return from
pxe_start_nbp(), but there are no known PXE NBPs which first call
PXENV_RESTART_TFTP and subsequently attempt to return to the PXE base
code. The effect on program execution was therefore similar to that
of moving the stack to a pseudo-random location in the 32-bit address
space; this will often allow execution to complete successfully since
there is a high chance that the pseudo-random location will be unused.
The regression therefore went undetected for around one month.
Fix by restoring the correct registers from the saved jmp_buf
structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
xHCI provides a somewhat convoluted mechanism for specifying details
of a transaction translator. Hubs must be marked as such in the
device slot context. The only opportunity to do so is as part of a
Configure Endpoint command, which can be executed only when opening
the hub's interrupt endpoint.
We add a mechanism for host controllers to intercept the opening of
hub devices, providing xHCI with an opportunity to update the internal
device slot structure for the corresponding USB device to indicate
that the device is a hub. We then include the hub-specific details in
the input context whenever any Configure Endpoint command is issued.
When a device is opened, we record the device slot and port for its
transaction translator (if any), and supply these as part of the
Address Device command.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support low-speed and full-speed devices attached to a USB2 hub. Such
devices use a transaction translator (TT) within the USB2 hub, which
asynchronously initiates transactions on the lower-speed bus and
returns the result via a split completion on the high-speed bus.
We make the simplifying assumption that there will never be more than
sixteen active interrupt endpoints behind a single transaction
translator; this assumption allows us to schedule all periodic start
splits in microframe 0 and all periodic split completions in
microframes 2 and 3. (We do not handle isochronous endpoints.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current endpoint reset logic defers the reset until the caller
attempts to enqueue a new transfer to that endpoint. This is
insufficient when dealing with endpoints behind a transaction
translator, since the transaction translator is a resource shared
between multiple endpoints.
We cannot reset the endpoint as part of the completion handling, since
that would introduce recursive calls to usb_poll(). Instead, we
add the endpoint to a list of halted endpoints, and perform the reset
on the next call to usb_step().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The endpoint may already have enqueued TRBs at the time that
xhci_endpoint_reset() is called. Ring the doorbell to resume
processing these TRBs immediately, rather than waiting until the next
call to xhci_endpoint_message() or xhci_endpoint_stream().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several of the USB timeouts were chosen on the principle of "pick an
arbitrary but ridiculously large value, just to be safe". It turns
out that some of the timeouts permitted by the USB specification are
even larger: for example, control transactions are allowed to take up
to five seconds to complete.
Fix up these USB timeout values to match those found in the USB2
specification.
Debugged-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
xHCI (and EHCI) nominally provide a mechanism for releasing ownership
of the host controller back to the BIOS, which can then potentially
restore legacy USB keyboard functionality.
This is a rarely used code path, since most operating systems claim
ownership and never attempt to later return to the BIOS. On some
systems (observed with a Lenovo X1 Carbon), this code path leads to
obscure and interesting bugs: if the xHCI and EHCI controllers are
both claimed and later released back to the BIOS, then a subsequent
call to INT 16,0305 to set the keyboard repeat rate to a non-default
value will lock the system.
Obscure though this sequence of operations may sound, it is exactly
what happens when using iPXE to boot a Linux kernel via a USB network
card. There is old and probably unwanted code in Linux's
arch/x86/boot/main.c which sets the keyboard repeat rate (with the
accompanying comment "Set keyboard repeat rate (why?)"). When booting
Linux via a USB network card on a Lenovo X1 Carbon, the system
therefore locks up immediately after jumping to the kernel's entry
point.
Work around this problem by preventing the release of ownership back
to the BIOS if it is known that we are shutting down to boot an OS.
This should allow legacy USB keyboard functionality to be restored if
the user chooses to exit iPXE, while avoiding the rarely used code
paths (and corresponding BIOS bugs) if the user chooses instead to
boot an OS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When using iPXE as an option ROM for a PCI USB controller (e.g. via
qemu's "-device nec-usb-xhci,romfile=..." syntax), the ROM prefix will
set the PCI bus:dev.fn address of the USB controller as the PCI
autoboot device. This will cause iPXE to fail to boot from any
detected USB network devices, since they will not match the autoboot
bus type (or location).
Fix by allowing the autoboot bus type and location to match against
the network device or any of its parent devices. This allows the
match to succeed for USB network devices attached to the selected PCI
USB controller.
Reported-by: Dan Ellis <Dan.Ellis@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the BIOS fails to gracefully release ownership of the xHCI
controller, we can forcibly claim it by disabling all SMIs via the
USB legacy support control/status register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RX FIFO overflow is almost inevitable since the (usable) USB2 bus
bandwidth is approximately one quarter of the Ethernet bandwidth.
Avoid flooding the console with RX FIFO overflow messages in a
standard debug build.
With TCP SACK implemented, the RX FIFO overflow no longer causes a
catastrophic drop in throughput. Experimentation shows that HTTP
downloads now progress at a fairly smooth 250Mbps, which is around the
maximum speed attainable for a USB2 NIC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TCP Selective Acknowledgement option (specified in RFC2018)
provides a mechanism for the receiver to indicate packets that have
been received out of order (e.g. due to earlier dropped packets).
iPXE often operates in environments in which there is a high
probability of packet loss. For example, the legacy USB keyboard
emulation in some BIOSes involves polling the USB bus from within a
system management interrupt: this introduces an invisible delay of
around 500us which is long enough for around 40 full-length packets to
be dropped. Similarly, almost all 1Gbps USB2 devices will eventually
end up dropping packets because the USB2 bus does not provide enough
bandwidth to sustain a 1Gbps stream, and most devices will not provide
enough internal buffering to hold a full TCP window's worth of
received packets.
Add support for sending TCP Selective Acknowledgements. This provides
the sender with more detailed information about which packets have
been lost, and so allows for a more efficient retransmission strategy.
We include a SACK-permitted option in our SYN packet, since
experimentation shows that at least Linux peers will not include a
SACK-permitted option in the SYN-ACK packet if one was not present in
the initial SYN. (RFC2018 does not seem to mandate this behaviour,
but it is consistent with the approach taken in RFC1323.) We ignore
any received SACK options; this is safe to do since SACK is only ever
advisory and we never have to send non-trivial amounts of data.
Since our TCP receive queue is a candidate for cache discarding under
low memory conditions, we may end up discarding data that has been
reported as received via a SACK option. This is permitted by RFC2018.
We follow the stricture that SACK blocks must not report data which is
no longer held by the receiver: previously-reported blocks are
validated against the current receive queue before being included
within the current SACK block list.
Experiments in a qemu VM using forced packet drops (by setting
NETDEV_DISCARD_RATE to 32) show that implementing SACK improves
throughput by around 400%.
Experiments with a USB2 NIC (an SMSC7500) show that implementing SACK
improves throughput by around 700%, increasing the download rate from
35Mbps up to 250Mbps (which is approximately the usable bandwidth
limit for USB2).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several of the assembly files in arch/i386/prefix were missed by the
automated relicensing tool due to missing licence declarations, code
dating back to the initial git revision, etc. Manual review shows
that these files may be relicensed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This driver is functional but any downloads via a TCP-based protocol
tend to perform poorly. The 1Gbps Ethernet line rate is substantially
higher than the 480Mbps (in practice around 280Mbps) provided by USB2,
and the device has only 32kB of internal buffer memory. Our 256kB TCP
receive window therefore rapidly overflows the RX FIFO, leading to
multiple dropped packets (usually within the same TCP window) and
hence a low overall throughput.
Reducing the TCP window size so that the RX FIFO does not overflow
greatly increases throughput, but is not a general-purpose solution.
Further investigation is required to determine how other OSes
(e.g. Linux) cope with this scenario. It is possible that
implementing TCP SACK would provide some benefit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most devices expose at least the link up/down status via a bit in a
MAC register, since the MAC generally already needs to know whether or
not the link is up. Some devices (e.g. the SMSC75xx USB NIC) expose
this information to software only via the MII registers.
Provide a generic mii_check_link() implementation to check the BMSR
and report the link status via netdev_link_{up,down}().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Microsoft IIS supports only MD5-sess for Digest authentication.
Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE already sends RX notifications to the backend when needed, but
does not set the "feature-rx-notify" flag. As of XenServer 6.5, this
flag is mandatory and omitting it will cause the backend to fail.
Fix by setting the "feature-rx-notify" flag, to inform the backend
that we will send notifications.
Reported-by: Shalom Bhooshi <shalom.bhooshi@citrix.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Restore the original values of XUSB2PR and USB3PSSEN, in case we are
booting an OS with no support for xHCI.
Suggested-by: Dan Ellis <Dan.Ellis@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Intel PCH controllers default to routing USB2 ports to EHCI rather
than xHCI, and default to disabling SuperSpeed connections.
Manipulate the PCI configuration space registers as necessary to
reroute ports and enable SuperSpeed.
Originally-fixed-by: Dan Ellis <Dan.Ellis@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relicense files with kind permission from
Stefan Hajnoczi <stefanha@redhat.com>
alongside the contributors who have already granted such relicensing
permission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite (and relicense) the header files which are included in all
builds of iPXE (including non-Linux builds).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At some point in the past few years, binutils became more aggressive
at removing unused symbols. To function as a symbol requirement, a
relocation record must now be in a section marked with @progbits and
must not be in a section which gets discarded during the link (either
via --gc-sections or via /DISCARD/).
Update REQUIRE_SYMBOL() to generate relocation records meeting these
criteria. To minimise the impact upon the final binary size, we use
existing symbols (specified via the REQUIRING_SYMBOL() macro) as the
relocation targets where possible. We use R_386_NONE or R_X86_64_NONE
relocation types to prevent any actual unwanted relocation taking
place. Where no suitable symbol exists for REQUIRING_SYMBOL() (such
as in config.c), the macro PROVIDE_REQUIRING_SYMBOL() can be used to
generate a one-byte-long symbol to act as the relocation target.
If there are versions of binutils for which this approach fails, then
the fallback will probably involve killing off REQUEST_SYMBOL(),
redefining REQUIRE_SYMBOL() to use the current definition of
REQUEST_SYMBOL(), and postprocessing the linked ELF file with
something along the lines of "nm -u | wc -l" to check that there are
no undefined symbols remaining.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The valgrind headers are not x86-specific; they detect the CPU
architecture and contain inline assembly for multiple architectures.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Unregistering a child settings block can have almost arbitrary
effects, due to the call to apply_settings(). Avoid potentially
dereferencing a stale pointer by using list_first_entry() rather than
list_for_each_entry_safe() to iterate over the list of child settings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The code in list.h was originally taken from the Linux kernel many
years ago, but has been rewritten to the point that no original code
remains, and may therefore be relicensed.
The functions and data structures remain largely API-compatible, to
facilitate the conversion of Linux network drivers to iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
These files cannot be automatically relicensed by util/relicense.pl
since they either contain unusual but trivial contributions (such as
the addition of __nonnull function attributes), or contain lines
dating back to the initial git revision (and so require manual
knowledge of the code's origin).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relicence files with kind permission from the following contributors:
Alex Williamson <alex.williamson@redhat.com>
Eduardo Habkost <ehabkost@redhat.com>
Greg Jednaszewski <jednaszewski@gmail.com>
H. Peter Anvin <hpa@zytor.com>
Marin Hannache <git@mareo.fr>
Robin Smidsrød <robin@smidsrod.no>
Shao Miller <sha0.miller@gmail.com>
Thomas Horsten <thomas@horsten.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relicense files authored by Dan Lynch while working as an employee of
Fen Systems Ltd., with permission from Fen Systems Ltd.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UBDL relicensing tool (util/relicense.pl) is designed to identify
files which may be relicensed under a dual GPL+UBDL licence. It uses
git-blame to identify the author of each line (using the -M and -C
options to track lines moved or copied between files), and relicenses
files for which all authors have given permission.
The relicensing tool will ignore certain types of lines identified by
git-blame:
- empty lines
- comments
- standalone opening or closing braces
- "#include ..."
- "return 0;"
- "return rc;"
- "PCI_ROM(...)"
- "FILE_LICENCE(...)"
These lines either contain no meaningful content (e.g. empty lines),
contain only non-copyrightable facts (e.g. PCI ROM IDs) or are
sufficiently common within the codebase that git-blame is likely to
misattribute their origin (e.g. "return 0").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the text for the Unmodified Binary Distribution Licence. This
Licence allows for the distribution of unmodified binaries built from
publicly available source code, without imposing the obligations of
the GNU General Public License upon anyone who chooses to distribute
only the unmodified binaries built from that source code. See the
licence text for the precise terms and conditions.
Add the licence GPL2_OR_LATER_OR_UBDL to the set of licences which can
be declared using FILE_LICENCE(), and add the corresponding support to
licence.pl.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the standard warranty disclaimer and Free Software Foundation
address paragraphs to the licence text where these are not currently
present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The code in lzma_literal() checks to see if we are at the start of the
compressed input data in order to determine whether or not a most
recent output byte exists. This check is incorrect, since
initialisation of the decompressor will always consume the first five
bytes of the compressed input data.
Fix by instead checking whether or not we are at the start of the
output data stream. This is, in any case, a more logical check.
This issue was masked during development and testing since virtual
machines tend to zero the initial contents of RAM; the spuriously-read
"most recent output byte" is therefore likely to already be a zero
when running in a virtual machine.
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The 0xe9 debug port exists only on virtual machines. Provide an
option to print debug output on the BIOS console, to allow for
debugging on real hardware.
Note that this option can be used only if the decompressor is called
in flat real mode; the easiest way to achieve this is to build with
DEBUG=libprefix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the decompressor the option of generating debugging output via
the BIOS console by calling it in flat real mode (rather than 16-bit
protected mode) when libprefix.S is built with debugging enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
LZMA performs an extra normalisation after decompression is complete,
which does not affect the output but may consume an extra byte from
the input (and so may affect which byte is identified as being the
start of the next block).
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE uses DHCP timeouts loosely based on values recommended by the
specification, but often abbreviated to reduce timeouts for reliable
and/or simple network topologies. Extract the DHCP timing parameters
to config/dhcp.h and document them. The resulting default iPXE
behavior is exactly the same, but downstreams are now afforded the
opportunity to implement spec-compliant behavior via config file
overrides.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
LZMA provides significantly better compression (by ~15%) than the
current NRV2B algorithm.
We use a raw LZMA stream (aka LZMA1) to avoid the need for code to
parse the LZMA2 block headers. We use parameters {lc=2,lp=0,pb=0} to
reduce the stack space required by the decompressor to acceptable
levels (around 8kB). Using lc=3 or pb=2 would give marginally better
compression, but at the cost of substantially increasing the required
stack space.
The build process now requires the liblzma headers to be present on
the build system, since we do not include a copy of an LZMA compressor
within the iPXE source tree. The decompressor is written from scratch
(based on XZ Embedded) and is entirely self-contained within the
iPXE source.
The branch-call-jump (BCJ) filter used to improve the compressibility
is specific to iPXE. We choose not to use liblzma's built-in BCJ
filter since the algorithm is complex and undocumented. Our BCJ
filter achieves approximately the same results (on typical iPXE
binaries) with a substantially simpler algorithm.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some decompression algorithms (e.g. LZMA) require large amounts of
temporary stack space, which may not be made available by all
prefixes. Use .bss16 as a temporary stack for the duration of the
calls to install_block (switching back to the external stack before we
start making calls into code which might access variables in .bss16),
and allow the decompressor to define a global symbol to force a
minimum value on the size of .bss16.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Other hypervisors (e.g. KVM) may provide an unusable subset of the
Hyper-V features, and our attempts to use these non-existent features
cause the guest to reboot.
Fix by explicitly checking for the Hyper-V features that we use.
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The implementation of strtoul() has a partially unknown provenance.
Rewrite this code to avoid potential licensing uncertainty.
Since we now use -ffunction-sections, there is no need to place
strtoull() in a separate file from strtoul().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The implementation of inet_aton() has an unknown provenance. Rewrite
this code to avoid potential licensing uncertainty.
Also move the code from core/misc.c to its logical home in net/ipv4.c,
and add a few extra test cases.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a command times out, abort it (via the Command Abort bit in the
Command Ring Control Register) so that subsequent commands may execute
as expected.
This improves robustness when a device fails to respond to the Set
Address command, since the subsequent Disable Slot command will now
succeed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the Disable Slot command fails then the hardware may continue to
write to the slot context. Leak the memory used by the slot context
to avoid future memory corruption.
This situation has been observed in practice when a Set Address
command fails, causing the command ring to become temporarily
unresponsive.
Note that there is no need to similarly leak memory on the failure
path in xhci_device_open(), since in the event of a failure the
hardware is never informed of the slot context address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The 8254 timer code (used to implement udelay()) has an unknown
provenance. Rewrite this code to avoid potential licensing
uncertainty.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with memcpy(), we can reduce the code size (by an average of 0.2%)
by giving the compiler more visibility into what memset() is doing,
and by avoiding the "rep" prefix on short fixed-length sequences of
string operations.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the C library string functions have an unknown provenance.
Reimplement all such functions to avoid potential licensing
uncertainty.
Remove the inline-assembler versions of strlen(), memswap(), and
strncmp(); these save a minimal amount of space (around 40 bytes in
total) and are not performance-critical.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic framework for allocating, refilling, and optionally
recycling I/O buffers used by bulk IN and interrupt endpoints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hardened versions of gcc default to building position-independent
code, which breaks our i386 build. Our build process therefore
detects such platforms and automatically adds "-fno-PIE -nopie" to the
gcc command line.
On x86_64, we choose to build position-independent code (in order to
reduce the final binary size and, in particular, the number of
relocations required for UEFI binaries). The workaround therefore
breaks the build process for x86_64 binaries on such platforms.
Fix by moving the workaround to the i386-specific portion of the
Makefile.
Reported-by: Jan Kundrát <jkt@kde.org>
Debugged-by: Jan Kundrát <jkt@kde.org>
Debugged-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI binaries may be relocated to any location within the 64-bit
address space. We compile as position-independent code with hidden
visibility, which should force all relocation records to be either
PC-relative (in which case no PE relocations are required) or full
64-bit relocations. There should be no R_X86_64_32 relocation
records, since that would imply an invalid assumption that code could
not be relocated above 4GB.
Remove support for R_X86_64_32 relocation records from util/elf2efi.c,
so that any such records result in a build failure rather than a
potential runtime failure.
Reported-by: Jan Kundrát <jkt@kde.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building hvmloader for Xen tools the iPXE objects are also linked
into the binary. Unfortunately the linker will place them in the
order found in the archive. Since this order is random the resulting
hvmloader binary differs when it was built from identical sources but
on different build hosts. To help with creating a reproducible binary
the elements in blib.a must simply be sorted before passing them to
$(AR).
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit a60f2dd ("[usb] Try multiple USB device configurations")
changed the behaviour of register_usb() such that if no drivers are
found then the device will be closed and the memory used will be
freed.
If a port status change subsequently occurs while the device is still
physically attached, then usb_hotplug() will see this as a new device
having been attached, since there is no device recorded as being
currently attached to the port. This can lead to spurious hotplug
events (or even endless loops of hotplug events, if the process of
opening and closing the device happens to generate a port status
change).
Fix by using a separate flag to indicate that a device is physically
attached (even if we have no corresponding struct usb_device).
Reported-by: Dan Ellis <Dan.Ellis@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use PRODUCT_SHORT_NAME instead of a hardcoded "iPXE" for strings which
are typically shown in the user interface.
Note that this only allows for customisation of the user interface.
Where the "iPXE" string serves a technical purpose (such as in the
HTTP User-Agent), the string cannot be customised.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some xHCI controllers (observed with a Renesas Electronics PCIe USB3
card) seem to require a delay after forcing the link state of USB3
ports to RxDetect. Omitting this delay causes strange behaviour
including system lockups.
Add an unconditional 20ms delay after writing the port link states.
This seems to be sufficient to avoid the problem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some USB endpoints require that a short packet be used to terminate
transfers, since they have no other way to determine message
boundaries. If the message length happens to be an exact multiple of
the USB packet size, then this requires the use of an additional
zero-length packet.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
USB Communications Device Class devices may use a union functional
descriptor to group several interfaces into a function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Iterate over a USB device's available configurations until we find one
for which we have working drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some protocols (such as ARP) may modify the received packet and re-use
the same I/O buffer for transmission of a reply. To allow this,
reserve sufficient headroom at the start of each received packet
buffer for our transmit datapath headers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some devices return multiple packets in a single poll. Handle such
devices gracefully by enqueueing received PXE UDP packets (along with
a pseudo-header to hold the IPv4 addresses and port numbers) and
dequeueing them on subsequent calls to PXENV_UDP_READ.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fetching the TFTP file size is currently implemented via a custom
"tftpsize://" protocol hack. Generalise this approach to instead
close the TFTP connection whenever the parent data-transfer interface
is closed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some devices have a very small number of internal buffers, and rely on
being able to pack multiple packets into each buffer. Using 2048-byte
buffers on such devices produces throughput of around 100Mbps. Using
a small number of much larger buffers (e.g. 32kB) increases the
throughput to around 780Mbps. (The full 1Gbps is not reached because
the high RTT induced by the use of multi-packet buffers causes us to
saturate our 256kB TCP window.)
Since allocation of large buffers is very likely to fail, allocate the
buffer set only once when the device is opened and recycle buffers
immediately after use. Received data is now always copied to
per-packet buffers.
If allocation of large buffers fails, fall back to allocating a larger
number of smaller buffers. This will give reduced performance, but
the device will at least still be functional.
Share code between the interrupt and bulk IN endpoint handlers, since
the buffer handling is now very similar.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow drivers to specify a supported PCI class code. To save space in
the final binary, make this an attribute of the driver rather than an
attribute of a PCI device ID list entry.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We require the ability to disconnect from and reconnect to VMBus; if
we don't have this then there is no (viable) way for a loaded
operating system to continue to use any VMBus devices. (There is also
a small but non-zero risk that the host will continue to write to our
interrupt and monitor pages, since the VMBUS_UNLOAD message in earlier
versions is essentially a no-op.)
This requires us to ensure that the host supports protocol version 3.0
(VMBUS_VERSION_WIN8_1). However, we can't actually _use_ protocol
version 3.0, since doing so causes an iSCSI-booted Windows Server 2012
R2 VM to crash due to a NULL pointer dereference in vmbus.sys.
To work around this problem, we first ensure that we can connect using
protocol v3.0, then disconnect and reconnect using the oldest known
protocol.
This deliberately prevents the use of the iPXE native Hyper-V drivers
on older versions of Hyper-V, where we could use our drivers but in so
doing would break the loaded operating system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG with a
status code of 0x4002006. This status code does not appear to be
documented anywhere within the sphere of human knowledge.
Explicitly ignore this status code in order to avoid unnecessarily
cluttering the display when RNDIS debugging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The (undocumented) VMBus protocol seems to allow for transfer
page-based packets where the data payload is split into an arbitrary
set of ranges within the transfer page set.
The RNDIS protocol includes a length field within the header of each
message, and it is known from observation that multiple RNDIS messages
can be concatenated into a single VMBus message.
iPXE currently assumes that the transfer page range boundaries are
entirely arbitrary, and uses the RNDIS header length to determine the
RNDIS message boundaries.
Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG for an
undocumented and unknown status code (0x40020006) with a malformed
RNDIS header length: the length does not cover the StatusBuffer
portion of the message. This causes iPXE to report a malformed RNDIS
message and to discard any further RNDIS messages within the same
VMBus message.
The Linux Hyper-V driver assumes that the transfer page range
boundaries correspond to RNDIS message boundaries, and so does not
notice the malformed length field in the RNDIS header.
Match the behaviour of the Linux Hyper-V driver: assume that the
transfer page range boundaries correspond to the RNDIS message
boundaries and ignore the RNDIS header length. This avoids triggering
the "malformed packet" error and also avoids unnecessary data copying:
since we now have one I/O buffer per RNDIS message, there is no longer
any need to use iob_split().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Empirical observation suggests that 32 is a sensible size to minimise
the number of deferred packet transmissions without overflowing the
VMBus transmit ring buffer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for elision of transmitted TCP ACKs by handling all received
VMBus messages in each network device poll operation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On Windows Server 2012 R2, closing and reopening the device will
sometimes result in a non-functional RX datapath. The root cause is
unknown. Clearing the receive filter before closing the device seems
to fix the problem.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On Windows Server 2012 R2, the receive buffer teardown completion
message seems to occasionally be deferred until after the VMBus
channel has been closed. This happens even if there are no packets
currently in the receive buffer.
Work around this problem by separating the revocation and teardown of
the receive buffer, and deferring the teardown until after the VMBus
channel has been closed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Hyper-V RNDIS implementation on Windows Server 2012 R2 requires
that we send an explicit RNDIS initialisation message in order to get
a working RX datapath.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RNDIS devices may provide multiple packets encapsulated into a single
message. Provide an API to allow the RNDIS driver to split an I/O
buffer into smaller portions.
The current implementation will always copy the underlying data,
rather than splitting the buffer in situ.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Check the integrity of the free memory block list before and after any
modifications to the list. We check that certain invariants are
preserved:
- the list is a well-formed doubly linked list
- all blocks are at least MIN_MEMBLOCK_SIZE
- no block extends beyond the end of our address space
- blocks remain sorted in ascending order of address
- no blocks are adjacent (i.e. any adjacent blocks have been merged)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently compare the entirety of the KeyHash object (including the
ASN.1 tag and length byte) against the raw SHA-1 hash of the
certificate's public key. This causes OCSP validation to fail for any
responses which identify the responder by key hash rather than by
name, and hence prevents the use of X.509 certificates where any
certificate in the chain has an OCSP responder which chooses to
identify itself via its key hash.
Fix by adding the missing asn1_enter() required to enter the ASN.1
octet string containing the key hash.
Also add a corresponding test case including an OCSP response where
the responder is identified by key hash, to ensure that this
functionality cannot be broken in future.
Debugged-by: Brian Rak <brak@gameservers.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The i350 (and possibly other Intel NICs) have a non-trivial
correspondence between the PCI function number and the external
physical port number. For example, the i350 has a "LAN Function Sel"
bit within the EEPROM which can invert the mapping so that function 0
becomes port 3, function 1 becomes port 2, etc.
Unfortunately the MAC addresses within the EEPROM are indexed by
physical port number rather than PCI function number. The end result
is that when anything other than the default mapping is used, iPXE
will use the wrong address as the base MAC address.
Fix by using the autoloaded MAC address if it is valid, and falling
back to reading the MAC address directly from the EEPROM only if no
autoloaded address is available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the "-c <count>" option to the "ping" command, allowing for
automatic termination after a specified number of packets.
When a number of packets is specified:
- if a serious error (i.e. length mismatch or content mismatch)
occurs, then the ping will be immediately terminated with the relevant
status code;
- if at least one response is received successfully, and all errors
are non-serious (i.e. timeouts or out-of-sequence responses), then
the ping will be terminated after the final response (or timeout)
with a success status;
- if no responses are received successfully, then the ping will be
terminated after the final timeout with ETIMEDOUT.
If no number of packets is specified, then the ping will continue
until manually interrupted.
Originally-implemented-by: Cedric Levasseur <cyr-ius@ipocus.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
End users almost certainly don't care whether the underlying interface
is SNP or NII/UNDI. Try to minimise surprise and unnecessary
documentation by including the NII driver whenever the SNP driver is
requested.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE itself exposes a dummy NII protocol with no UNDI. Avoid
potentially dereferencing a NULL pointer by checking for a non-zero
UNDI address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI network drivers provide a software UNDI interface which is
exposed via the Network Interface Identifier Protocol (NII), rather
than providing a Simple Network Protocol (SNP).
The UEFI platform firmware will usually include the SnpDxe driver,
which attaches to NII and provides an SNP interface. The SNP
interface is usually provided on the same handle as the underlying NII
device. This causes problems for our EFI driver model: when
efi_driver_connect() detaches existing drivers from the handle it will
cause the SNP interface to be uninstalled, and so our SNP driver will
not be able to attach to the handle. The platform firmware will
eventually reattach the SnpDxe driver and may attach us to the SNP
handle, but we have no way to prevent other drivers from attaching
first.
Fix by providing a driver which can attach directly to the NII
protocol, using the software UNDI interface to drive the network
device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The snpnet driver uses netdev_tx_defer() and so must ensure that space
in the (single-entry) transmit descriptor ring is freed up before
calling netdev_tx_complete().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EDK2 codebase uses -malign-double for 32-bit builds, which causes
64-bit integers to be naturally aligned. This affects the layout of
some structures (including EFI_BLOCK_IO_MEDIA).
This mirrors wimboot commit 7b8f39d ("[build] Fix building of 32-bit
UEFI version").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit 03f0c23 ("[ipoib] Expose Ethernet-compatible eIPoIB
link-layer addresses and headers"), all link layers have used
addresses which fit within the DHCP chaddr field. The dhcp_chaddr()
function was therefore made obsolete by this commit, but was
accidentally left present (though unused) in the source code.
Remove the dhcp_chaddr() function and the only remaining use of it,
unnecessarily introduced in commit 08bcc0f ("[dhcp] Check for matching
chaddr in received DHCP packets").
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On large networks a DHCP XID collision is possible. Fix by explicitly
checking the chaddr in received DHCP packets.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI platforms will fail the call to LoadImage() with
EFI_INVALID_PARAMETER if we do not provide a device path (even though
we are providing a non-NULL SourceBuffer).
Fix by providing an empty device path for the call to LoadImage() in
efi_image_probe().
The call to LoadImage() in efi_image_exec() already constructs and
provides a device path (based on the most recently opened SNP device),
and so does not require this fix.
Reported-by: NICOLAS CATTIE <nicolas.cattie@mpsa.com>
Tested-by: NICOLAS CATTIE <nicolas.cattie@mpsa.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the ID for the LM variant and differentiate it from the I217-V.
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add utility for constructing EFI fat binaries (dual 32/64-bit
binaries, usable only on Apple EFI systems).
This utility is not part of the standard build process. To use it:
make util/efifatbin bin-i386-efi/ipxe.efi bin-x86_64-efi/ipxe.efi
and then
./util/efifatbin bin-*-efi/ipxe.efi fat-ipxe.efi
Requested-by: Brandon Penglase <bpenglase-ipxe@spaceservices.net>
Tested-by: Brandon Penglase <bpenglase-ipxe@spaceservices.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow a straightforward "make clean" or "make veryclean" to apply to
all binary directories (using the shell pattern "bin{,-*}").
Individual binary directories can be cleaned using e.g.
make bin clean
make bin-x86_64-efi clean
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently require information about the underlying PCI device to
populate the snpnet device's name and description. If the underlying
device is not a PCI device, this will fail and prevent the device from
being registered.
Fix by falling back to populating the device description with
information based on the EFI handle, if no PCI device information is
available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI systems (observed with a Hyper-V virtual machine) do not
provide EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL. Make this an optional
protocol (and fail any attempts to access PCI configuration space via
the root bridge if the protocol is missing).
Reported-by: Colin Blacker <Colin.Blacker@computerplanet.co.uk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Under UEFI, reads from PCI configuration space may fail. If this
happens, we should return all-ones (which will mimic the behaviour of
an absent PCI device).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some systems will install a child of the SNP device and use this as
our loaded image's device handle, duplicating the installation of the
underlying SNP protocol onto the child device handle. On such
systems, we want to end up driving the parent device (and
disconnecting any other drivers, such as MNP, which may be attached to
the parent device).
Fix by recording the SNP protocol instance at initialisation time, and
using this to match against device handles (rather than simply
comparing the handles themselves).
Reported-by: Jarrod Johnson <jarrod.b.johnson@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Propagate our modified EFI system table to any images loaded by the
image that we wrap, thereby allowing us to observe boot services calls
made by all subsequent EFI images.
Also show details of intercepted ExitBootServices() calls. When
wrapping is used, exiting boot services will almost certainly fail,
but this at least allows us to see when it happens.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The .mrom payload has a code type of 0xff and so the initialisation
length field (single byte at offset 0x02) does not need to be
present. Use only the PCI header's image length field, which allows
the .mrom payload to be up to 32MB in size.
Inspired-by: Swift Geek <swiftgeek@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
mromprefix.S currently uses the initialisation length field (single
byte at offset 0x02) to determine the length of a ROM image within a
multi-image ROM BAR. For PCI ROM images with a code type other than
0, the initialisation length field may not be present.
Fix by using the PCI header's image length field instead.
Inspired-by: Swift Geek <swiftgeek@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Option::ROM currently uses the initialisation length field (single
byte at offset 0x02) to determine the length of a ROM image within a
multi-image ROM file. For PCI ROM images with a code type other than
0, the initialisation length field may not be present.
Fix by using the PCI header's image length field instead. Note that
this does not prevent us from correctly handling ISA ROMs, since ISA
ROMs do not support multiple images within a single ROM BAR anyway.
Inspired-by: Swift Geek <swiftgeek@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
With extremely unlucky timing, it is possible to interrupt a build and
cause make to delete config/named.h (and possibly any local
configuration headers).
Mark config/named.h and all local configuration headers as .PRECIOUS
to prevent make from ever deleting them.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The build process has for a long time assumed that every ROM is a PCI
ROM, and will always include the PCI header and PCI-related
functionality (such as checking the PCI BIOS version, including the
PCI bus:dev.fn address within the ROM product name string, etc.).
While real ISA cards are no longer in use, some virtualisation
environments (notably VirtualBox) have support only for ISA ROMs.
This can cause problems: in particular, VirtualBox will call our
initialisation entry point with random garbage in %ax, which we then
treat as the PCI bus:dev.fn address of the autoboot device: this
generally prevents the default boot sequence from using any network
devices.
Create .isarom and .pcirom prefixes which can be used to explicitly
specify the type of ROM to be created. (Note that the .mrom prefix
always implies a PCI ROM, since the .mrom mechanism relies on
reconfiguring PCI BARs.)
Make .rom a magic prefix which will automatically select the
appropriate PCI or ISA ROM prefix for ROMs defined via a PCI_ROM() or
ISA_ROM() macro. To maintain backwards compatibility, we default to
building a PCI ROM for anything which is not directly derived from a
PCI_ROM() or ISA_ROM() macro (e.g. bin/intel.rom).
Add a selection of targets to "make everything" to ensure that the
(relatively obscure) ISA ROM build process is included within the
per-commit QA checks.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since some PnP BIOSes fail to set %es:di to point to the PnP signature
on entry, we identify a PnP BIOS by scanning through the top 64kB of
base memory looking for the PnP structure. We therefore don't
actually use the values of %es:di provided to the initialisation entry
point, and so there is no need to preserve them.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
ICH8 devices have an errata which requires us to reconfigure the
packet buffer size (PBS) register, and correspondingly adjust the
packet buffer allocation (PBA) register. The "Intel I/O Controller
Hub ICH8/9/10 and 82566/82567/82562V Software Developer's Manual"
notes for the PBS register that:
10.4.20 Packet Buffer Size - PBS (01008h; R/W)
Note: The default setting of this register is 20 KB and is
incorrect. This register must be programmed to 16 KB.
Initial value: 0014h
0018h (ICH9/ICH10)
It is unclear from this comment precisely which devices require the
workaround to be applied. We currently attempt to err on the side of
caution: if we detect an initial value of either 0x14 or 0x18 then the
workaround will be applied. If the workaround is applied
unnecessarily, then the effect should be just that we use less than
the full amount of the available packet buffer memory.
Unfortunately this approach does not play nicely with other device
drivers. For example, the Linux e1000e driver will rewrite PBA while
assuming that PBS still contains the default value, which can result
in inconsistent values between the two registers, and a corresponding
inability to transmit or receive packets. Even more unfortunately,
the contents of PBS and PBA are not reset by anything less than a
power cycle, meaning that this error condition will survive a hardware
reset.
The Linux driver (written and maintained by Intel) applies the PBS/PBA
errata workaround only for devices in the ICH8 family, identified via
the PCI device ID. Adopt a similar approach, using the PCI_ROM()
driver data field to indicate when the workaround is required.
Reported-by: Donald Bindner <dbindner@truman.edu>
Debugged-by: Donald Bindner <dbindner@truman.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow named configurations to be specified via the CONFIG=... build
parameter. For headers in config/*.h which support named
configurations, the following files will be included when building
with CONFIG=<name>:
- config/defaults/<platform>.h (e.g. config/defaults/pcbios.h)
- config/<header>.h
- config/<name>/<header>.h (only if the directory config/<name> exists)
- config/local/<header>.h (autocreated if necessary)
- config/local/<name>/<header>.h (autocreated if necessary)
This mechanism allows for predefined named configurations to be
checked in to the source tree, as a directory config/<name> containing
all of the required header files.
The mechanism also allows for users to define multiple local
configurations, by creating header files in the directory
config/local/<name>.
Note that the config/*.h files which are used only to configure
internal iPXE APIs (e.g. config/ioapi.h) cannot be modified via a
named configuration. This avoids rebuilding the entire iPXE codebase
whenever switching to a different named configuration.
Inspired-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Under some circumstances (e.g. if iPXE itself is booted via iSCSI, or
after an unclean reboot), the backend may not be in the expected
InitWait state when iPXE starts up.
There is no generic reset mechanism for Xenbus devices. Recent
versions of xen-netback will gracefully perform all of the required
steps if the frontend sets its state to Initialising. Older versions
(such as that found in XenServer 6.2.0) require the frontend to
transition through Closed before reaching Initialising.
Add a reset mechanism for netfront devices which does the following:
- read current backend state
- if backend state is anything other than InitWait, then set the
frontend state to Closed and wait for the backend to also reach
Closed
- set the frontend state to Initialising and wait for the backend to
reach InitWait.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using version 1 grant tables limits guests to using 16TB of grantable
RAM, and prevents the use of subpage grants. Some versions of the Xen
hypervisor refuse to allow the grant table version to be set after the
first grant references have been created, so the loaded operating
system may be stuck with whatever choice we make here. We therefore
currently use version 2 grant tables, since they give the most
flexibility to the loaded OS.
Current versions (7.2.0) of the Windows PV drivers have no support for
version 2 grant tables, and will merrily create version 1 entries in
what the hypervisor believes to be a version 2 table. This causes
some confusion.
Avoid this problem by attempting to use version 1 tables, since
otherwise we may render Windows unable to boot.
Play nicely with other potential bootloaders by accepting either
version 1 or version 2 grant tables (if we are unable to set our
requested version).
Note that the use of version 1 tables on a 64-bit system introduces a
possible failure path in which a frame number cannot fit into the
32-bit field within the v1 structure. This in turn introduces
additional failure paths into netfront_transmit() and
netfront_refill_rx().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At some point during XenServer development history, the Windows PV
drivers changed to using a PCI device ID of 5853:0002 rather than
5853:0001. Current (7.2.0) drivers will bind to either 5853:0001 or
5853:0002, and the general approach taken by the world at large
(including Amazon EC2) seems to be to use only 5853:0001.
However, the current version of XenServer (6.2.0) will create the
platform device as 5853:0002 (via the platform:device_id VM parameter)
for any VMs created using the built-in templates for Windows Vista or
later.
Accept either PCI ID, since the underlying device is identical.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The behavior observed in the Apple EFI (1.10) RecieveFilters() call
is:
- failure if any of the PROMISCUOUS or MULTICAST filters are
included
- success if only UNICAST is included, however the result is
UNICAST|BROADCAST
- success if only UNICAST and BROADCAST are included
- if UNICAST, or UNICAST|BROADCAST are used, but the previous call
tried (and failed) to set UNICAST|BROADCAST|MULTICAST, then the
result is UNICAST|BROADCAST|MULTICAST
Work around this apparently broken SNP implementation by trying
RecieveFilterMask, then falling back to UNICAST|BROADCAST|MULTICAST,
then UNICAST|BROADCAST, and finally UNICAST.
Modified-by: Michael Brown <mcb30@ipxe.org>
Tested-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EFI 1.10 systems (observed on an Apple iMac) do not allow us to
open the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_BY_DRIVER and so we cannot maintain a safe,
long-lived pointer to the device path. Work around this by instead
opening the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_GET_PROTOCOL whenever we need to use it.
Debugged-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
According to the UEFI specification, the MCastFilter parameter (which
we currently pass as NULL, along with a zero MCastFilterCnt) is
optional only if ResetMCastFilter is true.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ComponentName and ComponentName2 protocols differ only in the
standard which is used for language name codes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Try very hard to avoid ever doing something invalid while attempting
to generate a debug message.
Debugged-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Dump the existing openers of a protocol whenever we are unable to open
a protocol using attributes of BY_DEVICE, EXCLUSIVE, or
BY_CHILD_CONTROLLER.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
efi_file_install() and efi_download_install() are both used to install
onto existing handles. There is therefore no need to allow for each
of their calls to InstallMultipleProtocolInterfaces() to create a new
handle.
By passing the handle directly (rather than a pointer to the handle),
we avoid potential confusion (and erroneous debug message colours).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI headers define EFI_HANDLE as a void pointer, which renders
type checking on anything dealing with EFI handles somewhat useless.
Work around this bizarre sabotage attempt by redefining EFI_HANDLE as
a pointer to an anonymous structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using efi_devpath_text() is marginally more efficient if we already
have the device path protocol available, but the mild increase in
efficiency is not worth compromising the clarity of the pattern:
DBGC ( device, "THING %p %s ...", device, efi_handle_name ( device ) );
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a function efi_handle_name() (as a generalisation of
efi_handle_devpath_text()) which tries various methods to produce a
human-readable name for an EFI handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
HII seems to fail on several systems. Since it is non-essential,
treat HII problems as non-fatal.
Debugged-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reject network devices which appear to be duplicates of those already
available via a different underlying hardware device. On a Xen PV-HVM
system, this allows us to filter out the emulated PCI NICs (which
would otherwise appear alongside the netfront NICs).
Note that we cannot use the Xen facility to "unplug" the emulated PCI
NICs, since there is no guarantee that the OS we subsequently load
will have a native netfront driver.
We permit devices with the same MAC address if they are attached to
the same underlying hardware device (e.g. VLAN devices).
Inspired-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EFI 1.10 implementations (observed with a mid-2011 iMac) seem to
fail to fill in the DeviceHandle for our loaded images. It is
plausible that these implementations fill in the DeviceHandle only if
loading the image from a device path (rather than directly from a
memory buffer).
Work around this problem by filling in DeviceHandle if the firmware
leaves it empty.
We cannot sensibly fill in FilePath, because we have no way of knowing
whether or not the firmware will treat this as a pointer to be freed
when the image returns.
Reported-by: Curtis Larsen <larsen@dixie.edu>
Tested-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the StartImage() call returns with no error, then the image must
have been started and returned successfully. It either unloaded
itself, or it intended to remain loaded (e.g. it was a driver). We
therefore do not unload successful images.
If there was an error, we attempt to unload the image. This may not
work. In particular, there is no way to tell whether an error
returned from StartImage() was due to being unable to start the image
(in which case we probably should call UnloadImage()), or due to the
image itself returning an error (in which case we probably should not
call UnloadImage()). We therefore ignore any failures from the
UnloadImage() call itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently treat network devices as available for use via the SNP
API only if RX queue processing has been frozen. (This is similar in
spirit to the way that RX queue processing is frozen for the network
device currently exposed via the PXE API.)
The default state of a freshly created network device is for the RX
queue to not be frozen, and thus to be unavailable for use via SNP.
This causes problems when devices are added through code paths other
than _efidrv_start() (which explicitly releases devices for use via
SNP).
We don't actually need to freeze RX queue processing, since calls via
the SNP API will always use netdev_poll() rather than net_poll(), and
so will never trigger the RX queue processing code path anyway.
We can therefore simplify the code to use a single global flag to
indicate whether network devices are claimed for use by iPXE or
available for use via SNP. Using a global flag allows the default
state for dynamically created network devices to behave sensibly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for Xen PV-HVM domains (detected via the Xen
platform PCI device with IDs 5853:0001), including support for
accessing configuration via XenStore and enumerating devices via
XenBus.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Import selected headers from the xen/include/public directory of the
Xen repository at git://xenbits.xen.org/xen.git
The script ./include/xen/import.pl can be used to automatically import
any required headers and their dependencies (in a similar fashion to
./include/ipxe/efi/import.pl). Trailing whitespace is stripped and an
appropriate FILE_LICENCE declaration is added to each header file.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 24bbaf6 ("[lotest] Allow loopback testing on shared networks")
introduced a regression in which loopback testing packets would be
accepted from any network device. This produces unexpected results,
such as VLAN loopback testing succeeding even when incorrectly using
the underlying trunk device as either transmitter or receiver.
Fix by discarding any loopback testing packets which arrive on a
network device other than the current loopback testing receiver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The $(BIN)/version.%.o target will fail if iPXE is built within a
non-git repository, e.g. when the user downloaded and extracted an
archive containing iPXE sources, *and* if any parent directory of the
iPXE sources is a git repository (or even contains a directory named
".git"). This is because git will by default ascend the directory
tree and look for ".git".
The problem typically manifests on source based distributions, see for
example https://bugs.gentoo.org/show_bug.cgi?id=482804
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some switches do not allow an individual link (as defined in IEEE Std
802.3ad-2000 section 43.3.5) to work alone in a link aggregation group
as described in section 43.3.6. This is verified on Dell's
PowerConnect M6220, based on the Broadcom Strata XGS-IV chipset.
Set the LACP_STATE_AGGREGATABLE flag in the actor.state field to
announce link aggregation in the response LACPDU, which will have the
switch enable the link aggregation group and allow frames to pass.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a 32-bit iPXE binary is running on a system which allocates PCI
memory BARs above 4GB, our PCI subsystem will return the base address
for any such BARs as zero (with a warning message if DEBUG=pci is
enabled). Currently, ioremap() will happily map an address pointing
to the start of physical memory, providing no sensible indication of
failure.
Fix by always returning NULL if we are asked to ioremap() a zero bus
address.
With a totally flat memory model (e.g. under EFI), this provides an
accurate failure indication since no PCI peripheral will be mapped to
the zero bus address.
With the librm memory model, there is the possibility of a spurious
NULL return from ioremap() if the bus address happens to be equal to
virt_offset. Under the current virtual memory map, the NULL virtual
address will always be the start of .textdata, and so this problem
cannot occur; a NULL return from ioremap() will always be an accurate
failure indication.
Debugged-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some older EFI 1.10 implementations (observed with an old iMac), we
must use the (now obsolete) EFI_CONSOLE_CONTROL_PROTOCOL to switch the
console into text mode.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_CONSOLE_CONTROL_PROTOCOL does not exist in the current UEFI
specification, but is required to enable text output on some older EFI
1.10 implementations (observed on an old iMac).
The header is not present in any of the standard include directories,
but can still be found in the EDK2 codebase as part of
EdkCompatibilityPkg.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building with DEBUG=efi_wrap, print details of calls made by the
loaded image to selected boot services functions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI FAT filesystem driver has a bug: if a block device contains no
FAT filesystem but does have an EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
instance, the FAT driver will assume that it must have previously
installed the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. This causes the FAT
driver to claim control of our device, and to refuse to stop driving
it, which prevents us from later uninstalling correctly.
Work around this bug by opening the disk I/O protocol ourselves,
thereby preventing the FAT driver from opening it.
Note that the alternative approach of opening the block I/O protocol
(and thereby in theory preventing DiskIo from attaching to the block
I/O protocol) causes an endless loop of calls to our DRIVER_STOP
method when starting the EFI shell. I have no idea why this is.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When functioning as an EFI driver, drivers can be disconnected and
reconnected multiple times (e.g. via the EFI shell "connect" command,
or by running an executable such as ipxe.efi which will temporarily
disconnect existing drivers).
Minimise surprise by resetting the network device index to zero
whenever the last device is unregistered. This is not foolproof, but
it does handle the common case of having all devices unregistered and
then reregistered in the original order.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite the SNP NIC driver to use non-blocking and deferrable
transmissions, to provide link status detection, to provide
information about the underlying (PCI) hardware device, and to avoid
unnecessary I/O buffer allocations during receive polling.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 8290a95 ("[build] Expose build timestamp, build name, and
product names") introduced a regression in the build process which
resulted in broken final binaries which had names based on object
files (e.g. "undionly.kpxe" or "intel.rom") rather than on device IDs
(e.g. "8086100e.mrom").
The underlying problem is the -DOBJECT=<name> macro which is used to
generate the obj_<name> symbols used to select objects required for
the final binary. The macro definition is derived from the initial
portion (up to the first dot) of the object being built. In the case
of e.g. undionly.kpxe.version.o, this gives -DOBJECT=undionly. This
results in undionly.kpxe.version.o claiming to be the "undionly"
object; the real "undionly" object will therefore never get dragged in
to the build.
Fix by renaming $(BIN)/%.version.o to $(BIN)/version.%.o, so that the
object is always built with -DOBJECT=version (as might be expected,
since it is built from core/version.c).
Final binaries which have names based on device IDs (such as
"8086100e.mrom") are not affected by this problem, since the object
name "8086100e" will not conflict with that of the underlying "intel"
object.
This problem was not detected by the per-commit smoke testing
procedure, which happens to use the binary bin/8086100e.mrom.
Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a single instance of EFI_DRIVER_BINDING_PROTOCOL (attached to
our image handle); this matches the expectations scattered throughout
the EFI specification.
Open the underlying hardware device using EFI_OPEN_PROTOCOL_BY_DRIVER
and EFI_OPEN_PROTOCOL_EXCLUSIVE, to prevent other drivers from
attaching to the same device.
Do not automatically connect to devices when being loaded as a driver;
leave this task to the platform firmware (or to the user, if loading
directly from the EFI shell).
When running as an application, forcibly disconnect any existing
drivers from devices that we want to control, and reconnect them on
exit.
Provide a meaningful driver version number (based on the build
timestamp), to allow platform firmware to automatically load newer
versions of iPXE drivers if multiple drivers are present.
Include device paths within debug messages where possible, to aid in
debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the build timestamp (measured in seconds since the Epoch) and
the build name (e.g. "rtl8139.rom" or "ipxe.efi"), and provide the
product name and product short name in a single centralised location.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When multiple iPXE binaries are running concurrently (e.g. in the case
of undionly.kpxe using an underlying iPXE driver via the UNDI
interface) it would be helpful to be able to visually distinguish
debug messages from each binary.
Allow the range of debug colours used to be customised via the
DBGCOL=... build parameter. For example:
# Restrict to colours 31-33 (red, green, yellow)
make DBGCOL=31-33
# Restrict to colours 34-36 (blue, magenta, cyan)
make DBGCOL=34-36
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If iPXE is used as a git submodule then the ../.git/index file will
not exist, and the build will fail. Fix by checking that the git
index file exists before adding it as a build dependency.
Signed-off-by: Peter Lemenkov <lemenkov@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
With blade servers, the chassis serial number (exposed via ${serial})
may not be unique. Expose ${board-serial} as a named setting to
provide easy access to a more meaningful serial number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The VF might not have assigned a MAC address upon startup, and will
end up with a random MAC address during probe(). With this patch the
MAC address can be changed later on.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the VF doesn't have a MAC address assigned we should create a
random MAC address.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iBFT includes an "origin" field to indicate the source of the IP
address. We use the heuristic of assuming that the source should be
"manual" if the IP address originates directly from the network device
settings block, and "DHCP" otherwise. This is an imperfect guess, but
is likely to be correct in most common situations.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Parse the sense data to extract the reponse code, the sense key, the
additional sense code, and the additional sense code qualifier.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently calls cpu_nap() while performing DHCP, in order to
reduce CPU utilisation on virtual machines. Under mild broadcast load
(~100 packets per second), this can cause received packets to be
dropped because the receive descriptor ring is overrun before the next
18Hz timer interrupt wakes up the CPU. The result is that DHCP is
likely to intermittently fail on networks with appreciable amounts of
broadcast (or multicast) traffic.
This behaviour was introduced in the series of commits which
generalised the "dhcp" command to the "ifconf" command. The earlier
code (which did not handle IPv6 configuration) had no call to
cpu_nap() and so did not suffer from this problem.
Fix by removing the call to cpu_nap() in ifpoller_progress(). This
has the undesirable side effect that CPU utilisation will remain at
100% while waiting for DHCP to complete (which can take several
seconds, if we have to wait around for potential ProxyDHCP offers to
arrive).
Reported-by: Alex Davies <adavies@jumptrading.com>
Reported-by: Christoffer Stokbæk <christoffers@easyspeedy.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some external code (observed with FreeBSD's bootloader) will continue
to make INT 13 calls after reconfiguring the 8259 PIC to change the
vector offsets for IRQs. If an IRQ (e.g. the timer IRQ) subsequently
occurs while iPXE is in protected mode, this will cause a general
protection fault since the corresponding IDT entry is empty.
A general protection fault is INT 0x0d, which happens to overlap with
the original IRQ5. We therefore do have an ISR set up to handle a
general protection fault, but this ISR simply reflects the interrupt
down to the real-mode INT 0x0d and then attempts to return. Since our
ISR is expecting a hardware interrupt rather than a general protection
fault, it doesn't remove the error code from the stack before issuing
the iret instruction; it therefore attempts to return to a garbage
address. Since the segment part of this address is likely to be
invalid, a second general protection fault occurs. This cycle
continues until we run out of stack space and triple fault.
Fix by reflecting all INTs down to real mode. This actually reduces
the code size by four bytes (but increases the bss size by almost
2kB).
Reported-by: Brian Rak <dn@devicenull.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If ipv6_tx() is called with a non-NULL network device, a NULL or
unspecified source address, and a destination address which does not
match any routing table entry, then it will attempt to copy the source
address from a NULL pointer.
I don't think that there is currently any code path which could
trigger this behaviour, but we should probably ensure that it can
never happen.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Destination multicast addresses require a sin6_scope_id, which should
therefore be transcribed to a network device name by ipv6_sock_ntoa().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The transmitting network device is specified via the destination
address, not the source address. There is no reason to set
sin6_scope_id on the source address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Setting sin6_scope_id to a non-zero value will cause the check against
the "empty socket address" in udp_demux() to fail, and incoming DHCPv6
responses on interfaces other than net0 will be rejected with a
spurious "No UDP connection listening on port 546" error.
The transmitting network device is specified via the destination
address, not the source address. Fix by simply not setting
sin6_scope_id on the client socket address.
Reported-by: Anton D. Kachalov <mouse@yandex-team.ru>
Tested-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix an erroneous htonl() in the definition of IN6_IS_ADDR_LINKLOCAL(),
and add self-tests for the IN6_IS_ADDR_xxx() family of macros.
Reported-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI systems (observed with a Mac Pro) do not provide a loaded
image device path protocol. We don't currently use the loaded image
device path protocol for anything beyond printing a debug message, so
simply remove the code which attempts to fetch it.
Reported-by: Matt Woodward <pxematt@woodwardcc.com>
Tested-by: Matt Woodward <pxematt@woodwardcc.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI systems (observed with a Mac Pro) do not provide
EFI_HII_DATABASE_PROTOCOL. We can continue to function without
providing access to network device settings via HII, so make this
protocol optional and fall back to simply not providing any HII
protocols.
Reported-by: Matt Woodward <pxematt@woodwardcc.com>
Tested-by: Matt Woodward <pxematt@woodwardcc.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI systems (observed with a Mac Pro) do not provide
EFI_DEVICE_PATH_TO_TEXT_PROTOCOL. Since we use this protocol only for
debug messages, make it optional and fall back to printing the raw
device path bytes.
Reported-by: Matt Woodward <pxematt@woodwardcc.com>
Tested-by: Matt Woodward <pxematt@woodwardcc.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Get the NFS URI manipulation code out of nfs_open.c. The resulting
code is now much more readable.
Signed-off-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
strndup() may be called on a string which is not NUL-terminated. Use
strnlen() instead of strlen() to ensure that we do not read beyond the
end of such a string.
Add self-tests for strndup(), including a test case with an
unterminated string.
Originally-fixed-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid generating syntactically invalid log messages by ensuring that
invalid characters are not present in the hostname. In particular,
ensure that any whitespace is stripped, since whitespace functions as
a field separator for syslog messages.
Reported-by: Alex Davies <adavies@jumptrading.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit d28bb51 ("[tcp] Defer sending ACKs until all received
packets have been processed"), increasing the RX ring size will
increase the number of received packets per transmitted ACK (since
each poll will process up to one complete receive ring). Under KVM,
this can make a substantial (up to ~200%) difference to the overall
download speed, since transmissions are very expensive.
Increase the ring fill level from four to eight packets: this
increases the download speed by around 50% at a cost of around 8kB of
heap space. Further speedups are possible by increasing the ring size
further, but it would be preferable to find alternative methods which
do not use noticeable amounts of heap space.
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An invalid free() was ironically introduced by fixing another invalid
free in commit 7aa69c4 ("[nfs] Fix an invalid free() when loading a
symlink").
Signed-off-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The bzImage boot protocol allows the real-mode code to be loaded at
any segment within base memory. (The fact that both iPXE and recent
versions of Syslinux will load the real-mode code at 1000:0000 is a
coincidence; it is not guaranteed by the specification.)
Fix by making the code relocatable.
Reported-by: Andrew Stuart <andrew@shopcusa.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rework geniso and genliso to provide a single merged utility for
generating ISO images.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The .lkrn prefix currently provides a zImage kernel with unused setup
sectors and the whole iPXE binary placed within the "protected mode
kernel" portion of the zImage.
The work carried out years ago to create the .mrom format provides a
mechanism allowing the iPXE binary to be split into a small real-mode
header and a larger payload. This neatly matches the way that a
bzImage is loaded: the "setup sectors" can contain the header and the
"protected mode kernel" can contain the payload.
This removes the size restrictions on an iPXE .lkrn image (and hence
on derived image formats such as .iso).
Also remove obsolete copyright information, since none of the original
code or functionality now remains.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running inside a virtual machine (or when using the UNDI driver),
transmitting packets can be expensive. When we receive several
packets in one poll (e.g. because a slow BIOS timer interrupt routine
has caused us to fall behind in processing), we can safely send just a
single ACK to cover all of the received packets. This reduces the
time spent transmitting and allows us to clear the backlog much
faster.
Various RFCs (starting with RFC1122) state that there should be an ACK
for at least every second segment. We choose not to enforce this
rule. Under normal operation each poll should find at most one
received packet, and we will then not delay any ACKs. We delay
(i.e. omit) ACKs only when under sufficiently heavy load that we are
finding multiple packets per poll; under these conditions it is
important to clear the backlog quickly since any delay may lead to
dropped packets.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 8540300 ("[build] Disable ccache for all relevant build
targets") attempted to generalise the rule for $(BIN)/version.o to
$(BIN)/version.% in order to apply the dependency to all relevant
build targets (debug objects, assembly listings, etc).
This generalisation appears to work for the ccache override
directives, but seems to cause make (at least, GNU make 4.0) to simply
ignore the dependency upon the git index.
Since version.c contains only some string constants, there is unlikely
to be a substantive need for its debug objects, assembly listings,
etc. Restore the previous form of the dependency and accept that
hypothetical builds with e.g. DEBUG=version will not be handled
correctly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When profiling, exclude any time spent inside the hypervisor
responding to our MMIO accesses. This substantially reduces the
variance accumulated on many other profilers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Interrupt processing adds noise to profiling results. Allow
interrupts (from within protected mode) to be profiled separately,
with time spent within the interrupt handler being excluded from any
other profiling currently in progress.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PXENV_UNDI_ISR calls may implicitly refill the underlying receive
ring, and so could continue to retrieve packets indefinitely. Place
an upper limit on the number of calls to PXENV_UNDI_ISR per call to
undinet_poll().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When making a call from real mode to protected mode, we save and
restore the global and interrupt descriptor table registers. The
restore currently takes place after returning to real mode, which
generates two EXCEPTION_NMIs and corresponding VM exits when running
under KVM on an Intel CPU.
Avoid the VM exits by restoring the descriptor table registers inside
prot_to_real, while still running in protected mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that all segment registers have zero in the low two bits before
transitioning to protected mode. This allows the CPU state to
immediately be deemed to be "valid", and eliminates the need for any
further emulated instructions.
Load the protected-mode interrupt descriptor table after switching to
protected mode, since this avoids triggering an EXCEPTION_NMI and
corresponding VM exit.
This reduces the time taken by real_to_prot under KVM by around 50%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On an Intel CPU supporting VMX, KVM will emulate instructions while
the CPU state remains "invalid". In real mode, the CPU state is
defined to be "invalid" if any segment register has a base which is
not equal to (sreg<<4) or a limit which is not equal to 64kB.
We don't actually use the base stored in the REAL_DS descriptor for
any significant purpose. Change the base stored in this descriptor to
be equal to (REAL_DS<<4). A segment register loaded with REAL_DS is
then automatically valid in both real and protected modes. This
allows KVM to stop emulating instructions much sooner.
The only use of REAL_DS for memory accesses currently occurs in the
indirect ljmp within prot_to_real. Change this to a direct ljmp,
storing rm_cs in .text16 as part of the ljmp instruction. This
removes the only memory access via REAL_DS (thereby allowing for the
above descriptor base address hack), and also simplifies the ljmp
instruction (which will still have to be emulated).
Load the real-mode interrupt descriptor table register before
switching to real mode, since this avoids triggering an EXCEPTION_NMI
and corresponding VM exit.
This reduces the time taken by prot_to_real under KVM by around 65%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The mode-transition code involves paths which switch back and forth
between the .text and .text16 sections. At present, only the start of
each function is labelled, which makes it difficult to decode
addresses within the parts of the function existing in a different
section.
Add explicit labels at the start of each section change, so that
addresses can be meaningfully decoded to the nearest label.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Now that we can handle interrupts while in protected mode, there is no
need to switch to real mode just to halt the CPU.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The currticks() function is called at least once per TCP packet, and
so is performance-critical. Switching to real mode just to allow the
timer interrupt to fire is expensive when running inside a virtual
machine, and imposes a significant performance cost.
Fix by enabling interrupts without switching to real mode. This
results in an approximately 100% increase in download speed when
running under KVM.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We now have the ability to handle interrupts while in protected mode,
and so no longer need to set up a dedicated interrupt descriptor table
while running COM32 executables.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running in a virtual machine, switching to real mode may be
expensive. Allow interrupts to be enabled while in protected mode and
reflected down to the real-mode interrupt handlers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for an explicit debug level of zero, which will enable
assertions and profiling (i.e. anything controlled by NDEBUG) without
generating any debug messages.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use flat real mode wherever real mode is required. This
guarantees that we will not surprise some unsuspecting external caller
which has carefully set up flat real mode by suddenly reducing the
segment limits to 64kB.
However, operating in flat real mode imposes a severe performance
penalty in some virtualisation environments, since some CPUs cannot
fully virtualise flat real mode and so the hypervisor must fall back
to emulation. In particular, operating under KVM on a pre-Westmere
Intel CPU will be at least an order of magnitude slower, to the point
that there is a visible teletype effect when printing anything to the
BIOS console. (Older versions of KVM used to cheat and ignore the
"flat" part of flat real mode, which masked the problem.)
Switch (back) to using genuine real mode with 64kB segment limits
instead of flat real mode. Hopefully this won't break anything.
Add an explicit switch to flat real mode before returning to the BIOS
from the ROM prefix, since we know that a PMM BIOS will call the ROM
initialisation point (and potentially the BEV) in flat real mode.
As noted in previous commit messages, it is not possible to restore
the real-mode segment limits after a transition to protected mode,
since there is no way to know which protected-mode segment descriptor
was originally used to initialise the limit portion of the segment
register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Inside a virtual machine, writing the RX ring tail pointer may incur a
substantial overhead of processing inside the hypervisor. Minimise
this overhead by writing the tail pointer once per batch of
descriptors, rather than once per descriptor.
Profiling under qemu-kvm (version 1.6.2) shows that this reduces the
amount of time taken to refill the RX descriptor ring by around 90%.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Operations which are negligible on physical hardware (such as issuing
a posted write to the transmit ring tail register) may involve
substantial amounts of processing within the hypervisor if running in
a virtual machine.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As observed in commit 082cedb ("[build] Fix __libgcc attribute for
recent gcc versions"), recent versions of gcc have changed the
semantics of -mrtd as applied to the implicit arithmetic functions.
It is possible for tests to succeed even if our assumptions about
gcc's interpretation of -mrtd are incorrect. In particular, if gcc
chooses to utilise a frame pointer in the calling function, then it
can tolerate a temporarily incorrect stack pointer (since the stack
pointer will shortly afterwards be restored from the frame pointer
anyway).
Add tests designed specifically to check that our implementations of
the implicit arithmetic functions manipulate the stack pointer as
expected by gcc.
The effect of these tests can be observed by temporarily reverting
commit 082cedb ("[build] Fix __libgcc attribute for recent gcc
versions"): without this fix in place, the tests will fail on gcc 4.7
and later.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We observed some time ago (in commit 4ce8d61 "Import various libgcc
functions from syslinux") that gcc seems to treat calls to the
implicit arithmetic functions (e.g. __udivdi3()) as being affected by
-mregparm but unaffected by -mrtd.
This seems to be no longer the case with current gcc versions, which
treat calls to these functions as being affected by both -mregparm and
-mrtd, as expected.
There is nothing obvious in the gcc changelogs to indicate precisely
when this happened. From experimentation with available gcc versions,
the change occurred sometime between v4.6.3 and v4.7.2. We assume
that only versions up to v4.6.x require the special treatment.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a 32-bit system, 64-bit division is implemented using the libgcc
functions provided in __udivmoddi4.c etc. Calls to these functions
are generated automatically by gcc, with a calling convention that is
somewhat empirical in nature. Add these self-tests primarily as a
check that we are using the correct calling convention.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Escape sequences received via the serial console can fail since the
cpu_nap() in getchar_timeout() can delay processing for more than the
time it takes for a single character to arrive.
Fix by enabling the UART FIFOs.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is unclear from the datasheets whether or not the TX ring can be
completely filled (i.e. whether writing the tail value as equal to the
current head value will cause the ring to be treated as completely
full or completely empty). It is very plausible that this edge case
could differ in behaviour between real hardware and the many
implementations of an emulated Intel NIC found in various virtual
machines. Err on the side of caution and always leave at least one
ring entry empty.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expand the concept of the X.509 cache to provide the functionality of
a certificate store. Certificates in the store will be automatically
used to complete certificate chains where applicable.
The certificate store may be prepopulated at build time using the
CERT=... build command line option. For example:
make bin/ipxe.usb CERT=mycert1.crt,mycert2.crt
Certificates within the certificate store are not implicitly trusted;
the trust list is specified using TRUST=... as before. For example:
make bin/ipxe.usb CERT=root.crt TRUST=root.crt
This can be used to embed the full trusted root certificate within the
iPXE binary, which is potentially useful in an HTTPS-only environment
in which there is no HTTP server from which to automatically download
cross-signed certificates or other certificate chain fragments.
This usage of CERT= extends the existing use of CERT= to specify the
client certificate. The client certificate is now identified
automatically by checking for a match against the private key. For
example:
make bin/ipxe.usb CERT=root.crt,client.crt TRUST=root.crt KEY=client.key
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that any generated files (such as DER forms of X.509
certificates) are rebuilt if the Makefile changes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The build process currently attempts to disable ccache for files using
the .incbin directive, but the rule fails to apply to anything beyond
the simple object target. Fix by applying to all relevant build
targets (including debug objects, assembly listings, and so on).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently allocates a copy the certificate's common name as a
string. This string is used by the TLS and CMS code to check
certificate names against an expected name, and also appears in
debugging messages.
Provide a function x509_check_name() to centralise certificate name
checking (in preparation for adding subjectAlternativeName support),
and a function x509_name() to provide a name to be used in debugging
messages, and remove the dynamically allocated string.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Certificate authorities are not required to send the certificate used
to sign the OCSP response if the response is signed by the original
issuer.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
At least one HTTP server (Google's OCSP responder) has been observed
to generate a Content-Length header with trailing whitespace.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes (observed with a ProLiant DL360p Gen8 SE) perform no range
checking whatsoever on the parameters passed to INT10,06 and will
therefore happily write to an area beyond the end of video RAM. The
area immediately following the video RAM tends to be the VGA BIOS ROM
image. Overwriting the VGA BIOS leads to an interesting variety of
crashes and reboots.
Fix by specifying an exact width and height to be cleared, rather than
passing in large values and relying upon the BIOS to truncate them to
the appropriate range.
Reported-by: Alex Davies <adavies@jumptrading.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On an Asus Z87-K motherboard with an onboard 8168 NIC, booting into
Windows 7 and then warm rebooting into iPXE results in a broken RX
datapath: packets can be transmitted successfully but garbage is
received. A cold reboot clears the problem.
A dump of the PHY registers reveals only one difference: in the
failure case the bits ADVERTISE_PAUSE_CAP and ADVERTISE_PAUSE_ASYM are
cleared. Explicitly setting these bits does not fix the problem.
A dump of the MAC registers reveals a few differences, of which the
most obvious culprit is the undocumented bit 24 of the Receive
Configuration Register (RCR), which is set in the failure case.
Explicitly clearing this bit does fix the problem.
Reported-by: Sebastian Nielsen <ipxe@sebbe.eu>
Reported-by: Oliver Rath <rath@mglug.de>
Debugged-by: Sebastian Nielsen <ipxe@sebbe.eu>
Tested-by: Sebastian Nielsen <ipxe@sebbe.eu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI builds will set up a timer to continuously poll any SNP
devices. This can drain packets from the network device's receive
queue before iPXE gets a chance to process them.
Use netdev_rx_[un]freeze() to explicitly indicate when we expect our
network devices to be driven via the external SNP API (as we do with
the UNDI API on the standard BIOS build), and disable the SNP API
except when receive queue processing is frozen.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFIRC() uses PLATFORM_TO_ERRNO(), which evaluates its argument twice
(and can't trivially use a braced-group expression or an inline
function to avoid this, since it gets used outside of function
context).
The expression "EFIRC(main())" will therefore end up calling main()
twice, which is not the intended behaviour. Every other instance of
EFIRC() is of the simple form "EFIRC(rc)", so fix by converting this
instance to match.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for extraneous packets to be received during loopback testing,
and so permit loopback tests to be performed when ports are connected
to a switch (rather than requiring ports to be directly connected with
a loopback cable).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Inhibit implicit sign-padding of characters with the top bit set
(e.g. accented characters), which confuses the mucurses library by
colliding with the bits used to store character attributes and
colours.
Reported-by: Marc Delisle <Marc.Delisle@cegepsherbrooke.qc.ca>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a 64-bit build, EFI_STATUS codes are 64-bit quantities, with the
"error/warning" bit located in bit 63.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE will detect timeout failures in several situations: network
link-up, DHCP, TCP connection attempts, unacknowledged TCP data, etc.
This does not cover all possible circumstances. For example, if a
connection to a web server is successfully established and the web
server acknowledges the HTTP request but never sends any data in
response, then no timeout will be triggered. There is no timeout
defined within the HTTP specifications, and the underlying TCP
connection will not generate a timeout since it has no way to know
that the HTTP layer is expecting to receive data from the server.
Add a "--timeout" parameter to "imgfetch", "chain", etc. If no
progress is made (i.e. no data is downloaded) within the timeout
period, then the download will be aborted.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Redefine the timeout parameter from "time since start of job" to "time
since progress was last made". This does not affect any existing
behaviour, since all existing users of the timeout parameter do not
provide progress indication.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A web server may return a 503 Service Unavailable response along with
a Retry-After header to direct the client to retry the request at a
later time.
The Retry-After header may be a number of seconds, or a full HTTP
timestamp (e.g. "Fri, 7 Mar 2014 17:22:14 GMT"). We have no
reasonable way of parsing a full HTTP timestamp; if the server chooses
to use this format then we simply retry after a fixed 5-second delay.
As per RFC 2616, in the absence of a Retry-After header we treat a
status code of 503 Service Unavailable as being equivalent to 500
Internal Server Error, and immediately fail the request.
Requested-by: Suresh Sundriyal <ssundriy@vmware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE uses currticks() (along with the MAC address(es) of any network
devices) to seed the (non-cryptographic) random number generator. The
current implementation of linux_currticks() ensures that the first
call to currticks() will always return zero; this results in identical
random number sequences on each run of iPXE on a given machine. This
can cause odd-looking behaviour due to e.g. the reuse of local TCP
port numbers.
Fix by effectively rounding down the start time recorded by
linux_currticks() to the nearest whole second; this makes it unlikely
that consecutive runs of iPXE will use the exact same RNG sequence.
(Note that none of this affects the cryptographic RNG, which uses
/dev/random as a source of entropy.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently ignores ACKs which do not acknowledge any new data.
(In particular, it does not stop the retransmission timer; this is
done to prevent an immediate retransmission if a duplicate ACK is
received while the transmit queue is non-empty.)
If a peer provides a window size of zero and later sends a duplicate
ACK to update the window size, this update will therefore be ignored
and iPXE will never be able to transmit data.
Fix by updating the window size even for ACKs which do not acknowledge
new data.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When opening a VLAN device, vlan_open() will call netdev_open() on the
trunk device. This will result in a call to netdev_notify(), which
will cause vlan_notify() to call vlan_sync() on the original VLAN
device, which will see that the trunk device is now open but the VLAN
device apparently isn't (since it has not yet been flagged as open by
netdev_open()). The upshot is a second attempt to open the VLAN
device, which will result in an erroneous second call to vlan_open().
This convoluted chain of events then terminates harmlessly since
vlan_open() calls netdev_open() on the trunk device, which just
returns immediately since the trunk device is by now flagged as being
already open.
Prevent this from happening by having netdev_open() flag the device as
open prior to calling the device's open() method, and reflagging it as
closed if the open() method fails.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit c429bf0 ("[romprefix] Store boot bus:dev.fn address as autoboot
device location") introduced a regression by using register %cx to
temporarily hold the PCI bus:dev.fn address, despite the fact that %cx
was already being used to hold the stored BIOS stack segment.
Consequently, when returning to the BIOS after a failed or cancelled
boot attempt, iPXE would end up calling INT 18 with the stack segment
set equal to the PCI bus:dev.fn address. Writing to essentially
random areas of memory tends to upset even the more robust BIOSes.
Fix by using register %ax to temporarily hold the PCI bus:dev.fn
address.
Reported-by: Anton D. Kachalov <mouse@yandex-team.ru>
Tested-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently pads initrd images to a multiple of 4kB and inserts
zero padding between images, as required by some versions of the Linux
kernel. The overall length reported via the ramdisk_size field in the
bzImage header includes this zero padding.
This causes problems when using memdisk to load a gzip-compressed disk
image. memdisk treats the ramdisk_size field as containing the exact
length of the initrd image, and uses this length to locate the 8-byte
gzip footer. This will generally cause memdisk to fail to decompress
the disk image.
Fix by reporting the exact length of the initrd image set, including
any padding inserted between images but excluding any padding added at
the end of the final image.
Reported-by: Levente LEVAI <levail@aviatronic.hu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently advertises a fixed MSS of 1460, which is correct only
for IPv4 over Ethernet. For IPv6 over Ethernet, the value should be
1440 (allowing for the larger IPv6 header). For non-Ethernet link
layers, the value should reflect the MTU of the underlying network
device.
Use tcpip_mtu() to calculate the transport-layer MTU associated with
the peer address, and calculate the MSS to allow for an optionless TCP
header as per RFC 6691.
As a side benefit, we can now fail a connection immediately with a
meaningful error message if we have no route to the destination
address.
Reported-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide the function tcpip_mtu() to allow external code to determine
the (transport-layer) maximum transmission unit for a given socket
address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide the function tcpip_netdev() to allow external code to
determine the transmitting network device for a given socket address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
qemu can load an option ROM which is not associated with a particular
PCI device using the "-option-rom" syntax. Under these circumstances,
we should ignore the PCI bus:dev.fn address that we expect to find in
%ax on entry to the initialisation vector.
Fix by using the PCI bus:dev.fn address only if it is non-zero. Since
00:00.0 will always be the host bridge, it can never be the address of
a network card.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Per the BIOS Boot Specification, the initialization phase of the ROM
is called with the PFA (PCI Function Address) in the %ax register.
The intention is that the ROM code will store that device address
somewhere and use it for booting from that device when the Boot Entry
Vector (BEV) is called. iPXE does store the PFA, but doesn't use it
to select the boot network device. This renders BIOS IPL lists fairly
ineffective.
Fix by using the BBS-specified bus:dev.fn address as the autoboot
device location.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE will currently attempt to boot from every network device for
which it has a driver. Where a system has more than one network
device supported by iPXE, this renders BIOS IPL lists ineffective.
Allow an autoboot device location to be specified. If such a location
is specified, then only devices matching that location will be used as
part of the automatic boot sequence. If no such location is
specified, then all devices will be used.
Note that this does not affect the "autoboot" command, which will
continue to use all devices.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently prints a "Press Ctrl-B" banner twice: once when the ROM
is first called for initialisation and again if we attempt to boot
from the ROM. This slows boot, especially when the NIC is not the
primary boot device. Tools such as libguestfs make use of QEMU VMs
for performing maintenance on disk images and may make use of NICs in
the VM for network support. If iPXE introduces a static init-time
delay, that directly translates to increased runtime for the tools.
Fix by allowing the ROM banner timeout to be configured independently
of the main banner timeout.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for parsing of URIs containing literal IPv6 addresses
(e.g. "http://[fe80::69ff:fe50:5845%25net0]/boot.ipxe").
Duplicate URIs by directly copying the relevant fields, rather than by
formatting and reparsing a URI string. This relaxes the requirements
on the URI formatting code and allows it to focus on generating
human-readable URIs (e.g. by not escaping ':' characters within
literal IPv6 addresses). As a side-effect, this allows relative URIs
containing parameter lists (e.g. "../boot.php##params") to function
as expected.
Add validity check for FTP paths to ensure that only printable
characters are accepted (since FTP is a human-readable line-based
protocol with no support for character escaping).
Construct TFTP next-server+filename URIs directly, rather than parsing
a constructed "tftp://..." string,
Add self-tests for URI functions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit b5f5f73 ("[cmdline] Expand settings within each command-line
token individually") effectively rendered the "uristring" setting type
obsolete, since strings containing whitespace no longer break the
command line parser. The concept of the "uristring" type is not well
defined, since URI escaping rules depend on which portion of a URI is
being escaped.
Remove the "uristring" type, converting it into an alias for the
"string" setting type so as to avoid breaking existing scripts.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When resizing DHCP options, iPXE currently calculates the length to be
copied by subtracting the destination pointer from the end of buffer
pointer. This works and guarantees not to write beyond the end of the
buffer, but may end up reading beyond the end of the buffer.
Fix by calculating the required length exactly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit d4c0226 ("[dns] Support DNS search lists") introduced a
regression when handling CNAME records resolving to names longer than
the original name. The "end of name" offset stored in dns->offset was
not updated to reflect the length of the new name, causing
dns_question() to append the (empty) search suffix at an incorrect
offset within the name buffer, resulting in a mangled DNS name.
In the case of a CNAME record resolving to a name shorter than or
equal in length to the original name, then the mangling would occur in
an unused portion of the name buffer. In the common case of a name
server returning the A (or AAAA) record along with the CNAME record,
this would cause name resolution to succeed despite the mangling. (If
the name server did not return the A or AAAA record along with the
CNAME record, then the mangling would be revealed by the subsequent
invalid query packet.)
Reported-by: Nicolas Sylvain <nsylvain@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update the DNS resolver to support DNS search lists (as provided by
DHCP option 119, DHCPv6 option 24, or NDP option 31).
Add validation code to ensure that parsing of DNS packets does not
overrun the input, get stuck in infinite loops, or (worse) write
beyond the end of allocated buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rename the "--bpp" option to "--depth", to free up the single-letter
option "-b" for "--bottom" in preparation for adding margin support.
This does not break backwards compatibility with documented features,
since the "console" command has not yet been documented.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for an arbitrary margin to be specified in the console
configuration. If the actual screen size does not match the requested
screen size, then update any margins specified so that they remain in
the same place relative to the requested screen size. If margins are
unspecified (i.e. zero), then leave them as zero.
The underlying assumption here is that any specified margins are
likely to describe an area within a background picture, and so should
remain in the same place relative to that background picture.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Centre the background picture on the console, to give a more
consistent result when the aspect ratio does not match the requested
width and height.
Once drawn for the first time, nothing should ever overwrite the
margins of the display. We can therefore eliminate the logic used to
redraw only the margin areas, and use much simpler code to draw the
complete initial background image.
Simplify the redrawing logic further by making the background picture
buffer equal in size to the frame buffer. In the common case of a
background picture which is designed to fill the screen, this wastes
no extra memory, and the combined code simplifications reduce the size
of fbcon.o by approximately 15%.
Redefine the concept of "margin" to match the intuitive definition
(i.e. the size of the gap, rather than the position of the boundary
line).
Signed-off-by: Michael Brown <mcb30@ipxe.org>