Enable -fstack-protector for EFI builds, where binary size is less
critical than for BIOS builds.
The stack cookie must be constructed immediately on entry, which
prohibits the use of any viable entropy source. Construct a cookie by
XORing together various mildly random quantities to produce a value
that will at least not be identical on each run.
On detecting a stack corruption, attempt to call Exit() with an
appropriate error. If that fails, then lock up the machine since
there is no other safe action that can be taken.
The old conditional check for support of -fno-stack-protector is
omitted since this flag dates back to GCC 4.1.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Raw block downloads are expensive if the origin server uses HTTPS,
since each concurrent download will require local TLS resources
(including potentially large received encrypted data buffers).
Raw block downloads may also be prohibitively slow to initiate when
the origin server is using HTTPS and client certificates. Origin
servers for PeerDist downloads are likely to be running IIS, which has
a bug that breaks session resumption and requires each connection to
go through the full client certificate verification.
Limit the total number of concurrent raw block downloads to ameliorate
these problems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Raspberry Pi NIC has no EEPROM to hold the MAC address. The
platform firmware (e.g. UEFI or U-Boot) will typically obtain the MAC
address from the VideoCore firmware and add it to the device tree,
which is then made available to subsequent programs such as iPXE or
the Linux kernel.
Add the ability to parse a flattened device tree and to extract the
MAC address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Hermon driver uses vlan_find() to identify the appropriate VLAN
device for packets that are received with the VLAN tag already
stripped out by the hardware. Generalise this capability and expose
it for use by other network card drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Intel 40 Gigabit Ethernet virtual functions support only MSI-X
interrupts, and will write back completed interrupt descriptors only
when the device attempts to raise an interrupt (or when a complete
cacheline of receive descriptors has been completed).
We cannot actually use MSI-X interrupts within iPXE, since we never
have ownership of the APIC. However, an MSI-X interrupt is
fundamentally just a DMA write of a single dword to an arbitrary
address. We can therefore configure the device to "raise" an
interrupt by writing a meaningless value to an otherwise unused memory
location: this is sufficient to trigger the receive descriptor
writeback logic.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
One of the design goals of ASN.1 DER is to provide a canonical
serialization of a data structure, thereby allowing for equality of
values to be tested by simply comparing the serialized bytes.
Some OCSP servers will modify the request certID to omit the optional
(and null) "parameters" portion of the hashAlgorithm. This is
arguably legal but breaks the ability to perform a straightforward
bitwise comparison on the entire certID field between request and
response.
Fix by comparing the OID-identified hashAlgorithm separately from the
remaining certID fields.
Originally-fixed-by: Thilo Fromm <Thilo@kinvolk.io>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the session ID (if any) provided by the server and attempt to
reuse it for any concurrent connections to the same server.
If multiple connections are initiated concurrently (e.g. when using
PeerDist) then defer sending the ClientHello for all but the first
connection, to allow time for the first connection to potentially
obtain a session ID (and thereby speed up the negotiation for all
remaining connections).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a Dell OptiPlex 7010, calling DisconnectController() on the LOM
device handle will lock up the system. Debugging shows that execution
is trapped in an infinite loop that is somehow trying to reconnect
drivers (without going via ConnectController()).
The problem can be reproduced in the UEFI shell with no iPXE code
present, by using the "disconnect" command. Experimentation shows
that the only fix is to unload (rather than just disconnect) the
"Ip4ConfigDxe" driver.
Add the concept of a blacklist of UEFI drivers that will be
automatically unloaded when iPXE runs as an application, and add the
Dell Ip4ConfigDxe driver to this blacklist.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the function mii_find() in order to locate the PHY address.
Signed-off-by: Sylvie Barlow <sylvie.c.barlow@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently have no generic concept of a PHY address, since all
existing implementations simply hardcode the PHY address within the
MII access methods.
A bit-bashing MII interface will need to be provided with an explicit
PHY address in order to generate the correct waveform. Allow for this
by separating out the concept of a MII device (i.e. a specific PHY
address attached to a particular MII interface).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In TLS terminology a session conceptually spans multiple individual
connections, and essentially represents the stored cryptographic state
(master secret and cipher suite) required to establish communication
without going through the certificate and key exchange handshakes.
Rename tls_session to tls_connection in order to make the name
tls_session available to represent the session state.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently perform various min-entropy calculations using build-time
floating-point arithmetic. No floating-point code ends up in the
final binary, since the results are eventually converted to integers
and asserted to be compile-time constants.
Though this mechanism is undoubtedly cute, it inhibits us from using
"-mno-sse" to prevent the use of SSE registers by the compiler.
Fix by using fixed-point arithmetic instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the ACPI power management timer to be used if enabled via
TIMER_ACPI in config/timer.h. This provides an alternative timer on
systems where the standard 8254 PIT is unavailable or unreliable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some CAs provide non-functional OCSP servers, and some clients are
forced to operate on networks without access to the OCSP servers.
Allow the user to explicitly disable the use of OCSP checks by
undefining OCSP_CHECK in config/crypto.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mark the link as blocked if the LACP partner is not reporting itself
as being in sync, collecting, and distributing.
This matches the behaviour for STP: we mark the link as blocked if we
detect that the switch is actively blocking traffic, in order to
extend the DHCP discovery period and so prevent boot failures on
switches that take an excessively long time to enable ports.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow individual authentication schemes to parse WWW-Authenticate
headers that do not comply with RFC2617.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record and report the number of peers (calculated as the maximum
number of peers discovered for a block's segment at the time that the
block download is complete), and the percentage of blocks retrieved
from peers rather than from the origin server.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We must not steal ownership from the Gen 2 UEFI firmware, since doing
so will cause an immediate system crash (most likely in the form of a
reboot).
This problem was masked before commit a0f6e75 ("[hyperv] Do not fail
if guest OS ID MSR is already set"), since prior to that commit we
would always fail if we found any non-zero guest OS identity. We now
accept a non-zero previous guest OS identity in order to allow for
situations such as chainloading from iPXE to another iPXE, and as a
prerequisite for commit b91cc98 ("[hyperv] Cope with Windows Server
2016 enlightenments").
A proper fix would be to reverse engineer the UEFI protocols exposed
within the Hyper-V Gen 2 firmware and use these to bind to the VMBus
device representing the network connection, (with the native Hyper-V
driver moved to become a BIOS-only feature).
As an interim solution, fail to initialise the native Hyper-V driver
if we detect the guest OS identity known to be used by the Gen 2 UEFI
firmware. This will cause the standard all-drivers build (ipxe.efi)
to fall back to using the SNP driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Since we don't enable IOMMU at all, we can then simply enable the
IOMMU support by claiming the support of VIRITO_F_IOMMU_PLATFORM.
This fixes booting failure when iommu_platform is set from qemu cli.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The smsc75xx and smsc95xx drivers include a substantial amount of
identical functionality, varying only in the base address of register
sets. Abstract out this common functionality to allow code to be
shared between the drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support renegotiation with servers supporting RFC5746. This allows
for the use of per-directory client certificates.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use a zero language ID to retrieve strings such as the
ECM/NCM MAC address. This works on most hardware devices, but is
known to fail on some software emulated CDC-NCM devices.
Fix by using the first supported language ID, falling back to English
(0x0409) if any error occurs when fetching the list of supported
languages. This matches the behaviour of the Linux kernel.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our ASN.1 parsing code uses a struct asn1_cursor, while the object
construction code uses a struct asn1_builder. These structures are
identical apart from the const modifier applied to the data pointer in
struct asn1_cursor.
Provide asn1_built() to safely typecast a struct asn1_builder to a
struct asn1_cursor, allowing constructed objects to be passed to
functions expecting a struct asn1_cursor.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow values to be read from ACPI tables using the syntax
${acpi/<signature>.<index>.0.<offset>.<length>}
where <signature> is the ACPI table signature as a 32-bit hexadecimal
number (e.g. 0x41504093 for the 'APIC' signature on the MADT), <index>
is the index into the array of tables matching this signature,
<offset> is the byte offset within the table, and <length> is the
field length in bytes.
Numeric values are returned in reverse byte order, since ACPI numeric
values are usually little-endian.
For example:
${acpi/0x41504943.0.0.0.0} - entire MADT table in raw hex
${acpi/0x41504943.0.0.0x0a.6:string} - MADT table OEM ID
${acpi/0x41504943.0.0.0x24.4:uint32} - local APIC address
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When performing a SAN boot via INT 13, there is no way for the
operating system to indicate that it has finished using the INT 13 SAN
device. We therefore have no opportunity to clean up state before the
loaded operating system's native drivers take over. This can cause
problems when booting Windows, which tends not to be forgiving of
unexpected system state.
Windows will typically write a flag to the SAN device as the last
action before transferring control to the native drivers. We can use
this as a heuristic to bring the system to a quiescent state (without
performing a full shutdown); this provides us an opportunity to
temporarily clean up state that could otherwise prevent a successful
Windows boot.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader. This does not work when performing a SAN boot on a
machine where the EFI variable is not present.
Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting. For
example:
sanboot --filename \efi\redhat\grub.efi \
iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6
or
option ipxe.san-filename code 188 = string;
option ipxe.san-filename "\\efi\\redhat\\grub.efi";
option root-path "iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6";
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the SAN retry count to be configured via the ${san-retry}
setting, defaulting to the current value of 10 retries if not
specified.
Note that setting a retry count of zero is inadvisable, since iSCSI
targets in particular will often report spurious errors such as "power
on occurred" for the first few commands.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for multipath block devices. The "sanboot" and
"sanhook" commands now accept a list of SAN URIs. We open all URIs
concurrently. The first connection to become available for issuing
block device commands is marked as the active path and used for all
subsequent commands; all other connections are then closed. Whenever
the active path fails, we reopen all URIs and repeat the process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a dummy SAN device which allows the "sanhook" command to be tested
even when no SAN booting capability is present on the platform. This
allows substantial portions of the SAN boot code to be run in Linux
under Valgrind.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The eIPoIB translation layer needs to translate outbound ARP packets
from Ethernet to IPoIB. A 64-byte buffer (starting with the Ethernet
header) does not provide enough tailroom to expand to hold the two
20-byte IPoIB MAC addresses. The result is that an UNDI API user will
be unable to send ARP packets.
We could potentially shuffle the packet contents to reuse the space
occupied by the stripped Ethernet link-layer header, but this would
add complexity. Instead, fix by increasing the minimum allocation
size to 128 bytes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Track the current and maximum heap usage, and display the maximum
during shutdown when DEBUG=malloc is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The concept of the SAN drive number is meaningful only in a BIOS
environment, where it represents the INT13 drive number (0x80 for the
first hard disk). We retain this concept in a UEFI environment to
allow for a simple way for iPXE commands to refer to SAN drives.
Centralise the concept of the default drive number, since it is shared
between all supported environments.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Shutting down (and optionally restarting) multiple interfaces is
fraught with problems if there are loops in the interface connectivity
(e.g. the HTTP content-decoded and transfer-decoded interfaces, which
will generally loop back to each other). Various workarounds
currently exist across the codebase, generally involving preceding
calls to intf_nullify() to avoid problems due to known loops.
Provide intfs_shutdown() and intfs_restart() to allow all of an
object's interfaces to be shut down (or restarted) in a single call,
without having to worry about potential external loops.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the active timer (providing udelay() and currticks()) to be
selected at runtime based on probing during the INIT_EARLY stage of
initialisation.
TICKS_PER_SEC is now a fixed compile-time constant for all builds, and
is independent of the underlying clock tick rate. We choose the value
1024 to allow multiplications and divisions on seconds to be converted
to bit shifts.
TICKS_PER_MS is defined as 1, allowing multiplications and divisions
on milliseconds to be omitted entirely. The 2% inaccuracy in this
definition is negligible when using the standard BIOS timer (running
at around 18.2Hz).
TIMER_RDTSC now checks for a constant TSC before claiming to be a
usable timer. (This timer can be tested in KVM via the command-line
option "-cpu host,+invtsc".)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Separate out the concept of "hardware maximum supported frame length"
and "configured link MTU", and limit the latter according to the
former.
In networks where the DHCP-supplied link MTU is inconsistent with the
hardware or driver capabilities (e.g. a network using jumbo frames),
this will result in iPXE advertising a TCP MSS consistent with a size
that can actually be received.
Note that the term "MTU" is typically used to refer to the maximum
length excluding the link-layer headers; we adopt this usage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a settings applicator to modify netdev->max_pkt_len in
response to changes to the "mtu" setting (DHCP option 26).
Note that as with MAC address changes, drivers are permitted to
completely ignore any changes in the MTU value. The net result will
be that iPXE effectively uses the smaller of either the hardware
default MTU or the software configured MTU.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This code largely inspired by tap.c. Allows for testing iPXE on real
NICs from within Linux. For example:
make bin-x86_64-linux/af_packet.linux
valgrind ./bin-x86_64-linux/af_packet.linux --net af_packet,if=eth3
Tested as x86_64 and i386 binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 0.9 implementation was limited to the maximum virtqueue size of
MAX_QUEUE_NUM and the virtio-net driver would fail to initialize on hosts
exceeding this limit.
This commit lifts the restriction by allocating the queue memory based on
the actual queue size instead of using a fixed maximum. Note that virtio
1.0 still uses the MAX_QUEUE_NUM constant to cap the size (unfortunately
this functionality is not available in virtio 0.9).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
vpm_find_vqs incorrectly accepted the host provided queue size with no
regard to iPXE's internal limitations. Virtio 1.0 makes it possible for
the driver to override the queue size to reduce memory requirements and
iPXE is a great use case for this feature.
Also removing the extra vq->vring.num assignment which is already
handled in vring_init.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no clean way for device drivers to shut down in
preparation for handover to a booted operating system. The platform
firmware simply doesn't bother to call the drivers' Stop() methods.
Instead, drivers must register an EVT_SIGNAL_EXIT_BOOT_SERVICES event
to be signalled when ExitBootServices() is called, and clean up
without any reference to the EFI driver model.
Unfortunately, all timers silently stop working when ExitBootServices()
is called. Even more unfortunately, and for no discernible reason,
this happens before any EVT_SIGNAL_EXIT_BOOT_SERVICES events are
signalled. The net effect of this entertaining design choice is that
any timeout loops on the shutdown path (e.g. for gracefully closing
outstanding TCP connections) may wait indefinitely.
There is no way to report failure from currticks(), since the API
lazily assumes that the host system continues to travel through time
in the usual direction. Work around EFI's violation of this
assumption by falling back to a simple free-running monotonic counter.
Debugged-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AppleNetBoot.h is not taken from the EDK2 codebase and so cannot be
imported using include/ipxe/efi/import.pl. Mark as a native iPXE
header (by changing the include guard) to avoid breaking the import
process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow certificates to be marked as having been added explicitly at run
time. Such certificates will not be discarded via the certificate
store cache discarder.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable IMAGE_PNG (but not IMAGE_PNM) by default, and drag in the
relevant objects only when image_pixbuf() is present in the binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PEM-encoded ASN.1 as an image format. We accept as PEM any image
containing a line starting with a "-----BEGIN" boundary marker.
We allow for PEM files containing multiple ASN.1 objects, such as a
certificate chain produced by concatenating individual certificate
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add DER-encoded ASN.1 as an image format. There is no fixed signature
for DER files. We treat an image as DER if it comprises a single
valid SEQUENCE object covering the entire length of the image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow code to create a partial ASN.1 cursor containing only the type
and length bytes, so that asn1_start() may be used to determine the
length of a large ASN.1 blob without first allocating memory to hold
the entire blob.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Windows drivers for VMBus devices are enumerated using the
instance UUID rather than the channel number. Include the instance
UUID within the iPXE device name to allow an iPXE network device to be
more easily associated with the corresponding Windows network device
when debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Select the IPv6 source address and corresponding router (if any) using
a very simplified version of the algorithm from RFC6724:
- Ignore any source address that has a smaller scope than the
destination address. For example, do not use a link-local source
address when sending to a global destination address.
- If we have a source address which is on the same link as the
destination address, then use that source address.
- If we are left with multiple possible source addresses, then choose
the address with the smallest scope. For example, if we are sending
to a site-local destination address and we have both a global source
address and a site-local source address, then use the site-local
source address.
- If we are still left with multiple possible source addresses, then
choose the address with the longest matching prefix.
For the purposes of this algorithm, we treat RFC4193 Unique Local
Addresses as having organisation-local scope. Since we use only
link-local scope for our multicast transmissions, this approximation
should remain valid in all practical situations.
Originally-implemented-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the IPv6 settings to construct the routing table, in a matter
analogous to the construction of the IPv4 routing table.
This allows for manual assignment of IPv6 addresses via e.g.
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0/len6 64
set net0/gateway6 fe80::226:bff:fedd:d3c0
The prefix length ("len6") may be omitted, in which case a default
prefix length of 64 will be assumed.
Multiple IPv6 addresses may be assigned manually by implicitly
creating child settings blocks. For example:
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0.ula/ip6 fda4:2496:e992::6950:5845
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonable user expectation is that ${net0/ip6} should show the
"highest-priority" of the IPv6 addresses, even when multiple IPv6
addresses are active. The expected order of priority is likely to be
manually-assigned addresses first, then stateful DHCPv6 addresses,
then SLAAC addresses, and lastly link-local addresses.
Using ${priority} to enforce an ordering is undesirable since that
would affect the priority assigned to each of the net<N> blocks as a
whole, so use the sibling ordering capability instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow settings blocks to provide an explicit default ordering between
siblings, with lower precedence than the existing ${priority} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the IPv6 address (or prefix) as ${ip6}, the prefix length as
${len6}, and the router address as ${gateway6}.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The settings scope ipv6_scope refers specifically to IPv6 settings
that have a corresponding DHCPv6 option. Rename to dhcpv6_scope to
more accurately reflect this purpose.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In edk2, there are several drivers that associate HII forms (and
corresponding config access protocol instances) with each individual
network device. (In this context, "network device" means the EFI
handle on which the SNP protocol is installed, and on which the device
path ending with the MAC() node is installed also.) Such edk2 drivers
are, for example: Ip4Dxe, HttpBootDxe, VlanConfigDxe.
In UEFI, any given handle can carry at most one instance of a specific
protocol (see e.g. the specification of the InstallProtocolInterface()
boot service). This implies that the class of drivers mentioned above
can't install their EFI_HII_CONFIG_ACCESS_PROTOCOL instances on the
SNP handle directly -- they would conflict with each other.
Accordingly, each of those edk2 drivers creates a "private" child
handle under the SNP handle, and installs its config access protocol
(and corresponding HII package list) on its child handle.
The device path for the child handle is traditionally derived by
appending a Hardware Vendor Device Path node after the MAC() node.
The VenHw() nodes in question consist of a GUID (by definition), and
no trailing data (by choice). The purpose of these VenHw() nodes is
only that all the child nodes can be uniquely identified by device
path.
At the moment iPXE does not follow this pattern. It doesn't run into
a conflict when it installs its EFI_HII_CONFIG_ACCESS_PROTOCOL
directly on the SNP handle, but that's only because iPXE is the sole
driver not following the pattern. This behavior seems risky (one
might call it a "latent bug"); better align iPXE with the edk2 custom.
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Gary Lin <glin@suse.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13494/focus=13532
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with assertions, profiling is enabled for objects built with any
debug level (including an explicit debug level of zero).
Allow profiling to be globally enabled or disabled by adding PROFILE=1
or PROFILE=0 respectively to the build command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The vendor class identifier strings in DHCP_ARCH_VENDOR_CLASS_ID are
out of sync with the (correct) client architecture values in
DHCP_ARCH_CLIENT_ARCHITECTURE.
Fix by removing all definitions of DHCP_ARCH_VENDOR_CLASS_ID, and
instead generating the vendor class identifier string automatically
based on DHCP_ARCH_CLIENT_ARCHITECTURE and DHCP_ARCH_CLIENT_NDI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC3315 defines DHCPv6 option 16 (vendor class identifier) but does
not define any direct relationship with the roughly equivalent DHCPv4
option 60.
The PXE specification predates IPv6, and the UEFI specification is
expectedly vague on the subject. Examination of the reference EDK2
codebase suggests that the DHCPv6 vendor class identifier will be
formatted in accordance with RFC3315, using a single vendor-class-data
item in which the opaque-data field is the string as would appear in
DHCPv4 option 60.
RFC3315 requires the vendor class identifier to specify an IANA
enterprise number, as a way of disambiguating the vendor-class-data
namespace. The EDK2 code uses the value 343, described as:
// TODO: IANA TBD: temporarily using Intel's
Since this "TODO" has been present since at least 2010, it is probably
safe to assume that it has now become a de facto standard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC5970 defines DHCPv6 options 61 (client system architecture type)
and 62 (client network interface identifier), with contents equivalent
to DHCPv4 options 93 and 94 respectively.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the regions may end up being unmapped, either because they are
optional or because the attempt to map them has failed. Region types
starting at 0 didn't make it easy to test for this condition.
This commit bumps all valid region types up by 1 with 0 having the
implicit 'unmapped' meaning.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a mechanism to allow an arbitrary adjustment to be applied to
all subsequent calls to time().
Note that the underlying clock source (e.g. the RTC clock) will not be
changed; only the time as reported within iPXE will be affected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some circumstances, intermediate devices may lose state in a way
that temporarily prevents the successful delivery of packets from a
TCP peer. For example, a firewall may drop a NAT forwarding table
entry.
Since iPXE spends most of its time downloading files (and hence purely
receiving data, sending only TCP ACKs), this can easily happen in a
situation in which there is no reason for iPXE's TCP stack to generate
any retransmissions. The temporary loss of connectivity can therefore
effectively become permanent.
Work around this problem by sending TCP keepalives after a period of
inactivity on an established connection.
TCP keepalives usually send a single garbage byte in sequence number
space that has already been ACKed by the peer. Since we do not need
to elicit a response from the peer, we instead send pure ACKs (with no
garbage data) in order to keep the transmit code path simple.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mac OS X uses non-standard EFI protocols to obtain the DHCP packets
from the UEFI firmware.
Originally-implemented-by: Michael Kuron <m.kuron@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There has been a longstanding disagreement between RFC4578 and the
IANA "Processor Architecture Types" registry. RFC4578 section 2.1
defines type 7 as "EFI BC" and type 9 as "EFI x86-64"; the IANA
registry quotes RFC4578 as its source but has these values erroneously
swapped. The EDK2 codebase uses the IANA values.
As of March 2016, RFC4578 has been modified by an errata to match the
values as recorded in the IANA registry.
Fix our definitions to match the consensus values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the EFI_CPU_ARCH_PROTOCOL's GetTimerValue() method to
generate the currticks() timer, calibrated against a 1ms delay from
the boot services Stall() method.
This does not work on ARM platforms, where GetTimerValue() is an empty
stub which just returns EFI_UNSUPPORTED.
Fix by instead creating a periodic timer event, and using this event
to increment a current tick counter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Require architecture-specific code to make a deliberate choice to use
the unoptimised generic_tcpip_continue_chksum() function, if there is
no optimised version available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit adds support for driving virtio 1.0 PCI devices. In
addition to various helpers, a number of vpm_ functions are introduced
to be used instead of their legacy vp_ counterparts when accessing
virtio 1.0 (aka modern) devices.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 1.0 introduces new constants and data structures, common to all
devices as well as specific to virtio-net. This commit adds a subset
of these to be able to drive the virtio-net 1.0 network device.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI devices may support more capabilities of the same type (for
example PCI_CAP_ID_VNDR) and there was no way to discover all of them.
This commit adds a new API pci_find_next_capability which provides
this functionality. It would typically be used like so:
for (pos = pci_find_capability(pci, PCI_CAP_ID_VNDR);
pos > 0;
pos = pci_find_next_capability(pci, pos, PCI_CAP_ID_VNDR)) {
...
}
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DHCP option 175.189 has been defined (by us) since 2006 as
containing the drive number to be used for a SAN boot, but has never
been automatically used as such by iPXE.
Use this option (if specified) to override the default SAN drive
number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Interpret the maximum drive number (0xff for hard disks, 0x7f for
floppy disks) as meaning "use natural drive number".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide access to local files via the "file://" URI scheme. There are
three syntaxes:
- An opaque URI with a relative path (e.g. "file:script.ipxe").
This will be interpreted as a path relative to the iPXE binary.
- A hierarchical URI with a non-network absolute path
(e.g. "file:/boot/script.ipxe"). This will be interpreted as a
path relative to the root of the filesystem from which the iPXE
binary was loaded.
- A hierarchical URI with a network path in which the authority is a
volume label (e.g. "file://bootdisk/script.ipxe"). This will be
interpreted as a path relative to the root of the filesystem with
the specified volume label.
Note that the potentially desirable shell mappings (e.g. "fs0:" and
"blk0:") are concepts internal to the UEFI shell binary, and do not
seem to be exposed in any way to external executables. The old
EFI_SHELL_PROTOCOL (which did provide access to these mappings) is no
longer installed by current versions of the UEFI shell.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no practical way to generate an underlength ARP packet since
an ARP packet is always padded up to the minimum Ethernet frame length
(or dropped by the receiving Ethernet hardware if incorrectly padded),
but the absence of an explicit check causes warnings from some
analysis tools.
Fix by adding an explicit check on the I/O buffer length.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The assumption in asn1_type() that an ASN.1 cursor will always contain
a type byte is incorrect. A cursor that has been cleanly invalidated
via asn1_invalidate_cursor() will contain a type byte, but there are
other ways in which to arrive at a zero-length cursor.
Fix by explicitly checking the cursor length in asn1_type(). This
allows asn1_invalidate_cursor() to be reduced to simply zeroing the
length field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations utilise an EoIB-to-Ethernet gateway device
that does not perform a FullMember join to the multicast group for the
EoIB broadcast domain. This has various exciting side-effects, such
as requiring every EoIB node to send every broadcast packet twice.
As an added bonus, the gateway may also break the EoIB MAC address to
GID mapping protocol by sending Ethernet-sourced packets from the
wrong QPN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations require each individual EoIB node to create
the multicast group for the EoIB broadcast domain.
It is left as an exercise for the interested reader to determine how
such an implementation might ever allow the parameters of such a
multicast group to be changed without requiring a simultaneous upgrade
of every driver on every operating system on every machine currently
attached to the fabric.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EoIB is a fairly simple protocol in which raw Ethernet frames
(excluding the CRC) are encapsulated within Infiniband Unreliable
Datagrams, with a four-byte fixed EoIB header (which conveys no actual
information). The Ethernet broadcast domain is provided by a
multicast group, similar to the IPoIB IPv4 multicast group.
The mapping from Ethernet MAC addresses to Infiniband address vectors
is achieved by snooping incoming traffic and building a peer cache
which can then be used to map a MAC address into a port GID. The
address vector is completed using a path record lookup, as for IPoIB.
Note that this requires every packet to include a GRH.
Add basic support for EoIB devices. This driver is substantially
derived from the IPoIB driver. There is currently no mechanism for
automatically creating EoIB devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit e62e52b ("[ipoib] Simplify test for received broadcast
packets") relies upon the multicast LID being present in the
destination address vector as passed to ipoib_complete_recv().
Unfortunately, this information is not present in many Infiniband
devices' completion queue entries.
Fix by testing instead for the presence of a multicast GID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the speed of a USB device based on the port's speed at the time
that the device was enabled. This allows us to remember the device's
speed even after the device has been disconnected (and so the port's
current speed has changed).
In particular, this allows us to correctly identify the transaction
translator for a low-speed or full-speed device after the device has
been disconnected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide BIT_QWORD_PTR() to allow for easy extraction of non-endian
fields (e.g. Infiniband GUIDs) without unnecessary byte swapping.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Tested using QEMU and usbredir to expose the LAN9512 chip present on a
Raspberry Pi.
There is a known issue with the LAN9512: an extra two bytes are
appended to every transmitted packet. These two bytes comprise:
{ 0x00, 0x08 } if packet length == 0 (mod 8)
{ CRC[0], 0x00 } if packet length == 7 (mod 8)
{ CRC[0], CRC[1] } otherwise
The extra bytes are appended whether the Ethernet CRC is generated
manually or added automatically by the hardware. The issue occurs
with the Linux kernel driver as well as the iPXE driver. It appears
to be an undocumented hardware errata.
TCP/IP traffic is not affected, since the IP header length field
causes the extraneous bytes to be discarded by the receiver. However,
protocols that rely on the length of the Ethernet frame (such as FCoE
or iPXE's "lotest" protocol) will be unusable on this hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the UEFI platform firmware to provide drivers for unrecognised
devices, by exposing our own implementation of EFI_USB_IO_PROTOCOL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make the class ID a property of the USB driver (rather than a property
of the USB device ID), and allow USB drivers to specify a wildcard ID
for any of the three component IDs (class, subclass, or protocol).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generate a score for each possible USB device configuration based on
the available driver support, and select the configuration with the
highest score. This will allow us to prefer ECM over RNDIS (for
devices which support both) and will allow us to meaningfully select a
configuration even when we have drivers available for all functions
(e.g. when exposing unused functions via EFI_USB_IO_PROTOCOL).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The decision on whether or not a zero-length packet needs to be
transmitted is independent of the host controller and belongs in the
USB core.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TCP/IP checksum fields are one's complement values and therefore have
two possible representations of zero: positive zero (0x0000) and
negative zero (0xffff).
In RFC768, UDP over IPv4 exploits this redundancy to repurpose the
positive representation of zero (0x0000) to mean "no checksum
calculated"; checksums are optional for UDP over IPv4.
In RFC2460, checksums are made mandatory for UDP over IPv4. The
wording of the RFC is such that the UDP header is mandated to use only
the negative representation of zero (0xffff), rather than simply
requiring the checksum to be correct but allowing for either
representation of zero to be used.
In RFC1071, an example algorithm is given for calculating the TCP/IP
checksum. This algorithm happens to produce only the positive
representation of zero (0x0000); this is an artifact of the way that
unsigned arithmetic is used to calculate a signed one's complement
sum (and its final negation).
A common misconception has developed (exemplified in RFC1624) that
this artifact is part of the specification. Many people have assumed
that the checksum field should never contain the negative
representation of zero (0xffff).
A sensible receiver will calculate the checksum over the whole packet
and verify that the result is zero (in whichever representation of
zero happens to be generated by the receiver's algorithm). Such a
receiver will not care which representation of zero happens to be used
in the checksum field.
However, there are receivers in existence which will verify the
received checksum the hard way: by calculating the checksum over the
remainder of the packet and comparing the result against the checksum
field. If the representation of zero used by the receiver's algorithm
does not match the representation of zero used by the transmitter (and
so placed in the checksum field), and if the receiver does not
explicitly allow for both representations to compare as equal, then
the receiver may reject packets with a valid checksum.
For UDP, the combined RFCs effectively mandate that we should generate
only the negative representation of zero in the checksum field.
For IP, TCP and ICMP, the RFCs do not mandate which representation of
zero should be used, but the misconceptions which have grown up around
RFC1071 and RFC1624 suggest that it would be least surprising to
generate only the positive representation of zero in the checksum
field.
Fix by ensuring that all of our checksum algorithms generate only the
positive representation of zero, and explicitly inverting this in the
case of transmitted UDP packets.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow iPXE to coexist with other USB device drivers, by attaching to
the EFI_USB_IO_PROTOCOL instances provided by the UEFI platform
firmware.
The EFI_USB_IO_PROTOCOL is an unsurprisingly badly designed
abstraction of a USB device. The poor design choices intrinsic in the
UEFI specification prevent efficient operation as a network device,
with the result that devices operated using the EFI_USB_IO_PROTOCOL
operate approximately two orders of magnitude slower than devices
operated using our native EHCI or xHCI host controller drivers.
Since the performance is so abysmally slow, and since the underlying
problems are due to fundamental architectural mistakes in the UEFI
specification, support for the EFI_USB_IO_PROTOCOL host controller
driver is left as disabled by default. Users are advised to use the
native iPXE host controller drivers instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many UEFI NBPs expect to find an EFI_PXE_BASE_CODE_PROTOCOL installed
in addition to the EFI_SIMPLE_NETWORK_PROTOCOL. Most NBPs use the
EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the cached DHCP packets.
This implementation has been tested with grub.efi, shim.efi,
syslinux.efi, and wdsmgfw.efi. Some methods (such as Discover() and
Arp()) are not used by any known NBP and so have not (yet) been
implemented.
Usage notes for the tested bootstraps are:
- grub.efi uses EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the
cached DHCP packet, and uses no other methods.
- shim.efi uses EFI_PXE_BASE_CODE_PROTOCOL to retrieve the cached
DHCP packet and to retrieve the next NBP via the Mtftp() method.
If shim.efi was downloaded via HTTP (or other non-TFTP protocol)
then shim.efi will blindly call Mtftp() with an HTTP URI as the
filename: this allows the next NBP (e.g. grubx64.efi) to also be
transparently retrieved by HTTP.
shim.efi can also use the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL to
retrieve files previously loaded by "imgfetch" or similar commands
in iPXE. The current implementation of shim.efi will use the
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL only if it does not find an
EFI_PXE_BASE_CODE_PROTOCOL; this patch therefore prevents this
usage of our EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. This logic could be
trivially reversed in shim.efi if needed.
- syslinux.efi uses EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the
cached DHCP packet. Versions 6.03 and earlier have a bug which
may cause syslinux.efi to attach to the wrong NIC if there are
multiple NICs in the system (or if the UEFI firmware supports
IPv6).
- wdsmgfw.efi (ab)uses EFI_PXE_BASE_CODE_PROTOCOL to retrieve the
cached DHCP packets, and to send and retrieve UDP packets via the
UdpWrite() and UdpRead() methods. (This was presumably done in
order to minimise the amount of benefit obtainable by switching to
UEFI, by replicating all of the design mistakes present in the
original PXE specification.)
The EFI_DOWNGRADE_UX configuration option remains available for now,
until this implementation has received more widespread testing.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Merge the functionality of parse_next_server_and_filename() and
tftp_uri() into a single pxe_uri(), which takes a server address
(IPv4/IPv6/none) and a filename, and produces a URI using the rule:
- if the filename is a hierarchical absolute URI (i.e. includes a
scheme such as "http://" or "tftp://") then use that URI and ignore
the server address,
- otherwise, if the server address is recognised (according to
sa_family) then construct a TFTP URI based on the server address,
port, and filename
- otherwise fail.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add definitions of protocols observed to be used by wdsmgfw.efi, and
add a handle name type for ConIn, ConOut, and StdErr.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add debug wrappers for more boot services functions, and print
symbolic values rather than raw numbers where possible.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 09b057c ("[settings] Remove "uristring" setting type") removed
support for URI-encoded settings via the "uristring" setting type, on
the basis that such encoding was no longer necessary to avoid problems
with the command line parser.
Other valid use cases for the "uristring" setting type do exist: for
example, a password containing a '/' character expanded via
chain http://username:${password:uristring}@server.name/boot.php
Restore the existence of the "uristring" setting, avoiding the
potentially large stack allocations that were used in the old code
prior to commit 09b057c ("[settings] Remove "uristring" setting
type").
Requested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current usage pattern of image_probe() is a legacy from the time
before commit 34b6ecb ("[image] Simplify image management") when
loading an image to its executable location in memory was a separate
action from actually executing the image.
Call image_probe() as soon as an image is registered. This allows
"imgstat" to display image type information for all images and allows
image-consuming code to assume that image->type is already set
correctly.
Ignore failures if image_probe() does not recognise the image, since
we do expect to handle unrecognised images (initrds, modules, etc).
Unrecognised images will be left with a NULL image->type, which
image-consuming code can easily check.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite the HTTP core to allow for the addition of arbitrary content
encoding mechanisms, such as PeerDist and gzip.
The core now exposes http_open() which can be used to create requests
with an explicitly selected HTTP method, an optional requested content
range, and an optional request body. A simple wrapper provides the
preexisting behaviour of creating either a GET request or an
application/x-www-form-urlencoded POST request (if the URI includes
parameters).
The HTTP SAN interface is now implemented using the generic block
device translator. Individual blocks are requested using http_open()
to create a range request.
Server connections are now managed via a connection pool; this allows
for multiple requests to the same server (e.g. for SAN blocks) to be
completely unaware of each other. Repeated HTTPS connections to the
same server can reuse a pooled connection, avoiding the per-connection
overhead of establishing a TLS session (which can take several seconds
if using a client certificate).
Support for HTTP SAN booting and for the Basic and Digest
authentication schemes is now optional and can be controlled via the
SANBOOT_PROTO_HTTP, HTTP_AUTH_BASIC, and HTTP_AUTH_DIGEST build
configuration options in config/general.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI platforms may provide a watchdog timer, which will reboot the
machine if an operating system takes more than five minutes to load.
This can cause long-lived iPXE downloads (or interactive shell
sessions) to unexpectedly reboot.
Fix by resetting the watchdog timer every ten seconds while the iPXE
main processing loop continues to run.
Reported-by: Bradley B Williams <bradleybwilliams@swbell.net>
Reported-by: John Clark <john.r.clark.3@gmail.com>
Reported-by: wdriever@gmail.com
Reported-by: Charlie Beima <cbeima@indiana.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for SHA-224, SHA-384, and SHA-512 as digest algorithms in
X.509 certificates, and allow the choice of public-key, cipher, and
digest algorithms to be configured at build time via config/crypto.h.
Originally-implemented-by: Tufan Karadere <tufank@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Check for existence of the UART in uart_select(), not just in
uart_init(). This allows uart_select() to refuse to set a non-working
address in uart->base, which in turns means that the serial console
code will not attempt to use a non-existent UART.
Reported-by: Torgeir Wulfsberg <Torgeir.Wulfsberg@kongsberg.com>
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We do not set up any kind of virtual addressing before invoking an
ELFBoot image. Reject if the image's program headers indicate that
virtual addresses are not equal to physical addresses.
This avoids problems when loading some RHEL5 kernels, which seem to
include ELFBoot headers using virtual addressing. With this change,
these kernels are no longer detected as ELFBoot, and so may be
(correctly) detected as bzImage instead.
Reported-by: Torgeir.Wulfsberg@kongsberg.com
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow line buffer to accumulate multiple lines, with buffered_line()
returning each freshly-completed line as it is encountered. This
allows buffered lines to be subsequently processed as a group.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
VLAN and 802.11 devices use a network device operations structure that
wraps an underlying structure. For example, the vlan_operations
structure wraps the network device operations structure of the
underlying trunk device. This can cause false positives from the
current implementation of netdev_irq_supported(), which will always
report that VLAN devices support interrupts since it has no visibility
into the support provided by the underlying trunk device.
Fix by allowing network devices to explicitly flag that interrupts are
not supported, despite the presence of an irq() method.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the IPv6 concept of "scope ID" (indicating the network device
index) to IPv4 socket addresses, so that IPv4 multicast transmissions
may specify the transmitting network device.
The scope ID is not (currently) exposed via the string representation
of the socket address, since IPv4 does not use the IPv6 concept of
link-local addresses (which could legitimately be specified in a URI).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Redefine various IPv4 address constants and testing macros to avoid
unnecessary byte swapping at runtime, and slightly rename the macros
to prevent code from accidentally using the old definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an IPv6 socket address string specifies a link-local or multicast
address but does not specify the requisite network device name
(e.g. "fe80::69ff:fe50:5845" rather than "fe80::69ff:fe50:5845%net0"),
assume the use of "netX".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the AES implementation from AXTLS with a dedicated iPXE
implementation which is slightly smaller and around 1000% faster.
This implementation has been verified using the existing self-tests
based on the NIST AES test vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the cost of implementing object methods which convey no
information beyond the fact that the method has been called.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide profile_custom() as a trivial wrapper around profile_update()
to allow for the use of the profiling infrastructure by code using
timers other than the default profile_timestamp() provider.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an inject_corruption() function that can be used to randomly
corrupt data bytes with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic inject_fault() function that can be used to inject
random faults with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix the TxBuf value filled in by GetStatus() to report the transmit
buffer address as required by the (now clarified) specification.
Simplify "interrupt" handling in GetStatus() to report only that one
or more packets have been transmitted or received; there is no need to
report one GetStatus() "interrupt" per packet.
Simplify receive handling to dequeue received packets immediately from
the network device into an internal list (thereby avoiding the hacks
previously used to determine when to report new packet arrivals).
Originally-fixed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently do not wait for a received FIN before exiting to boot a
loaded OS. In the common case of booting from an HTTP server, this
means that the TCP connection is left consuming resources on the
server side: the server will retransmit the FIN several times before
giving up.
Fix by initiating a graceful close of all TCP connections and waiting
(for up to one second) for all connections to finish closing
gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and
for the incoming FIN to have been received and ACKed at least once).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only way to map an eIPoIB MAC address (REMAC) to an IPoIB MAC
address is to intercept an incoming ARP request or reply.
If we do not have an REMAC cache entry for a particular destination
MAC address, then we cannot transmit the packet. This can arise in at
least two situations:
- An external program (e.g. a PXE NBP using the UNDI API) may attempt
to transmit to a destination MAC address that has been obtained by
some method other than ARP.
- Memory pressure may have caused REMAC cache entries to be
discarded. This is fairly likely on a busy network, since REMAC
cache entries are created for all received (broadcast) ARP
requests. (We can't sensibly avoid creating these cache entries,
since they are required in order to send an ARP reply, and when we
are being used via the UNDI API we may have no knowledge of which
IP addresses are "ours".)
Attempt to ameliorate the situation by generating a semi-spurious ARP
request whenever we find a missing REMAC cache entry. This will
hopefully trigger an ARP reply, which would then provide us with the
information required to populate the REMAC cache.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A fairly common end-user problem is that the default configuration of
a switch may leave the port in a non-forwarding state for a
substantial length of time (tens of seconds) after link up. This can
cause iPXE to time out and give up attempting to boot.
We cannot force the switch to start forwarding packets sooner, since
any attempt to send a Spanning Tree Protocol bridge PDU may cause the
switch to disable our port (if the switch happens to have the Bridge
PDU Guard feature enabled for the port).
For non-ancient versions of the Spanning Tree Protocol, we can detect
whether or not the port is currently forwarding and use this to inform
the network device core that the link is currently blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When Spanning Tree Protocol (STP) is used, there may be a substantial
delay (tens of seconds) from the time that the link goes up to the
time that the port starts forwarding packets.
Add a generic concept of a "blocked link" (i.e. a link which is up but
which is not expected to communicate successfully), and allow "ifstat"
to indicate when a link is blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtual functions use a mailbox to communicate with the physical
function driver: this covers functionality such as obtaining the MAC
address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When USB network card drivers are used, the BIOS' legacy USB
capability is necessarily disabled since there is no way to share the
host controller between the BIOS and iPXE. This currently results in
USB keyboards becoming non-functional in USB-enabled builds of iPXE.
Fix by adding basic support for USB keyboards, enabled by default in
iPXE builds which include USB support.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an EHCI hotplug action results in the controller disowning the
port, it will result in a hotplug action on the corresponding UHCI or
OHCI controller. Allow such hotplug actions to be carried out as part
of the same call to usb_step() or usb_register_bus(), by maintaining a
single central list of changed ports.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB core will currently fail to detect disconnections if a new
device has attached by the time the port is examined in
usb_hotplug().
Fix by recording the fact that a disconnection has taken place
whenever the "connection status changed" (CSC) bit is observed to be
set. (Whether the change represents a disconnection or a
reconnection, it indicates that the port has experienced some time of
being disconnected.)
Note that the time at which a disconnection can be detected varies by
hub type. In particular: root hubs can observe the CSC bit when
polling, and so will record the disconnection before calling
usb_port_changed(), but USB hubs read the port status (and hence the
CSC bit) only during the call to hub_speed(), long after the call to
usb_port_changed().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rename PCI_CLASS() (which constructs a struct pci_class_id) to
PCI_CLASS_ID(), and provide PCI_CLASS() as a macro which constructs
the 24-bit scalar value of a PCI class code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB API currently assumes that host controllers will have
immediate data buffer space available in which to store the setup
packet. This is true for xHCI, partially true for EHCI (which happens
to have 12 bytes of padding in each transfer descriptor due to
alignment requirements), and not true at all for UHCI.
Include the setup packet within the I/O buffer passed to the host
controller's message() method, thereby eliminating the requirement for
host controllers to provide immediate data buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current API for Base16 (and Base64) encoding requires the caller
to always provide sufficient buffer space. This prevents the use of
the generic encoding/decoding functionality in some situations, such
as in formatting the hex setting types.
Implement a generic hex_encode() (based on the existing
format_hex_setting()), implement base16_encode() and base16_decode()
in terms of the more generic hex_encode() and hex_decode(), and update
all callers to provide the additional buffer length parameter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Entropy gathering via timer ticks is slow under UEFI (of the order of
20-30 seconds on some machines). Use the EFI_RNG_PROTOCOL if
available, to speed up the process of entropy gathering.
Note that some implementations (including EDK2) will fail if we
request fewer than 32 random bytes at a time, and that the RNG
protocol provides no guarantees about the amount of entropy provided
by a call to GetRNG(). We take the (hopefully pessimistic) view that
a 32-byte block returned by GetRNG() will contain at least the 1.3
bits of entropy claimed by min_entropy_per_sample().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-512/224 is almost identical to SHA-512, with differing initial
hash values and a truncated output length.
This implementation has been verified using the NIST SHA-512/224 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-512/256 is almost identical to SHA-512, with differing initial
hash values and a truncated output length.
This implementation has been verified using the NIST SHA-512/256 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-384 is almost identical to SHA-512, with differing initial hash
values and a truncated output length.
This implementation has been verified using the NIST SHA-384 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-224 is almost identical to SHA-256, with differing initial hash
values and a truncated output length.
This implementation has been verified using the NIST SHA-224 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
xHCI provides a somewhat convoluted mechanism for specifying details
of a transaction translator. Hubs must be marked as such in the
device slot context. The only opportunity to do so is as part of a
Configure Endpoint command, which can be executed only when opening
the hub's interrupt endpoint.
We add a mechanism for host controllers to intercept the opening of
hub devices, providing xHCI with an opportunity to update the internal
device slot structure for the corresponding USB device to indicate
that the device is a hub. We then include the hub-specific details in
the input context whenever any Configure Endpoint command is issued.
When a device is opened, we record the device slot and port for its
transaction translator (if any), and supply these as part of the
Address Device command.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current endpoint reset logic defers the reset until the caller
attempts to enqueue a new transfer to that endpoint. This is
insufficient when dealing with endpoints behind a transaction
translator, since the transaction translator is a resource shared
between multiple endpoints.
We cannot reset the endpoint as part of the completion handling, since
that would introduce recursive calls to usb_poll(). Instead, we
add the endpoint to a list of halted endpoints, and perform the reset
on the next call to usb_step().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several of the USB timeouts were chosen on the principle of "pick an
arbitrary but ridiculously large value, just to be safe". It turns
out that some of the timeouts permitted by the USB specification are
even larger: for example, control transactions are allowed to take up
to five seconds to complete.
Fix up these USB timeout values to match those found in the USB2
specification.
Debugged-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TCP Selective Acknowledgement option (specified in RFC2018)
provides a mechanism for the receiver to indicate packets that have
been received out of order (e.g. due to earlier dropped packets).
iPXE often operates in environments in which there is a high
probability of packet loss. For example, the legacy USB keyboard
emulation in some BIOSes involves polling the USB bus from within a
system management interrupt: this introduces an invisible delay of
around 500us which is long enough for around 40 full-length packets to
be dropped. Similarly, almost all 1Gbps USB2 devices will eventually
end up dropping packets because the USB2 bus does not provide enough
bandwidth to sustain a 1Gbps stream, and most devices will not provide
enough internal buffering to hold a full TCP window's worth of
received packets.
Add support for sending TCP Selective Acknowledgements. This provides
the sender with more detailed information about which packets have
been lost, and so allows for a more efficient retransmission strategy.
We include a SACK-permitted option in our SYN packet, since
experimentation shows that at least Linux peers will not include a
SACK-permitted option in the SYN-ACK packet if one was not present in
the initial SYN. (RFC2018 does not seem to mandate this behaviour,
but it is consistent with the approach taken in RFC1323.) We ignore
any received SACK options; this is safe to do since SACK is only ever
advisory and we never have to send non-trivial amounts of data.
Since our TCP receive queue is a candidate for cache discarding under
low memory conditions, we may end up discarding data that has been
reported as received via a SACK option. This is permitted by RFC2018.
We follow the stricture that SACK blocks must not report data which is
no longer held by the receiver: previously-reported blocks are
validated against the current receive queue before being included
within the current SACK block list.
Experiments in a qemu VM using forced packet drops (by setting
NETDEV_DISCARD_RATE to 32) show that implementing SACK improves
throughput by around 400%.
Experiments with a USB2 NIC (an SMSC7500) show that implementing SACK
improves throughput by around 700%, increasing the download rate from
35Mbps up to 250Mbps (which is approximately the usable bandwidth
limit for USB2).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This driver is functional but any downloads via a TCP-based protocol
tend to perform poorly. The 1Gbps Ethernet line rate is substantially
higher than the 480Mbps (in practice around 280Mbps) provided by USB2,
and the device has only 32kB of internal buffer memory. Our 256kB TCP
receive window therefore rapidly overflows the RX FIFO, leading to
multiple dropped packets (usually within the same TCP window) and
hence a low overall throughput.
Reducing the TCP window size so that the RX FIFO does not overflow
greatly increases throughput, but is not a general-purpose solution.
Further investigation is required to determine how other OSes
(e.g. Linux) cope with this scenario. It is possible that
implementing TCP SACK would provide some benefit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most devices expose at least the link up/down status via a bit in a
MAC register, since the MAC generally already needs to know whether or
not the link is up. Some devices (e.g. the SMSC75xx USB NIC) expose
this information to software only via the MII registers.
Provide a generic mii_check_link() implementation to check the BMSR
and report the link status via netdev_link_{up,down}().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relicense files with kind permission from
Stefan Hajnoczi <stefanha@redhat.com>
alongside the contributors who have already granted such relicensing
permission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite (and relicense) the header files which are included in all
builds of iPXE (including non-Linux builds).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The code in list.h was originally taken from the Linux kernel many
years ago, but has been rewritten to the point that no original code
remains, and may therefore be relicensed.
The functions and data structures remain largely API-compatible, to
facilitate the conversion of Linux network drivers to iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
These files cannot be automatically relicensed by util/relicense.pl
since they either contain unusual but trivial contributions (such as
the addition of __nonnull function attributes), or contain lines
dating back to the initial git revision (and so require manual
knowledge of the code's origin).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relicence files with kind permission from the following contributors:
Alex Williamson <alex.williamson@redhat.com>
Eduardo Habkost <ehabkost@redhat.com>
Greg Jednaszewski <jednaszewski@gmail.com>
H. Peter Anvin <hpa@zytor.com>
Marin Hannache <git@mareo.fr>
Robin Smidsrød <robin@smidsrod.no>
Shao Miller <sha0.miller@gmail.com>
Thomas Horsten <thomas@horsten.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE uses DHCP timeouts loosely based on values recommended by the
specification, but often abbreviated to reduce timeouts for reliable
and/or simple network topologies. Extract the DHCP timing parameters
to config/dhcp.h and document them. The resulting default iPXE
behavior is exactly the same, but downstreams are now afforded the
opportunity to implement spec-compliant behavior via config file
overrides.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The implementation of strtoul() has a partially unknown provenance.
Rewrite this code to avoid potential licensing uncertainty.
Since we now use -ffunction-sections, there is no need to place
strtoull() in a separate file from strtoul().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic framework for allocating, refilling, and optionally
recycling I/O buffers used by bulk IN and interrupt endpoints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit a60f2dd ("[usb] Try multiple USB device configurations")
changed the behaviour of register_usb() such that if no drivers are
found then the device will be closed and the memory used will be
freed.
If a port status change subsequently occurs while the device is still
physically attached, then usb_hotplug() will see this as a new device
having been attached, since there is no device recorded as being
currently attached to the port. This can lead to spurious hotplug
events (or even endless loops of hotplug events, if the process of
opening and closing the device happens to generate a port status
change).
Fix by using a separate flag to indicate that a device is physically
attached (even if we have no corresponding struct usb_device).
Reported-by: Dan Ellis <Dan.Ellis@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some USB endpoints require that a short packet be used to terminate
transfers, since they have no other way to determine message
boundaries. If the message length happens to be an exact multiple of
the USB packet size, then this requires the use of an additional
zero-length packet.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
USB Communications Device Class devices may use a union functional
descriptor to group several interfaces into a function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Iterate over a USB device's available configurations until we find one
for which we have working drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow drivers to specify a supported PCI class code. To save space in
the final binary, make this an attribute of the driver rather than an
attribute of a PCI device ID list entry.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We require the ability to disconnect from and reconnect to VMBus; if
we don't have this then there is no (viable) way for a loaded
operating system to continue to use any VMBus devices. (There is also
a small but non-zero risk that the host will continue to write to our
interrupt and monitor pages, since the VMBUS_UNLOAD message in earlier
versions is essentially a no-op.)
This requires us to ensure that the host supports protocol version 3.0
(VMBUS_VERSION_WIN8_1). However, we can't actually _use_ protocol
version 3.0, since doing so causes an iSCSI-booted Windows Server 2012
R2 VM to crash due to a NULL pointer dereference in vmbus.sys.
To work around this problem, we first ensure that we can connect using
protocol v3.0, then disconnect and reconnect using the oldest known
protocol.
This deliberately prevents the use of the iPXE native Hyper-V drivers
on older versions of Hyper-V, where we could use our drivers but in so
doing would break the loaded operating system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG with a
status code of 0x4002006. This status code does not appear to be
documented anywhere within the sphere of human knowledge.
Explicitly ignore this status code in order to avoid unnecessarily
cluttering the display when RNDIS debugging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The (undocumented) VMBus protocol seems to allow for transfer
page-based packets where the data payload is split into an arbitrary
set of ranges within the transfer page set.
The RNDIS protocol includes a length field within the header of each
message, and it is known from observation that multiple RNDIS messages
can be concatenated into a single VMBus message.
iPXE currently assumes that the transfer page range boundaries are
entirely arbitrary, and uses the RNDIS header length to determine the
RNDIS message boundaries.
Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG for an
undocumented and unknown status code (0x40020006) with a malformed
RNDIS header length: the length does not cover the StatusBuffer
portion of the message. This causes iPXE to report a malformed RNDIS
message and to discard any further RNDIS messages within the same
VMBus message.
The Linux Hyper-V driver assumes that the transfer page range
boundaries correspond to RNDIS message boundaries, and so does not
notice the malformed length field in the RNDIS header.
Match the behaviour of the Linux Hyper-V driver: assume that the
transfer page range boundaries correspond to the RNDIS message
boundaries and ignore the RNDIS header length. This avoids triggering
the "malformed packet" error and also avoids unnecessary data copying:
since we now have one I/O buffer per RNDIS message, there is no longer
any need to use iob_split().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Hyper-V RNDIS implementation on Windows Server 2012 R2 requires
that we send an explicit RNDIS initialisation message in order to get
a working RX datapath.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RNDIS devices may provide multiple packets encapsulated into a single
message. Provide an API to allow the RNDIS driver to split an I/O
buffer into smaller portions.
The current implementation will always copy the underlying data,
rather than splitting the buffer in situ.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the "-c <count>" option to the "ping" command, allowing for
automatic termination after a specified number of packets.
When a number of packets is specified:
- if a serious error (i.e. length mismatch or content mismatch)
occurs, then the ping will be immediately terminated with the relevant
status code;
- if at least one response is received successfully, and all errors
are non-serious (i.e. timeouts or out-of-sequence responses), then
the ping will be terminated after the final response (or timeout)
with a success status;
- if no responses are received successfully, then the ping will be
terminated after the final timeout with ETIMEDOUT.
If no number of packets is specified, then the ping will continue
until manually interrupted.
Originally-implemented-by: Cedric Levasseur <cyr-ius@ipocus.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI network drivers provide a software UNDI interface which is
exposed via the Network Interface Identifier Protocol (NII), rather
than providing a Simple Network Protocol (SNP).
The UEFI platform firmware will usually include the SnpDxe driver,
which attaches to NII and provides an SNP interface. The SNP
interface is usually provided on the same handle as the underlying NII
device. This causes problems for our EFI driver model: when
efi_driver_connect() detaches existing drivers from the handle it will
cause the SNP interface to be uninstalled, and so our SNP driver will
not be able to attach to the handle. The platform firmware will
eventually reattach the SnpDxe driver and may attach us to the SNP
handle, but we have no way to prevent other drivers from attaching
first.
Fix by providing a driver which can attach directly to the NII
protocol, using the software UNDI interface to drive the network
device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Under UEFI, reads from PCI configuration space may fail. If this
happens, we should return all-ones (which will mimic the behaviour of
an absent PCI device).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Propagate our modified EFI system table to any images loaded by the
image that we wrap, thereby allowing us to observe boot services calls
made by all subsequent EFI images.
Also show details of intercepted ExitBootServices() calls. When
wrapping is used, exiting boot services will almost certainly fail,
but this at least allows us to see when it happens.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Under some circumstances (e.g. if iPXE itself is booted via iSCSI, or
after an unclean reboot), the backend may not be in the expected
InitWait state when iPXE starts up.
There is no generic reset mechanism for Xenbus devices. Recent
versions of xen-netback will gracefully perform all of the required
steps if the frontend sets its state to Initialising. Older versions
(such as that found in XenServer 6.2.0) require the frontend to
transition through Closed before reaching Initialising.
Add a reset mechanism for netfront devices which does the following:
- read current backend state
- if backend state is anything other than InitWait, then set the
frontend state to Closed and wait for the backend to also reach
Closed
- set the frontend state to Initialising and wait for the backend to
reach InitWait.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using version 1 grant tables limits guests to using 16TB of grantable
RAM, and prevents the use of subpage grants. Some versions of the Xen
hypervisor refuse to allow the grant table version to be set after the
first grant references have been created, so the loaded operating
system may be stuck with whatever choice we make here. We therefore
currently use version 2 grant tables, since they give the most
flexibility to the loaded OS.
Current versions (7.2.0) of the Windows PV drivers have no support for
version 2 grant tables, and will merrily create version 1 entries in
what the hypervisor believes to be a version 2 table. This causes
some confusion.
Avoid this problem by attempting to use version 1 tables, since
otherwise we may render Windows unable to boot.
Play nicely with other potential bootloaders by accepting either
version 1 or version 2 grant tables (if we are unable to set our
requested version).
Note that the use of version 1 tables on a 64-bit system introduces a
possible failure path in which a frame number cannot fit into the
32-bit field within the v1 structure. This in turn introduces
additional failure paths into netfront_transmit() and
netfront_refill_rx().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EFI 1.10 systems (observed on an Apple iMac) do not allow us to
open the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_BY_DRIVER and so we cannot maintain a safe,
long-lived pointer to the device path. Work around this by instead
opening the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_GET_PROTOCOL whenever we need to use it.
Debugged-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
efi_file_install() and efi_download_install() are both used to install
onto existing handles. There is therefore no need to allow for each
of their calls to InstallMultipleProtocolInterfaces() to create a new
handle.
By passing the handle directly (rather than a pointer to the handle),
we avoid potential confusion (and erroneous debug message colours).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI headers define EFI_HANDLE as a void pointer, which renders
type checking on anything dealing with EFI handles somewhat useless.
Work around this bizarre sabotage attempt by redefining EFI_HANDLE as
a pointer to an anonymous structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a function efi_handle_name() (as a generalisation of
efi_handle_devpath_text()) which tries various methods to produce a
human-readable name for an EFI handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reject network devices which appear to be duplicates of those already
available via a different underlying hardware device. On a Xen PV-HVM
system, this allows us to filter out the emulated PCI NICs (which
would otherwise appear alongside the netfront NICs).
Note that we cannot use the Xen facility to "unplug" the emulated PCI
NICs, since there is no guarantee that the OS we subsequently load
will have a native netfront driver.
We permit devices with the same MAC address if they are attached to
the same underlying hardware device (e.g. VLAN devices).
Inspired-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently treat network devices as available for use via the SNP
API only if RX queue processing has been frozen. (This is similar in
spirit to the way that RX queue processing is frozen for the network
device currently exposed via the PXE API.)
The default state of a freshly created network device is for the RX
queue to not be frozen, and thus to be unavailable for use via SNP.
This causes problems when devices are added through code paths other
than _efidrv_start() (which explicitly releases devices for use via
SNP).
We don't actually need to freeze RX queue processing, since calls via
the SNP API will always use netdev_poll() rather than net_poll(), and
so will never trigger the RX queue processing code path anyway.
We can therefore simplify the code to use a single global flag to
indicate whether network devices are claimed for use by iPXE or
available for use via SNP. Using a global flag allows the default
state for dynamically created network devices to behave sensibly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for Xen PV-HVM domains (detected via the Xen
platform PCI device with IDs 5853:0001), including support for
accessing configuration via XenStore and enumerating devices via
XenBus.
Signed-off-by: Michael Brown <mcb30@ipxe.org>