The Windows drivers for VMBus devices are enumerated using the
instance UUID rather than the channel number. Include the instance
UUID within the iPXE device name to allow an iPXE network device to be
more easily associated with the corresponding Windows network device
when debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Select the IPv6 source address and corresponding router (if any) using
a very simplified version of the algorithm from RFC6724:
- Ignore any source address that has a smaller scope than the
destination address. For example, do not use a link-local source
address when sending to a global destination address.
- If we have a source address which is on the same link as the
destination address, then use that source address.
- If we are left with multiple possible source addresses, then choose
the address with the smallest scope. For example, if we are sending
to a site-local destination address and we have both a global source
address and a site-local source address, then use the site-local
source address.
- If we are still left with multiple possible source addresses, then
choose the address with the longest matching prefix.
For the purposes of this algorithm, we treat RFC4193 Unique Local
Addresses as having organisation-local scope. Since we use only
link-local scope for our multicast transmissions, this approximation
should remain valid in all practical situations.
Originally-implemented-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the IPv6 settings to construct the routing table, in a matter
analogous to the construction of the IPv4 routing table.
This allows for manual assignment of IPv6 addresses via e.g.
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0/len6 64
set net0/gateway6 fe80::226:bff:fedd:d3c0
The prefix length ("len6") may be omitted, in which case a default
prefix length of 64 will be assumed.
Multiple IPv6 addresses may be assigned manually by implicitly
creating child settings blocks. For example:
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0.ula/ip6 fda4:2496:e992::6950:5845
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonable user expectation is that ${net0/ip6} should show the
"highest-priority" of the IPv6 addresses, even when multiple IPv6
addresses are active. The expected order of priority is likely to be
manually-assigned addresses first, then stateful DHCPv6 addresses,
then SLAAC addresses, and lastly link-local addresses.
Using ${priority} to enforce an ordering is undesirable since that
would affect the priority assigned to each of the net<N> blocks as a
whole, so use the sibling ordering capability instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow settings blocks to provide an explicit default ordering between
siblings, with lower precedence than the existing ${priority} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the IPv6 address (or prefix) as ${ip6}, the prefix length as
${len6}, and the router address as ${gateway6}.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The settings scope ipv6_scope refers specifically to IPv6 settings
that have a corresponding DHCPv6 option. Rename to dhcpv6_scope to
more accurately reflect this purpose.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In edk2, there are several drivers that associate HII forms (and
corresponding config access protocol instances) with each individual
network device. (In this context, "network device" means the EFI
handle on which the SNP protocol is installed, and on which the device
path ending with the MAC() node is installed also.) Such edk2 drivers
are, for example: Ip4Dxe, HttpBootDxe, VlanConfigDxe.
In UEFI, any given handle can carry at most one instance of a specific
protocol (see e.g. the specification of the InstallProtocolInterface()
boot service). This implies that the class of drivers mentioned above
can't install their EFI_HII_CONFIG_ACCESS_PROTOCOL instances on the
SNP handle directly -- they would conflict with each other.
Accordingly, each of those edk2 drivers creates a "private" child
handle under the SNP handle, and installs its config access protocol
(and corresponding HII package list) on its child handle.
The device path for the child handle is traditionally derived by
appending a Hardware Vendor Device Path node after the MAC() node.
The VenHw() nodes in question consist of a GUID (by definition), and
no trailing data (by choice). The purpose of these VenHw() nodes is
only that all the child nodes can be uniquely identified by device
path.
At the moment iPXE does not follow this pattern. It doesn't run into
a conflict when it installs its EFI_HII_CONFIG_ACCESS_PROTOCOL
directly on the SNP handle, but that's only because iPXE is the sole
driver not following the pattern. This behavior seems risky (one
might call it a "latent bug"); better align iPXE with the edk2 custom.
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Gary Lin <glin@suse.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13494/focus=13532
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with assertions, profiling is enabled for objects built with any
debug level (including an explicit debug level of zero).
Allow profiling to be globally enabled or disabled by adding PROFILE=1
or PROFILE=0 respectively to the build command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The vendor class identifier strings in DHCP_ARCH_VENDOR_CLASS_ID are
out of sync with the (correct) client architecture values in
DHCP_ARCH_CLIENT_ARCHITECTURE.
Fix by removing all definitions of DHCP_ARCH_VENDOR_CLASS_ID, and
instead generating the vendor class identifier string automatically
based on DHCP_ARCH_CLIENT_ARCHITECTURE and DHCP_ARCH_CLIENT_NDI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC3315 defines DHCPv6 option 16 (vendor class identifier) but does
not define any direct relationship with the roughly equivalent DHCPv4
option 60.
The PXE specification predates IPv6, and the UEFI specification is
expectedly vague on the subject. Examination of the reference EDK2
codebase suggests that the DHCPv6 vendor class identifier will be
formatted in accordance with RFC3315, using a single vendor-class-data
item in which the opaque-data field is the string as would appear in
DHCPv4 option 60.
RFC3315 requires the vendor class identifier to specify an IANA
enterprise number, as a way of disambiguating the vendor-class-data
namespace. The EDK2 code uses the value 343, described as:
// TODO: IANA TBD: temporarily using Intel's
Since this "TODO" has been present since at least 2010, it is probably
safe to assume that it has now become a de facto standard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC5970 defines DHCPv6 options 61 (client system architecture type)
and 62 (client network interface identifier), with contents equivalent
to DHCPv4 options 93 and 94 respectively.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the regions may end up being unmapped, either because they are
optional or because the attempt to map them has failed. Region types
starting at 0 didn't make it easy to test for this condition.
This commit bumps all valid region types up by 1 with 0 having the
implicit 'unmapped' meaning.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a mechanism to allow an arbitrary adjustment to be applied to
all subsequent calls to time().
Note that the underlying clock source (e.g. the RTC clock) will not be
changed; only the time as reported within iPXE will be affected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some circumstances, intermediate devices may lose state in a way
that temporarily prevents the successful delivery of packets from a
TCP peer. For example, a firewall may drop a NAT forwarding table
entry.
Since iPXE spends most of its time downloading files (and hence purely
receiving data, sending only TCP ACKs), this can easily happen in a
situation in which there is no reason for iPXE's TCP stack to generate
any retransmissions. The temporary loss of connectivity can therefore
effectively become permanent.
Work around this problem by sending TCP keepalives after a period of
inactivity on an established connection.
TCP keepalives usually send a single garbage byte in sequence number
space that has already been ACKed by the peer. Since we do not need
to elicit a response from the peer, we instead send pure ACKs (with no
garbage data) in order to keep the transmit code path simple.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mac OS X uses non-standard EFI protocols to obtain the DHCP packets
from the UEFI firmware.
Originally-implemented-by: Michael Kuron <m.kuron@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There has been a longstanding disagreement between RFC4578 and the
IANA "Processor Architecture Types" registry. RFC4578 section 2.1
defines type 7 as "EFI BC" and type 9 as "EFI x86-64"; the IANA
registry quotes RFC4578 as its source but has these values erroneously
swapped. The EDK2 codebase uses the IANA values.
As of March 2016, RFC4578 has been modified by an errata to match the
values as recorded in the IANA registry.
Fix our definitions to match the consensus values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the EFI_CPU_ARCH_PROTOCOL's GetTimerValue() method to
generate the currticks() timer, calibrated against a 1ms delay from
the boot services Stall() method.
This does not work on ARM platforms, where GetTimerValue() is an empty
stub which just returns EFI_UNSUPPORTED.
Fix by instead creating a periodic timer event, and using this event
to increment a current tick counter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Require architecture-specific code to make a deliberate choice to use
the unoptimised generic_tcpip_continue_chksum() function, if there is
no optimised version available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit adds support for driving virtio 1.0 PCI devices. In
addition to various helpers, a number of vpm_ functions are introduced
to be used instead of their legacy vp_ counterparts when accessing
virtio 1.0 (aka modern) devices.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 1.0 introduces new constants and data structures, common to all
devices as well as specific to virtio-net. This commit adds a subset
of these to be able to drive the virtio-net 1.0 network device.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI devices may support more capabilities of the same type (for
example PCI_CAP_ID_VNDR) and there was no way to discover all of them.
This commit adds a new API pci_find_next_capability which provides
this functionality. It would typically be used like so:
for (pos = pci_find_capability(pci, PCI_CAP_ID_VNDR);
pos > 0;
pos = pci_find_next_capability(pci, pos, PCI_CAP_ID_VNDR)) {
...
}
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DHCP option 175.189 has been defined (by us) since 2006 as
containing the drive number to be used for a SAN boot, but has never
been automatically used as such by iPXE.
Use this option (if specified) to override the default SAN drive
number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Interpret the maximum drive number (0xff for hard disks, 0x7f for
floppy disks) as meaning "use natural drive number".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide access to local files via the "file://" URI scheme. There are
three syntaxes:
- An opaque URI with a relative path (e.g. "file:script.ipxe").
This will be interpreted as a path relative to the iPXE binary.
- A hierarchical URI with a non-network absolute path
(e.g. "file:/boot/script.ipxe"). This will be interpreted as a
path relative to the root of the filesystem from which the iPXE
binary was loaded.
- A hierarchical URI with a network path in which the authority is a
volume label (e.g. "file://bootdisk/script.ipxe"). This will be
interpreted as a path relative to the root of the filesystem with
the specified volume label.
Note that the potentially desirable shell mappings (e.g. "fs0:" and
"blk0:") are concepts internal to the UEFI shell binary, and do not
seem to be exposed in any way to external executables. The old
EFI_SHELL_PROTOCOL (which did provide access to these mappings) is no
longer installed by current versions of the UEFI shell.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no practical way to generate an underlength ARP packet since
an ARP packet is always padded up to the minimum Ethernet frame length
(or dropped by the receiving Ethernet hardware if incorrectly padded),
but the absence of an explicit check causes warnings from some
analysis tools.
Fix by adding an explicit check on the I/O buffer length.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The assumption in asn1_type() that an ASN.1 cursor will always contain
a type byte is incorrect. A cursor that has been cleanly invalidated
via asn1_invalidate_cursor() will contain a type byte, but there are
other ways in which to arrive at a zero-length cursor.
Fix by explicitly checking the cursor length in asn1_type(). This
allows asn1_invalidate_cursor() to be reduced to simply zeroing the
length field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations utilise an EoIB-to-Ethernet gateway device
that does not perform a FullMember join to the multicast group for the
EoIB broadcast domain. This has various exciting side-effects, such
as requiring every EoIB node to send every broadcast packet twice.
As an added bonus, the gateway may also break the EoIB MAC address to
GID mapping protocol by sending Ethernet-sourced packets from the
wrong QPN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations require each individual EoIB node to create
the multicast group for the EoIB broadcast domain.
It is left as an exercise for the interested reader to determine how
such an implementation might ever allow the parameters of such a
multicast group to be changed without requiring a simultaneous upgrade
of every driver on every operating system on every machine currently
attached to the fabric.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EoIB is a fairly simple protocol in which raw Ethernet frames
(excluding the CRC) are encapsulated within Infiniband Unreliable
Datagrams, with a four-byte fixed EoIB header (which conveys no actual
information). The Ethernet broadcast domain is provided by a
multicast group, similar to the IPoIB IPv4 multicast group.
The mapping from Ethernet MAC addresses to Infiniband address vectors
is achieved by snooping incoming traffic and building a peer cache
which can then be used to map a MAC address into a port GID. The
address vector is completed using a path record lookup, as for IPoIB.
Note that this requires every packet to include a GRH.
Add basic support for EoIB devices. This driver is substantially
derived from the IPoIB driver. There is currently no mechanism for
automatically creating EoIB devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit e62e52b ("[ipoib] Simplify test for received broadcast
packets") relies upon the multicast LID being present in the
destination address vector as passed to ipoib_complete_recv().
Unfortunately, this information is not present in many Infiniband
devices' completion queue entries.
Fix by testing instead for the presence of a multicast GID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Record the speed of a USB device based on the port's speed at the time
that the device was enabled. This allows us to remember the device's
speed even after the device has been disconnected (and so the port's
current speed has changed).
In particular, this allows us to correctly identify the transaction
translator for a low-speed or full-speed device after the device has
been disconnected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide BIT_QWORD_PTR() to allow for easy extraction of non-endian
fields (e.g. Infiniband GUIDs) without unnecessary byte swapping.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Tested using QEMU and usbredir to expose the LAN9512 chip present on a
Raspberry Pi.
There is a known issue with the LAN9512: an extra two bytes are
appended to every transmitted packet. These two bytes comprise:
{ 0x00, 0x08 } if packet length == 0 (mod 8)
{ CRC[0], 0x00 } if packet length == 7 (mod 8)
{ CRC[0], CRC[1] } otherwise
The extra bytes are appended whether the Ethernet CRC is generated
manually or added automatically by the hardware. The issue occurs
with the Linux kernel driver as well as the iPXE driver. It appears
to be an undocumented hardware errata.
TCP/IP traffic is not affected, since the IP header length field
causes the extraneous bytes to be discarded by the receiver. However,
protocols that rely on the length of the Ethernet frame (such as FCoE
or iPXE's "lotest" protocol) will be unusable on this hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the UEFI platform firmware to provide drivers for unrecognised
devices, by exposing our own implementation of EFI_USB_IO_PROTOCOL.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make the class ID a property of the USB driver (rather than a property
of the USB device ID), and allow USB drivers to specify a wildcard ID
for any of the three component IDs (class, subclass, or protocol).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Generate a score for each possible USB device configuration based on
the available driver support, and select the configuration with the
highest score. This will allow us to prefer ECM over RNDIS (for
devices which support both) and will allow us to meaningfully select a
configuration even when we have drivers available for all functions
(e.g. when exposing unused functions via EFI_USB_IO_PROTOCOL).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The decision on whether or not a zero-length packet needs to be
transmitted is independent of the host controller and belongs in the
USB core.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TCP/IP checksum fields are one's complement values and therefore have
two possible representations of zero: positive zero (0x0000) and
negative zero (0xffff).
In RFC768, UDP over IPv4 exploits this redundancy to repurpose the
positive representation of zero (0x0000) to mean "no checksum
calculated"; checksums are optional for UDP over IPv4.
In RFC2460, checksums are made mandatory for UDP over IPv4. The
wording of the RFC is such that the UDP header is mandated to use only
the negative representation of zero (0xffff), rather than simply
requiring the checksum to be correct but allowing for either
representation of zero to be used.
In RFC1071, an example algorithm is given for calculating the TCP/IP
checksum. This algorithm happens to produce only the positive
representation of zero (0x0000); this is an artifact of the way that
unsigned arithmetic is used to calculate a signed one's complement
sum (and its final negation).
A common misconception has developed (exemplified in RFC1624) that
this artifact is part of the specification. Many people have assumed
that the checksum field should never contain the negative
representation of zero (0xffff).
A sensible receiver will calculate the checksum over the whole packet
and verify that the result is zero (in whichever representation of
zero happens to be generated by the receiver's algorithm). Such a
receiver will not care which representation of zero happens to be used
in the checksum field.
However, there are receivers in existence which will verify the
received checksum the hard way: by calculating the checksum over the
remainder of the packet and comparing the result against the checksum
field. If the representation of zero used by the receiver's algorithm
does not match the representation of zero used by the transmitter (and
so placed in the checksum field), and if the receiver does not
explicitly allow for both representations to compare as equal, then
the receiver may reject packets with a valid checksum.
For UDP, the combined RFCs effectively mandate that we should generate
only the negative representation of zero in the checksum field.
For IP, TCP and ICMP, the RFCs do not mandate which representation of
zero should be used, but the misconceptions which have grown up around
RFC1071 and RFC1624 suggest that it would be least surprising to
generate only the positive representation of zero in the checksum
field.
Fix by ensuring that all of our checksum algorithms generate only the
positive representation of zero, and explicitly inverting this in the
case of transmitted UDP packets.
Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow iPXE to coexist with other USB device drivers, by attaching to
the EFI_USB_IO_PROTOCOL instances provided by the UEFI platform
firmware.
The EFI_USB_IO_PROTOCOL is an unsurprisingly badly designed
abstraction of a USB device. The poor design choices intrinsic in the
UEFI specification prevent efficient operation as a network device,
with the result that devices operated using the EFI_USB_IO_PROTOCOL
operate approximately two orders of magnitude slower than devices
operated using our native EHCI or xHCI host controller drivers.
Since the performance is so abysmally slow, and since the underlying
problems are due to fundamental architectural mistakes in the UEFI
specification, support for the EFI_USB_IO_PROTOCOL host controller
driver is left as disabled by default. Users are advised to use the
native iPXE host controller drivers instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Many UEFI NBPs expect to find an EFI_PXE_BASE_CODE_PROTOCOL installed
in addition to the EFI_SIMPLE_NETWORK_PROTOCOL. Most NBPs use the
EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the cached DHCP packets.
This implementation has been tested with grub.efi, shim.efi,
syslinux.efi, and wdsmgfw.efi. Some methods (such as Discover() and
Arp()) are not used by any known NBP and so have not (yet) been
implemented.
Usage notes for the tested bootstraps are:
- grub.efi uses EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the
cached DHCP packet, and uses no other methods.
- shim.efi uses EFI_PXE_BASE_CODE_PROTOCOL to retrieve the cached
DHCP packet and to retrieve the next NBP via the Mtftp() method.
If shim.efi was downloaded via HTTP (or other non-TFTP protocol)
then shim.efi will blindly call Mtftp() with an HTTP URI as the
filename: this allows the next NBP (e.g. grubx64.efi) to also be
transparently retrieved by HTTP.
shim.efi can also use the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL to
retrieve files previously loaded by "imgfetch" or similar commands
in iPXE. The current implementation of shim.efi will use the
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL only if it does not find an
EFI_PXE_BASE_CODE_PROTOCOL; this patch therefore prevents this
usage of our EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. This logic could be
trivially reversed in shim.efi if needed.
- syslinux.efi uses EFI_PXE_BASE_CODE_PROTOCOL only to retrieve the
cached DHCP packet. Versions 6.03 and earlier have a bug which
may cause syslinux.efi to attach to the wrong NIC if there are
multiple NICs in the system (or if the UEFI firmware supports
IPv6).
- wdsmgfw.efi (ab)uses EFI_PXE_BASE_CODE_PROTOCOL to retrieve the
cached DHCP packets, and to send and retrieve UDP packets via the
UdpWrite() and UdpRead() methods. (This was presumably done in
order to minimise the amount of benefit obtainable by switching to
UEFI, by replicating all of the design mistakes present in the
original PXE specification.)
The EFI_DOWNGRADE_UX configuration option remains available for now,
until this implementation has received more widespread testing.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Merge the functionality of parse_next_server_and_filename() and
tftp_uri() into a single pxe_uri(), which takes a server address
(IPv4/IPv6/none) and a filename, and produces a URI using the rule:
- if the filename is a hierarchical absolute URI (i.e. includes a
scheme such as "http://" or "tftp://") then use that URI and ignore
the server address,
- otherwise, if the server address is recognised (according to
sa_family) then construct a TFTP URI based on the server address,
port, and filename
- otherwise fail.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add definitions of protocols observed to be used by wdsmgfw.efi, and
add a handle name type for ConIn, ConOut, and StdErr.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add debug wrappers for more boot services functions, and print
symbolic values rather than raw numbers where possible.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 09b057c ("[settings] Remove "uristring" setting type") removed
support for URI-encoded settings via the "uristring" setting type, on
the basis that such encoding was no longer necessary to avoid problems
with the command line parser.
Other valid use cases for the "uristring" setting type do exist: for
example, a password containing a '/' character expanded via
chain http://username:${password:uristring}@server.name/boot.php
Restore the existence of the "uristring" setting, avoiding the
potentially large stack allocations that were used in the old code
prior to commit 09b057c ("[settings] Remove "uristring" setting
type").
Requested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current usage pattern of image_probe() is a legacy from the time
before commit 34b6ecb ("[image] Simplify image management") when
loading an image to its executable location in memory was a separate
action from actually executing the image.
Call image_probe() as soon as an image is registered. This allows
"imgstat" to display image type information for all images and allows
image-consuming code to assume that image->type is already set
correctly.
Ignore failures if image_probe() does not recognise the image, since
we do expect to handle unrecognised images (initrds, modules, etc).
Unrecognised images will be left with a NULL image->type, which
image-consuming code can easily check.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite the HTTP core to allow for the addition of arbitrary content
encoding mechanisms, such as PeerDist and gzip.
The core now exposes http_open() which can be used to create requests
with an explicitly selected HTTP method, an optional requested content
range, and an optional request body. A simple wrapper provides the
preexisting behaviour of creating either a GET request or an
application/x-www-form-urlencoded POST request (if the URI includes
parameters).
The HTTP SAN interface is now implemented using the generic block
device translator. Individual blocks are requested using http_open()
to create a range request.
Server connections are now managed via a connection pool; this allows
for multiple requests to the same server (e.g. for SAN blocks) to be
completely unaware of each other. Repeated HTTPS connections to the
same server can reuse a pooled connection, avoiding the per-connection
overhead of establishing a TLS session (which can take several seconds
if using a client certificate).
Support for HTTP SAN booting and for the Basic and Digest
authentication schemes is now optional and can be controlled via the
SANBOOT_PROTO_HTTP, HTTP_AUTH_BASIC, and HTTP_AUTH_DIGEST build
configuration options in config/general.h.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
UEFI platforms may provide a watchdog timer, which will reboot the
machine if an operating system takes more than five minutes to load.
This can cause long-lived iPXE downloads (or interactive shell
sessions) to unexpectedly reboot.
Fix by resetting the watchdog timer every ten seconds while the iPXE
main processing loop continues to run.
Reported-by: Bradley B Williams <bradleybwilliams@swbell.net>
Reported-by: John Clark <john.r.clark.3@gmail.com>
Reported-by: wdriever@gmail.com
Reported-by: Charlie Beima <cbeima@indiana.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for SHA-224, SHA-384, and SHA-512 as digest algorithms in
X.509 certificates, and allow the choice of public-key, cipher, and
digest algorithms to be configured at build time via config/crypto.h.
Originally-implemented-by: Tufan Karadere <tufank@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Check for existence of the UART in uart_select(), not just in
uart_init(). This allows uart_select() to refuse to set a non-working
address in uart->base, which in turns means that the serial console
code will not attempt to use a non-existent UART.
Reported-by: Torgeir Wulfsberg <Torgeir.Wulfsberg@kongsberg.com>
Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We do not set up any kind of virtual addressing before invoking an
ELFBoot image. Reject if the image's program headers indicate that
virtual addresses are not equal to physical addresses.
This avoids problems when loading some RHEL5 kernels, which seem to
include ELFBoot headers using virtual addressing. With this change,
these kernels are no longer detected as ELFBoot, and so may be
(correctly) detected as bzImage instead.
Reported-by: Torgeir.Wulfsberg@kongsberg.com
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow line buffer to accumulate multiple lines, with buffered_line()
returning each freshly-completed line as it is encountered. This
allows buffered lines to be subsequently processed as a group.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
VLAN and 802.11 devices use a network device operations structure that
wraps an underlying structure. For example, the vlan_operations
structure wraps the network device operations structure of the
underlying trunk device. This can cause false positives from the
current implementation of netdev_irq_supported(), which will always
report that VLAN devices support interrupts since it has no visibility
into the support provided by the underlying trunk device.
Fix by allowing network devices to explicitly flag that interrupts are
not supported, despite the presence of an irq() method.
Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the IPv6 concept of "scope ID" (indicating the network device
index) to IPv4 socket addresses, so that IPv4 multicast transmissions
may specify the transmitting network device.
The scope ID is not (currently) exposed via the string representation
of the socket address, since IPv4 does not use the IPv6 concept of
link-local addresses (which could legitimately be specified in a URI).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Redefine various IPv4 address constants and testing macros to avoid
unnecessary byte swapping at runtime, and slightly rename the macros
to prevent code from accidentally using the old definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an IPv6 socket address string specifies a link-local or multicast
address but does not specify the requisite network device name
(e.g. "fe80::69ff:fe50:5845" rather than "fe80::69ff:fe50:5845%net0"),
assume the use of "netX".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the AES implementation from AXTLS with a dedicated iPXE
implementation which is slightly smaller and around 1000% faster.
This implementation has been verified using the existing self-tests
based on the NIST AES test vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce the cost of implementing object methods which convey no
information beyond the fact that the method has been called.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide profile_custom() as a trivial wrapper around profile_update()
to allow for the use of the profiling infrastructure by code using
timers other than the default profile_timestamp() provider.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an inject_corruption() function that can be used to randomly
corrupt data bytes with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic inject_fault() function that can be used to inject
random faults with configurable probabilities.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix the TxBuf value filled in by GetStatus() to report the transmit
buffer address as required by the (now clarified) specification.
Simplify "interrupt" handling in GetStatus() to report only that one
or more packets have been transmitted or received; there is no need to
report one GetStatus() "interrupt" per packet.
Simplify receive handling to dequeue received packets immediately from
the network device into an internal list (thereby avoiding the hacks
previously used to determine when to report new packet arrivals).
Originally-fixed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently do not wait for a received FIN before exiting to boot a
loaded OS. In the common case of booting from an HTTP server, this
means that the TCP connection is left consuming resources on the
server side: the server will retransmit the FIN several times before
giving up.
Fix by initiating a graceful close of all TCP connections and waiting
(for up to one second) for all connections to finish closing
gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and
for the incoming FIN to have been received and ACKed at least once).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only way to map an eIPoIB MAC address (REMAC) to an IPoIB MAC
address is to intercept an incoming ARP request or reply.
If we do not have an REMAC cache entry for a particular destination
MAC address, then we cannot transmit the packet. This can arise in at
least two situations:
- An external program (e.g. a PXE NBP using the UNDI API) may attempt
to transmit to a destination MAC address that has been obtained by
some method other than ARP.
- Memory pressure may have caused REMAC cache entries to be
discarded. This is fairly likely on a busy network, since REMAC
cache entries are created for all received (broadcast) ARP
requests. (We can't sensibly avoid creating these cache entries,
since they are required in order to send an ARP reply, and when we
are being used via the UNDI API we may have no knowledge of which
IP addresses are "ours".)
Attempt to ameliorate the situation by generating a semi-spurious ARP
request whenever we find a missing REMAC cache entry. This will
hopefully trigger an ARP reply, which would then provide us with the
information required to populate the REMAC cache.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A fairly common end-user problem is that the default configuration of
a switch may leave the port in a non-forwarding state for a
substantial length of time (tens of seconds) after link up. This can
cause iPXE to time out and give up attempting to boot.
We cannot force the switch to start forwarding packets sooner, since
any attempt to send a Spanning Tree Protocol bridge PDU may cause the
switch to disable our port (if the switch happens to have the Bridge
PDU Guard feature enabled for the port).
For non-ancient versions of the Spanning Tree Protocol, we can detect
whether or not the port is currently forwarding and use this to inform
the network device core that the link is currently blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When Spanning Tree Protocol (STP) is used, there may be a substantial
delay (tens of seconds) from the time that the link goes up to the
time that the port starts forwarding packets.
Add a generic concept of a "blocked link" (i.e. a link which is up but
which is not expected to communicate successfully), and allow "ifstat"
to indicate when a link is blocked.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtual functions use a mailbox to communicate with the physical
function driver: this covers functionality such as obtaining the MAC
address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When USB network card drivers are used, the BIOS' legacy USB
capability is necessarily disabled since there is no way to share the
host controller between the BIOS and iPXE. This currently results in
USB keyboards becoming non-functional in USB-enabled builds of iPXE.
Fix by adding basic support for USB keyboards, enabled by default in
iPXE builds which include USB support.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When an EHCI hotplug action results in the controller disowning the
port, it will result in a hotplug action on the corresponding UHCI or
OHCI controller. Allow such hotplug actions to be carried out as part
of the same call to usb_step() or usb_register_bus(), by maintaining a
single central list of changed ports.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB core will currently fail to detect disconnections if a new
device has attached by the time the port is examined in
usb_hotplug().
Fix by recording the fact that a disconnection has taken place
whenever the "connection status changed" (CSC) bit is observed to be
set. (Whether the change represents a disconnection or a
reconnection, it indicates that the port has experienced some time of
being disconnected.)
Note that the time at which a disconnection can be detected varies by
hub type. In particular: root hubs can observe the CSC bit when
polling, and so will record the disconnection before calling
usb_port_changed(), but USB hubs read the port status (and hence the
CSC bit) only during the call to hub_speed(), long after the call to
usb_port_changed().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rename PCI_CLASS() (which constructs a struct pci_class_id) to
PCI_CLASS_ID(), and provide PCI_CLASS() as a macro which constructs
the 24-bit scalar value of a PCI class code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB API currently assumes that host controllers will have
immediate data buffer space available in which to store the setup
packet. This is true for xHCI, partially true for EHCI (which happens
to have 12 bytes of padding in each transfer descriptor due to
alignment requirements), and not true at all for UHCI.
Include the setup packet within the I/O buffer passed to the host
controller's message() method, thereby eliminating the requirement for
host controllers to provide immediate data buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current API for Base16 (and Base64) encoding requires the caller
to always provide sufficient buffer space. This prevents the use of
the generic encoding/decoding functionality in some situations, such
as in formatting the hex setting types.
Implement a generic hex_encode() (based on the existing
format_hex_setting()), implement base16_encode() and base16_decode()
in terms of the more generic hex_encode() and hex_decode(), and update
all callers to provide the additional buffer length parameter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Entropy gathering via timer ticks is slow under UEFI (of the order of
20-30 seconds on some machines). Use the EFI_RNG_PROTOCOL if
available, to speed up the process of entropy gathering.
Note that some implementations (including EDK2) will fail if we
request fewer than 32 random bytes at a time, and that the RNG
protocol provides no guarantees about the amount of entropy provided
by a call to GetRNG(). We take the (hopefully pessimistic) view that
a 32-byte block returned by GetRNG() will contain at least the 1.3
bits of entropy claimed by min_entropy_per_sample().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-512/224 is almost identical to SHA-512, with differing initial
hash values and a truncated output length.
This implementation has been verified using the NIST SHA-512/224 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-512/256 is almost identical to SHA-512, with differing initial
hash values and a truncated output length.
This implementation has been verified using the NIST SHA-512/256 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-384 is almost identical to SHA-512, with differing initial hash
values and a truncated output length.
This implementation has been verified using the NIST SHA-384 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
SHA-224 is almost identical to SHA-256, with differing initial hash
values and a truncated output length.
This implementation has been verified using the NIST SHA-224 test
vectors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
xHCI provides a somewhat convoluted mechanism for specifying details
of a transaction translator. Hubs must be marked as such in the
device slot context. The only opportunity to do so is as part of a
Configure Endpoint command, which can be executed only when opening
the hub's interrupt endpoint.
We add a mechanism for host controllers to intercept the opening of
hub devices, providing xHCI with an opportunity to update the internal
device slot structure for the corresponding USB device to indicate
that the device is a hub. We then include the hub-specific details in
the input context whenever any Configure Endpoint command is issued.
When a device is opened, we record the device slot and port for its
transaction translator (if any), and supply these as part of the
Address Device command.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current endpoint reset logic defers the reset until the caller
attempts to enqueue a new transfer to that endpoint. This is
insufficient when dealing with endpoints behind a transaction
translator, since the transaction translator is a resource shared
between multiple endpoints.
We cannot reset the endpoint as part of the completion handling, since
that would introduce recursive calls to usb_poll(). Instead, we
add the endpoint to a list of halted endpoints, and perform the reset
on the next call to usb_step().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Several of the USB timeouts were chosen on the principle of "pick an
arbitrary but ridiculously large value, just to be safe". It turns
out that some of the timeouts permitted by the USB specification are
even larger: for example, control transactions are allowed to take up
to five seconds to complete.
Fix up these USB timeout values to match those found in the USB2
specification.
Debugged-by: Robin Smidsrød <robin@smidsrod.no>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The TCP Selective Acknowledgement option (specified in RFC2018)
provides a mechanism for the receiver to indicate packets that have
been received out of order (e.g. due to earlier dropped packets).
iPXE often operates in environments in which there is a high
probability of packet loss. For example, the legacy USB keyboard
emulation in some BIOSes involves polling the USB bus from within a
system management interrupt: this introduces an invisible delay of
around 500us which is long enough for around 40 full-length packets to
be dropped. Similarly, almost all 1Gbps USB2 devices will eventually
end up dropping packets because the USB2 bus does not provide enough
bandwidth to sustain a 1Gbps stream, and most devices will not provide
enough internal buffering to hold a full TCP window's worth of
received packets.
Add support for sending TCP Selective Acknowledgements. This provides
the sender with more detailed information about which packets have
been lost, and so allows for a more efficient retransmission strategy.
We include a SACK-permitted option in our SYN packet, since
experimentation shows that at least Linux peers will not include a
SACK-permitted option in the SYN-ACK packet if one was not present in
the initial SYN. (RFC2018 does not seem to mandate this behaviour,
but it is consistent with the approach taken in RFC1323.) We ignore
any received SACK options; this is safe to do since SACK is only ever
advisory and we never have to send non-trivial amounts of data.
Since our TCP receive queue is a candidate for cache discarding under
low memory conditions, we may end up discarding data that has been
reported as received via a SACK option. This is permitted by RFC2018.
We follow the stricture that SACK blocks must not report data which is
no longer held by the receiver: previously-reported blocks are
validated against the current receive queue before being included
within the current SACK block list.
Experiments in a qemu VM using forced packet drops (by setting
NETDEV_DISCARD_RATE to 32) show that implementing SACK improves
throughput by around 400%.
Experiments with a USB2 NIC (an SMSC7500) show that implementing SACK
improves throughput by around 700%, increasing the download rate from
35Mbps up to 250Mbps (which is approximately the usable bandwidth
limit for USB2).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This driver is functional but any downloads via a TCP-based protocol
tend to perform poorly. The 1Gbps Ethernet line rate is substantially
higher than the 480Mbps (in practice around 280Mbps) provided by USB2,
and the device has only 32kB of internal buffer memory. Our 256kB TCP
receive window therefore rapidly overflows the RX FIFO, leading to
multiple dropped packets (usually within the same TCP window) and
hence a low overall throughput.
Reducing the TCP window size so that the RX FIFO does not overflow
greatly increases throughput, but is not a general-purpose solution.
Further investigation is required to determine how other OSes
(e.g. Linux) cope with this scenario. It is possible that
implementing TCP SACK would provide some benefit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most devices expose at least the link up/down status via a bit in a
MAC register, since the MAC generally already needs to know whether or
not the link is up. Some devices (e.g. the SMSC75xx USB NIC) expose
this information to software only via the MII registers.
Provide a generic mii_check_link() implementation to check the BMSR
and report the link status via netdev_link_{up,down}().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relicense files with kind permission from
Stefan Hajnoczi <stefanha@redhat.com>
alongside the contributors who have already granted such relicensing
permission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rewrite (and relicense) the header files which are included in all
builds of iPXE (including non-Linux builds).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The code in list.h was originally taken from the Linux kernel many
years ago, but has been rewritten to the point that no original code
remains, and may therefore be relicensed.
The functions and data structures remain largely API-compatible, to
facilitate the conversion of Linux network drivers to iPXE.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
These files cannot be automatically relicensed by util/relicense.pl
since they either contain unusual but trivial contributions (such as
the addition of __nonnull function attributes), or contain lines
dating back to the initial git revision (and so require manual
knowledge of the code's origin).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Relicence files with kind permission from the following contributors:
Alex Williamson <alex.williamson@redhat.com>
Eduardo Habkost <ehabkost@redhat.com>
Greg Jednaszewski <jednaszewski@gmail.com>
H. Peter Anvin <hpa@zytor.com>
Marin Hannache <git@mareo.fr>
Robin Smidsrød <robin@smidsrod.no>
Shao Miller <sha0.miller@gmail.com>
Thomas Horsten <thomas@horsten.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE uses DHCP timeouts loosely based on values recommended by the
specification, but often abbreviated to reduce timeouts for reliable
and/or simple network topologies. Extract the DHCP timing parameters
to config/dhcp.h and document them. The resulting default iPXE
behavior is exactly the same, but downstreams are now afforded the
opportunity to implement spec-compliant behavior via config file
overrides.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The implementation of strtoul() has a partially unknown provenance.
Rewrite this code to avoid potential licensing uncertainty.
Since we now use -ffunction-sections, there is no need to place
strtoull() in a separate file from strtoul().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic framework for allocating, refilling, and optionally
recycling I/O buffers used by bulk IN and interrupt endpoints.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit a60f2dd ("[usb] Try multiple USB device configurations")
changed the behaviour of register_usb() such that if no drivers are
found then the device will be closed and the memory used will be
freed.
If a port status change subsequently occurs while the device is still
physically attached, then usb_hotplug() will see this as a new device
having been attached, since there is no device recorded as being
currently attached to the port. This can lead to spurious hotplug
events (or even endless loops of hotplug events, if the process of
opening and closing the device happens to generate a port status
change).
Fix by using a separate flag to indicate that a device is physically
attached (even if we have no corresponding struct usb_device).
Reported-by: Dan Ellis <Dan.Ellis@displaylink.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some USB endpoints require that a short packet be used to terminate
transfers, since they have no other way to determine message
boundaries. If the message length happens to be an exact multiple of
the USB packet size, then this requires the use of an additional
zero-length packet.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
USB Communications Device Class devices may use a union functional
descriptor to group several interfaces into a function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Iterate over a USB device's available configurations until we find one
for which we have working drivers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow drivers to specify a supported PCI class code. To save space in
the final binary, make this an attribute of the driver rather than an
attribute of a PCI device ID list entry.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We require the ability to disconnect from and reconnect to VMBus; if
we don't have this then there is no (viable) way for a loaded
operating system to continue to use any VMBus devices. (There is also
a small but non-zero risk that the host will continue to write to our
interrupt and monitor pages, since the VMBUS_UNLOAD message in earlier
versions is essentially a no-op.)
This requires us to ensure that the host supports protocol version 3.0
(VMBUS_VERSION_WIN8_1). However, we can't actually _use_ protocol
version 3.0, since doing so causes an iSCSI-booted Windows Server 2012
R2 VM to crash due to a NULL pointer dereference in vmbus.sys.
To work around this problem, we first ensure that we can connect using
protocol v3.0, then disconnect and reconnect using the oldest known
protocol.
This deliberately prevents the use of the iPXE native Hyper-V drivers
on older versions of Hyper-V, where we could use our drivers but in so
doing would break the loaded operating system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG with a
status code of 0x4002006. This status code does not appear to be
documented anywhere within the sphere of human knowledge.
Explicitly ignore this status code in order to avoid unnecessarily
cluttering the display when RNDIS debugging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The (undocumented) VMBus protocol seems to allow for transfer
page-based packets where the data payload is split into an arbitrary
set of ranges within the transfer page set.
The RNDIS protocol includes a length field within the header of each
message, and it is known from observation that multiple RNDIS messages
can be concatenated into a single VMBus message.
iPXE currently assumes that the transfer page range boundaries are
entirely arbitrary, and uses the RNDIS header length to determine the
RNDIS message boundaries.
Windows Server 2012 R2 generates an RNDIS_INDICATE_STATUS_MSG for an
undocumented and unknown status code (0x40020006) with a malformed
RNDIS header length: the length does not cover the StatusBuffer
portion of the message. This causes iPXE to report a malformed RNDIS
message and to discard any further RNDIS messages within the same
VMBus message.
The Linux Hyper-V driver assumes that the transfer page range
boundaries correspond to RNDIS message boundaries, and so does not
notice the malformed length field in the RNDIS header.
Match the behaviour of the Linux Hyper-V driver: assume that the
transfer page range boundaries correspond to the RNDIS message
boundaries and ignore the RNDIS header length. This avoids triggering
the "malformed packet" error and also avoids unnecessary data copying:
since we now have one I/O buffer per RNDIS message, there is no longer
any need to use iob_split().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Hyper-V RNDIS implementation on Windows Server 2012 R2 requires
that we send an explicit RNDIS initialisation message in order to get
a working RX datapath.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RNDIS devices may provide multiple packets encapsulated into a single
message. Provide an API to allow the RNDIS driver to split an I/O
buffer into smaller portions.
The current implementation will always copy the underlying data,
rather than splitting the buffer in situ.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the "-c <count>" option to the "ping" command, allowing for
automatic termination after a specified number of packets.
When a number of packets is specified:
- if a serious error (i.e. length mismatch or content mismatch)
occurs, then the ping will be immediately terminated with the relevant
status code;
- if at least one response is received successfully, and all errors
are non-serious (i.e. timeouts or out-of-sequence responses), then
the ping will be terminated after the final response (or timeout)
with a success status;
- if no responses are received successfully, then the ping will be
terminated after the final timeout with ETIMEDOUT.
If no number of packets is specified, then the ping will continue
until manually interrupted.
Originally-implemented-by: Cedric Levasseur <cyr-ius@ipocus.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI network drivers provide a software UNDI interface which is
exposed via the Network Interface Identifier Protocol (NII), rather
than providing a Simple Network Protocol (SNP).
The UEFI platform firmware will usually include the SnpDxe driver,
which attaches to NII and provides an SNP interface. The SNP
interface is usually provided on the same handle as the underlying NII
device. This causes problems for our EFI driver model: when
efi_driver_connect() detaches existing drivers from the handle it will
cause the SNP interface to be uninstalled, and so our SNP driver will
not be able to attach to the handle. The platform firmware will
eventually reattach the SnpDxe driver and may attach us to the SNP
handle, but we have no way to prevent other drivers from attaching
first.
Fix by providing a driver which can attach directly to the NII
protocol, using the software UNDI interface to drive the network
device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Under UEFI, reads from PCI configuration space may fail. If this
happens, we should return all-ones (which will mimic the behaviour of
an absent PCI device).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Propagate our modified EFI system table to any images loaded by the
image that we wrap, thereby allowing us to observe boot services calls
made by all subsequent EFI images.
Also show details of intercepted ExitBootServices() calls. When
wrapping is used, exiting boot services will almost certainly fail,
but this at least allows us to see when it happens.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Under some circumstances (e.g. if iPXE itself is booted via iSCSI, or
after an unclean reboot), the backend may not be in the expected
InitWait state when iPXE starts up.
There is no generic reset mechanism for Xenbus devices. Recent
versions of xen-netback will gracefully perform all of the required
steps if the frontend sets its state to Initialising. Older versions
(such as that found in XenServer 6.2.0) require the frontend to
transition through Closed before reaching Initialising.
Add a reset mechanism for netfront devices which does the following:
- read current backend state
- if backend state is anything other than InitWait, then set the
frontend state to Closed and wait for the backend to also reach
Closed
- set the frontend state to Initialising and wait for the backend to
reach InitWait.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Using version 1 grant tables limits guests to using 16TB of grantable
RAM, and prevents the use of subpage grants. Some versions of the Xen
hypervisor refuse to allow the grant table version to be set after the
first grant references have been created, so the loaded operating
system may be stuck with whatever choice we make here. We therefore
currently use version 2 grant tables, since they give the most
flexibility to the loaded OS.
Current versions (7.2.0) of the Windows PV drivers have no support for
version 2 grant tables, and will merrily create version 1 entries in
what the hypervisor believes to be a version 2 table. This causes
some confusion.
Avoid this problem by attempting to use version 1 tables, since
otherwise we may render Windows unable to boot.
Play nicely with other potential bootloaders by accepting either
version 1 or version 2 grant tables (if we are unable to set our
requested version).
Note that the use of version 1 tables on a 64-bit system introduces a
possible failure path in which a frame number cannot fit into the
32-bit field within the v1 structure. This in turn introduces
additional failure paths into netfront_transmit() and
netfront_refill_rx().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EFI 1.10 systems (observed on an Apple iMac) do not allow us to
open the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_BY_DRIVER and so we cannot maintain a safe,
long-lived pointer to the device path. Work around this by instead
opening the device path protocol with an attribute of
EFI_OPEN_PROTOCOL_GET_PROTOCOL whenever we need to use it.
Debugged-by: Curtis Larsen <larsen@dixie.edu>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
efi_file_install() and efi_download_install() are both used to install
onto existing handles. There is therefore no need to allow for each
of their calls to InstallMultipleProtocolInterfaces() to create a new
handle.
By passing the handle directly (rather than a pointer to the handle),
we avoid potential confusion (and erroneous debug message colours).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI headers define EFI_HANDLE as a void pointer, which renders
type checking on anything dealing with EFI handles somewhat useless.
Work around this bizarre sabotage attempt by redefining EFI_HANDLE as
a pointer to an anonymous structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a function efi_handle_name() (as a generalisation of
efi_handle_devpath_text()) which tries various methods to produce a
human-readable name for an EFI handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reject network devices which appear to be duplicates of those already
available via a different underlying hardware device. On a Xen PV-HVM
system, this allows us to filter out the emulated PCI NICs (which
would otherwise appear alongside the netfront NICs).
Note that we cannot use the Xen facility to "unplug" the emulated PCI
NICs, since there is no guarantee that the OS we subsequently load
will have a native netfront driver.
We permit devices with the same MAC address if they are attached to
the same underlying hardware device (e.g. VLAN devices).
Inspired-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently treat network devices as available for use via the SNP
API only if RX queue processing has been frozen. (This is similar in
spirit to the way that RX queue processing is frozen for the network
device currently exposed via the PXE API.)
The default state of a freshly created network device is for the RX
queue to not be frozen, and thus to be unavailable for use via SNP.
This causes problems when devices are added through code paths other
than _efidrv_start() (which explicitly releases devices for use via
SNP).
We don't actually need to freeze RX queue processing, since calls via
the SNP API will always use netdev_poll() rather than net_poll(), and
so will never trigger the RX queue processing code path anyway.
We can therefore simplify the code to use a single global flag to
indicate whether network devices are claimed for use by iPXE or
available for use via SNP. Using a global flag allows the default
state for dynamically created network devices to behave sensibly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for Xen PV-HVM domains (detected via the Xen
platform PCI device with IDs 5853:0001), including support for
accessing configuration via XenStore and enumerating devices via
XenBus.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_CONSOLE_CONTROL_PROTOCOL does not exist in the current UEFI
specification, but is required to enable text output on some older EFI
1.10 implementations (observed on an old iMac).
The header is not present in any of the standard include directories,
but can still be found in the EDK2 codebase as part of
EdkCompatibilityPkg.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When building with DEBUG=efi_wrap, print details of calls made by the
loaded image to selected boot services functions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI FAT filesystem driver has a bug: if a block device contains no
FAT filesystem but does have an EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
instance, the FAT driver will assume that it must have previously
installed the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. This causes the FAT
driver to claim control of our device, and to refuse to stop driving
it, which prevents us from later uninstalling correctly.
Work around this bug by opening the disk I/O protocol ourselves,
thereby preventing the FAT driver from opening it.
Note that the alternative approach of opening the block I/O protocol
(and thereby in theory preventing DiskIo from attaching to the block
I/O protocol) causes an endless loop of calls to our DRIVER_STOP
method when starting the EFI shell. I have no idea why this is.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a single instance of EFI_DRIVER_BINDING_PROTOCOL (attached to
our image handle); this matches the expectations scattered throughout
the EFI specification.
Open the underlying hardware device using EFI_OPEN_PROTOCOL_BY_DRIVER
and EFI_OPEN_PROTOCOL_EXCLUSIVE, to prevent other drivers from
attaching to the same device.
Do not automatically connect to devices when being loaded as a driver;
leave this task to the platform firmware (or to the user, if loading
directly from the EFI shell).
When running as an application, forcibly disconnect any existing
drivers from devices that we want to control, and reconnect them on
exit.
Provide a meaningful driver version number (based on the build
timestamp), to allow platform firmware to automatically load newer
versions of iPXE drivers if multiple drivers are present.
Include device paths within debug messages where possible, to aid in
debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the build timestamp (measured in seconds since the Epoch) and
the build name (e.g. "rtl8139.rom" or "ipxe.efi"), and provide the
product name and product short name in a single centralised location.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
With blade servers, the chassis serial number (exposed via ${serial})
may not be unique. Expose ${board-serial} as a named setting to
provide easy access to a more meaningful serial number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iBFT includes an "origin" field to indicate the source of the IP
address. We use the heuristic of assuming that the source should be
"manual" if the IP address originates directly from the network device
settings block, and "DHCP" otherwise. This is an imperfect guess, but
is likely to be correct in most common situations.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Parse the sense data to extract the reponse code, the sense key, the
additional sense code, and the additional sense code qualifier.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Fix an erroneous htonl() in the definition of IN6_IS_ADDR_LINKLOCAL(),
and add self-tests for the IN6_IS_ADDR_xxx() family of macros.
Reported-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Get the NFS URI manipulation code out of nfs_open.c. The resulting
code is now much more readable.
Signed-off-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Interrupt processing adds noise to profiling results. Allow
interrupts (from within protected mode) to be profiled separately,
with time spent within the interrupt handler being excluded from any
other profiling currently in progress.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expand the concept of the X.509 cache to provide the functionality of
a certificate store. Certificates in the store will be automatically
used to complete certificate chains where applicable.
The certificate store may be prepopulated at build time using the
CERT=... build command line option. For example:
make bin/ipxe.usb CERT=mycert1.crt,mycert2.crt
Certificates within the certificate store are not implicitly trusted;
the trust list is specified using TRUST=... as before. For example:
make bin/ipxe.usb CERT=root.crt TRUST=root.crt
This can be used to embed the full trusted root certificate within the
iPXE binary, which is potentially useful in an HTTPS-only environment
in which there is no HTTP server from which to automatically download
cross-signed certificates or other certificate chain fragments.
This usage of CERT= extends the existing use of CERT= to specify the
client certificate. The client certificate is now identified
automatically by checking for a match against the private key. For
example:
make bin/ipxe.usb CERT=root.crt,client.crt TRUST=root.crt KEY=client.key
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently allocates a copy the certificate's common name as a
string. This string is used by the TLS and CMS code to check
certificate names against an expected name, and also appears in
debugging messages.
Provide a function x509_check_name() to centralise certificate name
checking (in preparation for adding subjectAlternativeName support),
and a function x509_name() to provide a name to be used in debugging
messages, and remove the dynamically allocated string.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some UEFI builds will set up a timer to continuously poll any SNP
devices. This can drain packets from the network device's receive
queue before iPXE gets a chance to process them.
Use netdev_rx_[un]freeze() to explicitly indicate when we expect our
network devices to be driven via the external SNP API (as we do with
the UNDI API on the standard BIOS build), and disable the SNP API
except when receive queue processing is frozen.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On a 64-bit build, EFI_STATUS codes are 64-bit quantities, with the
"error/warning" bit located in bit 63.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently advertises a fixed MSS of 1460, which is correct only
for IPv4 over Ethernet. For IPv6 over Ethernet, the value should be
1440 (allowing for the larger IPv6 header). For non-Ethernet link
layers, the value should reflect the MTU of the underlying network
device.
Use tcpip_mtu() to calculate the transport-layer MTU associated with
the peer address, and calculate the MSS to allow for an optionless TCP
header as per RFC 6691.
As a side benefit, we can now fail a connection immediately with a
meaningful error message if we have no route to the destination
address.
Reported-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide the function tcpip_mtu() to allow external code to determine
the (transport-layer) maximum transmission unit for a given socket
address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide the function tcpip_netdev() to allow external code to
determine the transmitting network device for a given socket address.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for parsing of URIs containing literal IPv6 addresses
(e.g. "http://[fe80::69ff:fe50:5845%25net0]/boot.ipxe").
Duplicate URIs by directly copying the relevant fields, rather than by
formatting and reparsing a URI string. This relaxes the requirements
on the URI formatting code and allows it to focus on generating
human-readable URIs (e.g. by not escaping ':' characters within
literal IPv6 addresses). As a side-effect, this allows relative URIs
containing parameter lists (e.g. "../boot.php##params") to function
as expected.
Add validity check for FTP paths to ensure that only printable
characters are accepted (since FTP is a human-readable line-based
protocol with no support for character escaping).
Construct TFTP next-server+filename URIs directly, rather than parsing
a constructed "tftp://..." string,
Add self-tests for URI functions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update the DNS resolver to support DNS search lists (as provided by
DHCP option 119, DHCPv6 option 24, or NDP option 31).
Add validation code to ensure that parsing of DNS packets does not
overrun the input, get stuck in infinite loops, or (worse) write
beyond the end of allocated buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Rename the "--bpp" option to "--depth", to free up the single-letter
option "-b" for "--bottom" in preparation for adding margin support.
This does not break backwards compatibility with documented features,
since the "console" command has not yet been documented.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Centre the background picture on the console, to give a more
consistent result when the aspect ratio does not match the requested
width and height.
Once drawn for the first time, nothing should ever overwrite the
margins of the display. We can therefore eliminate the logic used to
redraw only the margin areas, and use much simpler code to draw the
complete initial background image.
Simplify the redrawing logic further by making the background picture
buffer equal in size to the frame buffer. In the common case of a
background picture which is designed to fill the screen, this wastes
no extra memory, and the combined code simplifications reduce the size
of fbcon.o by approximately 15%.
Redefine the concept of "margin" to match the intuitive definition
(i.e. the size of the gap, rather than the position of the boundary
line).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow test reports to specify an explicit file name and line number
using the extended okx() macro. This allows large blocks of test
report code such as tcpip_random_ok() to be implemented as functions
rather than macros.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The magic basic colour can be remapped at runtime from COLOR_NORMAL_BG
(usually blue) to COLOR_DEFAULT (which will be transparent as a
background colour on the framebuffer console).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a centralised concept of colours and colour pairs (using the
default colour pairs as configured via config/colour.h). A colour
pair consists of a pair of colour indices.
Add the ability to redefine both a colour pair and an individual
colour index, with minimal overhead if this feature is not required
(e.g. because the relevant shell commands are not present in the
build).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a mechanism for consoles to update the recorded console width
and height, and use this width and height to provide the curses COLS
and LINES variables.
We choose not to use ANSI escape sequences to obtain the width and
height, for two reasons:
- iPXE's model is that all output is sent to all consoles; we could
therefore end up with multiple consoles reporting conflicting widths
and heights
- when a serial console is in use, we probably don't want to resize
the output shown on the BIOS console to match the size of the serial
console, since it's likely that the serial console is in use only
for debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for equivalent IPv4 and IPv6 settings (which requires equivalent
settings to be adjacent within the settings list).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Note that IANA has not yet assigned a DHCPv6 option code for the
syslog server. When a code is assigned, the definition of
DHCPV6_LOG_SERVERS should be updated. Until then, an IPv6 address of
a syslog server can be configured manually using e.g.
set syslog6 3ffe:302:11:2::8309
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Our policy is to prefer IPv6 addreses to IPv4 addresses, but to
request IPv6 addresses only if we have an IPv6 address for the name
server itself.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for the existence of references to IPv6 setting types without
dragging in the whole IPv6 stack, by placing the definition of
setting_type_ipv6 in core/settings.c and providing weak stub methods
for parse_ipv6_setting() and format_ipv6_setting().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The fetch_setting() family of functions may currently modify the
definition of the specified setting (e.g. to add missing type
information). Clean up this interface by requiring callers to provide
an explicit buffer to contain the completed definition of the fetched
setting, if required.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The ANSI escape sequences to show and hide the cursor take the form
"<ESC>[?25h" and "<ESC>[?25l" respectively. iPXE currently treats the
'?' character as the final byte. Fix by explicitly treating '?' as an
intermediate byte.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for IPv6 routing table entries to be created for an on-link
prefix where a local address has not yet been assigned to the network
device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for the stateful and stateless variants of the DHCPv6
protocol. The resulting settings block is registered as
"net<x>.dhcpv6", and DHCPv6 options can be obtained using
e.g. "${net0.dhcpv6/23:ipv6}" to obtain the IPv6 DNS server address.
IPv6 addresses obtained via stateful DHCPv6 are not yet applied to the
network device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Include IPv6 within the generic network device configurator
mechanism. The IPv6 configurator will send a router solicitation and
wait for a router advertisement to be received. (As per RFC4861
section 6.3.7, we do this even if advertisements have been received
prior to sending the router solicitation.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE supports multiple mechanisms for network device configuration:
DHCPv4 for IPv4, FIP for FCoE, and SLAAC for IPv6. At present, DHCPv4
requires an explicit action (e.g. a "dhcp" command), FIP is initiated
implicitly upon opening a network device, and SLAAC takes place
whenever a RA happens to be received.
Add a generic concept of a network device configurator, which provides
a common interface to triggering configuration and to reporting the
result of the configuration process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some background jobs have a meaningful ongoing status code (e.g. the
current link status for a job waiting for a network link to come up).
Allow this to be exposed via the job_progress() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Parsing a timeout value (specified in milliseconds) into an internal
timeout value measured in timer ticks is a common operation. Provide
a parse_timeout() value to carry out this conversion automatically.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When chainloading, always retrieve the cached DHCPACK packet from the
underlying PXE stack, and apply it as the original contents of the
"net<X>.dhcp" settings block. This allows cached DHCP settings to be
used for any chainloaded iPXE binary (not just undionly.kkpxe).
This change eliminates the undocumented "use-cached" setting. Issuing
the "dhcp" command will now always result in a fresh DHCP request.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add generic pinger mechanism (analogous to the generic downloader
mechanism) which opens a ping socket, transmits ping requests, and
passes information about ping replies to a callback function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Merge common functionality between IPv4 and IPv6 ICMP echo handling,
and add support for transmitting ICMP echo requests and delivering
ICMP echo replies to a (not yet implemented) ping_rx() function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The path MTU is currently hardcoded to 1460 bytes, which fails to
allow space for TCP options. Sending a maximum-sized datagram (which
is viable when using HTTP POST) will therefore fail since the Ethernet
MTU will be exceeded.
Reduce the hardcoded path MTU to produce a maximum datagram of 1280
bytes, which is the size required of data link layers by IPv6. It is
a reasonable assumption that all intermediary data link layers will be
able to convey this packet without fragmentation, even for IPv4.
Note that this reduction has a minimal impact upon download
throughput, since it affects only the transmit data path.
Originally-fixed-by: Suresh Sundriyal <ssundriy@vmware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the existing partially-implemented IPv6 stack with a fresh
implementation.
This implementation is not yet complete. The IPv6 transmit and
receive datapaths are functional (including fragment reassembly and
parsing of arbitrary extension headers). NDP neighbour solicitations
and advertisements are supported. ICMPv6 echo is supported.
At present, only link-local addresses may be used, and there is no way
to specify an IPv6 address as part of a URI (either directly or via
a DNS lookup).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split the protocol-independent portions of arp.c into a separate file
neighbour.c, to allow for sharing of functionality between IPv4+ARP
and IPv6+NDP.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
IPv6 link-local socket addresses require some way to specify a local
network device. We cannot simply use a pointer to the network device,
since a struct sockaddr_in6 may be long-lived and has no way to hold a
reference to the network device.
Using a network device index allows a socket address to cleanly refer
to a network device without worrying about whether or not that device
continues to exist.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Give tap devices a meaningful name, and avoid segmentation faults when
attempting to retrieve ${net0/bustype} by assigning a new bus type for
tap devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for configurable provision of built-in settings by placing them
in a linker table rather than an array.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
HTTP POST requires the ability to associate a parameter list with a
URI. There is no standardised syntax for this. Use a non-standard
URI syntax to incorporate the specification of a parameter list within
a URI:
URI = [ absoluteURI | relativeURI ]
[ "#" fragment ] [ "##params" [ "=" paramsName ] ]
e.g.
http://boot.ipxe.org/demo/boot.php##paramshttp://boot.ipxe.org/demo/boot.php##params=mylist
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow memory map entries to be read using the syntax
${memmap/<region>.<properties>.<scale>}
where <region> is the index of the memory region, <properties> is a
bitmask where bit 0 represents the start address and bit 1 represents
the length (allowing the end address to be encoded by having both bits
0 and 1 set), and <scale> is the number of bits by which to shift the
result.
This allows for several values of interest to be encoded. For
example:
${memmap/<region>.1.0:hexraw} # 64-bit start address of <region>
${memmap/<region>.2.0:hexraw} # 64-bit length of <region>, in bytes
${memmap/<region>.3.0:hexraw} # 64-bit end address of <region>
${memmap/<region>.2.10:int32} # Length of <region>, in kB
${memmap/<region>.2.20:int32} # Length of <region>, in MB
The numeric encoding is slightly more sophisticated than described
here, allowing a single encoding to cover multiple regions. (See the
source code for details.) The primary use case for this feature is to
provide the total system memory size (in MB) via the "memsize"
predefined setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Adrian Jamróz <adrian.jamroz@gmail.com>
Modified-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Replace the old via-rhine driver with a new version using the iPXE
API.
Includes fixes by Thomas Miletich for:
- MMIO access
- Link detection
- RX completion in RX overflow case
- Reset and EEPROM reloading
- CRC stripping
- Missing cpu_to_le32() calls
- Missing memory barriers
Signed-off-by: Adrian Jamróz <adrian.jamroz@gmail.com>
Modified-by: Thomas Miletich <thomas.miletich@gmail.com>
Tested-by: Thomas Miletich <thomas.miletich@gmail.com>
Tested-by: Robin Smidsrød <robin@smidsrod.no>
Modified-by: Michael Brown <mcb30@ipxe.org>
Tested-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a facility for settings blocks to act as symbolic links to other
settings blocks, and reimplement the "netX" virtual settings block
using this facility.
The primary advantage of this approach is that unscoped settings such
as ${mac} and ${filename} will now reflect the settings obtained from
the most recently opened network device: in most cases, this will mean
the settings obtained from the most recent DHCP attempt. This should
improve conformance to the principle of least astonishment.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow values to be read from PCI configuration space using the syntax
${pci/<busdevfn>.<offset>.<length>}
where <busdevfn> is the bus:dev.fn address of the PCI device
(expressed as a single integer, as returned by ${net0/busloc}),
<offset> is the offset within PCI configuration space, and <length> is
the length within PCI configuration space.
Values are returned in reverse byte order, since PCI configuration
space is little-endian by definition.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow network device's "busloc" setting to be formatted as a PCI
bus:dev.fn address using e.g. ${net0/busloc:busdevfn}.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a generic hex_decode() routine which can be shared between the
Base16 code and the "hex" and "hexhyp" settings parsers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC2560 mandates that a valid OCSP response will contain exactly one
relevant certificate. However, some OCSP responders include
extraneous certificates. iPXE currently assumes that the first
certificate in the OCSP response is the relevant certificate; OCSP
checks will therefore fail if the responder includes the extraneous
certificates before the relevant certificate.
Fix by using the responder ID to identify the relevant certificate.
Reported-by: Christian Stroehmeier <stroemi@mail.uni-paderborn.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the syntax for numerical SMBIOS settings from
smbios/<type>.<offset>.<length>
to
smbios/[<instance>.]<type>.<offset>.<length>
Where SMBIOS provides multiple structures with the same <type>, this
extended syntax allows for access to structures other than the first.
If <instance> is omitted then it will default to zero, giving access
to the first instance (and so matching existing behaviour).
The 16-bit SMBIOS handle (which is an alternative way to disambiguate
multiple instances of the same type of structure) can be accessed, if
required, using
smbios/<instance>.<type>.2.2:uint16
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Create an explicit concept of "settings scope" and eliminate the magic
values used for numerical setting tags.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Devices with small transmit descriptor rings may temporarily run out
of space. Provide netdev_tx_defer() to allow drivers to defer packets
for retransmission as soon as a descriptor becomes available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Exploit the redefinition of iPXE error codes to include a "platform
error code" to allow for meaningful conversion of EFI_STATUS values to
iPXE errors and vice versa.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The low 8 bits of an iPXE error code are currently defined as the
closest equivalent PXE error code. Generalise this scheme to
platforms other than PC-BIOS by extending this definition to "closest
equivalent platform error code". This allows for the possibility of
returning meaningful errors via EFI APIs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Abstract out the ability to reboot the system to a separate reboot()
function (with platform-specific implementations), add an EFI
implementation, and make the existing "reboot" command available under
EFI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When iPXE is used as a UEFI driver, the UEFI PXE base code currently
provides the TCP/IP stack, network protocols, and user interface.
This represents a substantial downgrade from the standard BIOS iPXE
user experience.
Fix by installing our own EFI_LOAD_FILE_PROTOCOL implementation which
initiates the standard iPXE boot procedure. This upgrades the UEFI
iPXE user experience to match the standard BIOS iPXE user experience.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose iPXE's images as a UEFI file system, allowing the booted image
to access all images downloaded by iPXE.
This functionality is complementary to the custom iPXE download
protocol. The iPXE download protocol allows a booted image to utilise
iPXE to download arbitrary URIs, but requires the booted image to
specifically support the custom iPXE download protocol. The new
functionality limits the booted image to accessing only files that
were already downloaded by iPXE (e.g. as part of a script), but can
work with any generic UEFI image (e.g. the UEFI shell). Both
protocols are provided simultaneously, and are attached to the SNP
device handle.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PXE TFTP API allows the caller to request a particular TFTP block
size. Since mid-2008, iPXE has appended a "?blksize=xxx" parameter to
the TFTP URI constructed internally; nothing has ever parsed this
parameter. Nobody seems to have cared that this parameter has been
ignored for almost five years.
Fix by using xfer_window(), which provides a fairly natural way to
convey the block size information from the PXE TFTP API to the TFTP
protocol layer.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The iBFT has a VLAN field that should be filled in. Add the
vlan_tag() function to extract the VLAN tag of a network device.
Since VLAN support is optional, define a weak function that returns 0
when iPXE is built without VLAN support.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow non-data records to be split across multiple received I/O
buffers, to accommodate large certificate chains.
Reported-by: Nicola Volpini <Nicola.Volpini@kambi.com>
Tested-by: Nicola Volpini <Nicola.Volpini@kambi.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
serial_console_init() would enable serial console support without
knowing if the serial driver succeeded or not. As a result, the
serial console would interfere with a normal keyboard on a system
lacking serial support.
Reported-by: Jan ONDREJ (SAL) <ondrejj(at)salstar.sk>
Signed-off-by: Shao Miller <sha0.miller@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
TLS servers are not obliged to implement the RFC3546 maximum fragment
length extension, and many common servers (including OpenSSL, as used
in Apache's mod_ssl) do not do so. iPXE may therefore have to cope
with TLS records of up to 16kB. Allocations for 16kB have a
non-negligible chance of failing, causing the TLS connection to abort.
Fix by maintaining the received record as a linked list of I/O
buffers, rather than a single contiguous buffer. To reduce memory
pressure, we also decrypt in situ, and deliver the decrypted data via
xfer_deliver_iob() rather than xfer_deliver_raw().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When fetching a named setting using a name that does not explicitly
specify a type, default to using the type stored when the setting was
created, rather than always defaulting to "string". This allows the
behaviour of user-defined settings to match the behaviour of
predefined settings (which have a sensible default type).
For example:
set server:ipv4 192.168.0.1
echo ${server}
will now print "192.168.0.1", rather than trying to print out the raw
IPv4 address bytes as a string.
The downside of this change is that existing tricks for printing
special characters within scripts may require (backwards-compatible)
modification. For example, the "clear screen" sequence:
set esc:hex 1b
set cls ${esc}[2J
echo ${cls}
will now have to become
set esc:hex 1b
set cls ${esc:string}[2J # Must now explicitly specify ":string"
echo ${cls}
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Almost all clients of the raw-packet interfaces (UNDI and SNP) can
handle only Ethernet link layers. Expose an Ethernet-compatible link
layer to local clients, while remaining compatible with IPoIB on the
wire. This requires manipulation of ARP (but not DHCP) packets within
the IPoIB driver.
This is ugly, but it's the only viable way to allow IPoIB devices to
be driven via the raw-packet interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for allocation of memory blocks having a specified offset from a
specified physical alignment, such as being 12 bytes before a 2kB
boundary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The script include/ipxe/efi/import.pl relies on a particular format
for the #include guard in order to detect EFI headers that are not
imported.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A window size of 256kB should be sufficient to allow for
full-bandwidth transfers over a Gigabit LAN, and for acceptable
transfer speeds over other typical links.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Discarding the active ARP cache entry in the middle of a download will
substantially disrupt the TCP stream. Try to minimise any such
disruption by treating ARP cache entries as expensive, and discarding
them only when nothing else is available to discard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently aligns all I/O buffers on a 2kB boundary. This is
overkill for transmitted packets, which are typically much smaller
than 2kB.
Align I/O buffers on their own size. This reduces the alignment
requirement for small buffers, while preserving the guarantee that I/O
buffers will never cross boundaries that might cause problems for some
DMA engines.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The default maximum plaintext fragment length for TLS is 16kB, which
is a substantial amount of memory for iPXE to have to allocate for a
temporary decryption buffer.
Reduce the memory footprint of TLS connections by requesting a maximum
fragment length of 2kB.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The maximum unscaled TCP window (64kB) implies a maximum bandwidth of
around 300kB/s on a WAN link with an RTT of 200ms. Add support for
the TCP window scaling option to remove this upper limit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calculating the TCP/IP checksum on received packets accounts for a
substantial fraction of the response latency.
Signed-off-by: Michael Brown <mcb30@ipxe.org>