Some RTL8169 cards (observed with an RTL8169SC) crash and burn if DAC
is enabled, even if only 32-bit addresses are used. Observed
behaviour includes system lockups and repeated transmission of garbage
data onto the wire.
This seems to be a known problem. The Linux r8169 driver disables DAC
by default and provides a "use_dac" module parameter.
There appears to be no known test for determining whether or not DAC
will work. As a workaround, enable DAC only if we are built as as
64-bit binary. This at least eliminates the problem in the common
case of a 32-bit build, which will never use 64-bit addresses anyway.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some bits in the C+ Command register are always one. Testing for the
presence of the register must allow for this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) power up with
TCR.MXDMA set to 16 bytes. While this does not prevent proper
operation, it almost certainly degrades performance.
Fix by explicitly setting TCR.MXDMA to "unlimited".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) power up with invalid
values in RCR.RXFTH and RCR.MXDMA, causing receive DMA to fail. Fix
by setting explicit values for both fields.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some RTL8169 cards (observed with an RTL8169SC) power up with garbage
values in the ring address registers, and do not clear the registers
on reset.
Fix by always setting the high dword of the ring address registers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 501527d ("[http] Treat any unexpected connection close as an
error") introduced a regression causing HTTP SAN booting to fail. At
the end of the response to the HEAD request, the call to http_done()
would erroneously believe that the server had disconnected in the
middle of the HTTP headers.
Fix by treating the header block from a HEAD request as a trailer
block. This fixes the problem and also simplifies the logic in
http_rx_header().
Reported-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The FTP SIZE command allows us to get the size of a particular file,
as a consequence, we can now show proper transfer progression while
fetching a file using the FTP protocol.
Signed-off-by: Marin Hannache <git@mareo.fr>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently checks that the server has not closed the connection
mid-stream (i.e. in the middle of a chunked transfer, or before the
specified Content-Length has been received), but does not check that
the server got as far as starting to send data. Consequently, if the
server closes the connection before any data is transferred (e.g. if
the server gives up waiting while iPXE performs the validation steps
for TLS), then iPXE will treat this as a successful transfer of a
zero-length file.
Fix by checking the RX connection state, and forcing an error if the
server has closed the connection at an unexpected point.
Originally-fixed-by: Marin Hannache <mareo@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UNDI ROM header does contain a checksum byte. Apparently no-one
cares about this, since iPXE has left it as zero for years without
anyone noticing.
Since Option::ROM now understands the UNDI ROM header, we may as well
fix up the checksum byte for the sake of completeness.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Option::ROM currently understands only JMP NEAR and JMP SHORT
instructions in the initialisation entry point. At least one Broadcom
option ROM has been observed to use a CALL NEAR instruction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
COMBOOT images are detected by looking for a ".com" or ".cbt" filename
extension. There are widely-used files with a ".com" extension, such
as "wdsnbp.com", which are PXE images rather than COMBOOT images.
Avoid false detection of PXE images as COMBOOT images by accepting
only a ".cbt" extension as indicating a COMBOOT image.
Interestingly, this bug has been present for a long time but was
frequently concealed because the filename was truncated to fit the
fixed-length "name" field in struct image. (PXE binaries ending in
".com" tend to be related to Windows deployment products and so often
use pathnames including backslashes, which iPXE doesn't recognise as a
path separator and so treats as part of a very long filename.)
Commit 1c127a6 ("[image] Simplify image management commands and
internal API") made the image name a variable-length field, and so
exposed this flaw in the COMBOOT image detection algorithm.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the file mode to be specified using a "mode=" command line
parameter. For example:
initrd http://web/boot/bootlocal.sh /opt/bootlocal.sh mode=755
Requested-by: Bryce Zimmerman <bryce.zimmerman@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some AMI BIOSes apparently break in exciting ways when asked for PMM
allocations for sizes that are not multiples of 4kB.
Fix by rounding up the image source area to the nearest 4kB. (The
temporary decompression area is already rounded up to the nearest
128kB, to facilitate sharing between multiple iPXE ROMs.)
Reported-by: Itay Gazit <itayg@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Change the DMA alignment from 4096 bytes to 16 bytes, to conserve
available DMA memory. The hardware doesn't have any specific
alignment requirements.
Signed-off-by: Thomas Miletich <thomas.miletich@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reduce CPU usage while waiting for user input. This is particularly
important for virtual machines, where CPU is a shared resource.
Reported-by: Alessandro Salvatori <alessandro@embrane.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Cygwin's assembler treats '/' as a comment character.
Reported-by: Steve Goodrich <steve.goodrich@se-eng.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Similarly to FreeBSD, OpenBSD requires the object format to be
specified as elf_i386_obsd rather than elf_i386.
Reported-by: Jiri B <jirib@devio.us>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI3.0 allows us to report a "runtime size" which can be smaller than
the actual ROM size. On systems that support PMM our runtime size
will be small (~2.5kB), which helps to conserve the limited option ROM
space. However, there is no guarantee that the PMM allocation will
succeed, and so we need to report the worst-case runtime size in the
PCI header.
Move the "shrunk ROM size" field from the PCI header to a new "iPXE
ROM header", allowing it to be accessed by ROM-manipulation utilities
such as disrom.pl.
Reported-by: Anton D. Kachalov <mouse@yandex-team.ru>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Datasheet pp. 41-42 defines 'rx packet length' as upper word of
'status' dword field of the receive descriptor table.
http://www.smsc.com/media/Downloads_Archive/discontinued/83c171.pdf
Tested on SMC EtherPower II.
Signed-off-by: Alexey Smazhenko <darkover@corbina.com.ua>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gcc 4.7 produces a spurious warning about an array subscript being out
of bounds. Use a pointer dereference instead of an array lookup to
inhibit this spurious warning.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the process of booting by ensuring that old images are not
left registered after an unsuccessful autoboot attempt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose image tail-recursion to iPXE scripts via the "--replace"
option. This functions similarly to exec() under Unix: the
currently-executing script is replaced with the new image (as opposed
to running the new image as a subroutine).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The script include/ipxe/efi/import.pl relies on a particular format
for the #include guard in order to detect EFI headers that are not
imported.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PXENV_FILE_CMDLINE is an iPXE extension, and will not be supported by
most PXE stacks. Do not report any errors to the user, since in
almost all cases the error will mean simply "not loaded by iPXE".
Reported-by: Patrick Domack <patrickdk@patrickdk.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI_CPU_IO_PROTOCOL is not available on all EFI platforms. In
particular, it is not available under OVMF, as used for qemu.
Since the EFI_CPU_IO_PROTOCOL is an abomination of unnecessary
complexity, banish it and use raw I/O instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Attempt to restore the network device to the state it was in prior to
calling the NBP. This simplifies the task of taking follow-up action
in an iPXE script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow scripts to report errors in more detail by exposing the most
recent error via the ${errno} setting. For example:
chain ${filename} || goto failed
...
:failed
imgfree http://192.168.0.1/ipxe_error.php?error=${errno}
Note that ${errno} is valid only immediately after executing a failed
command.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes (observed on a Supermicro system with an AMI BIOS) seem to
use the area immediately below 0x7c00 to store data related to the
boot process. This data is currently liable to be overwritten by the
temporary stack used while decompressing and installing iPXE.
Try to avoid any such problems by placing the temporary stack
immediately after the loaded iPXE binary. Any memory used by the
stack could then potentially have been overwritten anyway by a larger
binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On i350 the datasheet contradicts itself in stating that the default
value of RXDCTL.ENABLE for queue zero is both set (according to the
"Receive Initialization" section) and unset (according to the "Receive
Descriptor Control - RXDCTL" section). Empirical evidence suggests
that the default value is unset.
Explicitly enable both transmit and receive queues to avoid any
ambiguity.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On 82576 (and probably others), the datasheet states that "the tail
register of the queue (RDT[n]) should not be bumped until the queue is
enabled". There is some confusion over exactly what constitutes
"enabled": the initialisation blurb says that we should "poll the
RXDCTL register until the ENABLE bit is set", while the description
for the RXDCTL register says that the ENABLE bit is set by default
(for queue zero). Empirical evidence suggests that the ENABLE bit
reads as set immediately after writing to RCTL.EN, and so polling is
not necessary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The setup_move_size field is not defined in protocol versions earlier
than 2.00 (and is obsolete in versions later than 2.01). In binaries
using versions earlier than 2.00, the relevant location is likely to
contain executable code.
Interestingly, this bug has been present since support for pre-2.00
protocol versions was added in 2009, and has been unexpectedly
modifying the memtest86+ code fragment:
mov $0x92, %dx
inb %dx, %al
Fortuitously, the modification exactly overwrote the value loaded into
%dx, and so the net effect was limited to causing Fast Gate A20
detection to always fail.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A window size of 256kB should be sufficient to allow for
full-bandwidth transfers over a Gigabit LAN, and for acceptable
transfer speeds over other typical links.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The maximum TCP throughput is fundamentally limited by the amount of
available receive buffer space. Increase the heap size from 128kB to
512kB to allow the use of larger TCP windows.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Whenever memory pressure causes a queued packet to be discarded (and
so retransmitted), reduce the maximum TCP window to a size that would
have prevented the discard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Discarding the active ARP cache entry in the middle of a download will
substantially disrupt the TCP stream. Try to minimise any such
disruption by treating ARP cache entries as expensive, and discarding
them only when nothing else is available to discard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
I/O buffers are allocated on aligned boundaries. The I/O buffer
descriptor (the struct io_buffer) is currently attached to the end of
the I/O buffer. When the size of the buffer is close to its
alignment, this can waste large amounts of aligned memory.
For example, a network card using 2048-byte receive buffers will end
up allocating 2072 bytes on a 2048-byte boundary. This effectively
wastes 50% of the available memory.
Improve the situation by allocating the descriptor separately from the
main I/O buffer if inline allocation would cause the total allocated
size to cross the alignment boundary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current logic is to process at most one received packet per call
to net_poll(), on the basis that refilling the hardware descriptor
ring should be delayed as little as possible. However, this limits
the rate at which packets can be processed and ultimately ends up
adding latency which, in turn, limits the achievable throughput.
With temporary modifications in place to essentially remove all
resource constraints (heap size increased to 16MB, RX descriptor ring
increased to 64 descriptors) and a TCP window size of 1MB, the
throughput on a gigabit (i.e. 119MBps) network can be observed to fall
off exponentially from around 115MBps to around 75MBps. Changing
net_poll() to process all received packets results in a steady
119MBps throughput.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 196751c ("[build] Enable warnings when building utilities")
revealed a previously hidden compiler warning in util/nrv2b.c
regarding an out-of-bounds array subscript in the code
#if defined(SWD_BEST_OFF)
if (s->best_pos[2] == 0)
s->best_pos[2] = key + 1;
#endif
where best_pos[] is defined by
#define SWD_BEST_OFF 1
#if defined(SWD_BEST_OFF)
unsigned int best_off[ SWD_BEST_OFF ];
unsigned int best_pos[ SWD_BEST_OFF ];
#endif
With SWD_BEST_OFF set to 1, it can be proven that all code paths
referring to s->best_off[] and s->best_pos[] will never be executed,
with the exception of the two lines above. Since these two lines
alone can have no effect on execution, we can safely undefine
SWD_BEST_OFF.
Verified by comparing md5sums of bin/undionly.kpxe before and after
the change.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Each ARP cache entry maintains a transmission queue, which is sent out
as soon as the link-layer address is known. If multiple packets are
queued, then it is possible for memory pressure to cause the ARP cache
discarder to be invoked during transmission of the first packet, which
may cause the ARP cache entry to be deleted before the second packet
can be sent. This results in an invalid pointer dereference.
Avoid this problem by reference-counting ARP cache entries and
ensuring that an extra reference is held while processing the
transmission queue, and by using list_first_entry() rather than
list_for_each_entry_safe() to traverse the queue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit ea61075 ("[tcp] Add support for TCP window scaling") introduced
a potential NULL pointer dereference by referring to the connection's
send window scale before checking whether or not the connection is
known.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently aligns all I/O buffers on a 2kB boundary. This is
overkill for transmitted packets, which are typically much smaller
than 2kB.
Align I/O buffers on their own size. This reduces the alignment
requirement for small buffers, while preserving the guarantee that I/O
buffers will never cross boundaries that might cause problems for some
DMA engines.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The default maximum plaintext fragment length for TLS is 16kB, which
is a substantial amount of memory for iPXE to have to allocate for a
temporary decryption buffer.
Reduce the memory footprint of TLS connections by requesting a maximum
fragment length of 2kB.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The maximum unscaled TCP window (64kB) implies a maximum bandwidth of
around 300kB/s on a WAN link with an RTT of 200ms. Add support for
the TCP window scaling option to remove this upper limit.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The undinet driver always has to make a copy of the received frame
into an I/O buffer. Align this copy sensibly so that subsequent
operations are as fast as possible.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Checking for keypresses takes a non-negligible amount of time, and
measurably affects our RTT. Minimise the impact by checking for
keypresses only once per timer tick.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The generic TCP/IP checksum implementation requires approximately 10
CPU clocks per byte (as measured using the TSC). Improve this to
approximately 0.5 CPU clocks per byte by using "lodsl ; adcl" in an
unrolled loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Calculating the TCP/IP checksum on received packets accounts for a
substantial fraction of the response latency.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "rep" prefix can be used with an iteration count of zero, which
allows the variable-length memcpy() to be implemented without using
any conditional jumps.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonably large (512MB) file transferred via HTTP over Gigabit
Ethernet should complete in around 4.6 seconds. Increase the
resolution of the "time" command to tenths of a second, to allow such
transfers to be meaningfully measured.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use hw pointer in PCI driver data as expected by sky2_remove().
Signed-off-by: Valentine Barshak <gvaxon@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE has no concept of the local time zone, mainly because there is no
viable way to obtain time zone information in the absence of local
state. This causes potential problems with newly-issued certificates
and certificates that are about to expire.
Avoid such problems by allowing an error margin of around 12 hours on
certificate validity periods, similar to the error margin already
allowed for OCSP response timestamps.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
FCoE requires the use of multiple local unicast link-layer addresses.
To avoid the complexity of managing multiple addresses, iPXE operates
in promiscuous mode. As a consequence, any unicast packets with
non-matching IPv4 addresses are rejected at the IPv4 layer (rather
than at the link layer).
This can cause problems when issuing a second DHCP request: if the
address chosen by the DHCP server does not match the existing address,
then the DHCP response will itself be rejected.
Fix by requesting a broadcast response from the DHCP server if the
network interface already has any IPv4 addresses.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PMM defines the return code 0xffffffff as meaning "unsupported
function". It's hard to imagine a PMM BIOS that doesn't support
pmmAllocate(), but apparently such things do exist.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A .mrom image currently assumes that it is the first image within the
expansion ROM BAR, which may not be correct when multiple images are
present.
Fix by scanning through the BAR until we locate an image matching our
build ID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The header of a .mrom image declares its length to be only a few
kilobytes; the remainder is accessed via a sideband mechanism. This
makes it difficult to append an additional ROM image, such as an EFI
ROM.
Add a second, dummy ROM header covering the payload portion of the
.mrom image, allowing consumers to locate any appended ROM images in
the usual way.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Avoid potential confusion in the documentation by using a
vendor-neutral name for the extended (AMD-defined) feature set.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add "sync" command (loosely based on the Unix "sync"), which will wait
for any pending operations to complete.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE is fundamentally asynchronous in operation: some operations
continue in the background even after the foreground has continued to
a new task. For example, the closing FIN/ACK exchanges of a TCP
connection will take place in the background after an HTTP download
has completed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>