A management interface is the component through which both local and
remote management agents are accessed.
This new implementation of a management interface allows for the user
to react to timed-out transactions, and also allows for cancellation
of in-progress transactions.
Some BIOSes support the BIOS Boot Specification (BBS) but fail to set
%es:%di correctly when calling the option ROM initialisation entry
point. This causes gPXE to identify the BIOS as non-PnP (and so
non-BBS), leaving the user unable to control the boot order.
Fix by scanning for the $PnP signature ourselves, rather than relying
on the BIOS having passed in %es:%di correctly.
Tested-by: Helmut Adrigan <helmut.adrigan@chello.at>
The IBA specification refers to management "interfaces" and "agents".
The interface is the component that connects to the queue pair and
sends and receives MADs; the agent is the component that constructs
the reply to the MAD.
Rename the IB_{QPN,QKEY,QPT} constants as a first step towards making
this separation in gPXE.
The function __intel_new_proc_init() (called implicitly when building
using icc) is marked with __attribute__((cdecl)). This breaks
building on x86_64, where cdecl is meaningless.
Fix by replacing with the existing __libgcc macro, which is already
defined to be "__attribute__((cdecl))" for i386 builds and empty for
x86_64 builds.
The MIT and ISC licenses are legally equivalent to the bsd2 license,
but with slightly different verbiage.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
pxe_api.h is just a description of API functions, it's actively
undesirable to have more implementations than necessary. Allowing it
under the MIT license lets the Syslinux libraries use it.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
In several places, we currently use size_t to represent a difference
between TCP sequence numbers. This can cause compiler warnings
relating to printf format specifiers, since the result of
(uint32_t+size_t) may be an unsigned long on some compilers.
Fix by using uint32_t for all variables that represent a difference
between TCP sequence numbers.
Tested-by: Joshua Oreman <oremanj@xenon.get-linux.org>
The geniso, genliso and gensdsk scripts contain hard-coded temporary
directory names, and so could potentially collide with each other when
run as part of a concurrent build (e.g. "make -j 4").
Fix by using mktemp to generate suitable temporary directory names.
We add a syslinux floppy disk type using parts of the genliso script.
This floppy image cat be dd'ed to a physical floppy or used in
instances where a virtual floppy with an mountable DOS filesystem is
useful.
We also modify the genliso script to only generate .liso images
rather than creating images depending on how it is called.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
This is required for all modern 802.11 devices, and allows drivers
to be written for them with minimally more effort than is required
for a wired NIC.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Modified-by: Michael Brown <mcb30@etherboot.org>
The Linux IB Communication Manager will always send MADs to QP1,
rather than back to the originating QP. On Hermon, QP1 is by default
handled by the embedded firmware. We can change this, but the cost is
that we have to handle both QP0 and QP1 (i.e. we have to provide SMA
as well as GMA service in software), and we have to use MLX queues
rather than standard UD queues (i.e. we have to construct the UD
datagrams by hand).
There doesn't seem to be any viable way around this situation, ugly
though it is.
Queue pairs are now assumed to be created in the INIT state, with a
call to ib_modify_qp() required to bring the queue pair to the RTS
state.
ib_modify_qp() no longer takes a modification list; callers should
modify the relevant queue pair parameters (e.g. qkey) directly and
then call ib_modify_qp() to synchronise the changes to the hardware.
The packet sequence number is now a property of the queue pair, rather
than of the device.
Each queue pair may have an associated address vector. For RC queue
pairs, this is the address vector that will be programmed in to the
hardware as the remote address. For UD queue pairs, it will be used
as the default address vector if none is supplied to ib_post_send().
Now that MAD handlers no longer return a status code, we can allow
them to return a pointer to a MAD structure if and only if they want
to send a response. This provides a more natural and flexible
approach than using a "response method" field within the handler's
descriptor.
MAD handlers have to set the status fields within the MAD itself
anyway, in order to provide a meaningful response MAD; the additional
gPXE return status code is just noise.
Note that we probably don't need to ever explicitly set the status to
IB_MGMT_STATUS_OK, since it should already have this value from the
request. (By not explicitly setting the status in this way, we can
safely have ib_sma_set_xxx() call ib_sma_get_xxx() in order to
generate the GetResponse MAD without worrying that ib_sma_get_xxx()
will clear any error status set by ib_sma_set_xxx().)
Most IB hardware seems not to allow allocation of the genuine QPNs 0
and 1, so allow for the externally-visible QPN (as constructed and
parsed by ib_packet, where used) to differ from the real
hardware-allocated QPN.
The queue key is stored as a property of the queue pair, and so can
optionally be added by the Infiniband core at the time of calling
ib_post_send(), rather than always having to be specified by the
caller.
This allows IPoIB to avoid explicitly keeping track of the data queue
key.
Now that path record lookups are handled entirely via
ib_resolve_path(), the only role of the IPoIB peer cache is as a
lookup table for MAC addresses. Update the code structure and
comments to reflect this.
The IPoIB broadcast MAC address varies according to the partition key.
Now that the broadcast MAC address is a property of the network device
rather than of the link layer, we can expose this real MAC address
directly.
The broadcast LID is now identified via a path record lookup; this is
marginally inefficient (since it was present in the MCMemberRecord
GetResponse), but avoids the need to special-case broadcasts when
constructing the address vector in ipoib_transmit().
Generalise the subnet management agent into a general management agent
capable of sending and responding to MADs, including support for
retransmissions as necessary.
Currently, all Infiniband users must create a process for polling
their completion queues (or rely on a regular hook such as
netdev_poll() in ipoib.c).
Move instead to a model whereby the Infiniband core maintains a single
process calling ib_poll_eq(), and polling the event queue triggers
polls of the applicable completion queues. (At present, the
Infiniband core simply polls all of the device's completion queues.)
Polling a completion queue will now implicitly refill all attached
receive work queues; this is analogous to the way that netdev_poll()
implicitly refills the RX ring.
Infiniband users no longer need to create a process just to poll their
completion queues and refill their receive rings.
IPoIB and the SMA have separate constants for the packet size to be
used to I/O buffer allocations. Merge these into the single
IB_MAX_PAYLOAD_SIZE constant.
(Various other points in the Infiniband stack have hard-coded
assumptions of a 2048-byte payload; we don't currently support
variable MTUs.)
IPoIB has a link-layer broadcast address that varies according to the
partition key. We currently go through several contortions to pretend
that the link-layer address is a fixed constant; by making the
broadcast address a property of the network device rather than the
link-layer protocol it will be possible to simplify IPoIB's broadcast
handling.
Move the icky call to step() from aoe.c to ata.c; this takes it at
least one step further away from where it really doesn't belong.
Unfortunately, AoE has the ugly aoe_discover() mechanism which means
that we still have a step() loop in aoe.c for now; this needs to be
replaced at some future point.
Objects typically call xfer_close() as part of their response to a
close() message. If the initiating object has already nullified the
xfer interface then this isn't a problem, but it can lead to
unexpected behaviour when the initiating object is aiming to reuse the
connection and so does not nullify the interface.
Fix by always temporarily nullifying the interface during xfer_close()
(as was already being done by xfer_vreopen() in order to work around
this specific problem).
Reported-by: infernix <infernix@infernix.net>
Tested-by: infernix <infernix@infernix.net>
These commands can be used to activate or deactivate the PXE API (on a
specifiable network interface).
This is currently of limited use, since most image formats will call
shutdown() before booting the image, meaning that the underlying net
device gets shut down during remove_devices() anyway.
ifcommon_exec() was long-ago marked as __attribute__((regparm(2))) in
order to minimise the size of functions that call into it. Since
then, gPXE has added -mregparm=3 as a general compilation option, and
this "optimisation" is now counter-productive.
Change (and simplify) the prototype to minimise code size given the
current compilation conditions.
pxe_init_structures() fills in the fields of the !PXE and PXENV+
structures that aren't known until gPXE starts up. Once gPXE is
started, these values will never change.
Make pxe_init_structures() an initialisation function so that PXE
users don't have to worry about calling it.
It is possible that the UNDI ISR may be triggered before netdev_tx()
returns control to pxenv_undi_transmit(). This means that
pxenv_undi_isr() may see a zero undi_tx_count, and so not check for TX
completions. This is not a significant problem, since it will check
for TX completions on the next call to pxenv_undi_isr() anyway; it
just means that the NBP will see a spurious IRQ that was apparently
caused by nothing.
Fix by updating the undi_tx_count before calling netdev_tx(), so that
pxenv_undi_isr() can decrement it and report the TX completion.
Symantec Ghost requires working multicast support. gPXE configures
all (sufficiently supported) network adapters into "receive all
multicasts" mode, which means that PXENV_UNDI_SET_MCAST_ADDRESS is
actually a no-op, but the current implementation returns
PXENV_STATUS_UNSUPPORTED instead.
Fix by making PXENV_UNDI_SET_MCAST_ADDRESS return success. For good
measure, also implement PXENV_UNDI_GET_MCAST_ADDRESS, since the
relevant functionality is now exposed by the net device core.
Note that this will silently fail if the gPXE driver for the NIC being
used fails to configure the NIC in "receive all multicasts" mode.
The PXE debugging messages have remained pretty much unaltered since
Etherboot 5.4, and are now difficult to read in comparison to most of
the rest of gPXE.
Bring the pxe_undi debug messages up to normal gPXE standards.
This keeps code size down, since the wireless interface management
commands have the same command-line interface and overall structure as
the wired commands.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
With the addition of link status codes, we can now display a detailed
error indication if iflinkwait() fails.
Putting the error output in iflinkwait avoids code duplication, and
gains symmetry with the other interface management routines; ifopen()
already prints an error directly if it cannot open its interface.
Modified-by: Michael Brown <mcb30@etherboot.org>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Expand the NETDEV_LINK_UP bit into a link_rc status code field,
allowing specific reasons for link failure to be reported via
"ifstat".
Originally-authored-by: Joshua Oreman <oremanj@rwcr.net>
The Symantec UNDI DOS driver fails when run on top of gPXE because we
return our interface type as "gPXE" rather than one of the predefined
NDIS interface type strings.
Fix by returning the standard "DIX+802.3" string; this isn't
necessarily always accurate, but it's highly unlikely that anything
trying to use the UNDI API would understand our IPoIB link-layer
pseudo-header anyway.
The Intel DOS UNDI driver fails when run on top of gPXE because we do
not fill in the ServiceFlags field in PXENV_UNDI_GET_IFACE_INFO.
Fix by filling in the ServiceFlags field with reasonable values
indicating our approximate feature capabilities.
The 3Com DOS UNDI driver fails when run on top of gPXE for two
reasons: firstly because PXENV_UNDI_SET_PACKET_FILTER is unsupported,
and secondly because gPXE enters the NBP without enabling interrupts
on the NIC, and the 3Com driver never calls PXENV_UNDI_OPEN.
Fix by always returning success from PXENV_UNDI_SET_PACKET_FILTER
(which is no worse than the current situation, since we already ignore
the receive packet filter in PXENV_UNDI_OPEN), and by forcibly
enabling interrupts on the NIC within PXENV_UNDI_TRANSMIT. The latter
is something of a hack, but avoids the need to implement a complete
base-code ISR that we would otherwise need if we were to enter the NBP
with interrupts enabled.
Commit 558c1a4 ("[tcp] Improve robustness in the presence of duplicated
received packets") introduced a regression in that an old duplicate
ACK received while in the ESTABLISHED state would pass through normal
ACK processing, including updating tcp->snd_seq.
Fix by ensuring that ACK processing ignores all duplicate ACKs.
All TCP errors or unusual events should now generate a debugging
message at DBGLVL_LOG, with enough information (SEQ and ACK numbers)
to be able to identify the corresponding packet (or missing packet) in
a network trace from the remote end.
This makes it possible to leave TCP debugging enabled in order to see
interesting TCP events, without flooding the console with at least one
message per packet.
This resolves potential difficulties occurring when more than one script
is used. Total cost: 88 bytes uncompressed.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
In order to construct outgoing link-layer frames or parse incoming
ones properly, some protocols (such as 802.11) need more state than is
available in the existing variables passed to the link-layer protocol
handlers. To remedy this, add struct net_device *netdev as the first
argument to each of these functions, so that more information can be
fetched from the link layer-private part of the network device.
Updated all three call sites (netdevice.c, efi_snp.c, pxe_undi.c) and
both implementations (ethernet.c, ipoib.c) of ll_protocol to use the
new argument.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
The 93C66 is identical to the 93C56 in programming interface and
addressing, but twice as large in data storage (4096 bits). It's
used in some RTL8185 wireless cards.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
gPXE responds to duplicated ACKs with an immediate retransmission,
which can lead to a sorceror's apprentice syndrome. It also responds
to out-of-range (or old duplicate) ACKs with a RST, which can cause
valid connections to be dropped.
Fix the sorceror's apprentice syndrome by leaving the retransmission
timer running (and so inhibiting the immediate retransmission) when we
receive a potential duplicate ACK. This seems to match the behaviour
of Linux observed via wireshark traces.
Fix the RST issue by sending RST only on out-of-range ACKs that occur
before the connection is fully established, as per RFC 793.
These problems were exposed during development of the 802.11 wireless
link layer; the 802.11 protocol has a failure mode that can easily
cause duplicated packets. The fixes were tested in a controlled way
by faking large numbers of duplicated packets in the rtl8139 driver.
Originally-fixed-by: Joshua Oreman <oremanj@rwcr.net>
setting_cmp() compares by option tag and then by name. Empty names
will always match, which gives us a false positive.
Fix by explicitly checking for empty names.
Modified-by: Michael Brown <mcb30@etherboot.org>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
MAX_LL_HEADER_LEN is erroneously set to 6 rather than 14, resulting
in possible data corruption whenever we send an ARP packet.
Fix value and add a comment explaining why MAX_LL_ADDR_LEN is greater
than MAX_LL_HEADER_LEN.
Reported-by: Joshua Oreman <oremanj@rwcr.net>
Windows text editors such as Notepad tend to use CRLF line endings,
which breaks gPXE's signature detection for script images. Since
scripts are usually very small, they end up falling back to being
detected as valid PXE executable images (since there are no signature
checks for PXE executables). Executing text files as x86 machine code
tends not to work well.
Fix by allowing for any isspace() character to terminate the "#!gpxe"
signature, and by ensuring that CR characters get stripped during
command line parsing.
Suggested-by: Shao Miller <Shao.Miller@yrdsb.edu.on.ca>
Several SPI chips will respond to an SPI read command with a dummy
zero bit immediately prior to the first real data bit. This can be
used to autodetect the address length, provided that the command
length and data length are already known, and that the MISO data line
is tied high.
Tested-by: Thomas Miletich <thomas.miletich@gmail.com>
Debugged-by: Thomas Miletich <thomas.miletich@gmail.com>
gcc 4.4 defaults to using .cfi assembler directives for debugging
information, which causes unneeded .eh_frame sections to be generated.
These sections are already stripped out by our linker script, so don't
affect the final build, but do distort the output of "size" when run
on individual .o files; the .eh_frame size is included within the size
reported for .text. This makes it difficult to accurately judge the
effects of source code changes upon object code size.
Fix by adding -fno-dwarf2-cfi-asm to CFLAGS if we detect that this
option is supported by the gcc that we are compiling with.
Tested-by: Daniel Verkamp <daniel@drv.nu>
If the ProxyDHCPOFFER already includes PXE options (i.e. option 60 is
set to "PXEClient" and option 43 is present) then assume that the
ProxyDHCPREQUEST can be sent to port 67, rather than port 4011. This
is a reasonable assumption, since in that case the ProxyDHCP server
has already demonstrated by responding to the DHCPDISCOVER that it is
listening on port 67. (If the ProxyDHCP server were not listening on
port 67, then the standard DHCP server would have been configured to
respond with option 60 set to "PXEClient" but no option 43 present.)
The PXE specification is ambiguous on this point; the specified
behaviour covers only the cases in which option 43 is *not* present in
the ProxyDHCPOFFER. In these cases, we will continue to send the
ProxyDHCPREQUEST to port 4011.
This change is required in order to allow us to interoperate with
dnsmasq, which listens only on port 67. (dnsmasq relies on
unspecified behaviour of the Intel PXE stack, which it seems will
retain the ProxyDHCPOFFER as an options source and never issue a
ProxyDHCPREQUEST, thereby enabling dnsmasq to omit listening on port
4011.)
IBM Tivoli PXE Server 5.1.0.3 is reported to send trailing garbage
bytes at the end of the OACK packet, which causes gPXE to reject the
packet and abort the TFTP transfer.
Work around the problem by processing as much as possible of the OACK,
and treating name/value parsing errors as non-fatal.
Reported-by: Shao Miller <Shao.Miller@yrdsb.edu.on.ca>
We currently send all boot server discovery requests to port 4011.
Section 2.2.1 of the PXE spec states that boot server discovery
packets should be "sent broadcast (port 67), multicast (port 4011), or
unicast (port 4011)". Adjust our behaviour so that any boot server
discovery packets that are sent to the broadcast address are directed
to port 67 rather than port 4011.
This is required for operation with dnsmasq as a PXE server, since
dnsmasq listens only on port 67, and relies upon this (specified)
behaviour.
This change may break some setups using the (itself very broken) Linux
PXE server from kano.org.uk. This server will, in its default
configuration, listen only on port 4011. It never constructs a boot
server list (PXE_BOOT_SERVERS, option 43.8), and uses the wrong
definitions for the discovery control bits (PXE_DISCOVERY_CONTROL,
option 43.6). The upshot is that it will always instruct the client
to perform multicast and broadcast discovery only. In setups lacking
a valid multicast route on the server side, this used to work because
gPXE would eventually give up on the (non-responsive) multicast
address and send a broadcast request to port 4011, which the Linux PXE
server would respond to. Now that gPXE correctly sends this broadcast
request to port 67 instead, it is never seen by the Linux PXE server,
and the boot fails. The fix is to either (a) set up a multicast route
correctly on the server side before starting the PXE server, or (b)
edit /etc/pxe.conf to contain the server's unicast address in the
"multicast_address" field (a hack that happens to work).
Suggested-by: Simon Kelley <simon@thekelleys.org.uk>
This prevents gPXE from wasting time attempting to contact a ProxyDHCP
server on port 4011 if the DHCP response already contains the relevant
PXE options. This behaviour is hinted at (though not explicitly
specified) in the PXE spec, and seems to match what the Intel client
does.
Suggested-by: Simon Kelley <simon@thekelleys.org.uk>
It is now possible to run e.g.
make bin/rtl8139.dsk.licence
in order to see a licensing assessment for any given gPXE build. The
assessment will either produce a single overall licence for the build
(based on combining all the licences used within the source files for
that build), or will exit with an error stating why a licence
assessment is not possible (for example, if there are files involved
that do not yet contain an explicit FILE_LICENCE() declaration).
For partly historical reasons, various files in the gPXE source tree
are licensed under different, though compatible, terms. Most of the
code is licensed under GPLv2 with the "or later" clause, but there are
exceptions such as:
The string.h file, which derives from Linux and is licensed as
Public Domain.
The EFI header files, which are taken from the EDK2 source tree and
are licensed under BSD.
The 3c90x driver, which has a custom GPL-like licence text.
Introduce a FILE_LICENCE() macro to make licensing more explicit.
This macro should be applied exactly once to each source (.c, .S or
.h) file. It will cause a corresponding zero-sized common symbol to
be added to any .o files generated from that source file (and hence to
any final gPXE binaries generated from that source file). Determining
the applicable licences to generated files can then be done using e.g.
$ objdump -t bin/process.o | grep __licence
00000000 O *COM* 00000001 .hidden __licence_gpl2_or_later
indicating that bin/process.o is covered entirely by the GPLv2
with the "or later" clause, or
$ objdump -t bin/rtl8139.dsk.tmp | grep __licence
00033e8c g O .bss.textdata 00000000 .hidden __licence_gpl2_only
00033e8c g O .bss.textdata 00000000 .hidden __licence_gpl2_or_later
00033e8c g O .bss.textdata 00000000 .hidden __licence_public_domain
indicating that bin/rtl8139.dsk includes both code licensed under
GPLv2 (both with and without the "or later" clause) and code licensed
as Public Domain.
Determining the result of licence combinations is currently left as an
exercise for the reader.
Etherboot 5.4 erroneously treats PXENV_UNLOAD_STACK as the "final
shutdown" call, and unhooks INT15. When using gPXE's undionly.kpxe,
this results in gPXE overwriting the portion of Etherboot located in
high memory, because it is no longer hidden from the system memory map
at the time that gPXE loads.
Work around this by explicitly testing for Etherboot as the underlying
PXE stack (as is already done in undinet.c) and skipping the call to
PXENV_UNLOAD_STACK if necessary.
Commit b149a99 ([build] Round up SUBx deltas) introduced a
signed/unsigned issue that affects gPXE images built on 32-bit hosts.
The zbin fixup utility performed an unsigned division, which led to
.usb images with an incorrect number of sectors to load.
The issue snuck by on 64-bit hosts since uint32_t is promoted to long.
On 32-bit hosts it is promoted to unsigned long.
Modified-by: Michael Brown <mcb30@etherboot.org>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
NetBSD kernels are multiboot ELF kernels with an entry point
incorrectly specified as a virtual address rather than a physical
address.
Work around this by looking for the segment that could plausibly
contain the entry point address (interpreted as either a physical or
virtual address), and using that to determine the eventual physical
entry point.
In the event of any ambiguity, precedence is given to interpretation
of the entry point as a physical address.
Solaris kernels are multiboot images with the "raw" flag set,
indicating that the loader should use the raw address fields within
the multiboot header rather than looking for an ELF header. However,
the Solaris kernel contains garbage data in the raw address fields,
and requires us to use the ELF header instead.
Work around this by always using the ELF header if present. This
renders the "raw" flag somewhat redundant.
You can now type e.g.
make bin/rtl8139.rom.sizes
in order to see the (uncompressed) sizes of all of the object files
linked in to bin/rtl8139.rom. This should make it easier to identify
relevant code bloat.
The build mechanism currently allows for multiple objects per source
file. The only remaining user of this is unnrv2b.S. Replace this
usage with a separate unnrv2b16.S wrapper file, as is currently used
for e.g. pxeprefix.S and kpxeprefix.S.
You can now type e.g.
make bin/rtl8139.rom.deps
to see a list of the source files included in the build of
bin/rtl8139.rom. This is intended to assist with copyright vetting.
Other new debugging targets include
make bin/rtl8139.rom.objs
to see a list of object files linked in to bin/rtl8139.rom, and
make bin/rtl8139.rom.nodeps
to see a list of the source files that are *not* required for the
build of bin/rtl8139.rom.
Some utilities that expect a floppy disk image (e.g. iLO?) may test
for a file of the correct size. Reinstate the .pdsk image format in
order to provide this if needed.
QEMU will silently round down a disk or ROM image file to the nearest
512 bytes. Fix by always padding .rom, .dsk and .hd images to the
nearest 512-byte boundary.
Originally-fixed-by: Stefan Hajnoczi <stefanha@gmail.com>
This replaces the gdbstub's polite NAK behavior with retransmission of
the current outstanding reply packet. It solves situations where gdb
and gPXE's gdbstub get out of sync due to the lack of flow control in
the gdb protocol spec.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
The zbin compressor fixup utility rounds down file sizes before
calculating their difference. This produces incorrect values and may
cause truncated gPXE images to be loaded at boot.
The following example explains the problem:
ilen = 48 bytes (uncompressed input file)
olen = 17 bytes (compressed output file)
divisor = 16 bytes (paragraph granularity)
fixmeup = 3 paragraphs (value to fix up)
olen / divisor - ilen / divisor
= 1 - 3
= -2 paragraphs (old delta calculation)
( align ( olen, divisor ) - align ( ilen, divisor ) ) / divisor
= 2 - 3
= -1 paragraphs (new delta calculation)
If we perform the SUBx operation with old delta:
fixmeup + -2 = 1 paragraph gets loaded by the prefix
With the new delta:
fixmeup + -1 = 2 paragraphs get loaded by the prefix
The old delta calculation removes the last paragraph; the prefix will
load a truncated copy of gPXE into memory. We need to load 2
paragraphs since olen is 17 bytes. Loading only 1 paragraph (16
bytes) would truncate the last byte.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Using "lret $2" to return from an interrupt causes interrupts to be
disabled in the calling program, since the INT instruction will have
disabled interrupts. Instead, patch CF on the stack and use iret to
return.
Interestingly, the original PC BIOS had this bug in at least one
place.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
The "seq" command is GNU-specific; a BSD userland will not have it.
Use POSIX-conforming "awk" instead.
Reported-by: Joshua Oreman <oremanj@rwcr.net>
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
On Mac OS X, it is necessary to build binutils manually; the system
does not provide bfd.h or the libbfd or libiberty libraries.
Originally-fixed-by: Joshua Oreman <oremanj@rwcr.net>
The Mac compiler treats "#pragma pack()" as gcc's "#pragma pack(pop)",
and so dies if the pragma pack stack is empty. Adding a "#pragma
pack(1)" immediately beforehand is enough to keep the Mac compiler
happy.
The combination of "#pragma pack(1)", "#pragma pack()" won't actually
achieve anything on a Mac, but it will at least build. (With gcc, the
"#pragma pack()" overrides any previous pragmas, so is still useful.)
Suggested-by: Joshua Oreman <oremanj@rwcr.net>
Some builds of the GNU assembler will treat a '/' character as a
comment delimiter. Adding "--divide" will cause it to be treated as a
division operator, as we expect. The "--divide" option is not
available in all gas versions, so apply it only conditionally.
Suggested-by: Joshua Oreman <oremanj@rwcr.net>
prep_segment() can sometimes fail because an image requests memory
that is already in use by gPXE. This will happen if
e.g. undionly.kpxe is used to boot memtest86; the memtest86 image is
an old-format kernel that needs to be loaded at 9000:0000, but this
area of memory may well already be in use by the underlying PXE stack.
Add a human-friendly error message, so that the cause is more
immediately visible.
This allows gPXE to load memtest86, which is packaged as an old kernel.
Split all code that directly touches the kernel headers out into
bzimage_parse_header() and bzimage_update_header(), to reduce code
size and offset the cost of supporting older kernels.
Total cost of this feature: 11 bytes (uncompressed).
bin/embedded.o has a build dependency on bin/.embedded.list, which
gets generated automatically by the Makefile. However, if the
EMBEDDED_IMAGE list is empty, bin/.embedded.list will never be
created, and so bin/embedded.o will be rebuilt every time due to a
missing dependency.
Fix by forcing bin/.embedded.list to be created even if the list is
empty.
The pcnet32 driver mismanages its RX buffers, with the result that
packets get corrupted if more than one packet arrives between calls to
poll().
Originally-fixed-by: Bill Lortz <Bill.Lortz@premier.org>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Tested-by: Stefan Hajnoczi <stefanha@gmail.com>
Also adds the MAC_ADDR_CORRECT flag, to indicate whether or not the
MAC address needs to be fixed up by the driver.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Thomas Miletich <thomas.miletich@gmail.com>
Modified-by: Michael Brown <mcb30@etherboot.org>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
This is a major rewrite of the legacy etherboot 3c90x driver using the
gPXE API for much improved performance over the legacy driver it
replaces.
This driver has been tested on 3c905, 3c905B, and 3c905C cards.
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Marty Connor <mdc@etherboot.org>
Tested-by: Marty Connor <mdc@etherboot.org>
Tested-by: Daniel Verkamp <daniel@drv.nu>
Signed-off-by: Marty Connor <mdc@etherboot.org>
Eliminate the potential for mismatches between table names and the
table entry data type by incorporating the data type into the
definition of the table, rather than specifying it explicitly in each
table accessor method.
Intel's C compiler (icc) chokes on the zero-length arrays that we
currently use as part of the mechanism for accessing linker table
entries. Abstract away the zero-length arrays, to make a port to icc
easier.
Introduce macros such as for_each_table_entry() to simplify the common
case of iterating over all entries in a linker table.
Represent table names as #defined string constants rather than
unquoted literals; this avoids visual confusion between table names
and C variable or type names, and also allows us to force a
compilation error in the event of incorrect table names.
Some firewall devices seem to regard SYN,PSH as an invalid flag
combination and reject the packet. Fix by setting PSH only if SYN is
not set.
Reported-by: DSE Incorporated <dseinc@gmail.com>
The parsing of the !PXE and PXENV+ structures share a fair bit of
code; merge the common code to save a few bytes.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Allow for settings blocks to be created on demand. This allows for
constructions such as
set defaults/filename http://bootserver/bootfile
set defaults/priority 0xff
dhcp net0
chain ${filename}
which will boot from the DHCP-provided filename, or from
"http://bootserver/bootfile" if the DHCP server does not provide a
filename.
(Note that "priority" gets interpreted as a signed integer, so setting
"defaults/priority" to 0xff will cause the "defaults" settings block
to have an effective priority of -1.)
Following the example of the Linux driver, we add a check and delay to
make sure that the NIC has finished resetting before the driver issues
any additional commands.
Signed-off-by: Marty Connor <mdc@etherboot.org>
Having a default script containing
#!gpxe
autoboot
can cause problems when entering commands to load and start a kernel
manually; the default script image will still be present when the
kernel is started and so will be treated as an initrd. It is possible
to work around this by typing "imgfree" before any other commands, but
this is counter-intuitive.
Fix by allowing the embedded image list to be empty (in which case we
just call autoboot()), and making this the default.
Reported by alkisg@gmail.com.
Search for the PXE entry points (via the !PXE or PXENV+ structures)
through all known combinations of search methods. Furthermore, if we
find a PXENV+ structure, attempt to use it to find the !PXE structure
if at all possible.
Avoid passing credentials in the iBFT that were available but not
required for login. This works around a problem in the Microsoft
iSCSI initiator, which will refuse to initiate sessions if the CHAP
password is fewer than 12 characters, even if the target ends up not
asking for CHAP authentication.
It is a programming error, not a runtime error, if we attempt to use
block ciphers with an incorrect blocksize, so use an assert() rather
than an error status return.
The various types of cryptographic algorithm are fundamentally
different, and it was probably a mistake to try to handle them via a
single common type.
pubkey_algorithm is a placeholder type for now.
The PXE 1.x spec specifies that on NBP entry or on return from INT
1Ah AX=5650h, EDX shall point to the physical address of the PXENV+
structure. The PXE 2.x spec drops this requirement, simply stating
that EDX is clobbered. Given the principle "be conservative in what
you send, liberal in what you accept", however, we should implement
this anyway.
Certain combinations of PXE stack and BIOS result in a broken INT 18
call, which will leave the system displaying a "PRESS ANY KEY TO
REBOOT" message instead of proceeding to the next boot device. On
these systems, returning via the PXE stack is the only way to continue
to the next boot device. Returning via the PXE stack works only if we
haven't already blown away the PXE base code in pxeprefix.S.
In most circumstances, we do want to blow away the PXE base code.
Base memory is a limited resource, and it is desirable to reclaim as
much as possible. When we perform an iSCSI boot, we need to place the
iBFT above the 512kB mark, because otherwise it may not be detected by
the loaded OS; this may not be possible if the PXE base code is still
occupying that memory.
Introduce a new prefix type .kkpxe which will preserve both the PXE
base code and the UNDI driver (as compared to .kpxe, which preserves
the UNDI driver but uninstalls the PXE base code). This prefix type
can be used on systems that are known to experience the specific
problem of INT 18 being broken, or in builds (such as gpxelinux.0) for
which it is particularly important to know that returning to the BIOS
will work.
Written by H. Peter Anvin <hpa@zytor.com> and Stefan Hajnoczi
<stefanha@gmail.com>, minor structural alterations by Michael Brown
<mcb30@etherboot.org>.
COMBOOT images use INTs to issue API calls; these end up making calls
into gPXE from real mode, and so temporarily change the real-mode
stack pointer. When our COMBOOT code uses a longjmp() to implement
the various "exit COMBOOT image" API calls, this leaves the real-mode
stack pointer stuck with its temporary value, which causes problems if
we eventually try to exit out of gPXE back to the BIOS.
Fix by adding rmsetjmp() and rmlongjmp() calls (analogous to
sigsetjmp()/siglongjmp()); these save and restore the additional state
needed for real-mode calls to function correctly.
Multi-level menus via COMBOOT rely on the COMBOOT program being able
to exit and invoke a new COMBOOT program (the next menu). This works,
but rapidly (within about five iterations) runs out of space in gPXE's
internal stack, since each new image is executed in a new function
context.
Fix by allowing tail recursion between images; an image can now
specify a replacement image for itself, and image_exec() will perform
the necessary tail recursion.
The version of the GNU assembler shipped with Fedora 10
(2.18.50.0.9-8.fc10) complains about character literals in some of our
assembly code. Changing $'x' to $( 'x' ) seems to fix the problem.
Yes, the whitespace is required; using just $('x') does not work.
Reported by Kevin O'Connor <kevin@koconnor.net>.
This patch extends the embedded image feature to allow multiple
embedded images instead of just one.
gPXE now always boots the first embedded image on startup instead of
doing the hardcoded DHCP boot (aka autoboot).
Based heavily upon a patch by Stefan Hajnoczi <stefanha@gmail.com>.
There are code paths other than PMM allocation that can result in our
changing the ROM checksum. For example, we attempt to update our
product string to incorporate the PCI bus:dev.fn number. In a system
that does not support PMM, we could therefore end up with an incorrect
checksum.
Fix by attempting to update the checksum unconditionally.
As reported by Stefan, commit 13d09e6 ("[i386] Simplify linker script
and standardise linker-defined symbol names") breaks gdb, readelf and
associated utilities.
This is caused by the .stack section overwriting a block in the middle
of the .debug_info section (despite being included in the
.bss.textdata section in the output file, which apparently has the
correct attributes for a .bss section).
Fixed by adding explicit flags and type to the stack section
declaration.
The documentation in xfer.h and xfer.c does not say that the metadata
parameter is optional in calls such as xfer_deliver_iob_meta() and the
deliver_iob() method. However, some code in net/ is prepared to
accept a NULL pointer, and xfer_deliver_as_iob() passes a NULL pointer
directly to the deliver_iob() method.
Fix this mess of conflicting assumptions by making everything assume
that the metadata parameter is mandatory, and fixing
xfer_deliver_as_iob() to pass in a dummy metadata structure (as is
already done in xfer_deliver_iob()).
If it happens that _textdata_memsz ends up being an exact multiple of
4kB, then this will cause the .textdata section (after relocation) to
start on a page boundary. This means that the hidden memory region
(which is rounded down to the nearest page boundary) will start
exactly at virtual address 0, i.e. UNULL. This means that
init_eheap() will erroneously assume that it has failed to allocate a
an external heap, since it typically ends up choosing the area that
lies immediately below .textdata, which in this case will be the
region with top==UNULL.
A subsequent error is that memtop_urealloc() passes through the error
return status -ENOMEM to the caller, which (rightly) assumes that the
result represents a valid userptr_t address.
Fixed by using alternative tests for heap non-existence, and by
returning UNULL in case of an error from init_eheap().
fetchf_uristring() was failing to handle error values from
fetch_setting(), resulting in its attempting to allocate extremely
large temporary buffers on the stack (and so overrunning the stack and
locking up the machine).
Problem reported by Shao Miller <Shao.Miller@yrdsb.edu.on.ca>.
This previously unsupported NIC variant was was found to work using
the current driver:
PCI_ROM(0x13f0, 0x0200, "ip100a", "IC+ IP100A"),
Signed-off-by: Marty Connor <mdc@etherboot.org>
The PXE spec dictates the rather ugly feature that we have to present
a DHCP-specified prompt string to the user, then wait to see if they
press F8 before displaying the menu.
This seems to me to be a significant retrograde step from the current
situation of displaying the menu with the timeout counting down
against the default selected boot option, but apparently the lack of
the "Press F8" prompt causes some confusion.
Various combinations of options 43.6, 43.7 and 43.8 dictate which
servers we send Boot Server Discovery requests to, and which servers
we should accept responses from. Obey these options, and remove the
explicit specification of a single Boot Server from start_pxebs() and
dependent functions.
There are many functions that take ownership of the I/O buffer they
are passed as a parameter. The caller should not retain a pointer to
the I/O buffer. Use iob_disown() to automatically nullify the
caller's pointer, e.g.:
xfer_deliver_iob ( xfer, iob_disown ( iobuf ) );
This will ensure that iobuf is set to NULL for any code after the call
to xfer_deliver_iob().
iob_disown() is currently used only in places where it simplifies the
code, by avoiding an extra line explicitly setting the I/O buffer
pointer to NULL. It should ideally be used with each call to any
function that takes ownership of an I/O buffer. (The SSA
optimisations will ensure that use of iob_disown() gets optimised away
in cases where the caller makes no further use of the I/O buffer
pointer anyway.)
If gcc ever introduces an __attribute__((free)), indicating that use
of a function argument after a function call should generate a
warning, then we should use this to identify all applicable function
call sites, and add iob_disown() as necessary.
A TFTP DATA packet with a block number of zero (representing a
negative offset within the file) could potentially cause problems.
Fixed by explicitly rejecting such packets.
Identified by Stefan Hajnoczi <stefanha@gmail.com>.
The DHCP client code now implements only the mechanism of the DHCP and
PXE Boot Server protocols. Boot Server Discovery can be initiated
manually using the "pxebs" command. The menuing code is separated out
into a user-level function on a par with boot_root_path(), and is
entered in preference to a normal filename boot if the DHCP vendor
class is "PXEClient" and the PXE boot menu option exists.
Automatically unregister any settings with the same name (and position
within the settings tree) as a newly registered settings block.
This functionality is generalised out from dhcp.c.
Some targets send a spurious CHECK CONDITION message in response to
the first SCSI command. We issue (and ignore the status of) an
arbitary harmless SCSI command (a READ CAPACITY (10)) in order to draw
out this response.
The Solaris Comstar target seems to send more than one spurious CHECK
CONDITION response. Attempt up to SCSI_MAX_DUMMY_READ_CAP dummy READ
CAPACITY (10) commands before assuming that error responses are
meaningful.
Problem reported by Kristof Van Doorsselaere <kvandoor@aserver.com>
and Shiva Shankar <802.11e@gmail.com>.
Try to qualify relative names in the DNS resolver using the DHCP Domain
Name. For example:
DHCP Domain Name: etherboot.org
(Relative) Name: www
yields:
www.etherboot.org
Only names with no dots ('.') will be modified. A name with one or more
dots is unchanged.
pxe_tftp.c assumes that the first seek on its data-transfer interface
represents the block size. Apart from being an ugly hack, this will
also screw up file size calculation for files smaller than one block.
The proper solution would be to extend the data-transfer interface to
support the reporting of stat()-like data. This is not going to
happen until the cost of adding interface methods is reduced (a fix I
have planned since June 2008).
In the meantime, abuse the xfer_window() method to return the block
size, since it is not being used for anything else and is vaguely
justifiable.
Astonishingly, having returned the incorrect TFTP blocksize via
PXENV_TFTP_OPEN for almost a year seems not to have affected any of
the test cases run during that time; this bug was found only when
someone tried running the heavily-patched version of pxegrub found in
OpenSolaris.
PXE dictates a mechanism for boot menuing, involving prompting the
user with a variable message, waiting for a predefined keypress,
displaying a boot menu, and waiting for a selection.
This breaks the currently desirable abstraction that DHCP is a process
that can happen in the background without any user interaction.
F8 is represented by the ANSI escape sequence "^[[19~", which is not
representable as a KEY_xxx constant using the current encoding scheme.
Adapt the encoding scheme to allow F8 to be represented, since PXE
requires that we may need to prompt the user to press F8.
Remove the lazy assumption that ProxyDHCP == "DHCP with option 60 set
to PXEClient", and explicitly separate the notion of ProxyDHCP from
the notion of packets containing PXE options.
It is possible to configure a DHCP server to hand out PXE options
without a ProxyDHCP server present. This requires setting option 60
to "PXEClient", which will cause gPXE to attempt ProxyDHCP.
We assume in several places that dhcp->proxydhcpack is set to the
DHCPACK packet containing option 60 set to "PXEClient". When we
transition into ProxyDHCPREQUEST, set dhcp->proxydhcpack=dhcp->dhcpack
so that this assumption holds true.
We ought to rename several references to "proxydhcp" to something more
accurate, such as "pxedhcp". Treating a single DHCP response as
potentially both DHCPOFFER and ProxyDHCPOFFER does make the code
smaller, but the variable names get confusing.
Pick out the first boot menu item from the boot menu (option 43.9) and
pass it to the boot server as the boot menu item (option 43.71).
Also improve DHCP debug messages to include more details of the
packets being transmitted.
Apparently this can cause a major speedup on some iSCSI targets, which
will otherwise wait for a timer to expire before responding. It
doesn't seem to hurt other simple TCP test cases (e.g. HTTP
downloads).
Problem and solution identified by Shiva Shankar <802.11e@gmail.com>
Some PXE configurations require us to perform a third DHCP transaction
(in addition to the real DHCP transaction and the ProxyDHCP
transaction) in order to retrieve information from a "Boot Server".
This is an experimental implementation, since the actual behaviour is
not well specified in the PXE spec.
When sending to a multicast address, it may be necessary to specify
the source address explicitly, since the multicast destination address
does not provide enough information to deduce the source address via
the miniroute table.
Allow the source address specified via the data-xfer metadata to be
passed down through the TCP/IP stack to the IPv4 layer, which can use
it as a default source address.
Move all the DHCP state transition logic into a single function
dhcp_next_state(). This will make it easier to add support for PXE
Boot Servers, since it abstracts away the difference between "mark
DHCP as complete" and "transition to boot server discovery".
The Linux PXE server (http://www.kano.org.uk/projects/pxe) does not
set the server identifier in its ProxyDHCP responses. If the server
ID is missing, do not treat this as an error.
This resolves the "vague and unsettling memory" mentioned in commit
fdb8481d ("[dhcp] Verify server identifier on ProxyDHCPACKs").
Note that we already accept ProxyDHCPOFFERs without a server
identifier; they get treated as potential BOOTP packets.
At some point, it seems that someone decided to change the GUID for
the EFI_NETWORK_INTERFACE_IDENTIFIER_PROTOCOL. Current EFI builds
ignore the older GUID, older EFI builds ignore the newer GUID, so we
have to expose both.
Include a minimal component name protocol so that the driver name
shows up as something other than "<UNKNOWN>" in the driver list, and a
device path protocol so that the network interface shows up as a
separate device in the device list, rather than being attached
directly to the PCI device.
Incidentally, the EFI component name protocol reaches new depths for
signal-to-noise ratio in program code. A typical instance within the
EFI development kit will use an additional 300 lines of code to
provide slightly less functionality than GNU gettext achieves with
three additional characters.
The UEFI specification does not mention ROM checksums, and reassigns
the field typically used as a checksum byte. The UEFI shell
"loadpcirom" utility does not verify ROM checksums, but it seems that
some UEFI BIOSes do.
Some devices take a very long time to initialise. This can make it
difficult to visually distinguish between the error cases of failing
to start executing C code and failing to initialise a device.
Add a "gPXE initialising devices..." message. The trailing ellipsis
indicates to the user that this may take some time, and the presence
of the message indicates to the developer that relocation etc. all
succeeded.
elf2efi converts a suitable ELF executable (containing relocation
information, and with appropriate virtual addresses) into an EFI
executable. It is less tightly coupled with the gPXE build process
and, in particular, does not require the use of a hand-crafted PE
image header in efiprefix.S.
elf2efi correctly handles .bss sections, which significantly reduces
the size of the gPXE EFI executable.
Conventional usage of the various struct sockaddr_xxx types involves
liberal use of casting, which tends to trigger strict-aliasing
warnings from gcc. Avoid these now and in future by marking all the
relevant types with __attribute__((may_alias)).
The check for unresolved symbols does not explicitly specify an output
architecture format, and so causes a warning when building an i386 EFI
binary on an x86_64 platform. This warning is harmless, and
specifying the output architecture in multiple places is cumbersome,
so just inhibit the warning.
At POST time some BIOSes return invalid e820 maps even though
they indicate that the data is valid. We add a check that the first
region returned by e820 is RAM type and declare the map to be invalid
if it is not.
This extends the sanity checks from 8b20e5d ("[pcbios] Sanity-check
the INT15,e820 and INT15,e801 memory maps").
Driver was storing the result of pci_bar_start() and pci_bar_size() in
an int, rather than an unsigned long.
(Bug was introduced in the vendor's tree in commit eac85cd "Port
etherfabric driver to net_device api".)
adjust_pci_device() has historically enabled bus-mastering and I/O
cycles, but has never previously needed to enable memory cycles. Some
EFI systems seem not to enable memory cycles by default, so add that
to the list of PCI command register bits that we force on.
When compiling for the Linux kernel, PCI_BASE_ADDRESS_0 == 0, and
PCI_BASE_ADDRESS_1 == 1. This is not so when compiling for gPXE. We
must use the symbolic names rather than integers to get the correct
values.
Bug identified and patch supplied by:
George Chou <george.chou@advantech.com>
Currently the only supported platform for x86_64 is EFI.
Building an EFI64 gPXE requires a version of gcc that supports
__attribute__((ms_abi)). This currently means a development build of
gcc; the feature should be present when gcc 4.4 is released.
In the meantime; you can grab a suitable gcc tree from
git://git.etherboot.org/scm/people/mcb30/gcc/.git
The patch file supplied for commit 3a799e9 ("[hermon] Add PCI ID for
ConnectX QDR card") accidentally marked drivers/infiniband/hermon.c as
being executable.
EFI provides a copy of the SMBIOS table accessible via the EFI system
table, which we should use instead of manually scanning through the
F000:0000 segment.
EFI passes in copies of SMBIOS and other system configuration tables
via the EFI system table. Allow configuration tables to be requested
using a mechanism similar to the current method for requesting EFI
protocols.
On non-BBS systems, we have to hook INT 19 in order to be able to boot
from the gPXE ROM at all. However, doing this unconditionally will
prevent the user from booting via any other devices.
Previously, the INT 19 entry point would prompt the user to press B in
order to boot from gPXE, which makes it impossible to perform an
unattended network boot. We now prompt the user to press N to skip
booting from gPXE, which allows for unattended operation.
This should be a better match for most real-world scenarios. Most
modern systems support BBS and so are unaffected by this change. Very
old (non-BBS) systems tend not to have PXE ROMs by default anyway; if
the user has added a gPXE ROM then they probably do want to boot from
the network. Newer non-BBS systems are essentially limited to IBM
servers, which will recapture the INT 19 vector anyway and implement
their own boot-ordering selection mechanism.
This driver is based on Stefan Hajnoczi's summer work, which
is in turn based on version 1.01 of the linux b44 driver.
I just assembled the pieces and fixed/added a few pieces
here and there to make it work for my hardware.
The most major limitation is that this driver won't work
on systems with >1GB RAM due to the card not having enough
address bits for that and gPXE not working around this
limitation.
Still, other than that the driver works well enough for
at least 2 users :) and the above limitation can always
be fixed when somebody wants it bad enough :)
Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com>
Remove the assortment of miscellaneous hacks to guess the "network
boot device", and replace them each with a call to last_opened_netdev().
It still isn't guaranteed correct, but it won't be any worse than
before, and it will at least be consistent.
There are currently four places within the codebase that use a
heuristic to guess the "boot network device", with varying degrees of
success. Add a feature to the net device core to maintain a list of
open network devices, in order of opening, and provide a function
last_opened_netdev() to retrieve the most recently opened net device.
This should do a better job than the current assortment of
guess_boot_netdev() functions.
The AoE spec does not specify that the source MAC address of a
received packet actually matches the MAC address of the AoE target.
In principle an AoE server can respond to an AoE request on any
interface available to it, which may not be an address configured to
accept AoE requests.
This issue is resolved by implementing AoE device discovery. The
purpose of AoE discovery is to find out which addresses an AoE target
can use for requests. An AoE configuration command is sent when the
AoE attach is attempted. The AoE target must respond to that
configuration query from an interface that can accept requests.
Based on a patch from Ryan Thomas <ryan@coraid.com>
EFI_STATUS is defined as an INTN, which maps to UINT32 (i.e. unsigned
int) on i386 and UINT64 (i.e. unsigned long) on x86_64. This would
require a cast each time the error status is printed.
Add efi_strerror() to avoid this ickiness and simultaneously enable
prettier reporting of EFI status codes.
This brings us in to line with Linux definitions, and also simplifies
adding x86_64 support since both platforms have 2-byte shorts, 4-byte
ints and 8-byte long longs.
Code paths that automatically allocate memory from the FBMS at 40:13
should also free it, if possible.
Freeing this memory will not be possible if either
1. The FBMS has been modified since our allocation, or
2. We have not been able to unhook one or more BIOS interrupt vectors.
_filesz was incorrectly forced to be aligned up to MAX_ALIGN. In a
non-compressed build, this would cause a build failure unless _filesz
happened to already be aligned to MAX_ALIGN.
The return path in directed route SMPs lists the egress ports in order
from SM to node, rather than from node to SM.
To write to the correct offset within the return path, we need to
parse the hop pointer. This is held within the class-specific data
portion of the MAD header, which was previously unused by us and
defined to be a uint16_t. Define this field to be a union type; this
requires some rearrangement of ib_mad.h and corresponding changes to
ipoib.c.
The only way that PMM allows us to request a block in a region with
A20=0 is to ask for a block with an alignment of 2MB. Due to the PMM
API design, the only way we can do this is to ask for a block with a
size of 2MB.
Unfortunately, some BIOSes will hit problems if we allocate a 2MB
block. In particular, it may not be possible to enter the BIOS setup
screen; the BIOS setup code attempts a PMM allocation, fails, and
hangs the machine.
We now try allocating only as much as we need via PMM. If the
allocated block has A20=1, we free the allocated block, double the
allocation size, and try again. Repeat until either we obtain a block
with A20=0 or allocation fails. (This is guaranteed to terminate by
the time we reach an allocation size of 2MB.)
These cards very nearly support our current IB Verbs model. There is
one minor difference: multicast packets will always be delivered by
the hardware to QP0, so the driver has to redirect them to the
appropriate QP. This means that QP owners may see receive completions
for buffers that they never posted. Nothing in our current codebase
will break because of this.
This can be used with cards that require the driver to construct and
parse packet headers manually. Headers are optionally handled
out-of-line from the packet payload, since some such cards will split
received headers into a separate ring buffer.
Some Infiniband cards will not be as accommodating as the Arbel and
Hermon cards in providing enough space for us to push a fake extra
header at the start of the received packet. We must therefore make do
with squeezing enough information to identify source and destination
addresses into the two bytes of padding within a genuine IPoIB
link-layer header.
Not all Infiniband cards have embedded subnet management agents.
Split out the code that communicates with such an embedded SMA into a
separate ib_smc.c file, and have drivers call ib_smc_update()
explicitly when they suspect that the answers given by the embedded
SMA may have changed.
Receive completion handlers now get passed an address vector
containing the information extracted from the packet headers
(including the GRH, if present), and only the payload remains in the
I/O buffer.
This breaks the symmetry between transmit and receive completions, so
remove the ib_completer_t type and use an ib_completion_queue_operations
structure instead.
Rename the "destination QPN" and "destination LID" fields in struct
ib_address_vector to reflect its new dual usage.
Since the ib_completion structure now contains only an IB status code,
("syndrome") replace it with a generic gPXE integer status code.
Avoid leaking I/O buffers in ib_destroy_qp() by completing any
outstanding work queue entries with a generic error code. This
requires the completion handlers to be available to ib_destroy_qp(),
which is done by making them static configuration parameters of the CQ
(set by ib_create_cq()) rather than being provided on each call to
ib_poll_cq().
This mimics the functionality of netdev_{tx,rx}_flush(). The netdev
flush functions would previously have been catching any I/O buffers
leaked by the IPoIB data queue (though not by the IPoIB metadata
queue).
Add the simplified ne2k_isa driver. It is just a selective copy+paste
of the relevant parts from ns8390.c plus a little trivial hacking to
make it actually work.
It is true that the code is pretty ugly, but:
a) ns8390.c is worse
b) It is only 372 lines and no #ifdefs
c) It works both in qemu/bochs and in real hardware
and we all know it is easier to cleanup working code
Hope someone will find the time to rewrite this driver properly,
but until then at least for me this is an ok solution.
Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com>
netdev_rx_err() and netdev_tx_complete_err() get passed the error
code, but currently use it only in debug messages.
Retain error numbers and frequencey counts for up to
NETDEV_MAX_UNIQUE_ERRORS (4) different errors for each of TX and RX.
This allows the "ifstat" command to report the reasons for TX/RX
errors in most cases, even in non-debug builds.
Halting the PEGs breaks platforms where there is sideband access to
the NIC (e.g. HP machines using iLO). (We have to retain the
unhalting code because on some other platforms (e.g. IBM blades with
BOFM) the pre-PXE firmware must halt the PEGs to avoid issues with the
BIOS rereading via the expansion ROM BAR.)
The retry timer needs to be running as soon as we know that we are
trying to transmit a command. If transmission fails because of a
temporary error condition, then the timer will allow us to retry the
transmission later.
This fixes a regression introduced in commit 612f4e7:
[settings] Avoid returning uninitialised data on error in fetch_xxx_setting()
in which the memset() was moved from fetch_string_setting() to
fetch_setting(), in order that it would be useful for non-string
setting types. However, this neglects to take into account the fact
that fetch_string_setting() shrinks its buffer by one byte (to allow
for the NUL) before calling fetch_setting().
Restore the memset() in fetch_string_setting(), so that the
terminating NUL is guaranteed to actually be a NUL.
With a 16-bit operand, lgdt/lidt will load only a 24-bit base address,
ignoring the high-order bits. This meant that we could fail to fully
restore the GDT across a call into gPXE, if the GDT happened to be
located above the 16MB mark.
Not all of our lgdt/lidt instructions require a data32 prefix (for
example, reloading the real-mode IDT can never require a 32-bit base
address), but by adding them everywhere we will hopefully not forget
the necessary ones in future.
This is something of an ugly hack to accommodate an OEM requirement.
The NIC has only one expansion ROM BAR, rather than one per port. To
allow individual ports to be selectively enabled/disabled for PXE boot
(as required), we must therefore leave the expansion ROM always
enabled, and place the per-port enable/disable logic within the gPXE
driver.
Some hardware vendors have been known to remove all gPXE-related
branding from ROMs that they build. While this is not prohibited by
the GPL, it is a little impolite.
Add a facility for adding branding messages via two #defines
(PRODUCT_NAME and PRODUCT_SHORT_NAME) in config/general.h. This
should accommodate all known OEM-mandated branding requirements.
Vendors with branding requirements that cannot be satisfied by using
PRODUCT_NAME and/or PRODUCT_SHORT_NAME should contact us so that we
can extended this facility as necessary.
The Phantom firmware selectively disables PCI functions based on the
board type, with the end result that we see one PCI function for each
network port. This allows us to eliminate the code for reading from
flash and, more importantly, removes knowledge of the board type magic
number from the gPXE driver.
This function is a major kludge, but can be made slightly more
accurate by ignoring net devices that aren't open. Eventually it
needs to be removed entirely.
Settings can be constructed using a dotted-decimal notation, to allow
for access to unnamed settings. The default interpretation is as a
DHCP option number (with encapsulated options represented as
"<encapsulating option>.<encapsulated option>".
In several contexts (e.g. SMBIOS, Phantom CLP), it is useful to
interpret the dotted-decimal notation as referring to non-DHCP
options. In this case, it becomes necessary for these contexts to
ignore standard DHCP options, otherwise we end up trying to, for
example, retrieve the boot filename from SMBIOS.
Allow settings blocks to specify a "tag magic". When dotted-decimal
notation is used to construct a setting, the tag magic value of the
originating settings block will be ORed in to the tag number.
Store/fetch methods can then check for the magic number before
interpreting arbitrarily-numbered settings.
This extends the sanity checks on the runtime segment address provided
in %bx, first implemented in commit 5600955.
We now allow the ROM to be placed anywhere above a000:0000 (rather
than c000:0000, as before), since this is the region allowed by the
PCI 3 spec. If the BIOS asks us to place the runtime image such that
it would overlap with the init-time image (which is explicitly
prohibited by the PCI 3 spec), then we assume that the BIOS is faulty
and ignore the provided runtime segment address.
Testing on a SuperMicro BIOS providing overlapping segment addresses
shows that ignoring the provided runtime segment address is safe to do
in these circumstances.
This interface provides access to firmware settings (e.g. MAC address)
that will apply to all drivers loaded for the duration of the current
system boot.
A hardware bug means that reads through the expansion ROM BAR can
return corrupted data if the PEGs are running. This breaks platforms
that re-read the expansion ROM after invoking gPXE code, such as IBM
blade servers.
Halt PEGs during driver shutdown, and unhalt PEGs during driver
startup if we detect that this is not the first startup since
power-on.
A DOS-style full path name such as "C:\Program Files\tftpboot\nbp.0"
satisfies the syntax requirements for a URI with a scheme of "C" and
an opaque portion of "\Program Files\tftpboot\nbp.0".
Add a check in parse_uri() to ignore schemes that are apparently only
a single character long; this avoids interpreting DOS-style paths in
this way, and shouldn't affect any practical URI scheme.
Most other Phantom drivers define a register space in terms of a 64M
virtual address space. While this doesn't map in any meaningful way
to the actual addresses used on the latest cards, it makes maintenance
easier if we do the same.
Someone at Dell must have a full-time job designing ways to screw up
implementations of INT 15,e820. This latest gem is courtesy of a Dell
Xanadu system, which arbitrarily decides to obliterate the contents of
%esi.
Preserve %esi, %edi and %ebp across calls to INT 15,e820, in case
someone tries a variation on this trick in future.
Callers (e.g. usr/autoboot.c) may not check the return values from
fetch_xxx_setting(), assuming that in error cases the returned setting
value will be "empty" (for some sensible value of "empty").
In particular, if the DHCP server did not specify a next-server
address, this would result in gPXE using uninitialised data for the
TFTP server IP address.
FreeBSD requires the object format to be specified as elf_i386_fbsd,
rather than elf_i386.
Based on a patch from Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Some PCI 3 BIOSes seem to provide a garbage value in %bx, which should
contain the runtime segment address. Perform a basic sanity check: we
reject the segment if it is below the start of option ROM space. If
the sanity check fails, we assume that the BIOS was not expecting us
to be a PCI 3 ROM, and we just leave our image in situ.
The section name seems to have significance for some versions of
binutils.
There is no way to instruct gcc that sections such as .bss16 contain
uninitialised data; it will emit them with contents explicitly set to
zero. We therefore have to rely on the linker script to force these
sections to become uninitialised-data sections. We do this by marking
them as NOLOAD; this seems to be the closest semantic equivalent in the
linker script language.
However, this gets ignored by some versions of ld (including 2.17 as
shipped with Debian Etch), which mark the resulting sections with
(CONTENTS,ALLOC,LOAD,DATA). Combined with the fact that this version of
ld seems to ignore the specified LMA for these sections, this means that
they end up overlapping other sections, and so parts of .prefix (for
example) get obliterated by .data16's bss section.
Rename the .bss sections from .section_bss to .bss.section; this seems to
cause these versions of ld to treat them as uninitialised data.
Not fully understood, but it seems that the LMA of bss sections matters
for some newer binutils builds. Force all bss sections to have an LMA
at the end of the file, so that they don't interfere with other
sections.
The symptom was that objcopy -O binary -j .zinfo would extract the
.zinfo section from bin/xxx.tmp as a blob of the correct length, but
with zero contents. This would then cause the [ZBIN] stage of the
build to fail.
Also explicitly state that .zinfo(.*) sections have @progbits, in case
some future assembler or linker variant decides to omit them.
The virtnet_transmit() logic for waiting the packet to be transmitted is
reversed: we can't wait the packet to be transmitted if we didn't kick()
the ring yet. The vring_more_used() while loop logic is reversed also,
that explains why the code works today.
The current code risks trying to free a buffer from the used ring
when none was available, that will happen most times because KVM
doesn't handle the packet immediately on kick(). Luckily it was working
because it was unlikely to have a buffer still queued for transmit when
virtnet_transmit() was called.
Also, adds a BUG_ON() to vring_get_buf(), to catch cases where we try
to free a buffer from the used ring when there was none available.
Patch for Etherboot. gPXE has the same problem on the code, but I hadn't
a chance to test gPXE using virtio-net yet.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
EFI requires us to be able to specify the source address for
individual transmitted packets, and to be able to extract the
destination address on received packets.
Take advantage of this to rationalise the push() and pull() methods so
that push() takes a (dest,source,proto) tuple and pull() returns a
(dest,source,proto) tuple.
Multicast hashing is an ugly overlap between network and link layers.
EFI requires us to provide access to this functionality, so move it
out of ipv4.c and expose it as a method of the link layer.
Some versions of ld choke on the "AT ( _xxx_lma )" in efi.lds with an
error saying "nonconstant expression for load base". Since these were
only explicitly setting the LMA to the address that it would have had
anyway, they can be safely omitted.
We have EFI APIs for CPU I/O, PCI I/O, timers, console I/O, user
access and user memory allocation.
EFI executables are created using the vanilla GNU toolchain, with the
EXE header handcrafted in assembly and relocations generated by a
custom efilink utility.
monojob_wait() was holding a reference to the completed job, meaning that
various objects would not be freed until the next job was plugged in to
the monojob interface.
The userptr_t is now the fundamental type that gets used for conversions.
For example, virt_to_phys() is implemented in terms of virt_to_user() and
user_to_phys().
-Wformat-nonliteral is not enabled by -Wall and needs to be explicitly
specified.
Modified the few files that use nonliteral format strings to work with
this new setting in place.
Inspired by a patch from Carl Karsten <carl@personnelware.com> and an
identical patch from Rorschach <r0rschach@lavabit.com>.
The intention is to include near-verbatim copies of the EFI headers
required by gPXE. This is achieved using the import.pl script in
src/include/gpxe/efi.
Note that import.pl will modify any #include lines in each imported
header to reflect its new location within the gPXE tree. It will also
tidy up the file by removing carriage return characters and trailing
whitespace.
Reduce the number of sections within the linker script to match the
number of practical sections within the output file.
Define _section, _msection, _esection, _section_filesz, _section_memsz,
and _section_lma for each section, replacing the mixture of symbols that
previously existed.
In particular, replace _text and _end with _textdata and _etextdata, to
make it explicit within code that uses these symbols that the .text and
.data sections are always treated as a single contiguous block.
Allow for the build CPU architecture and platform to be specified as part
of the make command goals. For example:
make bin/rtl8139.rom # Standard i386 PC-BIOS build
make bin-efi/rtl8139.efi # i386 EFI build
The generic syntax is "bin[-[arch-]platform]", with the default
architecture being "i386" (regardless of the host architecture) and the
default platform being "pcbios".
Non-path targets such as "srcs" can be specified using e.g.
make bin-efi srcs
Note that this changeset is merely Makefile restructuring to allow the
build architecture and platform to be determined by the make command
goals, and to export these to compiled code via the ARCH and PLATFORM
defines. It doesn't actually introduce any new build platforms.
Some devices (e.g. the Atmel AT24C11) have no concept of a device
address; they respond to every device address and use this value as
the word address. Some other devices use part of the device address
field to extend the word address field.
Generalise the i2c bit-bashing support to handle this by defining the
device address length and word address length as properties of an i2c
device. The word address is assumed to overflow into the device
address field if the address used exceeds the width of the word
address field.
Also add a bus reset mechanism. i2c chips don't usually have a reset
line, so rebooting the host will not clear any bizarre state that the
chip may be in. We reset the bus by clocking SCL until we see SDA
high, at which point we know we can generate a start condition and
have it seen by all devices. We then generate a stop condition to
leave the bus in a known state prior to use.
Finally, add some extra debugging messages to i2c_bit.c.
Some older versions of gcc don't complain about unknown compiler flags
unless you ask them to actually compile; asking them to merely
preprocess won't trigger the error.
Fix the -fno-stack-protector test by making it attempt to compile an
empty file, rather than preprocess an empty file.
The usefulness of DBGLVL_IO is limited by the fact that many cards
require large numbers of uninteresting I/O reads/writes at device
probe time, typically when driving a bit-bashing I2C/SPI bus to read
the MAC address.
This patch adds the DBG_DISABLE() and DBG_ENABLE() macros, which can
be used to temporarily disable and re-enable selected debug levels.
Note that debug levels must still be enabled in the build in order to
function at all: you can't use DBG_ENABLE(DBGLVL_IO) in an object
built with DEBUG=object:1 and expect it to do anything.
?= in a Makefile means that that variable can be overridden by the
environment. This is confusing to users, especially with a generic
name like "ARCH".
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Although the E820 API allows for a caller to provide only a 20-byte
buffer, there exists at least one combination (HP BIOS, 32-bit WinPE)
that relies on information found only in the "extended attributes"
field, which requires a 24-byte buffer.
Allow for up to a 64-byte E820 buffer, in the hope of coping with
future idiocies like this one.
The ACPI specification defines an additional 4-byte field at offset 20
for an E820 memory map entry. This field is presumably optional,
since generally E820 gets given only a 20-byte buffer to fill.
However, the bits of this optional field are defined as:
bit 0 : region is enabled
bit 1 : region is non-volatile memory rather than RAM
so it seems as though callers that pass in only a 20-byte buffer may
be missing out on some rather important information.
Our INT 15,e820 code was setting %es=%ss (as part of the "look ahead
in the memory map" logic), but failing to restore %es afterwards.
This is a serious bug, but wasn't affecting many platforms because
almost all callers seem to set %es=%ss anyway.
Use individual page mappings rather than a single whole-region
mapping, to avoid the waste of memory that occurs due to the
constraint that each mapped block must be aligned on its own size.
Some BIOSes require us to pass in not only the continuation value (in
%ebx) as returned by the previous call to INT 15,e820 but also the
unmodified buffer (at %es:%di) as returned by the previous call to INT
15,e820. Apparently, someone thought it would be a worthwhile
optimisation to fill in only the low dword of the "length" field and
the low byte of the "type field", assuming that the buffer would
remain unaltered from the previous call.
This problem was being triggered by the "peek ahead" logic in
get_mangled_e820(), which would read the next entry into a temporary
buffer in order to be able to guarantee terminating the map with
%ebx=0 rather than CF=1. (Terminating with CF=1 upsets some Windows
flavours, despite being documented legal behaviour.)
Work around this problem by always fetching directly into our e820
cache; that way we can guarantee that the underlying call always sees
the previous buffer contents (and the same buffer address).
We were accidentally allocating only half the required amount of
memory (given the alignment method) for the firmware buffer, leading
to conflicts between the firmware buffer and gPXE code/data segments.
We were accidentally allocating only half the required amount of
memory (given the alignment method) for the firmware buffer, leading
to conflicts between the firmware buffer and gPXE code/data segments.
We seem to be having issues with various E820 memory maps. These
problems are often difficult to reproduce, requiring access to the
specific system exhibiting the problem.
Add a facility for hooking in a fake E820 map generator, using an
arbitrary map defined in a C array, solely in order to be able to test
the map-mangling code against arbitrary E820 maps.
In particular, allow BANNER_TIMEOUT=0 to inhibit the prompt banners
altogether.
Ironically, this request comes from the same OEM that originally
required the prompts to be present during POST.
Some really moronic BIOSes bring up the PXE stack via the UNDI loader
entry point during POST, and then don't bother to unload it before
overwriting the code and data segments. If this happens, we really
don't want to leave INT 15 hooked, because that will cause any loaded
OS to die horribly as soon as it attempts to fetch the system memory
map.
We use a heuristic to detect whether or not we are being loaded at the
top of free base memory. If we determine that we are being loaded at
some other arbitrary location in base memory, then we assume that it's
not safe to hook INT 15.
This allows settings to be expanded in a way that is safe to include
within a URI string, such as
kernel http://10.0.0.1/boot.php?mf=${manufacturer:uristring}
where the ${manufacturer} setting may contain characters that are not
permitted (or have reserved purposes) within a URI.
Since whitespace characters will be URI-encoded (e.g. "%20" for a
space character), this also works around the problem that spaces
within an expanded setting would cause the shell to split command-line
arguments incorrectly.
On non-BBS systems we hook INT 19, since there is no other way we can
guarantee gaining control of the flow of execution. If we end up
doing this, prompt the user before attempting boot, since forcibly
capturing INT 19 is rather antisocial.
It is possible for the BIOS to use the UNDI API to bring up the NIC
prior to system boot. If this happens, UNM_NIC_REG_CMDPEG_STATE will
contain the value 0xf00f (UNM_NIC_REG_CMDPEG_STATE_INITIALIZE_ACK),
and we should skip initialising the command PEG.
The firmware will now determine the right port mode on all cards, so
the PXE driver doesn't have to set it. (Setting the port mode
apparently breaks some newer cards.)
At least one Dell system calls the UNDI loader entry point with the
BIOS console disabled. The serial console is active only after a call
to initialise(), so move the debug message in undi_loader() so that it
can be displayed via the serial console.
If the INT 15,e820 memory map reports a region [0,0), this confuses
the "truncate to even megabytes" logic, which ends up rounding the
region 'down' to [0,fff00000).
Fix by ensuring that the region's end address is at least 1, before we
subtract 1 to obtain the "last byte in region" address.
INT 15,e801 is capable of returning a memory range that extends to
4GB, so allow for this in the debug message that shows the data
returned by INT 15,e801.
The domain etherboot.org was actually registered on 2000-01-09, not
2000-09-01. (To put it another way, it was registered on 1/9/2000 (US
date format) rather than 1/9/2000 (sensible date format); this may
illuminate the cause of the error.)
"iqn.2000-09.org.etherboot:" is still valid as per RFC3720, but may be
surprising to users, so change it to something less unexpected.
Thanks to the anonymous contributor for pointing this one out.
Apparently some BIOSes will place option ROMs on 512-byte boundaries.
While this is against specification, it doesn't actually hurt
anything, so we may as well increase our scan granularity to 512
bytes.
Contributed by Luca <lucarx76@gmail.com>
Wyse Streaming Manager server (WLDRM13.BIN) assumes that the PXENV+
entry point is at UNDI_CS:0000; apparently, somebody at Wyse has
difficulty distinguishing between the words "may" and "must"...
Add a dummy entry point at UNDI_CS:0000, which just jumps to the
correct entry point.
The multiboot specification states that, for raw images, if
load_end_addr is zero then it should be interpreted as meaning "use
the entire file", and if bss_end_addr is zero it should be interpreted
as meaning "no bss".
Must check that argument to a fclose() is not NULL -- we can get to the
'err' label when file was not opened. fclose(NULL) is known to produce
core dump on some platforms and we don't want zbin to fail so loudly.
Signed-off-by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Explicitly state that we are using 32-bit addressing in 16-bit code.
GNU as 2.15 (FreeBSD/amd64 7-STABLE) got confused that 32-bit registers
are used in the code that was declared as 16-bit. Add explicit modifier
'addr32' to make assembler happy.
Signed-off-by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
IBM's iSCSI Firmware Initiator checks the UNDIROMID pointer in the
!PXE structure that gets created by the UNDI loader. We didn't
previously fill this value in.
Commit f58cc3f introduced a temporary workaround for a bug in current
prototype silicon, but failed to apply it to all eight PCI functions
within the device.
Option::ROM was assuming that ROM images using a short jump
instruction for the init entry point would have a zero byte at offset
5; this is not necessarily true.
Include PMM allocation result in POST banner.
Include full product string in "starting execution" message.
Also mark ourselves as supporting DDIM in PnP header, for
completeness.
On a system that doesn't support BBS, we end up hooking INT19 to gain
control of the boot process. If the system is PCI3.0, we must take
care to use the runtime value for %cs, rather than the POST-time
value, otherwise we end up pointing INT19 to the temporary option ROM
POST scratch area.
Determine the network-layer packet type and fill it in for UNDI
clients. This is required by some NBPs such as emBoot's winBoot/i.
This change requires refactoring the link-layer portions of the
gPXE netdevice API, so that it becomes possible to strip the
link-layer header without passing the packet up the network stack.
Some dumb NBPs (e.g. emBoot's winBoot/i) never call PXENV_UNDI_ISR
with FuncFlag=PXENV_UNDI_ISR_START; they just sit in a tight polling
loop merrily violating the PXE spec with repeated calls to
PXENV_UNDI_ISR_IN_PROCESS. Force a extra calls to netdev_poll() to
cope with these out-of-spec clients.
Allow for an arbitrary number of splits of the system memory map via
INT 15,e820.
Features of the new map-mangling algorithm include:
Supports random access to e820 map entries.
Requires only sequential access support from the underlying e820
map, even if our caller uses random access.
Empty regions will always be stripped.
Always terminates with %ebx=0, even if the underlying map terminates
with CF=1.
Allows for an arbitrary number of hidden regions, with underlying
regions split into as many subregions as necessary.
Total size increase to achieve this is 193 bytes.
Define a list of N allowed memory regions, and split each underlying
e820 region into up to N subregions. Strip resulting empty regions
out of the map, avoiding using the "return with CF set to strip last
empty region" trick, because it seems that bootmgr.exe in Win2k8 gets
upset if the memory map is terminated with CF set.
This is an intermediate checkin that defines a single allowed memory
region covering the entire 64-bit address space, and uses the existing
map-mangling code on top of the new region-splitting code. This
sanitises the memory map to the point that Win2k8 is able to boot even
on a system that defines a final zero-length region at the 4GB mark.
I'm checking this in because it may be useful for future debugging
efforts to be able to run with the existing and known-working map
mangling code together with the map sanitisation capabilities of the
new map mangling code.
Add support for manipulating the jump instruction that forms the
option ROM initialisation entry point, so that mergerom.pl can treat
it just like other entry points.
Add support for merging the initialisation entry point (and IBM BOFM
table) to mergerom.pl; this is another slightly icky but unfortunately
necessary GPL vs. NDA workaround. When mergerom.pl replaces an entry
point in the original ROM, it now fills in the corresponding entry
point in the merged ROM with the original value; this allows (for
example) a merged initialisation entry point to do some processing and
then jump back to the original entry point.
fetch_string_setting() was subtracting one from the length of the
to-be-NUL-terminated buffer in order to obtain the length of the
unterminated buffer to be passed to fetch_setting(). This works
extremely well unless the length of the to-be-NUL-terminated buffer is
zero, at which point we end up giving fetch_setting() a buffer of
length -1UL, thereby inviting it to overwrite as much memory as it
wants...
The ProxyDHCPREQUEST is a unicast packet, so the first request will
almost always be lost due to not having the IP address in the ARP
cache. If the minimum retry time is set to one second (as per commit
ff2b6a5), then ProxyDHCP will time out and give up before managing to
successfully transmit a request.
The DHCP timers need to be reworked anyway, so this mild hack is
acceptable for now.
New min_timeout and max_timeout fields in struct retry_timer allow
users of this timer to set their own desired minimum and maximum
timeouts, without being constrained to a single global minimum and
maximum. Users of the timer can still elect to use the default global
values by leaving the min_timeout and max_timeout fields as 0.
WinPE seems to have a bug that causes it to always use the TFTP server
IP address and filename from the ProxyDHCPACK packet, even if the
ProxyDHCPACK packet doesn't exist. This causes it to end up
attempting to fetch a file such as
tftp://0.0.0.0/bootmgr.exe
If we don't have a ProxyDHCPACK to use, we pretend that it was a copy
of the DHCPACK packet. This works around the problem, and hopefully
won't surprise any NBPs.
Altiris erroneously cares about the ordering of DHCP options, and will
get confused if we don't construct them in the order it expects.
This is observed (so far) only when attempting to deploy 64-bit Win2k3.
This patch adds support for the virtio-net adapter provided by KVM.
Written by Laurent Vivier <Laurent.Vivier@bull.net> for Etherboot.
Wrapped as legacy driver for gPXE by Stefan Hajnoczi
<stefanha@gmail.com>.
When we boot from a DHCP-supplied filename, we previously relied on
the fact that the current working URI is set to tftp://[next-server]/
in order to resolve the filename into a full tftp:// URI. However,
this process will eliminate the distinction between filenames with and
without initial slashes:
cwuri="tftp://10.0.0.1/" filename="vmlinuz" => URI="tftp://10.0.0.1/vmlinuz"
cwuri="tftp://10.0.0.1/" filename="/vmlinuz" => URI="tftp://10.0.0.1/vmlinuz"
This distinction is important for some TFTP servers. We now
explicitly construct a string of the form
"tftp://[next-server]/filename"
so that a filename with an initial slash will result in a URI
containing a double-slash, e.g.
"tftp://10.0.0.1//vmlinuz"
The TFTP code always strips a single initial slash, and so ends up
presenting the correct path to the server.
URIs entered explicitly by users at the command line must include a
double slash if they want an initial slash presented to the TFTP
server:
"kernel tftp://10.0.0.1/vmlinuz" => filename="vmlinuz"
"kernel tftp://10.0.0.1//vmlinuz" => filename="/vmlinuz"
This utility is required as a workaround for legal restrictions on
including GPL and non-GPL code within the same expansion ROM image.
While this is not encouraged, we are prepared to accept that
concatenation of ROM images and updating of the ROM header data
structures can be classed as "mere aggregation" within the terms of
the GPL.
If in any doubt, assume that you cannot include GPL and non-GPL code
within the same expansion ROM image. Contact the Etherboot team for
clarification on your specific circumstances.
When an error reply (not 1xx, 2xx or 3xx) was received, ftp_reply()
invoked ftp_done() to close connections, but did not return, and the
rest of code in this function could try to send commands to the closed
control connection.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Based on a patch contributed by Sergey Vlasov <vsu@altlinux.ru> :
In my testing with "qemu -net user" the 226 response to RETR was
often received earlier than final packets of the data connection;
this caused the received file to become truncated without any error
indication. Fix this by adding an intermediate state FTP_TRANSFER
between FTP_RETR and FTP_QUIT, so that the transfer is considered to
be complete only when both the end of data connection is encountered
and the final reply to the RETR command is received.
H. Peter Anvin <hpa@zytor.com> sent word that Sergey Vlasov
<vsu@altlinux.ru> discovered gPXE lkrn images fail to load in SYSLINUX
3.70 because we have initrd_addr_max zeroed. This patch sets the same
value as the Linux kernel.
Also change the header jmp instruction to use a hardcoded opcode value
like Linux does. Just in case the assembler decides to use a three-byte
instruction instead of the desired two-byte jmp.
Allow settings to be expanded in arbitrary commands, such as
kernel http://10.0.0.1/boot.php?uuid=${uuid}
Also add the "echo" command, as being the easiest way to test this
features.
Print one dot per second while waiting in monojob.c (e.g. for DHCP,
for file downloads, etc.), to inform user that the system has not
locked up.
Patch contributed by Andrew Schran <aschran@google.com>, minor
modification by me.
This change allows the time for which shell banners are displayed to
be configured in the config.h file. The ability to access the shell
can also be effectively disabled by setting this timeout to zero.
In tg3_chip_reset(), the PCI_EXPRESS change is taken from the Linux
tg3 driver. I am not sure what exactly it does (it is not documented
in the Linux driver), but it is necessary for the NIC to work
correctly.
Use "-include" rather than "include" for the generated Makefile
fragments, in order to suppress the long list of warnings that
otherwise appears at the start of a clean build.
Contributed by Edward Waugh <ewaugh@netxen.com>
Add yet another ugly hack to iscsiboot.c, this time to allow the user to
inhibit the shutdown/removal of the iSCSI INT13 device (and the network
devices, since they are required for the iSCSI device to function).
On the plus side, the fact that shutdown() now takes flags to
differentiate between shutdown-for-exit and shutdown-for-boot means that
another ugly hack (to allow returning via the PXE stack on BIOSes that
have broken INT 18 calls) will be easier.
I feel dirty.
Conjecture: The hardware issues 64-bit DMA writes of status descriptors,
which some PCI bridges seem to split into two 32-bit writes in reverse
order (i.e. dword 1 first). This means that we sometimes observe a
partial status descriptor. Add an explicit check to ensure that the
descriptor is complete before processing it.
Also ensure that the RDS consumer counter is incremented only when we
know that we have actually consumed an RX descriptor.
Shifting all INT13 drive numbers causes problems on systems that use a
sparse drive number space (e.g. qemu BIOS, which uses 0xe0 for the CD-ROM
drive).
The strategy now is:
Each drive is assigned a "natural" drive number, being the next
available drive number in the system (based on the BIOS drive count).
Each drive is accessed using its specified drive number. If the
specified drive number is -1, the natural drive number will be used.
Accesses to the specified drive number will be delivered to the
emulated drive, masking out any preexisting drive using this number.
Accesses to the natural drive number, if different, will be remapped to
the masked-out drive.
The overall upshot is that, for examples:
System has no drives. Emulated INT13 drive gets natural number 0x80
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive, and there is no remapping.
System has one drive. Emulated INT13 drive gets natural number 0x81
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive. Accesses to drive 0x81 get remapped to the original drive 0x80.
Verifying server ID and DHCP transaction ID is insufficient to
differentiate between DHCPACK and ProxyDHCPACK when the DHCP server and
Proxy DHCP server are the same machine.
If there is more than one loaded image, refuse to automatically select
the image to execute. There are at least two possible cases, with
different "correct" answers:
1. User loads image A by mistake, then loads image B and types "boot".
User wants to execute image B.
2. User loads image A, then loads image B (which patches image A), then
types "boot". User wants to execute image A.
If a user actually wants to load multiple images, they must explicitly
specify which image is to be executed.
Clearing the LOADED flag actually prevents users from doing clever things
such as loading an image, then loading a patch image, then executing the
first image. (image_exec() checks for IMAGE_LOADED, so this sequence of
operations will fail if the LOADED flag gets cleared.)
This reverts commit 14c080020f.
Loading an image may overwrite part or all of any previously-loaded
images, so we should clear the LOADED flag for all images prior to
attempting to load a new image.
We can just treat all non-kernel images as initrds, which matches our
behaviour for multiboot kernels. This allows us to eliminate initrd as
an image type, and treat the "initrd" command as just another synonym for
"imgfetch".
dhcppkt_store() is supposed to clear the setting if passed NULL for the
setting data. In the case of fixed-location fields (e.g. client IP
address), this requires setting the content of the field to all-zeros.
__from_data16 and __from_text16 now take a pointer to a
.data16/.text16 variable, and return the real-mode offset within the
appropriate segment. This matches the use case for every occurrence
of these macros, and prevents potential future bugs such as that fixed
in commit d51d80f. (The bug arose essentially because "&pointer" is
still syntactically valid.)
__from_data16 takes the value pointed to, rather than the pointer
itself. This was silently causing gPXE to return a dud buffer pointer
when the caller did not supply a buffer for PXENV_GET_CACHED_INFO.
Perform the same test for a matching DHCP_SERVER_IDENTIFIER on
ProxyDHCPACKs as we do for DHCPACKs. Otherwise, a retransmitted
DHCPACK can end up being treated as the ProxyDHCPACK.
I have a vague and unsettling memory that this test was deliberately
omitted, but I can't remember why, and can't find anything in the VC
logs.
ns8390.c can produce four different drivers (one PCI, three ISA.) The
ISA driver requires setting a few macros; do that by setting defines
in stub files instead of using src/Config.
Currently, all the ISA drivers are broken (they were not enabled by
default), so #if 0 them out.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When the 16-bit segment registers are accessed using 32-bit instructions
the high order bytes are undefined on older CPUs. We now explicitly
zero the high order bytes when snapshotting the CPU state. This ensures
that the GDB stub reports consistent values for the segment registers.
This commit implements GDB over UDP. Using UDP is more complex than
serial and has required some restructuring.
The GDB stub is now built using one or both of GDBSERIAL and GDBUDP
config.h options.
To enter the debugger, execute the gPXE shell command:
gdbstub <transport> [<options>...]
Where <transport> is "serial" or "udp". For "udp", the name of a
configured network device is required:
gdbstub udp net0
The GDB stub listens on UDP port 43770 by default.
Commit fd0aef9 introduced a typo that caused PMM detection to start at
paragraph 0xe00 rather than 0xe000. (Detection would still work, since it
would scan until it ran out of base memory, but it would end up scanning
an unnecessarily large portion of base memory.)
Spotted by Sebastian Herbszt <herbszt@gmx.de>.
Send a null command, specifically "pulse outputs" with no outputs
selected, to the KBC after changing A20. This was apparently done by DOS,
presumably as a synchronization hack, and the authors of the UHCI spec
thought it was inherent. Therefore, there are systems out there (e.g. HP
DL360 G5) which will stop responsing to "legacy USB" unless they see the
null command, 0xFF, written to port 0x64 at the end of the A20 toggling
sequence.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Some drivers that still use the legacy-driver wrapper (tg3 in particular)
apparently do not specify their alignment constraints properly. This
hack forces any __shared data to be maximally aligned.
Note that this provides only 16-byte alignment; it is not possible to
request alignment to any greater than 16 bytes using
__attribute__((aligned)), since the relocation code will preserve only 16
byte alignment (and operation under -DKEEP_IT_REAL cannot preserve more
that 16 byte alignment).
Idea proposed by Tim Hockin <thockin@google.com>
When the BIOS doesn't support BBS, hooking INT 19 is the only way to add
ourselves as a boot device. If we have to do this, we should at least
try to chain to the original INT 19 vector if our boot fails.
Idea suggested by Andrew Schran <aschran@google.com>
2.6.22+ kernels have an extra field in the bzimage_header structure to
indicate the maximum permitted command-line length. Use this if it is
available.
A bug in read_smbios_string() was causing the starting offset of the
SMBIOS structure to be added twice, resulting in completely the wrong
strings being returned.
Bug identified by Martin Herweg <m.herweg@gmx.de>
Add the definition of SLAM_MAX_BLOCKS_PER_NACK, which is roughly
equivalent to a TCP window size; it represents the maximum number of
packets that will be requested in a single NACK.
Note that, to keep the code size down, we still limit ourselves to
requesting only a single range per NACK; if the missing-block list is
discontiguous then we may request fewer than SLAM_MAX_BLOCKS_PER_NACK
blocks.
Avoid calling cpu_nap() until after we have determined that there is
no input ready to read. This avoids delaying for one timer interrupt
(~50ms) in the case of
if ( iskey() )
char = getkey()
which happens to be present in monojob.c, which is where we spend most
of our time looping (e.g. during any download).
This should eliminate the irritating tendency of gPXE to lose
keypresses.
Discovered on a Dell system where the serial port seems to send in a
constant stream of 0xff characters; this wouldn't be a problem in
itself except that each one triggers the 50ms delay (as mentioned
above), which really kills performance.
On any fast network, or with any driver that may drop packets
(e.g. Infiniband, which has very small RX rings), the traditional
usage of the SLAM protocol will result in enormous numbers of packet
drops and a consequent large number of retransmissions.
By adapting the client behaviour, we can force the server to act more
like a multicast TFTP server, with flow control provided by a single
master client.
This behaviour should interoperate with any traditional SLAM client
(e.g. Etherboot 5.4) on the network. The SLAM protocol isn't actually
documented anywhere, so it's hard to define either behaviour as
compliant or otherwise.
A missing test for dhcp->dhcpoffer in dhcp_timer_expired() was causing
the client to transition to DHCPREQUEST after timing out on waiting
for ProxyDHCP even if no DHCPOFFERs had been received.
In a SLAM NACK packet, if we run out of space to represent the
missing-block list, then indicate all remaining blocks as missing.
This avoids the need to wait for the one-second timeout before
receiving the blocks that otherwise wouldn't have been requested due
to running out of space.
Shorter NACK packets take less time to construct and spew out less
debug output, and there's a limit to how useful it is to send a
complete missing-block list anyway; if the loss rate is high then
we're going to have to retransmit an updated missing-block list
anyway.
Also add pretty debugging output to show the list of requested blocks.
We never set up specific multicast filters; native drivers will ask
the card to receive all multicast packets. The only way to achieve
this via the UNDI API is to enable promiscuous mode.
UDP sockets can be used for multicast, at which point it becomes
plausible that we could receive packets that aren't destined for us
but that still match on a port number.
Delete ELF as a generic image type. The method for invoking an
ELF-based image (as well as any tables that must be set up to allow it
to boot) will always depend on the specific architecture. core/elf.c
now only provides the elf_load() function, to avoid duplicating
functionality between ELF-based image types.
Add arch/i386/image/elfboot.c, to handle the generic case of 32-bit
x86 ELF images. We don't currently set up any multiboot tables, ELF
notes, etc. This seems to be sufficient for loading kernels generated
using both wraplinux and coreboot's mkelfImage.
Note that while Etherboot 5.4 allowed ELF images to return, we don't.
There is no callback mechanism for the loaded image to shut down gPXE,
which means that we have to shut down before invoking the image. This
means that we lose device state, protection against being trampled on,
etc. It is not safe to continue afterwards.
Revert "Use .SECONDARY instead of .PRECIOUS for bin/%.tmp targets."
This reverts commit de29e5a39c.
.SECONDARY doesn't seem to work properly with the target patterns of
implicit rules. In particular, a "make clean ; make bin/rtl8139.dsk"
will correctly leave the bin/rtl8139.dsk.tmp file present when .PRECIOUS
is used, but not when .SECONDARY is used.
This is slightly irritating since we don't want the
"do-not-delete-if-interrupted" semantics of .PRECIOUS, but it seems to be
the best compromise.
During development it is often handy to change the config.h options from
their defaults, for example to enable debugging features.
To prevent accidental commits of debugging config.h changes, mdc
suggested having a config-local.h that is excluded from source control.
This file acts as a temporary config.h and can override any of the
defaults.
This commit is an attempt to implement the config-local.h feature.
The config.h file now has the following as its last line:
/* @TRYSOURCE config-local.h */
The @TRYSOURCE directive causes config-local.h to be included at that
point in the file. If config-local.h does not exist, no error will be
printed and parsing will continue as normal. Therefore, mkconfig.pl is
"trying" to "source" config-local.h.
The GDBSYM config.h option was an attempt at QEMU GDB debugging. I have
removed the code since it is unused and may confuse people wanting to
use the GDB stub.
Maintain state for the advertised window length, and only ever increase
it (instead of calculating it afresh on each transmit). This avoids
triggering "treason uncloaked" messages on Linux peers.
Respond to zero-length TCP keepalives (i.e. empty data packets
transmitted outside the window). Even if the peer wouldn't otherwise
expect an ACK (because its packet consumed no sequence space), force an
ACK if it was outside the window.
We don't yet generate TCP keepalives. It could be done, but it's unclear
what benefit this would have. (Linux, for example, doesn't start sending
keepalives until the connection has been idle for two hours.)
When the embedded image is a script, the unregister_image() performed by
image/script.c corrupts memory, since image/embedded.c omitted the call
to register_image().
This is the first bug fixed using Stefan Hajnoczi's gdb stub for gPXE.
Return the most appropriate of EACCES, EPERM, ENODEV, ENOTSUP, EIO or
EINVAL depending on the exact error returned by the target, rather than
just always returning EPERM.
Also, ensure that error strings exist for these errors.