iPXE allocates its first PMM block using the image source length,
which is rounded up to the nearest 16-byte paragraph. It then copies
in data of a length calculated from the ROM size, which is
theoretically less than or equal to the image source length, but is
rounded up to the nearest 512-byte sector. This can result in copying
beyond the end of the allocated PMM block, which can corrupt the PMM
data structures (and other essentially arbitrary areas of memory).
Fix by rounding up the image source length to the nearest 512-byte
sector before using it as the PMM allocation length.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Jarrod Johnson <jarrod.b.johnson@gmail.com>
Reported-by: Itay Gazit <itayg@mellanox.co.il>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
INT 16,01 will discard some extended keystrokes on some BIOSes, making
it impossible for iPXE to detect keypresses such as F12. Fix by using
INT 16,11 instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some prefixes (e.g. .lkrn) allow a command line to be passed in to
iPXE. At present, this command line is ignored.
If a command line is provided, treat it as an embedded script (without
an explicit "#!ipxe" magic marker). This allows for patterns of
invocation such as
title iPXE
kernel /boot/ipxe.lkrn dhcp && \
sanboot iscsi:10.0.4.1::::iqn.2010-04.org.ipxe.dolphin:storage
Here GRUB is instructed to load ipxe.lkrn with an embedded script
equivalent to
#!ipxe
dhcp
sanboot iscsi:10.0.4.1::::iqn.2010-04.org.ipxe.dolphin:storage
This can be used to effectively vary the embedded script without
having to rebuild ipxe.lkrn.
Originally-implemented-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The function keys F5-F12 all conform to the same ANSI pattern as the
other "special" keys that we currently recognise. Add these key
definitions, and shrink the representation of the ANSI sequences in
bios_console.c to compensate.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Refactor the {load,exec} image operations as {probe,exec}. This makes
the probe mechanism cleaner, eliminates some forward declarations,
avoids holding magic state in image->priv, eliminates the possibility
of screwing up between the "load" and "exec" stages, and makes the
documentation simpler since the concept of "loading" (as distinct from
"executing") no longer needs to be explained.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The online documentation (e.g. http://ipxe.org/cmd/ifopen), though not
yet complete, is far more comprehensive than could be provided within
the iPXE binary. Save around 200 bytes (compressed) by removing the
command descriptions from the interactive help, and instead referring
users directly to the web page describing the relevant command.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use INT 13,00 as an opportunity to reopen the underlying
block device, which works well for callers such as DOS that will use
INT 13,00 in response to any disk errors. However, some callers (such
as Windows Server 2008) do not attempt to reset the disk, and so any
failures become effectively permanent.
Fix this by automatically reopening the underlying block device
whenever we might want to access it.
This makes direct installation of Windows to an iSCSI target much more
reliable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "size" bit (aka the D/B) bit should (as far as I can tell) be
irrelevant for accesses to a non-code, non-stack, expand-upwards
segment. However, VirtualBox fails on some accesses via this segment
if this bit is not set.
This change allows iPXE to boot under VirtualBox without having to
disable VT-x/AMD-V support.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Building the Linux-specific code (tap.o et al) requires external
headers that have proven to be extremely variable across systems,
causing frequent build failures.
Until this situation is rectified, remove the Linux-specific code from
the default (non-Linux build).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some binutils versions will drag in an object to satisfy the entry
symbol; some won't. Try to cope with this exciting variety of
behaviour by ensuring that all entry symbols are unique.
Remove the explicit inclusion of the prefix object on the linker
command line, since the entry symbol now provides all the information
needed to identify the prefix.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 623469d ("[build] Eliminate unused sections at link-time")
introduced a regression in several build formats, in which the prefix
would end up being garbage-collected out of existence. Fix by
ensuring that an entry symbol exists in each possible prefix, and is
required by the linker script.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use -ffunction-sections, -fdata-sections, and --gc-sections to
automatically prune out any unreferenced sections.
This saves around 744 bytes (uncompressed) from the rtl8139.rom build.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI performs its own PCI bus enumeration. Respect this, and start
controlling devices only when instructed to do so by EFI.
As a side benefit, we should now correctly create multiple SNP
instances for multi-port devices.
This should also fix the problem of failing to enumerate devices
because the PCI bridges have not yet been enabled at the time the iPXE
driver is loaded.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Merge the "bus" and "devfn" fields into a single "busdevfn" field, to
match the format used by the majority of external code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes can report multiple memory regions which may be adjacent
and the same type. Since only the first region is used in the
mboot.c32 layer it's possible to run out of memory when loading all of
the boot modules. One may get around this problem by having iPXE
merge these memory regions internally.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the concept of shutdown exit flags, and replace it with a
counter used to keep track of exposed interfaces that require devices
to remain active.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
libflat no longer has anything to do with flat real mode; it handles
only the A20 gate. Update library name to match.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Flat real mode will have been set up as a side-effect of the
protected-mode call invoked during install_block() for .text16.early;
there is no need to do so explicitly.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Flat real mode works perfectly on real hardware, but seems to cause
problems for some hypervisors. Revert to using 16-bit protected mode
(and returning to real mode with 4GB limits, so as not to break PMM
BIOSes).
Allow the code specific to the .mrom format to continue to assume that
flat real mode works, since this format is specific to real hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The PXE debugging messages have remained pretty much unaltered since
Etherboot 5.4, and are now difficult to read in comparison to most of
the rest of iPXE.
Bring the pxe_udp debug messages up to normal iPXE standards.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Earlier versions of the PXE specification do not have the SubVendor_ID
and SubDevice_ID fields, and some NBPs may not provide space for them.
Avoid overwriting the contents of these fields, just in case.
This is similar to the problem with the BufferLimit field in
PXENV_GET_CACHED_INFO.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Changes were made to files where the licence text within the files
themselves confirms that the files are GPL version 2 or later.
Signed-off-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the real-mode address ffff:0010 to access the linear address
0x100000, and so test whether or not the A20 gate is enabled without
requiring a switch into flat real mode (or some other addressing
mode).
This speeds up CPU mode transitions, and also avoids breaking the NBP
from IBM's Tivoli Provisioning Manager for Operating System
Deployment. This NBP makes some calls to iPXE in VM86 mode rather
than true real mode and does not correctly emulate our transition into
flat real mode.
Interestingly, Tivoli's VMM *does* allow us to switch into protected
mode (though it patches our GDT so that we execute in ring 1 rather
than ring 0). However, paging is still disabled and we have a 4GB
segment limit. Being in ring 1 does not, therefore, restrict us in
any meaningful way; this has been verified by deliberately writing
garbage over Tivoli's own GDT (at address 0x02201010) during a
nominally VM86-mode PXE API call. It's unclear precisely what
protection this VMM is supposed to be offering.
Suggested-by: Joshua Oreman <oremanj@rwcr.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some network cards do not generate interrupts when operated via the
UNDI API. Allow for this by waiting for the ISR to be triggered only
if the PXE stack advertises that it supports interrupts. When the PXE
stack does not advertise interrupt support, we skip the call to
PXENV_UNDI_ISR_IN_START and just poll the device using
PXENV_UNDI_ISR_IN_PROCESS. This matches the observed behaviour of at
least one other PXE NBP (emBoot's winBoot/i), so there is a reasonable
chance of this working.
Originally-implemented-by: Muralidhar Appalla <Muralidhar.Appalla@emulex.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The disk signature is used by some OSes (notably Windows) to identify
the boot disk, so it's useful debugging information to have.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support the extensions mandated by EDD 4.0, including:
o the ability to specify a flat physical address in a disk address
packet,
o the ability to specify a sector count greater than 127 in a disk
address packet,
o support for all functions within the Fixed Disk Access and EDD
Support subsets,
o the ability to describe a device using EDD Device Path Information.
This implementation is based on draft revision 3 of the EDD 4.0
specification, with reference to the EDD 3.0 specification. It is
possible that this implementation may need to change in order to
conform to the final published EDD 4.0 specification.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The block device interface used in gPXE predates the invention of even
the old gPXE data-transfer interface, let alone the current iPXE
generic asynchronous interface mechanism. Bring this old code up to
date, with the following benefits:
o Block device commands can be cancelled by the requestor. The INT 13
layer uses this to provide a global timeout on all INT 13 calls,
with the result that an unexpected passive failure mode (such as
an iSCSI target ACKing the request but never sending a response)
will lead to a timeout that gets reported back to the INT 13 user,
rather than simply freezing the system.
o INT 13,00 (reset drive) is now able to reset the underlying block
device. INT 13 users, such as DOS, that use INT 13,00 as a method
for error recovery now have a chance of recovering.
o All block device commands are tagged, with a numerical tag that
will show up in debugging output and in packet captures; this will
allow easier interpretation of bug reports that include both
sources of information.
o The extremely ugly hacks used to generate the boot firmware tables
have been eradicated and replaced with a generic acpi_describe()
method (exploiting the ability of iPXE interfaces to pass through
methods to an underlying interface). The ACPI tables are now
built in a shared data block within .bss16, rather than each
requiring dedicated space in .data16.
o The architecture-independent concept of a SAN device has been
exposed to the iPXE core through the sanboot API, which provides
calls to hook, unhook, boot, and describe SAN devices. This
allows for much more flexible usage patterns (such as hooking an
empty SAN device and then running an OS installer via TFTP).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE has never supported SEEK_END; the usage of "whence" offers only
the options of SEEK_SET and SEEK_CUR and so is effectively a boolean
flag. Further flags will be required to support additional metadata
required by the Fibre Channel network model, so repurpose the "whence"
field as a generic "flags" field.
xfer_seek() has always been used with SEEK_SET, so remove the "whence"
field altogether from its argument list.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Declarations without the accompanying __table_entry cause misalignment
of the table entries when using gcc 4.5. Fix by adding the
appropriate __table_entry macro or (where possible) by removing
unnecessary forward declarations.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support qemu-like arguments for network setup:
--net driver_name[,setting=value]*
and global settings:
--settings setting=value[,setting=value]*
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add makefiles, ld scripts and default config for linux platform for
both i386 and x86_64.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
pcbios specific get_memmap() is used by the b44 driver making
all-drivers builds fail on other platforms. Move it to the I/O API
group and provide a dummy implementation on EFI.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
COM32 binaries generally expect to run with interrupts
enabled. Syslinux does so, and COM32 programs will execute cli/sti
pairs when running a critical section, to provide mutual exclusion
against BIOS interrupt handlers. Previously, under iPXE, the IDT was
not valid, so any interrupt (e.g. a timer tick) would generally cause
the machine to triple fault.
This change introduces code to:
- Create a valid IDT at the same location that syslinux uses
- Create an "interrupt jump buffer", which contains small pieces of
code that simply record the vector number and jump to a common
handler
- Thunk down to real mode and execute the BIOS's interrupt handler
whenever an interrupt is received in a COM32 program
- Switch IDTs and enable/disable interrupts when context switching to
and from COM32 binaries
Testing done:
- Booted VMware ESX using a COM32 multiboot loader (mboot.c32)
- Built with GDBSERIAL enabled, and tested breakpoints on int22 and
com32_irq
- Put the following code in a COM32 program:
asm volatile ( "sti" );
while ( 1 );
Before this change, the machine would triple fault
immediately. After this change, it hangs as expected. Under Bochs,
it is possible to see the interrupt handler run, and the current
time in the BIOS data area gets incremented.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An assembly version of memswap() is in an x86 word-length-agnostic
header file, but it used 32-bit registers to store pointers, leading
to memory errors responding to ARP queries on 64-bit systems.
Signed-off-by: Joshua Oreman <oremanj@rwcr.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existence and usage of the BEV entry point is covered by the PnP
spec, not the BBS spec; the BBS spec merely describes a policy for
selecting the boot device order. iPXE should therefore check only for
a PnP BIOS in order to decide whether or not to hook INT19.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit ea12dc0 ("[build] Avoid hard-coding the path to perl")
introduced a build failure for fully clean trees (e.g. after running
"make veryclean"), since the dependency upon $(PARSEROM) now includes
a dependency upon "perl" (which doesn't exist) rather than upon
"/usr/bin/perl" (which does exist).
There should of course be no dependency upon the perl binary at all;
the dependency should be upon "./util/parserom.pl" alone.
Fix by removing the $(PERL) from the definition of Perl-based utility
paths, and adding $(PERL) at the point of usage.
Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove data-xfer as an interface type, and replace data-xfer
interfaces with generic interfaces supporting the data-xfer methods.
Filter interfaces (as used by the TLS layer) are handled using the
generic pass-through interface capability. A side-effect of this is
that deliver_raw() no longer exists as a data-xfer method. (In
practice this doesn't lose any efficiency, since there are no
instances within the current codebase where xfer_deliver_raw() is used
to pass data to an interface supporting the deliver_raw() method.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove name-resolution as an interface type, and replace
name-resolution interfaces with generic interfaces supporting the
resolv_done() method.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
strerror() has not been able to use the PXE-only error table since
commit 9aa61ad ("Add per-file error identifiers") back in 2007.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Most of iPXE uses __attribute__((packed)) anyway, and PACKED conflicts
with an identically-named macro in the upstream EFI header files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
See RFC 4578 for details.
Signed-off-by: Joshua Oreman <oremanj@rwcr.net>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The linker chooses to look for _start first and always picks
efidrvprefix.o to satisfy it (probably because it's earlier in the
archive) which causes a multiple definition error when the linker
later has to pick efiprefix.o for other symbols.
Fix by using EFI-specific TGT_LD_FLAGS with an explicit entry point.
Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com>
Signed-off-by: Joshua Oreman <oremanj@rwcr.net>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This removes the need for inline safety wrappers, marginally reducing
the size penalty of weak functions, and works around an apparent
binutils bug that causes undefined weak symbols to not actually be
NULL when compiling with -fPIE (as EFI builds do).
A bug in versions of binutils prior to 2.16 (released in 2005) will
cause same-file weak definitions to not work with those
toolchains. Update the README to reflect our new dependency on
binutils >= 2.16.
Signed-off-by: Joshua Oreman <oremanj@rwcr.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
COMBOOT API calls set the carry flag on failure. This was not being
propagated because the COMBOOT interrupt handler used iret to return
with EFLAGS restored from the stack. This patch propagates CF before
returning from the interrupt.
Reported-by: Geoff Lywood <glywood@vmware.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Microsoft WDS can end up calling PXENV_RESTART_TFTP to execute a
second-stage NBP which then exits. Specifically, wdsnbp.com uses
PXENV_RESTART_TFTP to execute pxeboot.com, which will exit if the user
does not press F12. iPXE currently treats PXENV_RESTART_TFTP as a
normal PXE API call, and so attempts to return to wdsnbp.com, which
has just been vaporised by pxeboot.com.
Use rmsetjmp/rmlongjmp to preserve the stack state as of the initial
NBP execution, and to restore this state immediately prior to
executing the NBP loaded via PXENV_RESTART_TFTP. This matches the
behaviour in the PXE spec (which says that "if TFTP is restarted,
control is never returned to the caller"), and allows pxeboot.com to
exit relatively cleanly back to iPXE.
As with all usage of setjmp/longjmp, there may be subtle corner case
bugs due to not gracefully unwinding any state accumulated by the time
of the longjmp call, but this seems to be the only viable way to
provide the specified behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an infrastructure allowing the prefix to provide an open_payload()
method for obtaining out-of-band access to the whole iPXE image. Add
a mechanism within this infrastructure that allows raw access to the
expansion ROM BAR by temporarily borrowing an address from a suitable
memory BAR on the same PCI card.
For cards that have a memory BAR that is at least as large as their
expansion ROM BAR, this allows large iPXE ROMs to be supported even on
systems where PMM fails, or where option ROM space pressure makes it
impossible to use PMM shrinking. The BIOS sees only a stub ROM of
approximately 3kB in size; the remainder (which can be well over 64kB)
is loaded only at the time iPXE is invoked.
As a nice side-effect, an iPXE .mrom image will continue to work even
if its PMM-allocated areas are overwritten between initialisation and
invocation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The only remaining useful function of makerom.pl is to correct the ROM
and PnP checksums; the PCI IDs are set at link time, and padding is
performed using padimg.pl.
Option::ROM already provides a facility for correcting the checksums,
so we may as well just use this instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is common for system memory maps to be grotesquely unreliable
during POST. Many sanity checks have been added to the memory map
reading code, but these do not catch all problems.
Skip relocation entirely if called during POST. This should avoid the
problems typically encountered, at the cost of slightly disrupting the
memory map of an operating system booted via iPXE when iPXE was
entered during POST. Since this is a very rare special case (used,
for example, when reflashing an experimental ROM that would otherwise
prevent the system from completing POST), this is an acceptable cost.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some BIOSes (at least some AMI BIOSes) tend to refuse to allocate a
single area large enough to hold both the iPXE image source and the
temporary decompression area, despite promising a largest available
PMM memory block of several megabytes. This causes ROM image
shrinking to fail on these BIOSes, with undesirable consequences:
other option ROMs may be disabled due to shortage of option ROM space,
and the iPXE ROM may itself be corrupted by a further BIOS bug (again,
observed on an AMI BIOS) which causes large ROMs to end up overlapping
reserved areas of memory. This can potentially render a system
unbootable via any means.
Increase the chances of a successful PMM allocation by dropping the
alignment requirement (which is redundant now that we can enable A20
from within the prefix); this allows us to reduce the allocation size
from 2MB down to only the required size.
Increase the chances still further by using two separate allocations:
one to hold the image source (i.e. the copy of the ROM before being
shrunk) and the other to act as the decompression area. This allows
ROM image shrinking to take place even on systems that fail to
allocate enough memory for the temporary decompression area.
Improve the behaviour of iPXE in systems with multiple iPXE ROMs by
sharing PMM allocations where possible. Image source areas can be
shared with any iPXE ROMs with a matching build identifier, and the
temporary decompression area can be shared with any iPXE ROMs with the
same uncompressed size (rounded up to the nearest 128kB).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use INT 15,88 to find a suitable temporary decompression area, rather
than a fixed address. This hopefully gives us a better chance of not
treading on any PMM-allocated areas, in BIOSes where PMM support
exists but tends not to give us the large blocks that we ask for.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Always call INT 15,88 even if we don't use the result. This allows
DEBUG=memmap to show the complete result set returned by all of the
INT 15 memory-map calls.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Randomly generate a 32-bit build identifier that can be used to
identify identical iPXE ROMs when multiple such ROMs are present in a
system (e.g. when a multi-function NIC exposes the same iPXE ROM image
via each function's expansion ROM BAR).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The existing "iPXE starting execution" message indicates that the BEV
(or INT19) was invoked, but gives no indication on whether or not the
iPXE source was successfully retrieved (e.g. from PMM). Split the
"starting execution message" into "starting execution...ok"; the "ok"
indicates that the main iPXE body was successfully decompressed and
relocated.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Now that we can use odd megabytes, there is no particular need to use
an even megabyte as the fallback temporary load point.
Note that the old warnings about avoiding 2MB pre-date our ability to
cooperate with other PXE ROMs by using PMM.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE is now capable of operating in odd megabytes of memory, so remove
the obsolete code enforcing an even-megabyte constraint.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the shared code in libflat to perform the A20 transitions
automatically on each transition from real to protected mode. This
allows us to remove all explicit calls to gateA20_set().
The old warnings about avoiding automatically enabling A20 are
essentially redundant; they date back to the time when we would always
start hammering the keyboard controller without first checking to see
if gate A20 was already enabled (which it almost always is).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE currently insists on residing in an even megabyte. This imposes
undesirably severe constraints upon our PMM allocation strategy, and
limits our options for mechanisms to access ROMs greater than 64kB in
size.
Add A20 handling code to libflat so that prefixes are able to access
memory even in odd megabytes.
The algorithms and tuning parameters in the new A20 handling code are
based upon a mixture of the existing iPXE A20 code and the A20 code
from the 2.6.32 Linux kernel.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The flatten_real_mode routine is not needed until after decompressing
.text16.early, and currently performs various contortions to
compensate for the fact that .prefix may not be writable. Move
flatten_real_mode to .text16.early to save on (compressed) binary size
and simplify the code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a section .text16.early which is always kept inline with the
prefix. This will allow for some code sharing between the .prefix and
.text16 sections.
Note that the simple solution of just prepending the .prefix section
to the .text16 section will not work, because a bug in Wyse Streaming
Manager server (WLDRM13.BIN) requires us to place a dummy PXENV+ entry
point at the start of .text16.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use flat real mode rather than 16-bit protected mode for access to
high memory during installation. This simplifies the code by reducing
the number of CPU modes we need to think about, and also increases the
amount of code in common between the normal and (somewhat
hypothetical) KEEP_IT_REAL methods of operation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When returning to real mode, set 4GB segment limits instead of 64kB
limits. This change improves our chances of successfully returning to
a PMM-capable BIOS aftering entering iPXE during POST; the BIOS will
have set up flat real mode before calling our initialisation point,
and may be disconcerted if we then return in genuine real mode.
This change is unlikely to break anything, since any code that might
potentially access beyond 64kB must use addr32 prefixes to do so; if
this is the case then it is almost certainly code written to expect
flat real mode anyway.
Note that it is not possible to restore the real-mode segment limits
to their original values, since it is not possible to know which
protected-mode segment descriptor was originally used to initialise
the limit portion of the segment register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The .hrom prefix provides an experimental mechanism for reducing
option ROM space usage on systems where PMM allocation fails, by
pretending that PMM allocation succeeded and gave us an address fixed
at compilation time. This is unreliable, and potentially dangerous.
In particular, when multiple gPXE ROMs are present in a system, each
gPXE ROM will assume ownership of the same fixed address, resulting in
undefined behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The .xrom prefix provides an experimental mechanism for loading ROM
images greater than 64kB in size by mapping the expansion ROM BAR in
at a hopefully-unused address. This is unreliable, and potentially
dangerous. In particular, there is no guarantee that any PCI bridges
between the CPU and the device will respond to accesses for the
"unused" memory region that is chosen, and it is possible that the
process of scanning for the "unused" memory region may end up issuing
reads to other PCI devices. If this ends up trampling on a register
with read side-effects belonging to an unrelated PCI device, this may
cause undefined behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Access to the gpxe.org and etherboot.org domains and associated
resources has been revoked by the registrant of the domain. Work
around this problem by renaming project from gPXE to iPXE, and
updating URLs to match.
Also update README, LOG and COPYRIGHTS to remove obsolete information.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
gPXE currently overwrites the filename stored in the cached DHCP
packets when a call to PXENV_TFTP_READ_FILE or PXENV_RESTART_TFTP is
made. This code has existed for many years as a workaround for RIS,
which seemed to require that this be done.
pxe_set_cached_filename() causes problems with the Bootix NBP, and a
recent test demonstrates that RIS will complete successfully even with
pxe_set_cached_filename() removed. There have been many changes to
the DHCP and PXE logic since this code was first added, and it is
quite plausible that it was masking a bug that no longer exists.
Reported-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
Debugged-by: Shao Miller <Shao.Miller@yrdsb.edu.on.ca>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Current gPXE code always returns "OURS" in response to
PXENV_UNDI_ISR:START. This is harmless for non-shared interrupt
lines, and avoids the complexity of trying to determine whether or not
we really did cause the interrupt. (This is a non-trivial
determination; some drivers don't have interrupt support and hook the
system timer interrupt instead, for example.)
A problem occurs when we have a shared interrupt line, the other
device asserts an interrupt, and the controlling ISR does not chain to
the other device's ISR when we return "OURS". Under these
circumstances, the other device's ISR never executes, and so the
interrupt remains asserted, causing an interrupt storm.
Work around this by returning "OURS" if and only if our net device's
interrupt is currently recorded as being enabled. Since we always
disable interrupts as a result of a call to PXENV_UNDI_ISR:START, this
guarantees that we will eventually (on the second call) return "NOT
OURS", allowing the other ISR to be called. Under normal operation,
including a non-shared interrupt situation, this change will make no
difference since PXENV_UNDI_ISR:START would be called only when
interrupts were enabled anyway.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
In the actual SYSLINUX suite's comboot implementation, the version
string is prefixed by CR LF, and the copyright string has a leading
space. Some tools (specifically HDT) assume these padding characters
exist, so we should probably return strings in a similar format.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Loading multiple UNDI instances would be useful in systems that have
several network cards with vendor PXE ROMs. However, we cannot rely on
UNDI ROMs working correctly with multiple instances loaded
simultaneously.
The gPXE UNDI driver supports the following multi-NIC configurations:
1. Chainloading undionly.kpxe on a specific NIC.
2. Loading the UNDI driver for the first probed device and ignoring all
other UNDI devices in the system.
This patch refuses to probe additional UNDI devices so there can never
be multiple instances of UNDI loaded.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The .elf, .elfd, .lmelf, and .lmelfd prefices were brought over from
legacy Etherboot and they do not build in gPXE. This patch removes the
ELF prefices.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The unfinished .exe prefix was brought over from legacy Etherboot.
There has been no demand for .exe images so this patch removes the
prefix.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The DOS .com prefix was brought over from legacy Etherboot but does not
build. There has been no demand for .com images so this patch removes
the prefix.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The .lkrn prefix allows gPXE to be loaded as a Linux bzImage. The
bImage prefix was carried over from legacy Etherboot and does not build.
This patch removes the .bImage prefix, use .lkrn instead.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
It might be the case that we wish to chain to an NBP without
being "in the way". We now implement a hook in our exit path
for gPXE *.*pxe build targets. The hook is a pointer to a
SEG16:OFF16 which we try to jump to during exit. By default,
this pointer results in the usual exit path.
We also implement the "pxenv_file_exit_hook" PXE API routine
to allow the user to specify an alternate SEG16:OFF16 to jump
to during exit.
Unfortunately, this additional PXE extension has a cost
in code size. Fortunately, a look at the size difference
for a gPXE .rom build target shows zero size difference
after compression.
The routine is documented in doc/pxe_extensions as follows:
FILE EXIT HOOK
Op-Code: PXENV_FILE_EXIT_HOOK (00e7h)
Input: Far pointer to a t_PXENV_FILE_EXIT_HOOK parameter
structure that has been initialized by the caller.
Output: PXENV_EXIT_SUCCESS or PXENV_EXIT_FAILURE must be
returned in AX. The Status field in the parameter
structure must be set to one of the values represented
by the PXENV_STATUS_xxx constants.
Description:Modify the exit path to jump to the specified code.
Only valid for pxeprefix-based builds.
typedef struct s_PXENV_FILE_EXIT_HOOK {
PXENV_STATUS_t Status;
SEGOFF16_t Hook;
} t_PXENV_FILE_EXIT_HOOK;
Set before calling API service:
Hook: The SEG16:OFF16 of the code to jump to.
Returned from API service:
Status: See PXENV_STATUS_xxx constants.
Requested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Shao Miller <shao.miller@yrdsb.edu.on.ca>
Signed-off-by: Marty Connor <mdc@etherboot.org>
The standard option ROM format provides a header indicating the size
of the entire ROM, which the BIOS will reserve space for, load, and
call as necessary. However, this space is strictly limited to 128k for
all ROMs. gPXE ameliorates this somewhat by reserving space for itself
in high memory and relocating the majority of its code there, but on
systems prior to PCI3 enough space must still be present to load the
ROM in the first place. Even on PCI3 systems, the BIOS often limits the
size of ROM it will load to a bit over 64kB.
These space problems can be solved by providing an artificially small
size in the ROM header: just enough to let the prefix code (at the
beginning of the ROM image) be loaded by the BIOS. To the BIOS, the
gPXE ROM will appear to be only a few kilobytes; it can then load
the rest of itself by accessing the ROM directly using the PCI
interface reserved for that task.
There are a few problems with this approach. First, gPXE needs to find
an unmapped region in memory to map the ROM so it can read from it;
this is done using the crude but effective approach of scanning high
memory (over 0xF0000000) for a sufficiently large region of all-ones
(0xFF) reads. (In x86 architecture, all-ones is returned for accesses
to memory regions that no mapped device can satisfy.) This is not
provably valid in all situations, but has worked well in practice.
More importantly, this type of ROM access can only work if the PCI ROM
BAR exists at all. NICs on physical add-in PCI cards generally must
have the BAR in order for the BIOS to be able to load their ROM, but
ISA cards and LAN-on-Motherboard cards will both fail to load gPXE
using this scheme.
Due to these uncertainties, it is recommended that .xrom only be used
when a regular .rom image is infeasible due to crowded option ROM
space. However, when it works it could allow loading gPXE images
as large as a flash chip one could find - 128kB or even higher.
Signed-off-by: Marty Connor <mdc@etherboot.org>
For extremely tight space requirements and specific applications, it is
sometimes desirable to create gPXE images that cannot provide the PXE API
functionality to client programs. Add a configuration header option,
PXE_STACK, that can be removed to remove this stack. Also add PXE_MENU
to control the PXE boot menu, which most uses of gPXE do not need.
Signed-off-by: Marty Connor <mdc@etherboot.org>
If we don't unload the PXE stack before executing gPXE, automatically
take advantage of the cached DHCPACK that the underlying/parent PXE
stack can provide. If that cached DHCPACK contains option 175.178, or
the user sets the use-cached setting before invoking DHCP, the real
DHCP request will be skipped and the cached DHCPACK will be used for
network configuration. Otherwise, the cached settings block is thrown
away as soon as a fresh one is acquired.
Signed-off-by: Marty Connor <mdc@etherboot.org>
Calling the parent PXE stack (the stack that loaded us, for
undionly.kkpxe) can be useful for more than UNDI calls; for instance,
it lets us get cached DHCP packets to avoid re-DHCP when working with
embedded images.
Signed-off-by: Marty Connor <mdc@etherboot.org>
pxenv_tftp_get_fsize is an API call that PXE clients can call to
obtain the size of a remote file. It is implemented by starting a TFTP
transfer with pxe_tftp_open, waiting for the response and then
stopping the transfer with pxe_tftp_close(). This leaves the session
hanging on the TFTP server and it will try to resend the packet
repeatedly (verified with tftpd-hpa) until it times out.
This patch adds a method "tftpsize" that will abort the transfer after
the first packet is received from the server. This will terminate the
session on the server and is the same behaviour as Intel's PXE ROM
exhibits.
Together with a qemu patch to handle the ERROR packet (submitted to
qemu's mailing list), this resolves a specific issue where booting
pxegrub with qemu's TFTP server would be slow or hang.
I've tested this against qemu's tftp server and against my normal boot
infrastructure (tftpd-hpa). Booting pxegrub and loading extra files
now produces a trace similar to Intel's PXE client and there are no
spurious retransmits from tftpd any more.
Signed-off-by: Thomas Horsten <thomas@horsten.com>
Signed-off-by: Milan Plzik <milan.plzik@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
When the "keep-san" option is used, the function is exited without
unregistering the stack allocated int13h drive. To prevent a dangling
pointer to the stack, these structs should be heap allocated.
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
gPXE currently takes advantage of the feature of PCI3.0 that allows
option ROMs to relocate the bulk of their code to high memory and so
take up only a small amount of space in the option ROM area. Currently,
the relocation can only take place if the BIOS's implementation of PMM
can be made to return blocks aligned to an even megabyte, because of
the A20 gate. AMI BIOSes, in particular, will not return allocations
that gPXE can use.
Ameliorate the situation somewhat by adding a prefix, .hrom, that works
identically to .rom except in the case that PMM allocation fails. Where
.rom would give up and place itself entirely in option ROM space, .hrom
moves to a block (assumed free) at HIGHMEM_LOADPOINT = 4MB. This allows
for the use of larger gPXE ROMs than would otherwise be possible.
Because there is no way to check that the area at HIGHMEM_LOADPOINT is
really free, other devices using that memory during the boot process
will cause failure for gPXE, the other device, or both. In practice
such conflicts will likely not occur, but this prefix should still be
considered EXPERIMENTAL.
Signed-off-by: Marty Connor <mdc@etherboot.org>
The disk partition prefix code in hdprefix.S reads the gPXE image in
tracks, not individual sectors. This means it will attempt to read
beyond the end of the image if the .hd image type is not padded to 32
KB.
This issue is affects virtualization software which may execute a .hd or
.usb image file directly - effectively running a machine with a tiny
disk containing just the gPXE image. Boot will fail when gPXE tries to
read beyond the end of disk.
The Multiboot memory map needs to be built after unhiding gPXE and
downloaded images from memory. Solaris faults during boot when trying
to access the ramdisk, which is hidden from the memory map while gPXE is
executing. This issue is fixed by using the memory map from after gPXE
unhides itself.
Reported-by: Moinak Ghosh <moinakg@belenix.org>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
The get_underlying_e820 function should return with CF unset on success.
Reported-by: Timothy Stack <tstack@vmware.com>
Signed-off-by: Marty Connor <mdc@etherboot.org>
REQUIRE_SYMBOL() formerly used a formulation of symbol requirement
that would allow a link to succeed despite lacking a required symbol,
because it did not introduce any relocations. Fix by renaming it to
REQUEST_SYMBOL() (since the soft-requirement behavior can be useful)
and add a REQUIRE_SYMBOL() that truly requires.
Add EXPORT_SYMBOL() and IMPORT_SYMBOL() for REQUEST_SYMBOL()-like
behavior that allows one to make use of the symbol, by combining a
weak external on the symbol itself with a REQUEST_SYMBOL() of a second
symbol.
Signed-off-by: Marty Connor <mdc@etherboot.org>
Some BIOSes (observed with an AMI BIOS on a SunFire X2200) seem to
reset the BIOS drive counter at 40:75 after a failed boot attempt.
This causes problems when attempting a Windows direct-to-iSCSI
installation: bootmgr.exe calls INT 13,0800 and gets told that there
are no hard disks, so never bothers to read the MBR in order to obtain
the boot disk signature. The Windows iSCSI initiator will detect the
iBFT and connect to the target, and everything will appear to work
except for the error message "This computer's hardware may not support
booting to this disk. Ensure that the disk's controller is enabled in
the computer's BIOS menu."
Fix by checking the BIOS drive counter on every INT 13 call, and
updating it whenever necessary.
The case of an unsupported SAN protocol will currently not result in
any error message. Fix by printing the error message at the top level
using strerror(), rather than using hard-coded error messages in the
error paths.
IPoIB has a 20-byte link-layer address, of which only eight bytes
represent anything relating to a "hardware address".
The PXE and EFI SNP APIs expect the permanent address to be the same
size as the link-layer address, so fill in the "permanent address"
field with the initial link layer address (as generated by
register_netdev() based upon the real hardware address).
The hardware address is an intrinsic property of the hardware, while
the link-layer address can be changed at runtime. This separation is
exposed via APIs such as PXE and EFI, but is currently elided by gPXE.
Expose the hardware and link-layer addresses as separate properties
within a net device. Drivers should now fill in hw_addr, which will
be used to initialise ll_addr at the time of calling
register_netdev().
The option ROM header contains a one-byte field indicating the number
of 512-byte sectors in the ROM image. Currently it is linked to
contain the number of uncompressed sectors, with an instruction to the
compressor to correct it. This causes link failure when the
uncompressed size of the ROM image is over 128k.
Fix by replacing the SUBx compressor fixup with an ADDx fixup that
adds the total compressed output length, scaled as requested, to an
addend stored in the field where the final length value will be
placed. This is similar to the behavior of ELF relocations, and
ensures that an overflow error will not be generated unless the
compressed size is still too large for the field.
This also allows us to do away with the _filesz_pgh and _filesz_sect
calculations exported by the linker script.
Output tested bitwise identical to the old SUBx mechanism on hd, dsk,
lkrn, and rom prefixes, on both 32-bit and 64-bit processors.
Modified-by: Michael Brown <mcb30@etherboot.org>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
The SRP Boot Firmware Table serves a similar role to the iSCSI and AoE
Boot Firmware Tables; it provides information required by the loaded
OS in order to establish a connection back to the SRP boot device.
SRP is the SCSI RDMA Protocol. It allows for a method of SAN booting
whereby the target is responsible for reading and writing data using
Remote DMA directly to the initiator's memory. The software initiator
merely sends and receives SCSI commands; it never has to touch the
actual data.
Some BIOSes support the BIOS Boot Specification (BBS) but fail to set
%es:%di correctly when calling the option ROM initialisation entry
point. This causes gPXE to identify the BIOS as non-PnP (and so
non-BBS), leaving the user unable to control the boot order.
Fix by scanning for the $PnP signature ourselves, rather than relying
on the BIOS having passed in %es:%di correctly.
Tested-by: Helmut Adrigan <helmut.adrigan@chello.at>
pxe_api.h is just a description of API functions, it's actively
undesirable to have more implementations than necessary. Allowing it
under the MIT license lets the Syslinux libraries use it.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
We add a syslinux floppy disk type using parts of the genliso script.
This floppy image cat be dd'ed to a physical floppy or used in
instances where a virtual floppy with an mountable DOS filesystem is
useful.
We also modify the genliso script to only generate .liso images
rather than creating images depending on how it is called.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
IPoIB has a link-layer broadcast address that varies according to the
partition key. We currently go through several contortions to pretend
that the link-layer address is a fixed constant; by making the
broadcast address a property of the network device rather than the
link-layer protocol it will be possible to simplify IPoIB's broadcast
handling.
These commands can be used to activate or deactivate the PXE API (on a
specifiable network interface).
This is currently of limited use, since most image formats will call
shutdown() before booting the image, meaning that the underlying net
device gets shut down during remove_devices() anyway.
pxe_init_structures() fills in the fields of the !PXE and PXENV+
structures that aren't known until gPXE starts up. Once gPXE is
started, these values will never change.
Make pxe_init_structures() an initialisation function so that PXE
users don't have to worry about calling it.
It is possible that the UNDI ISR may be triggered before netdev_tx()
returns control to pxenv_undi_transmit(). This means that
pxenv_undi_isr() may see a zero undi_tx_count, and so not check for TX
completions. This is not a significant problem, since it will check
for TX completions on the next call to pxenv_undi_isr() anyway; it
just means that the NBP will see a spurious IRQ that was apparently
caused by nothing.
Fix by updating the undi_tx_count before calling netdev_tx(), so that
pxenv_undi_isr() can decrement it and report the TX completion.
Symantec Ghost requires working multicast support. gPXE configures
all (sufficiently supported) network adapters into "receive all
multicasts" mode, which means that PXENV_UNDI_SET_MCAST_ADDRESS is
actually a no-op, but the current implementation returns
PXENV_STATUS_UNSUPPORTED instead.
Fix by making PXENV_UNDI_SET_MCAST_ADDRESS return success. For good
measure, also implement PXENV_UNDI_GET_MCAST_ADDRESS, since the
relevant functionality is now exposed by the net device core.
Note that this will silently fail if the gPXE driver for the NIC being
used fails to configure the NIC in "receive all multicasts" mode.
The PXE debugging messages have remained pretty much unaltered since
Etherboot 5.4, and are now difficult to read in comparison to most of
the rest of gPXE.
Bring the pxe_undi debug messages up to normal gPXE standards.
The Symantec UNDI DOS driver fails when run on top of gPXE because we
return our interface type as "gPXE" rather than one of the predefined
NDIS interface type strings.
Fix by returning the standard "DIX+802.3" string; this isn't
necessarily always accurate, but it's highly unlikely that anything
trying to use the UNDI API would understand our IPoIB link-layer
pseudo-header anyway.
The Intel DOS UNDI driver fails when run on top of gPXE because we do
not fill in the ServiceFlags field in PXENV_UNDI_GET_IFACE_INFO.
Fix by filling in the ServiceFlags field with reasonable values
indicating our approximate feature capabilities.
The 3Com DOS UNDI driver fails when run on top of gPXE for two
reasons: firstly because PXENV_UNDI_SET_PACKET_FILTER is unsupported,
and secondly because gPXE enters the NBP without enabling interrupts
on the NIC, and the 3Com driver never calls PXENV_UNDI_OPEN.
Fix by always returning success from PXENV_UNDI_SET_PACKET_FILTER
(which is no worse than the current situation, since we already ignore
the receive packet filter in PXENV_UNDI_OPEN), and by forcibly
enabling interrupts on the NIC within PXENV_UNDI_TRANSMIT. The latter
is something of a hack, but avoids the need to implement a complete
base-code ISR that we would otherwise need if we were to enter the NBP
with interrupts enabled.
In order to construct outgoing link-layer frames or parse incoming
ones properly, some protocols (such as 802.11) need more state than is
available in the existing variables passed to the link-layer protocol
handlers. To remedy this, add struct net_device *netdev as the first
argument to each of these functions, so that more information can be
fetched from the link layer-private part of the network device.
Updated all three call sites (netdevice.c, efi_snp.c, pxe_undi.c) and
both implementations (ethernet.c, ipoib.c) of ll_protocol to use the
new argument.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Etherboot 5.4 erroneously treats PXENV_UNLOAD_STACK as the "final
shutdown" call, and unhooks INT15. When using gPXE's undionly.kpxe,
this results in gPXE overwriting the portion of Etherboot located in
high memory, because it is no longer hidden from the system memory map
at the time that gPXE loads.
Work around this by explicitly testing for Etherboot as the underlying
PXE stack (as is already done in undinet.c) and skipping the call to
PXENV_UNLOAD_STACK if necessary.
Solaris kernels are multiboot images with the "raw" flag set,
indicating that the loader should use the raw address fields within
the multiboot header rather than looking for an ELF header. However,
the Solaris kernel contains garbage data in the raw address fields,
and requires us to use the ELF header instead.
Work around this by always using the ELF header if present. This
renders the "raw" flag somewhat redundant.
The build mechanism currently allows for multiple objects per source
file. The only remaining user of this is unnrv2b.S. Replace this
usage with a separate unnrv2b16.S wrapper file, as is currently used
for e.g. pxeprefix.S and kpxeprefix.S.
Some utilities that expect a floppy disk image (e.g. iLO?) may test
for a file of the correct size. Reinstate the .pdsk image format in
order to provide this if needed.
QEMU will silently round down a disk or ROM image file to the nearest
512 bytes. Fix by always padding .rom, .dsk and .hd images to the
nearest 512-byte boundary.
Originally-fixed-by: Stefan Hajnoczi <stefanha@gmail.com>
Using "lret $2" to return from an interrupt causes interrupts to be
disabled in the calling program, since the INT instruction will have
disabled interrupts. Instead, patch CF on the stack and use iret to
return.
Interestingly, the original PC BIOS had this bug in at least one
place.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Michael Brown <mcb30@etherboot.org>
This allows gPXE to load memtest86, which is packaged as an old kernel.
Split all code that directly touches the kernel headers out into
bzimage_parse_header() and bzimage_update_header(), to reduce code
size and offset the cost of supporting older kernels.
Total cost of this feature: 11 bytes (uncompressed).
The parsing of the !PXE and PXENV+ structures share a fair bit of
code; merge the common code to save a few bytes.
Signed-off-by: Michael Brown <mcb30@etherboot.org>
Search for the PXE entry points (via the !PXE or PXENV+ structures)
through all known combinations of search methods. Furthermore, if we
find a PXENV+ structure, attempt to use it to find the !PXE structure
if at all possible.
Avoid passing credentials in the iBFT that were available but not
required for login. This works around a problem in the Microsoft
iSCSI initiator, which will refuse to initiate sessions if the CHAP
password is fewer than 12 characters, even if the target ends up not
asking for CHAP authentication.
The PXE 1.x spec specifies that on NBP entry or on return from INT
1Ah AX=5650h, EDX shall point to the physical address of the PXENV+
structure. The PXE 2.x spec drops this requirement, simply stating
that EDX is clobbered. Given the principle "be conservative in what
you send, liberal in what you accept", however, we should implement
this anyway.
Certain combinations of PXE stack and BIOS result in a broken INT 18
call, which will leave the system displaying a "PRESS ANY KEY TO
REBOOT" message instead of proceeding to the next boot device. On
these systems, returning via the PXE stack is the only way to continue
to the next boot device. Returning via the PXE stack works only if we
haven't already blown away the PXE base code in pxeprefix.S.
In most circumstances, we do want to blow away the PXE base code.
Base memory is a limited resource, and it is desirable to reclaim as
much as possible. When we perform an iSCSI boot, we need to place the
iBFT above the 512kB mark, because otherwise it may not be detected by
the loaded OS; this may not be possible if the PXE base code is still
occupying that memory.
Introduce a new prefix type .kkpxe which will preserve both the PXE
base code and the UNDI driver (as compared to .kpxe, which preserves
the UNDI driver but uninstalls the PXE base code). This prefix type
can be used on systems that are known to experience the specific
problem of INT 18 being broken, or in builds (such as gpxelinux.0) for
which it is particularly important to know that returning to the BIOS
will work.
Written by H. Peter Anvin <hpa@zytor.com> and Stefan Hajnoczi
<stefanha@gmail.com>, minor structural alterations by Michael Brown
<mcb30@etherboot.org>.
COMBOOT images use INTs to issue API calls; these end up making calls
into gPXE from real mode, and so temporarily change the real-mode
stack pointer. When our COMBOOT code uses a longjmp() to implement
the various "exit COMBOOT image" API calls, this leaves the real-mode
stack pointer stuck with its temporary value, which causes problems if
we eventually try to exit out of gPXE back to the BIOS.
Fix by adding rmsetjmp() and rmlongjmp() calls (analogous to
sigsetjmp()/siglongjmp()); these save and restore the additional state
needed for real-mode calls to function correctly.
Multi-level menus via COMBOOT rely on the COMBOOT program being able
to exit and invoke a new COMBOOT program (the next menu). This works,
but rapidly (within about five iterations) runs out of space in gPXE's
internal stack, since each new image is executed in a new function
context.
Fix by allowing tail recursion between images; an image can now
specify a replacement image for itself, and image_exec() will perform
the necessary tail recursion.
The version of the GNU assembler shipped with Fedora 10
(2.18.50.0.9-8.fc10) complains about character literals in some of our
assembly code. Changing $'x' to $( 'x' ) seems to fix the problem.
Yes, the whitespace is required; using just $('x') does not work.
Reported by Kevin O'Connor <kevin@koconnor.net>.
There are code paths other than PMM allocation that can result in our
changing the ROM checksum. For example, we attempt to update our
product string to incorporate the PCI bus:dev.fn number. In a system
that does not support PMM, we could therefore end up with an incorrect
checksum.
Fix by attempting to update the checksum unconditionally.
As reported by Stefan, commit 13d09e6 ("[i386] Simplify linker script
and standardise linker-defined symbol names") breaks gdb, readelf and
associated utilities.
This is caused by the .stack section overwriting a block in the middle
of the .debug_info section (despite being included in the
.bss.textdata section in the output file, which apparently has the
correct attributes for a .bss section).
Fixed by adding explicit flags and type to the stack section
declaration.
If it happens that _textdata_memsz ends up being an exact multiple of
4kB, then this will cause the .textdata section (after relocation) to
start on a page boundary. This means that the hidden memory region
(which is rounded down to the nearest page boundary) will start
exactly at virtual address 0, i.e. UNULL. This means that
init_eheap() will erroneously assume that it has failed to allocate a
an external heap, since it typically ends up choosing the area that
lies immediately below .textdata, which in this case will be the
region with top==UNULL.
A subsequent error is that memtop_urealloc() passes through the error
return status -ENOMEM to the caller, which (rightly) assumes that the
result represents a valid userptr_t address.
Fixed by using alternative tests for heap non-existence, and by
returning UNULL in case of an error from init_eheap().
There are many functions that take ownership of the I/O buffer they
are passed as a parameter. The caller should not retain a pointer to
the I/O buffer. Use iob_disown() to automatically nullify the
caller's pointer, e.g.:
xfer_deliver_iob ( xfer, iob_disown ( iobuf ) );
This will ensure that iobuf is set to NULL for any code after the call
to xfer_deliver_iob().
iob_disown() is currently used only in places where it simplifies the
code, by avoiding an extra line explicitly setting the I/O buffer
pointer to NULL. It should ideally be used with each call to any
function that takes ownership of an I/O buffer. (The SSA
optimisations will ensure that use of iob_disown() gets optimised away
in cases where the caller makes no further use of the I/O buffer
pointer anyway.)
If gcc ever introduces an __attribute__((free)), indicating that use
of a function argument after a function call should generate a
warning, then we should use this to identify all applicable function
call sites, and add iob_disown() as necessary.
The DHCP client code now implements only the mechanism of the DHCP and
PXE Boot Server protocols. Boot Server Discovery can be initiated
manually using the "pxebs" command. The menuing code is separated out
into a user-level function on a par with boot_root_path(), and is
entered in preference to a normal filename boot if the DHCP vendor
class is "PXEClient" and the PXE boot menu option exists.
pxe_tftp.c assumes that the first seek on its data-transfer interface
represents the block size. Apart from being an ugly hack, this will
also screw up file size calculation for files smaller than one block.
The proper solution would be to extend the data-transfer interface to
support the reporting of stat()-like data. This is not going to
happen until the cost of adding interface methods is reduced (a fix I
have planned since June 2008).
In the meantime, abuse the xfer_window() method to return the block
size, since it is not being used for anything else and is vaguely
justifiable.
Astonishingly, having returned the incorrect TFTP blocksize via
PXENV_TFTP_OPEN for almost a year seems not to have affected any of
the test cases run during that time; this bug was found only when
someone tried running the heavily-patched version of pxegrub found in
OpenSolaris.
elf2efi converts a suitable ELF executable (containing relocation
information, and with appropriate virtual addresses) into an EFI
executable. It is less tightly coupled with the gPXE build process
and, in particular, does not require the use of a hand-crafted PE
image header in efiprefix.S.
elf2efi correctly handles .bss sections, which significantly reduces
the size of the gPXE EFI executable.
The check for unresolved symbols does not explicitly specify an output
architecture format, and so causes a warning when building an i386 EFI
binary on an x86_64 platform. This warning is harmless, and
specifying the output architecture in multiple places is cumbersome,
so just inhibit the warning.
At POST time some BIOSes return invalid e820 maps even though
they indicate that the data is valid. We add a check that the first
region returned by e820 is RAM type and declare the map to be invalid
if it is not.
This extends the sanity checks from 8b20e5d ("[pcbios] Sanity-check
the INT15,e820 and INT15,e801 memory maps").
Currently the only supported platform for x86_64 is EFI.
Building an EFI64 gPXE requires a version of gcc that supports
__attribute__((ms_abi)). This currently means a development build of
gcc; the feature should be present when gcc 4.4 is released.
In the meantime; you can grab a suitable gcc tree from
git://git.etherboot.org/scm/people/mcb30/gcc/.git
EFI provides a copy of the SMBIOS table accessible via the EFI system
table, which we should use instead of manually scanning through the
F000:0000 segment.
On non-BBS systems, we have to hook INT 19 in order to be able to boot
from the gPXE ROM at all. However, doing this unconditionally will
prevent the user from booting via any other devices.
Previously, the INT 19 entry point would prompt the user to press B in
order to boot from gPXE, which makes it impossible to perform an
unattended network boot. We now prompt the user to press N to skip
booting from gPXE, which allows for unattended operation.
This should be a better match for most real-world scenarios. Most
modern systems support BBS and so are unaffected by this change. Very
old (non-BBS) systems tend not to have PXE ROMs by default anyway; if
the user has added a gPXE ROM then they probably do want to boot from
the network. Newer non-BBS systems are essentially limited to IBM
servers, which will recapture the INT 19 vector anyway and implement
their own boot-ordering selection mechanism.
Remove the assortment of miscellaneous hacks to guess the "network
boot device", and replace them each with a call to last_opened_netdev().
It still isn't guaranteed correct, but it won't be any worse than
before, and it will at least be consistent.
This brings us in to line with Linux definitions, and also simplifies
adding x86_64 support since both platforms have 2-byte shorts, 4-byte
ints and 8-byte long longs.
Code paths that automatically allocate memory from the FBMS at 40:13
should also free it, if possible.
Freeing this memory will not be possible if either
1. The FBMS has been modified since our allocation, or
2. We have not been able to unhook one or more BIOS interrupt vectors.
_filesz was incorrectly forced to be aligned up to MAX_ALIGN. In a
non-compressed build, this would cause a build failure unless _filesz
happened to already be aligned to MAX_ALIGN.
The only way that PMM allows us to request a block in a region with
A20=0 is to ask for a block with an alignment of 2MB. Due to the PMM
API design, the only way we can do this is to ask for a block with a
size of 2MB.
Unfortunately, some BIOSes will hit problems if we allocate a 2MB
block. In particular, it may not be possible to enter the BIOS setup
screen; the BIOS setup code attempts a PMM allocation, fails, and
hangs the machine.
We now try allocating only as much as we need via PMM. If the
allocated block has A20=1, we free the allocated block, double the
allocation size, and try again. Repeat until either we obtain a block
with A20=0 or allocation fails. (This is guaranteed to terminate by
the time we reach an allocation size of 2MB.)
With a 16-bit operand, lgdt/lidt will load only a 24-bit base address,
ignoring the high-order bits. This meant that we could fail to fully
restore the GDT across a call into gPXE, if the GDT happened to be
located above the 16MB mark.
Not all of our lgdt/lidt instructions require a data32 prefix (for
example, reloading the real-mode IDT can never require a 32-bit base
address), but by adding them everywhere we will hopefully not forget
the necessary ones in future.
Some hardware vendors have been known to remove all gPXE-related
branding from ROMs that they build. While this is not prohibited by
the GPL, it is a little impolite.
Add a facility for adding branding messages via two #defines
(PRODUCT_NAME and PRODUCT_SHORT_NAME) in config/general.h. This
should accommodate all known OEM-mandated branding requirements.
Vendors with branding requirements that cannot be satisfied by using
PRODUCT_NAME and/or PRODUCT_SHORT_NAME should contact us so that we
can extended this facility as necessary.
This function is a major kludge, but can be made slightly more
accurate by ignoring net devices that aren't open. Eventually it
needs to be removed entirely.
Settings can be constructed using a dotted-decimal notation, to allow
for access to unnamed settings. The default interpretation is as a
DHCP option number (with encapsulated options represented as
"<encapsulating option>.<encapsulated option>".
In several contexts (e.g. SMBIOS, Phantom CLP), it is useful to
interpret the dotted-decimal notation as referring to non-DHCP
options. In this case, it becomes necessary for these contexts to
ignore standard DHCP options, otherwise we end up trying to, for
example, retrieve the boot filename from SMBIOS.
Allow settings blocks to specify a "tag magic". When dotted-decimal
notation is used to construct a setting, the tag magic value of the
originating settings block will be ORed in to the tag number.
Store/fetch methods can then check for the magic number before
interpreting arbitrarily-numbered settings.
This extends the sanity checks on the runtime segment address provided
in %bx, first implemented in commit 5600955.
We now allow the ROM to be placed anywhere above a000:0000 (rather
than c000:0000, as before), since this is the region allowed by the
PCI 3 spec. If the BIOS asks us to place the runtime image such that
it would overlap with the init-time image (which is explicitly
prohibited by the PCI 3 spec), then we assume that the BIOS is faulty
and ignore the provided runtime segment address.
Testing on a SuperMicro BIOS providing overlapping segment addresses
shows that ignoring the provided runtime segment address is safe to do
in these circumstances.
Someone at Dell must have a full-time job designing ways to screw up
implementations of INT 15,e820. This latest gem is courtesy of a Dell
Xanadu system, which arbitrarily decides to obliterate the contents of
%esi.
Preserve %esi, %edi and %ebp across calls to INT 15,e820, in case
someone tries a variation on this trick in future.
FreeBSD requires the object format to be specified as elf_i386_fbsd,
rather than elf_i386.
Based on a patch from Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Some PCI 3 BIOSes seem to provide a garbage value in %bx, which should
contain the runtime segment address. Perform a basic sanity check: we
reject the segment if it is below the start of option ROM space. If
the sanity check fails, we assume that the BIOS was not expecting us
to be a PCI 3 ROM, and we just leave our image in situ.
The section name seems to have significance for some versions of
binutils.
There is no way to instruct gcc that sections such as .bss16 contain
uninitialised data; it will emit them with contents explicitly set to
zero. We therefore have to rely on the linker script to force these
sections to become uninitialised-data sections. We do this by marking
them as NOLOAD; this seems to be the closest semantic equivalent in the
linker script language.
However, this gets ignored by some versions of ld (including 2.17 as
shipped with Debian Etch), which mark the resulting sections with
(CONTENTS,ALLOC,LOAD,DATA). Combined with the fact that this version of
ld seems to ignore the specified LMA for these sections, this means that
they end up overlapping other sections, and so parts of .prefix (for
example) get obliterated by .data16's bss section.
Rename the .bss sections from .section_bss to .bss.section; this seems to
cause these versions of ld to treat them as uninitialised data.
Not fully understood, but it seems that the LMA of bss sections matters
for some newer binutils builds. Force all bss sections to have an LMA
at the end of the file, so that they don't interfere with other
sections.
The symptom was that objcopy -O binary -j .zinfo would extract the
.zinfo section from bin/xxx.tmp as a blob of the correct length, but
with zero contents. This would then cause the [ZBIN] stage of the
build to fail.
Also explicitly state that .zinfo(.*) sections have @progbits, in case
some future assembler or linker variant decides to omit them.
Some versions of ld choke on the "AT ( _xxx_lma )" in efi.lds with an
error saying "nonconstant expression for load base". Since these were
only explicitly setting the LMA to the address that it would have had
anyway, they can be safely omitted.
We have EFI APIs for CPU I/O, PCI I/O, timers, console I/O, user
access and user memory allocation.
EFI executables are created using the vanilla GNU toolchain, with the
EXE header handcrafted in assembly and relocations generated by a
custom efilink utility.
The userptr_t is now the fundamental type that gets used for conversions.
For example, virt_to_phys() is implemented in terms of virt_to_user() and
user_to_phys().
Reduce the number of sections within the linker script to match the
number of practical sections within the output file.
Define _section, _msection, _esection, _section_filesz, _section_memsz,
and _section_lma for each section, replacing the mixture of symbols that
previously existed.
In particular, replace _text and _end with _textdata and _etextdata, to
make it explicit within code that uses these symbols that the .text and
.data sections are always treated as a single contiguous block.
Allow for the build CPU architecture and platform to be specified as part
of the make command goals. For example:
make bin/rtl8139.rom # Standard i386 PC-BIOS build
make bin-efi/rtl8139.efi # i386 EFI build
The generic syntax is "bin[-[arch-]platform]", with the default
architecture being "i386" (regardless of the host architecture) and the
default platform being "pcbios".
Non-path targets such as "srcs" can be specified using e.g.
make bin-efi srcs
Note that this changeset is merely Makefile restructuring to allow the
build architecture and platform to be determined by the make command
goals, and to export these to compiled code via the ARCH and PLATFORM
defines. It doesn't actually introduce any new build platforms.
Although the E820 API allows for a caller to provide only a 20-byte
buffer, there exists at least one combination (HP BIOS, 32-bit WinPE)
that relies on information found only in the "extended attributes"
field, which requires a 24-byte buffer.
Allow for up to a 64-byte E820 buffer, in the hope of coping with
future idiocies like this one.
The ACPI specification defines an additional 4-byte field at offset 20
for an E820 memory map entry. This field is presumably optional,
since generally E820 gets given only a 20-byte buffer to fill.
However, the bits of this optional field are defined as:
bit 0 : region is enabled
bit 1 : region is non-volatile memory rather than RAM
so it seems as though callers that pass in only a 20-byte buffer may
be missing out on some rather important information.
Our INT 15,e820 code was setting %es=%ss (as part of the "look ahead
in the memory map" logic), but failing to restore %es afterwards.
This is a serious bug, but wasn't affecting many platforms because
almost all callers seem to set %es=%ss anyway.
Some BIOSes require us to pass in not only the continuation value (in
%ebx) as returned by the previous call to INT 15,e820 but also the
unmodified buffer (at %es:%di) as returned by the previous call to INT
15,e820. Apparently, someone thought it would be a worthwhile
optimisation to fill in only the low dword of the "length" field and
the low byte of the "type field", assuming that the buffer would
remain unaltered from the previous call.
This problem was being triggered by the "peek ahead" logic in
get_mangled_e820(), which would read the next entry into a temporary
buffer in order to be able to guarantee terminating the map with
%ebx=0 rather than CF=1. (Terminating with CF=1 upsets some Windows
flavours, despite being documented legal behaviour.)
Work around this problem by always fetching directly into our e820
cache; that way we can guarantee that the underlying call always sees
the previous buffer contents (and the same buffer address).
We seem to be having issues with various E820 memory maps. These
problems are often difficult to reproduce, requiring access to the
specific system exhibiting the problem.
Add a facility for hooking in a fake E820 map generator, using an
arbitrary map defined in a C array, solely in order to be able to test
the map-mangling code against arbitrary E820 maps.
In particular, allow BANNER_TIMEOUT=0 to inhibit the prompt banners
altogether.
Ironically, this request comes from the same OEM that originally
required the prompts to be present during POST.
Some really moronic BIOSes bring up the PXE stack via the UNDI loader
entry point during POST, and then don't bother to unload it before
overwriting the code and data segments. If this happens, we really
don't want to leave INT 15 hooked, because that will cause any loaded
OS to die horribly as soon as it attempts to fetch the system memory
map.
We use a heuristic to detect whether or not we are being loaded at the
top of free base memory. If we determine that we are being loaded at
some other arbitrary location in base memory, then we assume that it's
not safe to hook INT 15.
On non-BBS systems we hook INT 19, since there is no other way we can
guarantee gaining control of the flow of execution. If we end up
doing this, prompt the user before attempting boot, since forcibly
capturing INT 19 is rather antisocial.
If the INT 15,e820 memory map reports a region [0,0), this confuses
the "truncate to even megabytes" logic, which ends up rounding the
region 'down' to [0,fff00000).
Fix by ensuring that the region's end address is at least 1, before we
subtract 1 to obtain the "last byte in region" address.
INT 15,e801 is capable of returning a memory range that extends to
4GB, so allow for this in the debug message that shows the data
returned by INT 15,e801.
Apparently some BIOSes will place option ROMs on 512-byte boundaries.
While this is against specification, it doesn't actually hurt
anything, so we may as well increase our scan granularity to 512
bytes.
Contributed by Luca <lucarx76@gmail.com>
Wyse Streaming Manager server (WLDRM13.BIN) assumes that the PXENV+
entry point is at UNDI_CS:0000; apparently, somebody at Wyse has
difficulty distinguishing between the words "may" and "must"...
Add a dummy entry point at UNDI_CS:0000, which just jumps to the
correct entry point.
The multiboot specification states that, for raw images, if
load_end_addr is zero then it should be interpreted as meaning "use
the entire file", and if bss_end_addr is zero it should be interpreted
as meaning "no bss".
Explicitly state that we are using 32-bit addressing in 16-bit code.
GNU as 2.15 (FreeBSD/amd64 7-STABLE) got confused that 32-bit registers
are used in the code that was declared as 16-bit. Add explicit modifier
'addr32' to make assembler happy.
Signed-off-by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
IBM's iSCSI Firmware Initiator checks the UNDIROMID pointer in the
!PXE structure that gets created by the UNDI loader. We didn't
previously fill this value in.
Include PMM allocation result in POST banner.
Include full product string in "starting execution" message.
Also mark ourselves as supporting DDIM in PnP header, for
completeness.
On a system that doesn't support BBS, we end up hooking INT19 to gain
control of the boot process. If the system is PCI3.0, we must take
care to use the runtime value for %cs, rather than the POST-time
value, otherwise we end up pointing INT19 to the temporary option ROM
POST scratch area.
Allow for an arbitrary number of splits of the system memory map via
INT 15,e820.
Features of the new map-mangling algorithm include:
Supports random access to e820 map entries.
Requires only sequential access support from the underlying e820
map, even if our caller uses random access.
Empty regions will always be stripped.
Always terminates with %ebx=0, even if the underlying map terminates
with CF=1.
Allows for an arbitrary number of hidden regions, with underlying
regions split into as many subregions as necessary.
Total size increase to achieve this is 193 bytes.
Define a list of N allowed memory regions, and split each underlying
e820 region into up to N subregions. Strip resulting empty regions
out of the map, avoiding using the "return with CF set to strip last
empty region" trick, because it seems that bootmgr.exe in Win2k8 gets
upset if the memory map is terminated with CF set.
This is an intermediate checkin that defines a single allowed memory
region covering the entire 64-bit address space, and uses the existing
map-mangling code on top of the new region-splitting code. This
sanitises the memory map to the point that Win2k8 is able to boot even
on a system that defines a final zero-length region at the 4GB mark.
I'm checking this in because it may be useful for future debugging
efforts to be able to run with the existing and known-working map
mangling code together with the map sanitisation capabilities of the
new map mangling code.
H. Peter Anvin <hpa@zytor.com> sent word that Sergey Vlasov
<vsu@altlinux.ru> discovered gPXE lkrn images fail to load in SYSLINUX
3.70 because we have initrd_addr_max zeroed. This patch sets the same
value as the Linux kernel.
Also change the header jmp instruction to use a hardcoded opcode value
like Linux does. Just in case the assembler decides to use a three-byte
instruction instead of the desired two-byte jmp.
Add yet another ugly hack to iscsiboot.c, this time to allow the user to
inhibit the shutdown/removal of the iSCSI INT13 device (and the network
devices, since they are required for the iSCSI device to function).
On the plus side, the fact that shutdown() now takes flags to
differentiate between shutdown-for-exit and shutdown-for-boot means that
another ugly hack (to allow returning via the PXE stack on BIOSes that
have broken INT 18 calls) will be easier.
I feel dirty.
Shifting all INT13 drive numbers causes problems on systems that use a
sparse drive number space (e.g. qemu BIOS, which uses 0xe0 for the CD-ROM
drive).
The strategy now is:
Each drive is assigned a "natural" drive number, being the next
available drive number in the system (based on the BIOS drive count).
Each drive is accessed using its specified drive number. If the
specified drive number is -1, the natural drive number will be used.
Accesses to the specified drive number will be delivered to the
emulated drive, masking out any preexisting drive using this number.
Accesses to the natural drive number, if different, will be remapped to
the masked-out drive.
The overall upshot is that, for examples:
System has no drives. Emulated INT13 drive gets natural number 0x80
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive, and there is no remapping.
System has one drive. Emulated INT13 drive gets natural number 0x81
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive. Accesses to drive 0x81 get remapped to the original drive 0x80.
We can just treat all non-kernel images as initrds, which matches our
behaviour for multiboot kernels. This allows us to eliminate initrd as
an image type, and treat the "initrd" command as just another synonym for
"imgfetch".
__from_data16 and __from_text16 now take a pointer to a
.data16/.text16 variable, and return the real-mode offset within the
appropriate segment. This matches the use case for every occurrence
of these macros, and prevents potential future bugs such as that fixed
in commit d51d80f. (The bug arose essentially because "&pointer" is
still syntactically valid.)
When the 16-bit segment registers are accessed using 32-bit instructions
the high order bytes are undefined on older CPUs. We now explicitly
zero the high order bytes when snapshotting the CPU state. This ensures
that the GDB stub reports consistent values for the segment registers.
Commit fd0aef9 introduced a typo that caused PMM detection to start at
paragraph 0xe00 rather than 0xe000. (Detection would still work, since it
would scan until it ran out of base memory, but it would end up scanning
an unnecessarily large portion of base memory.)
Spotted by Sebastian Herbszt <herbszt@gmx.de>.
Send a null command, specifically "pulse outputs" with no outputs
selected, to the KBC after changing A20. This was apparently done by DOS,
presumably as a synchronization hack, and the authors of the UHCI spec
thought it was inherent. Therefore, there are systems out there (e.g. HP
DL360 G5) which will stop responsing to "legacy USB" unless they see the
null command, 0xFF, written to port 0x64 at the end of the A20 toggling
sequence.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When the BIOS doesn't support BBS, hooking INT 19 is the only way to add
ourselves as a boot device. If we have to do this, we should at least
try to chain to the original INT 19 vector if our boot fails.
Idea suggested by Andrew Schran <aschran@google.com>
2.6.22+ kernels have an extra field in the bzimage_header structure to
indicate the maximum permitted command-line length. Use this if it is
available.
A bug in read_smbios_string() was causing the starting offset of the
SMBIOS structure to be added twice, resulting in completely the wrong
strings being returned.
Bug identified by Martin Herweg <m.herweg@gmx.de>
We never set up specific multicast filters; native drivers will ask
the card to receive all multicast packets. The only way to achieve
this via the UNDI API is to enable promiscuous mode.
Delete ELF as a generic image type. The method for invoking an
ELF-based image (as well as any tables that must be set up to allow it
to boot) will always depend on the specific architecture. core/elf.c
now only provides the elf_load() function, to avoid duplicating
functionality between ELF-based image types.
Add arch/i386/image/elfboot.c, to handle the generic case of 32-bit
x86 ELF images. We don't currently set up any multiboot tables, ELF
notes, etc. This seems to be sufficient for loading kernels generated
using both wraplinux and coreboot's mkelfImage.
Note that while Etherboot 5.4 allowed ELF images to return, we don't.
There is no callback mechanism for the loaded image to shut down gPXE,
which means that we have to shut down before invoking the image. This
means that we lose device state, protection against being trampled on,
etc. It is not safe to continue afterwards.
The GDBSYM config.h option was an attempt at QEMU GDB debugging. I have
removed the code since it is unused and may confuse people wanting to
use the GDB stub.
The ROM prefix now prompts the user to enter the gPXE shell during POST;
this allows for configuring gPXE without needing to attempt to boot from
it. (It also slows down system boot by three seconds per gPXE ROM, but
hey.)
This is apparently a certain OEM's requirement for option ROMs.
Add ability for network devices to flag link up/down state to the
networking core.
Autobooting code will now wait for link-up before attempting DHCP.
IPoIB reflects the Infiniband link state as the network device link state
(which is not strictly correct; we also need a succesful IPoIB IPv4
broadcast group join), but is probably more informative.
PXE is a catch-all image format with no signature checks. If an
unsupported image file is loaded, it will be treated as a PXE image. In
most cases, the image will be too large to be loaded as a PXE image (which
has to fit in base memory), so the error returned to the user will be that
the segment could not fit within the memory region.
Add an explicit check to pxe_image.c to reject images larger than base
memory with ENOEXEC.
Add ENOEXEC to the error string table.
Allow for settings to be described by something other than a DHCP option
tag if desirable. Currently used only for the MAC address setting.
Separate out fake DHCP packet creation code from dhcp.c to fakedhcp.c.
Remove notion of settings from dhcppkt.c.
Rationalise dhcp.c to use settings API only for final registration of the
DHCP options, rather than using {store,fetch}_setting throughout.
Add dedicated functions create_dhcpdiscover(), create_dhcpack() and
create_proxydhcpack() for use by external code such as the PXE preboot
code.
Register ProxyDHCP options under the global scope "proxydhcp".
Unregister previously-acquired DHCP and ProxyDHCP settings when DHCP
succeeds.
When PMM is used, the gPXE image source will no longer be in base memory.
Decompression of .text16 and .data16 can therefore no longer be done in
real mode.
Use BBS installation check to see if we need to hook INT19 even on a PnP
BIOS.
Verify that $PnP signature is paragraph-aligned; bochs/qemu BIOS provides
a dummy $PnP signature with no valid entry point, and deliberately
unaligns the signature to indicate that it is not properly valid.
Print message if INT19 is hooked.
Attempt to use PMM even if BBS check failed.
ROM initialisation vector now attempts to allocate a 2MB block using
PMM. If successful, it copies the ROM image to this block, then
shrinks the ROM image to allow for more option ROMs. If unsuccessful,
it leaves the ROM as-is.
ROM BEV now attempts to return to the BIOS, resorting to INT 18 only
if the BIOS stack has been corrupted.
This allows pxelinux to execute arbitrary gPXE commands. This is
remarkably unsafe (not least because some of the commands will assume
full ownership of memory and do nasty things like edit the e820 map
underneath the calling pxelinux), but it does allow access to the
"sanboot" command.
Replace a printf with a DBG in timer_rtdsc.c
Replace a printf in timer.c with assert
Return proper error codes from timer drivers
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Timer subsystem initialization code in core/timer.c
Split the BIOS and RTDSC timer drivers from i386_timer.c
Split arch/i386/firmware/pcbios/bios.c into the RTSDC
timer driver and arch/i386/core/nap.c
Split the headers properly:
include/unistd.h - delay functions to be used by the
gPXE core and drivers.
include/gpxe/timer.h - the fimer subsystem interface
to be used by the timer drivers
and currticks() to be used by
the code gPXE subsystems.
include/latch.h - removed
include/timer.h - scheduled for removal. Some driver
are using currticks, which is
only for core subsystems.
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
As written, if the if the UNDI ISR call clobbers the upper halves of
any of the GPRs (which by convention it is permitted to do, and by
paranoia should be expected to do) then nothing in the interrupt
handler will recover the state.
Additionally, save/restore %fs and %gs out of sheer paranoia - it's a
cheap enough operation, and may prevent problems due to poorly written
UNDI stacks.
Since we don't know what the UNDI code does, it is safest to
save/restore %eflags even though the lower half of %eflags is
automatically saved by the interrupt itself.
As written, if the if the UNDI ISR call clobbers the upper halves of
any of the GPRs (which by convention it is permitted to do, and by
paranoia should be expected to do) then nothing in the interrupt
handler will recover the state.
Additionally, save/restore %fs and %gs out of sheer paranoia - it's a
cheap enough operation, and may prevent problems due to poorly written
UNDI stacks.
_textdata_link_addr, _load_addr and _max_align in the linker scripts.
A bug in some versions of ld causes segfaults if the DEFINED() macro
is used in a linker script *and* the -Map option to ld is present.
We don't currently need to override any of these values; if we need to
do so in future then the solution will probably be to always specify
the values on the ld command line, and have the linker script not
define them at all.
memory map. (We achieve this by setting CF on the last entry if it is
zero-length; this avoids the need to look ahead to see at each entry
if the *next* entry would be both the last entry and zero-length).
This fixes the "0kB base memory" error message upon starting Windows
2003 on a SunFire X2100.
byte, rather than the number of permissible bytes (i.e. subtract one
from the value under the previous definition to get the value under
the new definition).
This avoids integer overflow on 64-bit kernels, where
bzhdr.initrd_addr_max may be 0xffffffffffffffff; under the old
behaviour we set mem_limit equal to initrd_addr_max+1, which meant it
ended up as zero. Kernel loads would fail with ENOBUFS.
Experimentation reveals that gcc ignores -mrtd for the implicit
arithmetic functions (e.g. __udivdi3), but not for the implicit
memcpy() and memset() functions. Mark the implicit arithmetic
functions with __attribute__((cdecl)) to compensate for this.
(Note: we cannot mark with with __cdecl, because we define __cdecl to
incorporate regparm(0) as well.)
us to round down the size for the relocation copy to the nearest 64kB
(+0x10 bytes); this just happened to work on most machines because the
last 64kB of the image is all-zeroes anyway (it's the .bss).
link-time check for section overlaps. (In order to avoid wasting
space in the executable image, .bss16 will overlap with the following
section, which is .text).
number of (potentially very slow) gateA20_set operations.
Die with a fatal error if we are unable to set gate A20; if this fails
then we are bound to experience memory corruption at a later stage,
and I'd prefer to pick it up early.
the UNDI stack.
Ignore obviously invalid length combinations (as returned by
e.g. VMWare's PXE stack).
Limit to one packet per poll to avoid memory exhaustion.
Always send EOI; do not chain to BIOS's default interrupt handler.
They are just too unpredictable; at least VMware's seems to kill the
machine if you go anywhere near it.
Disable interrupts after return from PXENV_UNDI_ISR, just in case some
dumb PXE stack enables them.
safe dropping of the netdev ref by the driver while other refs still
exist.
Add netdev_irq() method. Net device open()/close() methods should no
longer enable or disable IRQs.
Remove rx_quota; it wasn't used anywhere and added too much complexity
to implementing correct interrupt-masking behaviour in pxe_undi.c.
Use generic fields in struct device_description rather than assuming
that the struct device * is contained within a pci_device or
isapnp_device; this assumption is broken when using the undionly
driver.
Add PXENV_UNDI_SET_STATION_ADDRESS.
entirely self-hosted (which avoids problems when building the same
tree on multiple systems - e.g. when you have /home NFS-mounted).
Also saves around 50 bytes in total - not sure why.
clue what the "previous" interrupt handler will do, which could range
from "just an iret" to "disable the interrupt"), and that means that
we have to take responsibility for ACKing all interrupts. Joy.
refer to them by name from the command line, or build them into a
multiboot module list.
Use setting image->type to disambiguate between "not my image" and "bad
image"; this avoids relying on specific values of the error code.
names.
Add "dev" pointer in struct net_device to tie network interfaces back to a
hardware device.
Force natural alignment of data types in __table() macros. This seems to
prevent gcc from taking the unilateral decision to occasionally increase
their alignment (which screws up the table packing).
real_call(), rather than moving it to the RM stack and back again.
This allows the real-mode function to completely destroy the stack
contents, provided that it manages to return to real_call().