If it happens that _textdata_memsz ends up being an exact multiple of
4kB, then this will cause the .textdata section (after relocation) to
start on a page boundary. This means that the hidden memory region
(which is rounded down to the nearest page boundary) will start
exactly at virtual address 0, i.e. UNULL. This means that
init_eheap() will erroneously assume that it has failed to allocate a
an external heap, since it typically ends up choosing the area that
lies immediately below .textdata, which in this case will be the
region with top==UNULL.
A subsequent error is that memtop_urealloc() passes through the error
return status -ENOMEM to the caller, which (rightly) assumes that the
result represents a valid userptr_t address.
Fixed by using alternative tests for heap non-existence, and by
returning UNULL in case of an error from init_eheap().
There are many functions that take ownership of the I/O buffer they
are passed as a parameter. The caller should not retain a pointer to
the I/O buffer. Use iob_disown() to automatically nullify the
caller's pointer, e.g.:
xfer_deliver_iob ( xfer, iob_disown ( iobuf ) );
This will ensure that iobuf is set to NULL for any code after the call
to xfer_deliver_iob().
iob_disown() is currently used only in places where it simplifies the
code, by avoiding an extra line explicitly setting the I/O buffer
pointer to NULL. It should ideally be used with each call to any
function that takes ownership of an I/O buffer. (The SSA
optimisations will ensure that use of iob_disown() gets optimised away
in cases where the caller makes no further use of the I/O buffer
pointer anyway.)
If gcc ever introduces an __attribute__((free)), indicating that use
of a function argument after a function call should generate a
warning, then we should use this to identify all applicable function
call sites, and add iob_disown() as necessary.
The DHCP client code now implements only the mechanism of the DHCP and
PXE Boot Server protocols. Boot Server Discovery can be initiated
manually using the "pxebs" command. The menuing code is separated out
into a user-level function on a par with boot_root_path(), and is
entered in preference to a normal filename boot if the DHCP vendor
class is "PXEClient" and the PXE boot menu option exists.
pxe_tftp.c assumes that the first seek on its data-transfer interface
represents the block size. Apart from being an ugly hack, this will
also screw up file size calculation for files smaller than one block.
The proper solution would be to extend the data-transfer interface to
support the reporting of stat()-like data. This is not going to
happen until the cost of adding interface methods is reduced (a fix I
have planned since June 2008).
In the meantime, abuse the xfer_window() method to return the block
size, since it is not being used for anything else and is vaguely
justifiable.
Astonishingly, having returned the incorrect TFTP blocksize via
PXENV_TFTP_OPEN for almost a year seems not to have affected any of
the test cases run during that time; this bug was found only when
someone tried running the heavily-patched version of pxegrub found in
OpenSolaris.
elf2efi converts a suitable ELF executable (containing relocation
information, and with appropriate virtual addresses) into an EFI
executable. It is less tightly coupled with the gPXE build process
and, in particular, does not require the use of a hand-crafted PE
image header in efiprefix.S.
elf2efi correctly handles .bss sections, which significantly reduces
the size of the gPXE EFI executable.
The check for unresolved symbols does not explicitly specify an output
architecture format, and so causes a warning when building an i386 EFI
binary on an x86_64 platform. This warning is harmless, and
specifying the output architecture in multiple places is cumbersome,
so just inhibit the warning.
At POST time some BIOSes return invalid e820 maps even though
they indicate that the data is valid. We add a check that the first
region returned by e820 is RAM type and declare the map to be invalid
if it is not.
This extends the sanity checks from 8b20e5d ("[pcbios] Sanity-check
the INT15,e820 and INT15,e801 memory maps").
Currently the only supported platform for x86_64 is EFI.
Building an EFI64 gPXE requires a version of gcc that supports
__attribute__((ms_abi)). This currently means a development build of
gcc; the feature should be present when gcc 4.4 is released.
In the meantime; you can grab a suitable gcc tree from
git://git.etherboot.org/scm/people/mcb30/gcc/.git
EFI provides a copy of the SMBIOS table accessible via the EFI system
table, which we should use instead of manually scanning through the
F000:0000 segment.
On non-BBS systems, we have to hook INT 19 in order to be able to boot
from the gPXE ROM at all. However, doing this unconditionally will
prevent the user from booting via any other devices.
Previously, the INT 19 entry point would prompt the user to press B in
order to boot from gPXE, which makes it impossible to perform an
unattended network boot. We now prompt the user to press N to skip
booting from gPXE, which allows for unattended operation.
This should be a better match for most real-world scenarios. Most
modern systems support BBS and so are unaffected by this change. Very
old (non-BBS) systems tend not to have PXE ROMs by default anyway; if
the user has added a gPXE ROM then they probably do want to boot from
the network. Newer non-BBS systems are essentially limited to IBM
servers, which will recapture the INT 19 vector anyway and implement
their own boot-ordering selection mechanism.
Remove the assortment of miscellaneous hacks to guess the "network
boot device", and replace them each with a call to last_opened_netdev().
It still isn't guaranteed correct, but it won't be any worse than
before, and it will at least be consistent.
This brings us in to line with Linux definitions, and also simplifies
adding x86_64 support since both platforms have 2-byte shorts, 4-byte
ints and 8-byte long longs.
Code paths that automatically allocate memory from the FBMS at 40:13
should also free it, if possible.
Freeing this memory will not be possible if either
1. The FBMS has been modified since our allocation, or
2. We have not been able to unhook one or more BIOS interrupt vectors.
_filesz was incorrectly forced to be aligned up to MAX_ALIGN. In a
non-compressed build, this would cause a build failure unless _filesz
happened to already be aligned to MAX_ALIGN.
The only way that PMM allows us to request a block in a region with
A20=0 is to ask for a block with an alignment of 2MB. Due to the PMM
API design, the only way we can do this is to ask for a block with a
size of 2MB.
Unfortunately, some BIOSes will hit problems if we allocate a 2MB
block. In particular, it may not be possible to enter the BIOS setup
screen; the BIOS setup code attempts a PMM allocation, fails, and
hangs the machine.
We now try allocating only as much as we need via PMM. If the
allocated block has A20=1, we free the allocated block, double the
allocation size, and try again. Repeat until either we obtain a block
with A20=0 or allocation fails. (This is guaranteed to terminate by
the time we reach an allocation size of 2MB.)
With a 16-bit operand, lgdt/lidt will load only a 24-bit base address,
ignoring the high-order bits. This meant that we could fail to fully
restore the GDT across a call into gPXE, if the GDT happened to be
located above the 16MB mark.
Not all of our lgdt/lidt instructions require a data32 prefix (for
example, reloading the real-mode IDT can never require a 32-bit base
address), but by adding them everywhere we will hopefully not forget
the necessary ones in future.
Some hardware vendors have been known to remove all gPXE-related
branding from ROMs that they build. While this is not prohibited by
the GPL, it is a little impolite.
Add a facility for adding branding messages via two #defines
(PRODUCT_NAME and PRODUCT_SHORT_NAME) in config/general.h. This
should accommodate all known OEM-mandated branding requirements.
Vendors with branding requirements that cannot be satisfied by using
PRODUCT_NAME and/or PRODUCT_SHORT_NAME should contact us so that we
can extended this facility as necessary.
This function is a major kludge, but can be made slightly more
accurate by ignoring net devices that aren't open. Eventually it
needs to be removed entirely.
Settings can be constructed using a dotted-decimal notation, to allow
for access to unnamed settings. The default interpretation is as a
DHCP option number (with encapsulated options represented as
"<encapsulating option>.<encapsulated option>".
In several contexts (e.g. SMBIOS, Phantom CLP), it is useful to
interpret the dotted-decimal notation as referring to non-DHCP
options. In this case, it becomes necessary for these contexts to
ignore standard DHCP options, otherwise we end up trying to, for
example, retrieve the boot filename from SMBIOS.
Allow settings blocks to specify a "tag magic". When dotted-decimal
notation is used to construct a setting, the tag magic value of the
originating settings block will be ORed in to the tag number.
Store/fetch methods can then check for the magic number before
interpreting arbitrarily-numbered settings.
This extends the sanity checks on the runtime segment address provided
in %bx, first implemented in commit 5600955.
We now allow the ROM to be placed anywhere above a000:0000 (rather
than c000:0000, as before), since this is the region allowed by the
PCI 3 spec. If the BIOS asks us to place the runtime image such that
it would overlap with the init-time image (which is explicitly
prohibited by the PCI 3 spec), then we assume that the BIOS is faulty
and ignore the provided runtime segment address.
Testing on a SuperMicro BIOS providing overlapping segment addresses
shows that ignoring the provided runtime segment address is safe to do
in these circumstances.
Someone at Dell must have a full-time job designing ways to screw up
implementations of INT 15,e820. This latest gem is courtesy of a Dell
Xanadu system, which arbitrarily decides to obliterate the contents of
%esi.
Preserve %esi, %edi and %ebp across calls to INT 15,e820, in case
someone tries a variation on this trick in future.
FreeBSD requires the object format to be specified as elf_i386_fbsd,
rather than elf_i386.
Based on a patch from Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Some PCI 3 BIOSes seem to provide a garbage value in %bx, which should
contain the runtime segment address. Perform a basic sanity check: we
reject the segment if it is below the start of option ROM space. If
the sanity check fails, we assume that the BIOS was not expecting us
to be a PCI 3 ROM, and we just leave our image in situ.
The section name seems to have significance for some versions of
binutils.
There is no way to instruct gcc that sections such as .bss16 contain
uninitialised data; it will emit them with contents explicitly set to
zero. We therefore have to rely on the linker script to force these
sections to become uninitialised-data sections. We do this by marking
them as NOLOAD; this seems to be the closest semantic equivalent in the
linker script language.
However, this gets ignored by some versions of ld (including 2.17 as
shipped with Debian Etch), which mark the resulting sections with
(CONTENTS,ALLOC,LOAD,DATA). Combined with the fact that this version of
ld seems to ignore the specified LMA for these sections, this means that
they end up overlapping other sections, and so parts of .prefix (for
example) get obliterated by .data16's bss section.
Rename the .bss sections from .section_bss to .bss.section; this seems to
cause these versions of ld to treat them as uninitialised data.
Not fully understood, but it seems that the LMA of bss sections matters
for some newer binutils builds. Force all bss sections to have an LMA
at the end of the file, so that they don't interfere with other
sections.
The symptom was that objcopy -O binary -j .zinfo would extract the
.zinfo section from bin/xxx.tmp as a blob of the correct length, but
with zero contents. This would then cause the [ZBIN] stage of the
build to fail.
Also explicitly state that .zinfo(.*) sections have @progbits, in case
some future assembler or linker variant decides to omit them.
Some versions of ld choke on the "AT ( _xxx_lma )" in efi.lds with an
error saying "nonconstant expression for load base". Since these were
only explicitly setting the LMA to the address that it would have had
anyway, they can be safely omitted.
We have EFI APIs for CPU I/O, PCI I/O, timers, console I/O, user
access and user memory allocation.
EFI executables are created using the vanilla GNU toolchain, with the
EXE header handcrafted in assembly and relocations generated by a
custom efilink utility.
The userptr_t is now the fundamental type that gets used for conversions.
For example, virt_to_phys() is implemented in terms of virt_to_user() and
user_to_phys().
Reduce the number of sections within the linker script to match the
number of practical sections within the output file.
Define _section, _msection, _esection, _section_filesz, _section_memsz,
and _section_lma for each section, replacing the mixture of symbols that
previously existed.
In particular, replace _text and _end with _textdata and _etextdata, to
make it explicit within code that uses these symbols that the .text and
.data sections are always treated as a single contiguous block.
Allow for the build CPU architecture and platform to be specified as part
of the make command goals. For example:
make bin/rtl8139.rom # Standard i386 PC-BIOS build
make bin-efi/rtl8139.efi # i386 EFI build
The generic syntax is "bin[-[arch-]platform]", with the default
architecture being "i386" (regardless of the host architecture) and the
default platform being "pcbios".
Non-path targets such as "srcs" can be specified using e.g.
make bin-efi srcs
Note that this changeset is merely Makefile restructuring to allow the
build architecture and platform to be determined by the make command
goals, and to export these to compiled code via the ARCH and PLATFORM
defines. It doesn't actually introduce any new build platforms.
Although the E820 API allows for a caller to provide only a 20-byte
buffer, there exists at least one combination (HP BIOS, 32-bit WinPE)
that relies on information found only in the "extended attributes"
field, which requires a 24-byte buffer.
Allow for up to a 64-byte E820 buffer, in the hope of coping with
future idiocies like this one.
The ACPI specification defines an additional 4-byte field at offset 20
for an E820 memory map entry. This field is presumably optional,
since generally E820 gets given only a 20-byte buffer to fill.
However, the bits of this optional field are defined as:
bit 0 : region is enabled
bit 1 : region is non-volatile memory rather than RAM
so it seems as though callers that pass in only a 20-byte buffer may
be missing out on some rather important information.
Our INT 15,e820 code was setting %es=%ss (as part of the "look ahead
in the memory map" logic), but failing to restore %es afterwards.
This is a serious bug, but wasn't affecting many platforms because
almost all callers seem to set %es=%ss anyway.
Some BIOSes require us to pass in not only the continuation value (in
%ebx) as returned by the previous call to INT 15,e820 but also the
unmodified buffer (at %es:%di) as returned by the previous call to INT
15,e820. Apparently, someone thought it would be a worthwhile
optimisation to fill in only the low dword of the "length" field and
the low byte of the "type field", assuming that the buffer would
remain unaltered from the previous call.
This problem was being triggered by the "peek ahead" logic in
get_mangled_e820(), which would read the next entry into a temporary
buffer in order to be able to guarantee terminating the map with
%ebx=0 rather than CF=1. (Terminating with CF=1 upsets some Windows
flavours, despite being documented legal behaviour.)
Work around this problem by always fetching directly into our e820
cache; that way we can guarantee that the underlying call always sees
the previous buffer contents (and the same buffer address).
We seem to be having issues with various E820 memory maps. These
problems are often difficult to reproduce, requiring access to the
specific system exhibiting the problem.
Add a facility for hooking in a fake E820 map generator, using an
arbitrary map defined in a C array, solely in order to be able to test
the map-mangling code against arbitrary E820 maps.
In particular, allow BANNER_TIMEOUT=0 to inhibit the prompt banners
altogether.
Ironically, this request comes from the same OEM that originally
required the prompts to be present during POST.
Some really moronic BIOSes bring up the PXE stack via the UNDI loader
entry point during POST, and then don't bother to unload it before
overwriting the code and data segments. If this happens, we really
don't want to leave INT 15 hooked, because that will cause any loaded
OS to die horribly as soon as it attempts to fetch the system memory
map.
We use a heuristic to detect whether or not we are being loaded at the
top of free base memory. If we determine that we are being loaded at
some other arbitrary location in base memory, then we assume that it's
not safe to hook INT 15.
On non-BBS systems we hook INT 19, since there is no other way we can
guarantee gaining control of the flow of execution. If we end up
doing this, prompt the user before attempting boot, since forcibly
capturing INT 19 is rather antisocial.
If the INT 15,e820 memory map reports a region [0,0), this confuses
the "truncate to even megabytes" logic, which ends up rounding the
region 'down' to [0,fff00000).
Fix by ensuring that the region's end address is at least 1, before we
subtract 1 to obtain the "last byte in region" address.
INT 15,e801 is capable of returning a memory range that extends to
4GB, so allow for this in the debug message that shows the data
returned by INT 15,e801.
Apparently some BIOSes will place option ROMs on 512-byte boundaries.
While this is against specification, it doesn't actually hurt
anything, so we may as well increase our scan granularity to 512
bytes.
Contributed by Luca <lucarx76@gmail.com>
Wyse Streaming Manager server (WLDRM13.BIN) assumes that the PXENV+
entry point is at UNDI_CS:0000; apparently, somebody at Wyse has
difficulty distinguishing between the words "may" and "must"...
Add a dummy entry point at UNDI_CS:0000, which just jumps to the
correct entry point.
The multiboot specification states that, for raw images, if
load_end_addr is zero then it should be interpreted as meaning "use
the entire file", and if bss_end_addr is zero it should be interpreted
as meaning "no bss".
Explicitly state that we are using 32-bit addressing in 16-bit code.
GNU as 2.15 (FreeBSD/amd64 7-STABLE) got confused that 32-bit registers
are used in the code that was declared as 16-bit. Add explicit modifier
'addr32' to make assembler happy.
Signed-off-by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
IBM's iSCSI Firmware Initiator checks the UNDIROMID pointer in the
!PXE structure that gets created by the UNDI loader. We didn't
previously fill this value in.
Include PMM allocation result in POST banner.
Include full product string in "starting execution" message.
Also mark ourselves as supporting DDIM in PnP header, for
completeness.
On a system that doesn't support BBS, we end up hooking INT19 to gain
control of the boot process. If the system is PCI3.0, we must take
care to use the runtime value for %cs, rather than the POST-time
value, otherwise we end up pointing INT19 to the temporary option ROM
POST scratch area.
Allow for an arbitrary number of splits of the system memory map via
INT 15,e820.
Features of the new map-mangling algorithm include:
Supports random access to e820 map entries.
Requires only sequential access support from the underlying e820
map, even if our caller uses random access.
Empty regions will always be stripped.
Always terminates with %ebx=0, even if the underlying map terminates
with CF=1.
Allows for an arbitrary number of hidden regions, with underlying
regions split into as many subregions as necessary.
Total size increase to achieve this is 193 bytes.
Define a list of N allowed memory regions, and split each underlying
e820 region into up to N subregions. Strip resulting empty regions
out of the map, avoiding using the "return with CF set to strip last
empty region" trick, because it seems that bootmgr.exe in Win2k8 gets
upset if the memory map is terminated with CF set.
This is an intermediate checkin that defines a single allowed memory
region covering the entire 64-bit address space, and uses the existing
map-mangling code on top of the new region-splitting code. This
sanitises the memory map to the point that Win2k8 is able to boot even
on a system that defines a final zero-length region at the 4GB mark.
I'm checking this in because it may be useful for future debugging
efforts to be able to run with the existing and known-working map
mangling code together with the map sanitisation capabilities of the
new map mangling code.
H. Peter Anvin <hpa@zytor.com> sent word that Sergey Vlasov
<vsu@altlinux.ru> discovered gPXE lkrn images fail to load in SYSLINUX
3.70 because we have initrd_addr_max zeroed. This patch sets the same
value as the Linux kernel.
Also change the header jmp instruction to use a hardcoded opcode value
like Linux does. Just in case the assembler decides to use a three-byte
instruction instead of the desired two-byte jmp.
Add yet another ugly hack to iscsiboot.c, this time to allow the user to
inhibit the shutdown/removal of the iSCSI INT13 device (and the network
devices, since they are required for the iSCSI device to function).
On the plus side, the fact that shutdown() now takes flags to
differentiate between shutdown-for-exit and shutdown-for-boot means that
another ugly hack (to allow returning via the PXE stack on BIOSes that
have broken INT 18 calls) will be easier.
I feel dirty.
Shifting all INT13 drive numbers causes problems on systems that use a
sparse drive number space (e.g. qemu BIOS, which uses 0xe0 for the CD-ROM
drive).
The strategy now is:
Each drive is assigned a "natural" drive number, being the next
available drive number in the system (based on the BIOS drive count).
Each drive is accessed using its specified drive number. If the
specified drive number is -1, the natural drive number will be used.
Accesses to the specified drive number will be delivered to the
emulated drive, masking out any preexisting drive using this number.
Accesses to the natural drive number, if different, will be remapped to
the masked-out drive.
The overall upshot is that, for examples:
System has no drives. Emulated INT13 drive gets natural number 0x80
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive, and there is no remapping.
System has one drive. Emulated INT13 drive gets natural number 0x81
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive. Accesses to drive 0x81 get remapped to the original drive 0x80.
We can just treat all non-kernel images as initrds, which matches our
behaviour for multiboot kernels. This allows us to eliminate initrd as
an image type, and treat the "initrd" command as just another synonym for
"imgfetch".
__from_data16 and __from_text16 now take a pointer to a
.data16/.text16 variable, and return the real-mode offset within the
appropriate segment. This matches the use case for every occurrence
of these macros, and prevents potential future bugs such as that fixed
in commit d51d80f. (The bug arose essentially because "&pointer" is
still syntactically valid.)
When the 16-bit segment registers are accessed using 32-bit instructions
the high order bytes are undefined on older CPUs. We now explicitly
zero the high order bytes when snapshotting the CPU state. This ensures
that the GDB stub reports consistent values for the segment registers.
Commit fd0aef9 introduced a typo that caused PMM detection to start at
paragraph 0xe00 rather than 0xe000. (Detection would still work, since it
would scan until it ran out of base memory, but it would end up scanning
an unnecessarily large portion of base memory.)
Spotted by Sebastian Herbszt <herbszt@gmx.de>.
Send a null command, specifically "pulse outputs" with no outputs
selected, to the KBC after changing A20. This was apparently done by DOS,
presumably as a synchronization hack, and the authors of the UHCI spec
thought it was inherent. Therefore, there are systems out there (e.g. HP
DL360 G5) which will stop responsing to "legacy USB" unless they see the
null command, 0xFF, written to port 0x64 at the end of the A20 toggling
sequence.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When the BIOS doesn't support BBS, hooking INT 19 is the only way to add
ourselves as a boot device. If we have to do this, we should at least
try to chain to the original INT 19 vector if our boot fails.
Idea suggested by Andrew Schran <aschran@google.com>
2.6.22+ kernels have an extra field in the bzimage_header structure to
indicate the maximum permitted command-line length. Use this if it is
available.
A bug in read_smbios_string() was causing the starting offset of the
SMBIOS structure to be added twice, resulting in completely the wrong
strings being returned.
Bug identified by Martin Herweg <m.herweg@gmx.de>
We never set up specific multicast filters; native drivers will ask
the card to receive all multicast packets. The only way to achieve
this via the UNDI API is to enable promiscuous mode.
Delete ELF as a generic image type. The method for invoking an
ELF-based image (as well as any tables that must be set up to allow it
to boot) will always depend on the specific architecture. core/elf.c
now only provides the elf_load() function, to avoid duplicating
functionality between ELF-based image types.
Add arch/i386/image/elfboot.c, to handle the generic case of 32-bit
x86 ELF images. We don't currently set up any multiboot tables, ELF
notes, etc. This seems to be sufficient for loading kernels generated
using both wraplinux and coreboot's mkelfImage.
Note that while Etherboot 5.4 allowed ELF images to return, we don't.
There is no callback mechanism for the loaded image to shut down gPXE,
which means that we have to shut down before invoking the image. This
means that we lose device state, protection against being trampled on,
etc. It is not safe to continue afterwards.
The GDBSYM config.h option was an attempt at QEMU GDB debugging. I have
removed the code since it is unused and may confuse people wanting to
use the GDB stub.
The ROM prefix now prompts the user to enter the gPXE shell during POST;
this allows for configuring gPXE without needing to attempt to boot from
it. (It also slows down system boot by three seconds per gPXE ROM, but
hey.)
This is apparently a certain OEM's requirement for option ROMs.
Add ability for network devices to flag link up/down state to the
networking core.
Autobooting code will now wait for link-up before attempting DHCP.
IPoIB reflects the Infiniband link state as the network device link state
(which is not strictly correct; we also need a succesful IPoIB IPv4
broadcast group join), but is probably more informative.
PXE is a catch-all image format with no signature checks. If an
unsupported image file is loaded, it will be treated as a PXE image. In
most cases, the image will be too large to be loaded as a PXE image (which
has to fit in base memory), so the error returned to the user will be that
the segment could not fit within the memory region.
Add an explicit check to pxe_image.c to reject images larger than base
memory with ENOEXEC.
Add ENOEXEC to the error string table.
Allow for settings to be described by something other than a DHCP option
tag if desirable. Currently used only for the MAC address setting.
Separate out fake DHCP packet creation code from dhcp.c to fakedhcp.c.
Remove notion of settings from dhcppkt.c.
Rationalise dhcp.c to use settings API only for final registration of the
DHCP options, rather than using {store,fetch}_setting throughout.
Add dedicated functions create_dhcpdiscover(), create_dhcpack() and
create_proxydhcpack() for use by external code such as the PXE preboot
code.
Register ProxyDHCP options under the global scope "proxydhcp".
Unregister previously-acquired DHCP and ProxyDHCP settings when DHCP
succeeds.
When PMM is used, the gPXE image source will no longer be in base memory.
Decompression of .text16 and .data16 can therefore no longer be done in
real mode.
Use BBS installation check to see if we need to hook INT19 even on a PnP
BIOS.
Verify that $PnP signature is paragraph-aligned; bochs/qemu BIOS provides
a dummy $PnP signature with no valid entry point, and deliberately
unaligns the signature to indicate that it is not properly valid.
Print message if INT19 is hooked.
Attempt to use PMM even if BBS check failed.
ROM initialisation vector now attempts to allocate a 2MB block using
PMM. If successful, it copies the ROM image to this block, then
shrinks the ROM image to allow for more option ROMs. If unsuccessful,
it leaves the ROM as-is.
ROM BEV now attempts to return to the BIOS, resorting to INT 18 only
if the BIOS stack has been corrupted.
This allows pxelinux to execute arbitrary gPXE commands. This is
remarkably unsafe (not least because some of the commands will assume
full ownership of memory and do nasty things like edit the e820 map
underneath the calling pxelinux), but it does allow access to the
"sanboot" command.
Replace a printf with a DBG in timer_rtdsc.c
Replace a printf in timer.c with assert
Return proper error codes from timer drivers
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Timer subsystem initialization code in core/timer.c
Split the BIOS and RTDSC timer drivers from i386_timer.c
Split arch/i386/firmware/pcbios/bios.c into the RTSDC
timer driver and arch/i386/core/nap.c
Split the headers properly:
include/unistd.h - delay functions to be used by the
gPXE core and drivers.
include/gpxe/timer.h - the fimer subsystem interface
to be used by the timer drivers
and currticks() to be used by
the code gPXE subsystems.
include/latch.h - removed
include/timer.h - scheduled for removal. Some driver
are using currticks, which is
only for core subsystems.
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
As written, if the if the UNDI ISR call clobbers the upper halves of
any of the GPRs (which by convention it is permitted to do, and by
paranoia should be expected to do) then nothing in the interrupt
handler will recover the state.
Additionally, save/restore %fs and %gs out of sheer paranoia - it's a
cheap enough operation, and may prevent problems due to poorly written
UNDI stacks.
Since we don't know what the UNDI code does, it is safest to
save/restore %eflags even though the lower half of %eflags is
automatically saved by the interrupt itself.
As written, if the if the UNDI ISR call clobbers the upper halves of
any of the GPRs (which by convention it is permitted to do, and by
paranoia should be expected to do) then nothing in the interrupt
handler will recover the state.
Additionally, save/restore %fs and %gs out of sheer paranoia - it's a
cheap enough operation, and may prevent problems due to poorly written
UNDI stacks.
_textdata_link_addr, _load_addr and _max_align in the linker scripts.
A bug in some versions of ld causes segfaults if the DEFINED() macro
is used in a linker script *and* the -Map option to ld is present.
We don't currently need to override any of these values; if we need to
do so in future then the solution will probably be to always specify
the values on the ld command line, and have the linker script not
define them at all.
memory map. (We achieve this by setting CF on the last entry if it is
zero-length; this avoids the need to look ahead to see at each entry
if the *next* entry would be both the last entry and zero-length).
This fixes the "0kB base memory" error message upon starting Windows
2003 on a SunFire X2100.
byte, rather than the number of permissible bytes (i.e. subtract one
from the value under the previous definition to get the value under
the new definition).
This avoids integer overflow on 64-bit kernels, where
bzhdr.initrd_addr_max may be 0xffffffffffffffff; under the old
behaviour we set mem_limit equal to initrd_addr_max+1, which meant it
ended up as zero. Kernel loads would fail with ENOBUFS.
Experimentation reveals that gcc ignores -mrtd for the implicit
arithmetic functions (e.g. __udivdi3), but not for the implicit
memcpy() and memset() functions. Mark the implicit arithmetic
functions with __attribute__((cdecl)) to compensate for this.
(Note: we cannot mark with with __cdecl, because we define __cdecl to
incorporate regparm(0) as well.)
us to round down the size for the relocation copy to the nearest 64kB
(+0x10 bytes); this just happened to work on most machines because the
last 64kB of the image is all-zeroes anyway (it's the .bss).
link-time check for section overlaps. (In order to avoid wasting
space in the executable image, .bss16 will overlap with the following
section, which is .text).
number of (potentially very slow) gateA20_set operations.
Die with a fatal error if we are unable to set gate A20; if this fails
then we are bound to experience memory corruption at a later stage,
and I'd prefer to pick it up early.
the UNDI stack.
Ignore obviously invalid length combinations (as returned by
e.g. VMWare's PXE stack).
Limit to one packet per poll to avoid memory exhaustion.
Always send EOI; do not chain to BIOS's default interrupt handler.
They are just too unpredictable; at least VMware's seems to kill the
machine if you go anywhere near it.
Disable interrupts after return from PXENV_UNDI_ISR, just in case some
dumb PXE stack enables them.
safe dropping of the netdev ref by the driver while other refs still
exist.
Add netdev_irq() method. Net device open()/close() methods should no
longer enable or disable IRQs.
Remove rx_quota; it wasn't used anywhere and added too much complexity
to implementing correct interrupt-masking behaviour in pxe_undi.c.
Use generic fields in struct device_description rather than assuming
that the struct device * is contained within a pci_device or
isapnp_device; this assumption is broken when using the undionly
driver.
Add PXENV_UNDI_SET_STATION_ADDRESS.
entirely self-hosted (which avoids problems when building the same
tree on multiple systems - e.g. when you have /home NFS-mounted).
Also saves around 50 bytes in total - not sure why.
clue what the "previous" interrupt handler will do, which could range
from "just an iret" to "disable the interrupt"), and that means that
we have to take responsibility for ACKing all interrupts. Joy.
refer to them by name from the command line, or build them into a
multiboot module list.
Use setting image->type to disambiguate between "not my image" and "bad
image"; this avoids relying on specific values of the error code.
names.
Add "dev" pointer in struct net_device to tie network interfaces back to a
hardware device.
Force natural alignment of data types in __table() macros. This seems to
prevent gcc from taking the unilateral decision to occasionally increase
their alignment (which screws up the table packing).
real_call(), rather than moving it to the RM stack and back again.
This allows the real-mode function to completely destroy the stack
contents, provided that it manages to return to real_call().
UNDI ROM code etc. when you just want a "undi.kpxe"-type image).
This driver cannot be used in conjunction with any other driver (it will
crash), or in any other format than .kpxe (it just won't find any network
devices).
memory (unless we get an error while stopping the base code). Leave UNDI
resident (though stopped) for .kpxe.
Still need to add code to record the device identification parameters
prior to stopping UNDI.
function prefix "undinet_" and the variable name "undinic" in undinet.c,
so that we can reserve the variable name "undi" for a struct undi_device.
The idea is that we preserve the Etherboot 5.4 convention that the "UNDI"
code refers to our using an underlying UNDI stack, while the "PXE" code
refers to our providing a PXE API.
UNDI_GET_INFORMATION calls into drivers/net/undi.c. undi_probe() now
gets given a pxe_device representing a PXE stack that has been loaded
into memory but not initialised in any way.
causing the serial console to ignore input, because it happened to end up
linked with serial_ischar() at address 0, which core/console.c decided was
invalid).
technically be necessary, because the "enable A20" command requires
only that the keyboard controller is ready to accept input (i.e. that
its input buffer is empty), and shouldn't also require that the
keyboard is ready to send output (i.e. that its output buffer is also
empty). See http://www.smsc.com/main/tools/io-bios/42i.pdf section
3.1 ("Command Invocation") for a justification.
gateA20_set() is called on every real-mode transition (in case some
idiot piece of external code such as Intel's PXE stack decided it
would be fun to re-disable A20), so draining the keyboard buffer means
that we end up losing keypresses on some systems. In particular, this
makes typing at the command line almost impossible, and causes
Etherboot to ignore Ctrl-Alt-Del.
We should really implement a gateA20_test() function to verify that
gate A20 has been correctly enabled, and think about adding other
commonly-used methods such as Fast Gate A20.
defined in vsprintf.h. (This may change, since vsprintf.h is a
non-standard name, but for now it's the one to use.)
There should be no need to include vsprintf.h just for DBG() statements,
since include/compiler.h forces it in for a debug build anyway.
skip past an empty region, otherwise we end up generating an infinitely
long e820 map. (Yes, there *are* real systems that provide e820 maps
with a zero-length region at the end...)
Updated PXE API dispatcher to use copy_{to,from}_user, and moved to
arch/i386 since the implementation is quite architecture-dependent.
(The individual PXE API calls can be largely
architecture-independent.)
Allow our functions to return a non-zero, non-error status (since the
INT 13 Extensions Check has to return the API version in the register
that is otherwise always used for the error code).
Report a non-zero API version from the INT 13 Extensions Check; GRUB
now uses extended reads.
Change semantics; relocate() now just finds a suitable location; it
doesn't actually perform the relocation itself. Code in libprefix does
the copy in flat real mode.
the only one we actually use). This allows REAL_EXEC fragments to
contain proper references to constraints (e.g. "%w0"), rather than having
to force the use of specific registers.
Note that the "num_constraints" parameter is now completely obsolete, and
that we can probably reduce the syntax to something like
__asm__ __volatile__ ( REAL_CODE ( "asm statements" )
: output constraints
: input constraints
: clobber );
which would look much more natural, and avoid the need to always specify
a clobber list.
Add userptr_t to libkir.h, to allow it to at least compile.
We now split e820 regions around ourselves, rather than just
truncating the e820 region. This avoids the worst-case scenario of
losing all memory over 4GB.
It's more important to get the memory map right now that we're
expecting to still be loaded when the OS starts in several situations
(e.g. Linux with UNDI driver, any OS with iSCSI/AoE boot, etc.).
Tidied up debug messages; the log now contains one line per INT 13
operation, looking like
INT 13,08 (80): Get drive parameters
INT 13,02 (80): Read: C/H/S 0/47/14 = LBA 0xb9e <-> 1084:0000 (count 106)
the kernel), which encapsulates the information needed to refer to an
external buffer. Under normal operation, this can just be a void *
equivalent, but under -DKEEP_IT_REAL it would be a segoff_t equivalent.
Use this concept to avoid the need for bounce buffers in int13.c,
which reduces memory usage and opens up the possibility of using
multi-sector reads.
Extend the block-device API and the SCSI block device implementation
to support multi-sector reads.
Update iscsi.c to use user buffers.
Move the obsolete portions of realmode.h to old_realmode.h.
MS-DOS now boots an order of magnitude faster over iSCSI (~10 seconds
from power-up to C:> prompt in bochs).
typical build will now include 880 bytes of PCI support code, compared to
2327 bytes in Etherboot 5.4.
(There is a slight cost of around 5 extra bytes per access to a
non-constant config space address; this should be an overall win.
Driver-specific accesses will usually be to constant addresses, for
which there is no additional cost.)
Generic PCI code now handles 64-bit BARs correctly when setting
"membase"; drivers should need to call pci_bar_start() only if they want
to use BARs other than the first memory or I/O BAR.
Split rarely-used PCI functions out into pciextra.c.
Core PCI code is now 662 bytes (down from 1308 bytes in Etherboot 5.4).
284 bytes of this saving comes from the pci/pciextra split.
Cosmetic changes to lots of drivers (e.g. vendor_id->vendor in order to
match the names used in Linux).
Use .text16.data section with "aw" attributes, to avoid section type
conflicts when placing both code and data into .text16.
Add __from_{text16,data16}.
between the low half stored in the static variable rm_sp, and the high
half stored on the prot_call() stack, because:
Just using the stack would screw up when a prot_call()ed routine
executes a real_call(); it would have no way to find the current top of
the RM stack.
Extending rm_sp to rm_esp would not be safe, because the guarantee that
rm_sp must return to the correct value by the time an external
real-mode call returns applies only to %sp, not to %esp.
from protected-mode code.
Set up %ds to point to .data16 in prot_to_real, so that code specified
via REAL_EXEC() and friends can access variables in .data16.
Move most real-mode librm variables from .text16 to .data16.
I want to get to the point where any header in include/ reflects a
standard user-level header (e.g. a POSIX header), while everything that's
specific to gPXE lives in include/gpxe/. Headers that reflect a Linux
header (e.g. if_ether.h) should also be in include/gpxe/, with the same
name as the Linux header and, preferably, the same names used for the
definitions.