Commit Graph

1323 Commits (a80124456ee9ade7a63b69f167c3b1348c8d93c0)

Author SHA1 Message Date
Petr Borsodi ba0d5aa993 [pci] Correct invalid base-class/sub-class/prog-if order in PCIR
PCI Configuration Space contains fields prog-if at the offset 0x09,
sub-class at the offset 0x0a and base-class at the offset 0x0b (it
respects little endian).  PCIR structure uses these fields in the same
order.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2019-01-15 13:08:44 +00:00
Michael Brown d6f02c72c9 [undi] Include subsystem IDs in broken interrupt device check
Allow the subsystem IDs to be used when checking for PXE stacks with
broken interrupt support.

Suggested-by: Levi Hsieh <Levi.Hsieh@dell.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-04-18 16:57:05 +01:00
Michael Brown bc85368cdd [librm] Ensure that inline code symbols are unique
Commit 6149e0a ("[librm] Provide symbols for inline code placed into
other sections") may cause build failures due to duplicate label names
if the compiler chooses to duplicate inline assembly code.

Fix by using the "%=" special format string to include a
guaranteed-unique number within the label name.

The "%=" will be expanded only if constraints exist for the inline
assembly.  This fix therefore requires that all REAL_CODE() fragments
use a (possibly empty) constraint list.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-03-21 17:00:53 +02:00
Michael Brown 6149e0af3c [librm] Provide symbols for inline code placed into other sections
Provide symbols constructed from the object name and line number for
code fragments placed into alternative sections, such as inline
REAL_CODE() assembly placed into .text16.  This simplifies the
debugging task of finding the source code corresponding to a given
instruction pointer.

Note that we cannot use __FUNCTION__ since it is not a preprocessor
macro and so cannot be concatenated with string literals.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-03-21 14:44:04 +02:00
Michael Brown 0600ffeb30 [undi] Treat invalid IRQ numbers as non-fatal errors
If the underlying PXE stack reports an invalid IRQ number (above
IRQ_MAX), treat this as equivalent to an empty IRQ number and fall
back to using polling mode.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-03-21 10:28:05 +02:00
Michael Brown 1df3b53051 [build] Prevent use of MMX and SSE registers
The existence of MMX and SSE is required by the System V x86_64 ABI
and so is assumed by gcc, but these registers are not preserved by our
own interrupt handlers and are unlikely to be preserved by other
context switch handlers in a boot firmware environment.

Explicitly prevent gcc from using MMX or SSE registers to avoid
potential problems due to silent register corruption.

We must remove the %xmm0-%xmm5 clobbers from the x86_64 version of
hv_call() since otherwise gcc will complain about unknown register
names.  Theoretically, we should probably add code to explicitly
preserve the %xmm0-%xmm5 registers across a hypercall, in order to
guarantee to external code that these registers remain unchanged.  In
practice this is difficult since SSE registers are disabled by
default: for background information see commits 71560d1 ("[librm]
Preserve FPU, MMX and SSE state across calls to virt_call()") and
dd9a14d ("[librm] Conditionalize the workaround for the Tivoli VMM's
SSE garbling").

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-03-20 22:01:08 +02:00
Michael Brown 0d35411f88 [rng] Use fixed-point calculations for min-entropy quantities
We currently perform various min-entropy calculations using build-time
floating-point arithmetic.  No floating-point code ends up in the
final binary, since the results are eventually converted to integers
and asserted to be compile-time constants.

Though this mechanism is undoubtedly cute, it inhibits us from using
"-mno-sse" to prevent the use of SSE registers by the compiler.

Fix by using fixed-point arithmetic instead.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-03-20 20:56:01 +02:00
Michael Brown 3ec2079ce2 [time] Add support for the ACPI power management timer
Allow the ACPI power management timer to be used if enabled via
TIMER_ACPI in config/timer.h.  This provides an alternative timer on
systems where the standard 8254 PIT is unavailable or unreliable.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-03-20 17:26:49 +02:00
Michael Brown 89e31f8491 [librm] Add facility to provide register and stack dump for CPU exceptions
When DEBUG=librm_mgmt is enabled, intercept CPU exceptions and provide
a register and stack dump, then drop to an emergency shell.  Exiting
from the shell will almost certainly not work, but this provides an
opportunity to view the register and stack dump and carry out some
basic debugging.

Note that we can intercept only the first 8 CPU exceptions, since a
PXE ROM is not permitted to rebase the PIC.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-03-18 14:59:34 +02:00
Michael Brown 2bb4ec1f54 [build] Avoid use of "ld --oformat binary"
Using "ld --oformat binary" for mbr.bin and usbdisk.bin seems to cause
segmentation faults on some versions of binutils (observed on Fedora
27).  Work around this problem by using ld to create an intermediate
ELF object, followed by objcopy (via the existing %.tmp -> %.bin rule)
to create the final binary.

Note that we cannot simply use a single-stage "objcopy -O binary"
since this will not process the relocation records for x86_64: see
commit 1afcccd ("[build] Do not use "objcopy -O binary" for objects
with relocation records").

Reported-by: Brent S <bts@square-r00t.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2018-01-02 21:26:40 +01:00
Michael Brown 00c5b958c5 [legal] Add missing FILE_LICENCE declarations
Add missing FILE_LICENCE declarations to x86_64 headers based on the
corresponding i386 headers (from which the x86_64 headers were
originally derived).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-12-29 11:57:00 +00:00
Michael Brown 75acb3c775 [romprefix] Avoid unaligned accesses within ROM headers
Ensure that all headers (PCI, UNDI, PnP, iPXE) are aligned to at least
four bytes, so that all accesses to header fields will be correctly
aligned even when reading directly from the expansion ROM BAR.

Reported-by: Peter von Konigsmark <peter@exablaze.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-08-30 10:15:25 +01:00
Michael Brown 936657832f [hyperv] Do not steal ownership from the Gen 2 UEFI firmware
We must not steal ownership from the Gen 2 UEFI firmware, since doing
so will cause an immediate system crash (most likely in the form of a
reboot).

This problem was masked before commit a0f6e75 ("[hyperv] Do not fail
if guest OS ID MSR is already set"), since prior to that commit we
would always fail if we found any non-zero guest OS identity.  We now
accept a non-zero previous guest OS identity in order to allow for
situations such as chainloading from iPXE to another iPXE, and as a
prerequisite for commit b91cc98 ("[hyperv] Cope with Windows Server
2016 enlightenments").

A proper fix would be to reverse engineer the UEFI protocols exposed
within the Hyper-V Gen 2 firmware and use these to bind to the VMBus
device representing the network connection, (with the native Hyper-V
driver moved to become a BIOS-only feature).

As an interim solution, fail to initialise the native Hyper-V driver
if we detect the guest OS identity known to be used by the Gen 2 UEFI
firmware.  This will cause the standard all-drivers build (ipxe.efi)
to fall back to using the SNP driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-07-28 21:30:43 +01:00
Michael Brown 8866c919f8 [build] Fix ARM32 EFI builds with current EDK2 headers
EDK2 commit 6440385 ("MdePkg/Include: Add enumeration size checks to
Base.h") enforced the UEFI specification mandate that enums should
always be 32 bits.  This revealed a latent bug in iPXE, which does not
build with -fno-short-enums.

Fix by adding -fno-short-enums to CFLAGS for ARM32 EFI builds.

Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-07-28 15:46:06 +01:00
Michael Brown b6fc8be2c4 [build] Conditionalise use of -mabi=lp64 for ARM64 builds
The -mabi option was added in GCC 4.9.  Test for the existence of this
option to allow for building with earlier versions of GCC.

Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-07-28 12:49:40 +01:00
Michael Brown a6a5825f8d [cpuid] Allow input %ecx value to be specified
For some CPUID leaves (e.g. %eax=0x00000004), the result depends on
the input value of %ecx.  Allow this subfunction number to be
specified as a parameter to the cpuid() wrapper.

The subfunction number is exposed via the ${cpuid/...} settings
mechanism using the syntax

  ${cpuid/<subfunction>.0x40.<register>.<function>}

e.g.

  ${cpuid/0.0x40.0.0x0000000b}
  ${cpuid/1.0x40.0.0x0000000b}

to retrieve the CPU topology information.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-06-16 01:17:48 +01:00
Michael Brown 7c395b0e21 [build] Use -no-pie on newer versions of gcc
Some distributions patch gcc to generate position independent
executables by default.  We currently include a workaround to check
for this and to add -fno-PIE -nopie to CFLAGS if required.

Newer patched versions of gcc require -fno-PIE -no-pie instead.  Check
for both variants.

Reported-by: Nathan Rennie-Waldock <nathan.renniewaldock@gmail.com>
Originally-fixed-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-06-14 12:33:16 +01:00
Michael Brown 84e25513b1 [hdprefix] Avoid attempts to read beyond the end of the disk
When booting from a hard disk image (e.g. bin/ipxe.usb) within an
emulator such as QEMU, the disk may not exist beyond the end of the
image.  Limit all reads to the length of the image to avoid spurious
errors when loading the iPXE image.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-06-14 12:14:54 +01:00
Michael Brown 933e6dadc0 [acpi] Make acpi_find_rsdt() a per-platform method
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-05-23 18:34:39 +01:00
Michael Brown b91cc983da [hyperv] Cope with Windows Server 2016 enlightenments
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed.  When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.

Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR).  Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).

Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection.  Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host).  Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-04-28 16:20:47 +01:00
Michael Brown 276d618ca9 [hyperv] Remove redundant return status code from mapping functions
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-04-28 16:20:35 +01:00
Michael Brown a0f6e75532 [hyperv] Do not fail if guest OS ID MSR is already set
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-04-26 20:15:39 +01:00
Michael Brown dd976cb50d [block] Provide sandev_read() and sandev_write() as global symbols
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-04-26 20:15:39 +01:00
Michael Brown 84d406ccf4 [block] Allow use of a non-default EFI SAN boot filename
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader.  This does not work when performing a SAN boot on a
machine where the EFI variable is not present.

Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting.  For
example:

  sanboot --filename \efi\redhat\grub.efi \
          iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6

or

  option ipxe.san-filename code 188 = string;
  option ipxe.san-filename "\\efi\\redhat\\grub.efi";
  option root-path "iscsi:192.168.0.1::::iqn.2010-04.org.ipxe.demo:rhel6";

Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-04-12 15:58:05 +01:00
Michael Brown 5f85cbb9ee [build] Avoid implicit-fallthrough warnings on GCC 7
Reported-by: Vinson Lee <vlee@freedesktop.org>
Reported-by: Liang Yan <lyan@suse.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-29 10:36:45 +03:00
Michael Brown 7cfdd769aa [block] Describe all SAN devices via ACPI tables
Describe all SAN devices via ACPI tables such as the iBFT.  For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device.  For multi-device tables (i.e. the
iBFT), all devices are described in a single table.

An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table.  We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors.  This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.

For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully.  For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths.  We do this
since the iBFT has to include IP addresses rather than DNS names.  We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.

Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures.  We therefore describe only the
NICs that are actually required in order to reach the described
targets.  Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-28 19:12:48 +03:00
Michael Brown c73af29fe2 [int13con] Avoid overwriting random portions of SAN boot disks
The INT13 console type (CONSOLE_INT13) autodetects at initialisation
time a magic partition to be used for logging iPXE console output.  If
the INT13 drive number mapping is subsequently changed (e.g. because
iPXE was used to perform a SAN boot), then the console logging output
will be written to the incorrect disk.

Fix by recording the INT13 vector at initialisation time, and using
this original vector to emulate INT13 calls for all subsequent
accesses.  This should be robust against drive remapping performed
either by ourselves or by another bootloader (e.g. a chainloaded
undionly.kpxe which then performs a SAN boot).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-27 10:50:59 +03:00
Michael Brown ebceb8ad8a [int13] Improve geometry guessing for unaligned partitions
Some partition tables have partitions that are not aligned to a
cylinder boundary, which confuses the current geometry guessing logic.

Enhance the existing logic to ensure that we never reduce our guesses
for the number of heads or sectors per track, and add extra logic to
calculate the exact number of sectors per track if we find a partition
that starts within cylinder zero.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-26 21:03:50 +03:00
Michael Brown bb5a54b79a [block] Add basic multipath support
Add basic support for multipath block devices.  The "sanboot" and
"sanhook" commands now accept a list of SAN URIs.  We open all URIs
concurrently.  The first connection to become available for issuing
block device commands is marked as the active path and used for all
subsequent commands; all other connections are then closed.  Whenever
the active path fails, we reopen all URIs and repeat the process.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-26 16:06:02 +03:00
Michael Brown 7495813792 [video_subr] Use memmove() for overlapping memory copy
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-22 15:13:06 +02:00
Michael Brown d25e7daf47 [librm] Fail gracefully if asked to ioremap() a zero length
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-21 14:17:18 +02:00
Michael Brown 7692a8ff02 [undi] Move PXE API caller back into UNDI driver
As of commit 10d19bd ("[pxe] Always retrieve cached DHCPACK and apply
to relevant network device"), the UNDI driver has been the only user
of pxeparent_call().  Remove the unnecessary layer of abstraction by
refactoring this code back into undinet.c, and fix the ability of
undiisr.S to fall back to chaining to the original handler if we were
unable to unhook our own ISR.

This effectively reverts commit 337e1ed ("[pxe] Separate parent PXE
API caller from UNDINET driver").

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-19 15:57:24 +00:00
Michael Brown e790366c7c [int13] Refactor to use centralised SAN device abstraction
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-07 13:40:35 +00:00
Michael Brown e7ee2eda4b [block] Centralise "san-drive" setting
The concept of the SAN drive number is meaningful only in a BIOS
environment, where it represents the INT13 drive number (0x80 for the
first hard disk).  We retain this concept in a UEFI environment to
allow for a simple way for iPXE commands to refer to SAN drives.

Centralise the concept of the default drive number, since it is shared
between all supported environments.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-03-07 13:40:35 +00:00
Michael Brown f8cf3ceb0b [int13] Test correct return status from INT 13 calls
INT 13 calls return a status value via %ah, with CF set if %ah is
non-zero (indicating an error).  Our wrappers zero the whole of %ax if
CF is clear, to allow C code (which has no easy access to CF) to
simply test for a non-zero status to detect an error.

The current code assigns the returned status to a uint8_t, effectively
testing %al rather than %ah.  Fix by treating the returned status as a
uint16_t instead.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-26 09:45:19 +00:00
Michael Brown fcf7751565 [int13] Avoid potential division by zero
Avoid using a zero sector count to guess the disk geometry, since that
would result in a division by zero when calculating the number of
cylinders.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-26 09:31:40 +00:00
Michael Brown f3ba0fb5fd [hyperv] Provide timer based on the 10MHz time reference count MSR
When running on AMD platforms, the legacy hardware emulation is
extremely unreliable.  In particular, the IRQ0 timer interrupt is
likely to simply stop working, resulting in a total failure of any
code that relies on timers (such as DHCP retransmission attempts).

Work around this by using the 10MHz time counter provided by Hyper-V
via an MSR.  (This timer can be tested in KVM via the command-line
option "-cpu host,hv_time".)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-26 08:19:26 +00:00
Michael Brown 302f1eeb80 [time] Allow timer to be selected at runtime
Allow the active timer (providing udelay() and currticks()) to be
selected at runtime based on probing during the INIT_EARLY stage of
initialisation.

TICKS_PER_SEC is now a fixed compile-time constant for all builds, and
is independent of the underlying clock tick rate.  We choose the value
1024 to allow multiplications and divisions on seconds to be converted
to bit shifts.

TICKS_PER_MS is defined as 1, allowing multiplications and divisions
on milliseconds to be omitted entirely.  The 2% inaccuracy in this
definition is negligible when using the standard BIOS timer (running
at around 18.2Hz).

TIMER_RDTSC now checks for a constant TSC before claiming to be a
usable timer.  (This timer can be tested in KVM via the command-line
option "-cpu host,+invtsc".)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-26 08:17:37 +00:00
Michael Brown d37e025b81 [cpuid] Provide cpuid_supported() to test for supported functions
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-25 20:57:18 +00:00
Michael Brown bd6255c7be [pic8259] Fix definitions for "read IRR" and "read ISR" commands
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-25 07:37:04 +00:00
David Decotigny b6f524388b [af_packet] Add new AF_PACKET driver for Linux
This code largely inspired by tap.c.  Allows for testing iPXE on real
NICs from within Linux.  For example:

  make bin-x86_64-linux/af_packet.linux
  valgrind ./bin-x86_64-linux/af_packet.linux --net af_packet,if=eth3

Tested as x86_64 and i386 binary.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-22 14:02:54 +00:00
Michael Brown dfbbc16ae3 [build] Add %.vhd target for building VM bootable disk images
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2017-01-22 11:22:11 +00:00
Michael Brown e09331a4c6 [undi] Try matching UNDI ROMs in BIOS enumeration order
When searching for an UNDI ROM to match against a PCI device, search
in order of increasing ROM address (within the 128kB BIOS option ROM
area).  This is likely (though not guaranteed) to match the order of
the original enumeration performed by the BIOS, which is in turn
likely to match the order of enumeration on the PCI bus.

Since we load at most one UNDI ROM, the net result is that we increase
our chances of loading the ROM corresponding to the selected PCI
device (rather than loading a ROM corresponding to a higher-numbered
PCI device with the same vendor and device IDs.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-12-07 07:25:44 +00:00
Michael Brown 80c482c0ed [prefix] Include diagnostic information within progress messages
Include some relevant diagnostic infomation within the progress
messages generated via DEBUG=libprefix.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-12-06 09:38:33 +00:00
Michael Brown ce81601181 [prefix] Remove impossible progress message
The "progress" macro can be used only from within the .prefix section.
At the point of calling relocate(), we are running in .text16 and so
the near call to print_message() will end up calling a random function
somewhere in .text16.

Interestingly, this problem has remained unnoticed for some time.  It
is rare to build with DEBUG=libprefix.  In the few cases that it has
been used during development, the randomly selected function in
.text16 seems to have been a harmless no-op with no visible
side-effects (beyond the unnoticed failure to print the "relocate"
progress message).

Fix by removing the futile attempt to print a progress message before
calling relocate().

Reported-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-12-06 07:44:10 +00:00
Michael Brown 6997d3c2fa [undi] Clean up driver and device name information
Fix the <NULL> driver name reported by "ifstat" when using the undipci
driver (due to the unnecessary extra device node inserted as a child
of the PCI device).

Remove the "UNDI-" prefix from device names since the driver name is
also now visible via "ifstat", and tidy up the device name to match
the format used by standard PCI devices.

The output from "ifstat" now resembles:

  iPXE> ifstat
  net0: 52:54:00:12:34:56 using undipci on 0000:00:03.0

  iPXE> ifstat
  net0: 52:54:00:12:34:56 using undionly on 0000:00:03.0

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-12-05 15:48:37 +00:00
Michael Brown cc40fcbf8b [romprefix] Avoid using PMM-allocated memory in UNDI loader entry point
The UNDI loader entry point is very likely to be called after POST,
when there is a high chance that the PMM-allocated image source area
and decompression area have been reused by something else.

In particular, using an iPXE .iso to test a separate iPXE ROM's UNDI
loader entry point in a qemu VM is likely to crash.  SeaBIOS allocates
PMM blocks from close to the top of memory and so these blocks have a
high chance of colliding with the runtime addresses subsequently
chosen by the non-ROM iPXE by scanning the INT 15,e820 memory map.

The standard romprefix.S has no choice about relying on the
PMM-allocated image source area, since it has no other way to retrieve
its compressed payload.

In mromprefix.S, the image source area functions only as an optional
buffer used to avoid repeated reads from the (potentially slow)
expansion ROM BAR by the decompression code.  We can therefore always
set %esi=0 when calling install_prealloc from the UNDI loader entry
point, and simply fall back to reading directly from the expansion ROM
BAR.

We can always set %edi=0 when calling install_prealloc from the UNDI
loader entry point.  This will behave as though the decompression area
PMM allocation failed, and will therefore use INT 15,88 to find a
temporary decompression area somewhere close to 64MB.  This is by no
means guaranteed to be safe from collisions, but it's probably safer
on balance than the PMM-allocated address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-12-05 15:21:45 +00:00
Michael Brown 8138ea190d [undi] Allocate base memory before calling UNDI loader entry point
Allocate base memory (by decreasing the free base memory counter)
before calling the UNDI loader entry point, to minimise surprises for
the UNDI loader code.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-12-05 08:53:44 +00:00
Laszlo Ersek dd9a14de35 [librm] Conditionalize the workaround for the Tivoli VMM's SSE garbling
Commit 71560d1 ("[librm] Preserve FPU, MMX and SSE state across calls
to virt_call()") added FXSAVE and FXRSTOR instructions to iPXE.  In
KVM virtual machines, these instructions execute fine as long as the
host CPU supports the "unrestricted_guest" feature (that is, it can
virtualize big real mode natively).  On older host CPUs however, KVM
has to emulate big real mode, and it currently doesn't implement
FXSAVE emulation.

Upstream QEMU rebuilt iPXE at commit 0418631 ("[thunderx] Fix
compilation with older versions of gcc") which is a descendant of
commit 71560d1 (see above).

This was done in QEMU commit ffdc5a2 ("ipxe: update submodule from
4e03af8ec to 041863191").  The resultant binaries were bundled with
the QEMU v2.7.0 release; see QEMU commit c52125a ("ipxe: update
prebuilt binaries").

This distributed the iPXE workaround for the Tivoli VMM bug to a
number of KVM users with old host CPUs, causing KVM emulation failures
(guest crashes) for them while netbooting.

Make the FXSAVE and FXRSTOR instructions conditional on a new feature
test macro called TIVOLI_VMM_WORKAROUND.  Define the macro by default.

There is prior art for an assembly file including config/general.h:
see arch/x86/prefix/romprefix.S.  Also, TIVOLI_VMM_WORKAROUND seems to
be a good fit for the "Obscure configuration options" section in
config/general.h.

Cc: Bandan Das <bsd@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Greg <rollenwiese@yahoo.com>
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Michael Prokop <launchpad@michael-prokop.at>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Pickford <arch@netremedies.ca>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Ref: https://bugs.archlinux.org/task/50778
Ref: https://bugs.launchpad.net/qemu/+bug/1623276
Ref: https://bugzilla.proxmox.com/show_bug.cgi?id=1182
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1356762
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-11-08 17:37:59 +00:00
Michael Brown aa11f5deda [bzimage] Fix page alignment of initrd images
The initrd_addr_max field represents the highest byte address that may
be used to hold initrd images, and is therefore almost certainly not
aligned to a page boundary: a typical value might be 0x7fffffff.

Fix the address calculations to ensure that the initrd images are
always aligned to a page boundary.

Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-10-29 00:32:33 +01:00
Michael Brown df85901768 [acpi] Allow time for ACPI power off to take effect
The ACPI power off sequence may not take effect immediately.  Delay
for one second, to eliminate potentially confusing log messages such
as "Could not power off: Error 0x43902001 (http://ipx".

Reported-by: Leonid Vasetsky <leonidv@velostrata.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-07-11 21:23:03 +01:00
Michael Brown e19c0a8fd2 [acpi] Add support for ACPI power off
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-07-11 14:05:18 +01:00
Michael Brown 74222cd2c1 [rng] Check for functioning RTC interrupt
On some platforms (observed in a small subset of Microsoft Azure
(Hyper-V) virtual machines), the RTC appears to be incapable of
generating an interrupt via the legacy PIC.  The RTC status registers
show that a periodic interrupt has been asserted, but the PIC IRR
shows that IRQ8 remains inactive.

On such systems, iPXE will currently freeze during the "iPXE
initialising devices..." message.

Work around this problem by checking that RTC interrupts are being
raised before returning from rtc_entropy_enable().  If no interrupt is
seen within 100ms, then we assume that the RTC interrupt mechanism is
broken.  In these circumstances, iPXE will continue to initialise but
any subsequent attempt to generate entropy will fail.  In particular,
HTTPS connections will fail with an error indicating that no entropy
is available.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-07-10 20:42:53 +01:00
Michael Brown aeb6203811 [dhcp] Automatically generate vendor class identifier string
The vendor class identifier strings in DHCP_ARCH_VENDOR_CLASS_ID are
out of sync with the (correct) client architecture values in
DHCP_ARCH_CLIENT_ARCHITECTURE.

Fix by removing all definitions of DHCP_ARCH_VENDOR_CLASS_ID, and
instead generating the vendor class identifier string automatically
based on DHCP_ARCH_CLIENT_ARCHITECTURE and DHCP_ARCH_CLIENT_NDI.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-07-04 15:07:05 +01:00
Michael Brown 3d9f094022 [dhcp] Allow for variable encapsulation of architecture-specific options
DHCPv4 and DHCPv6 share some values in common for the architecture-
specific options (such as the client system architecture type), but
use different encapsulations: DHCPv4 has a single byte for the option
length while DHCPv6 has a 16-bit field for the option length.

Move the containing DHCP_OPTION() and related wrappers from the
individual dhcp_arch.h files to dhcp.c, thus allowing for the
architecture-specific values to be reused in dhcpv6.c.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-07-04 13:15:05 +01:00
Michael Brown 3bb61c33c2 [pxe] Disable interrupts on the PIC before starting NBP
Some BIOSes (observed with an HP Gen9) seem to spuriously enable
interrupts at the PIC.  This causes problems with NBPs such as GRUB
which use the UNDI API (thereby enabling interrupts on the NIC)
without first hooking an interrupt service routine.  In this
situation, the interrupt will end up being handled by the default BIOS
ISR, which will typically just send an EOI and return.  Since nothing
in this handler causes the NIC to deassert the interrupt, this will
result in an interrupt storm.

Entertainingly, some BIOSes are immune to this problem because the
default ISR sends the EOI only to the slave PIC; this effectively
disables the interrupt.

Work around this problem by disabling the interrupt on the PIC before
invoking the PXE NBP.  An NBP that expects to make use of interrupts
will need to be configuring the PIC anyway, so it is probably safe to
assume that it will explicitly reenable the interrupt.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-07-03 12:52:20 +01:00
Michael Brown c22da4b8ba [bios] Do not enable interrupts when printing to the console
There seems to be no reason for the sti/cli pair used around each call
to INT 10.  Remove these instructions, so that printing debug messages
from within an ISR does not temporarily reenable interrupts.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-07-03 12:50:19 +01:00
Michael Brown f76210961c [pci] Support systems with multiple PCI root bridges
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-06-09 09:36:28 +01:00
Michael Brown 31d4a7b8db [arm] Use correct DHCP client architecture values
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-26 13:43:33 +01:00
Michael Brown 276d7c31c5 [undi] Work around broken HP EliteBook 745 G3 PXE ROM
Reported-by: Arturino Mazzei <mazzeia@hotmail.com>
Tested-by: Arturino Mazzei <mazzeia@hotmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-13 13:22:06 +01:00
Michael Brown 601706688b [arm] Use CNTVCT_EL0 as profiling timestamp
The raw cycle counter at PMCCNTR_EL0 works in qemu but seems to always
read as zero on physical hardware (tested on Juno r1 and Cavium
ThunderX), even after ensuring that PMCR_EL0.E and PMCNTENSET_EL0.C
are both enabled.

Use CNTVCT_EL0 instead; this seems to count at a lower resolution
(tens of CPU cycles), but is usable for profiling.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-12 11:16:41 +01:00
Michael Brown 47931a4de5 [arm] Add optimised TCP/IP checksumming for 64-bit ARM
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-11 08:16:36 +01:00
Michael Brown 95716ece91 [arm] Add optimised string functions for 64-bit ARM
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-11 08:15:52 +01:00
Michael Brown 17c6f322ee [arm] Add support for 64-bit ARM (Aarch64)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-08 00:20:20 +01:00
Michael Brown edea3a434c [arm] Split out 32-bit-specific code to arch/arm32
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-08 00:18:35 +01:00
Michael Brown 1a16f67a28 [arm] Add support for 32-bit ARM
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-06 12:08:44 +01:00
Michael Brown 57d0ea7c46 [efi] Generalise EFI entropy generation to non-x86 CPUs
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-04 14:34:24 +01:00
Michael Brown 1e066431a4 [tcpip] Do not fall back to using unoptimised TCP/IP checksumming
Require architecture-specific code to make a deliberate choice to use
the unoptimised generic_tcpip_continue_chksum() function, if there is
no optimised version available.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-04 13:38:25 +01:00
Michael Brown 71560d1854 [librm] Preserve FPU, MMX and SSE state across calls to virt_call()
The IBM Tivoli Provisioning Manager for OS Deployment (also known as
TPMfOSD, Rembo-ia32, or Rembo Auto-Deploy) has a serious bug in some
older versions (observed with v5.1.1.0, apparently fixed by v7.1.1.0)
which can lead to arbitrary data corruption.

As mentioned in commit 87723a0 ("[libflat] Test A20 gate without
switching to flat real mode"), Tivoli's NBP sets up a VMM and makes
calls to the PXE stack in VM86 mode.  This appears to be some kind of
attempt to run PXE API calls inside a sandbox.  The VMM is fairly
sophisticated: for example, it handles our attempts to switch into
protected mode and patches our GDT so that our protected-mode code
runs in ring 1 instead of ring 0.  However, it neglects to apply any
memory protections.  In particular, it does not enable paging and
leaves us with 4GB segment limits.  We can therefore trivially break
out of the sandbox by simply overwriting the GDT (or by modifying any
of Tivoli's VMM code or data structures).

When we attempt to execute privileged instructions (such as "lidt"),
the CPU raises an exception and control is passed to the Tivoli VMM.
This may result in a call to Tivoli's memcpy() function.

Tivoli's memcpy() function includes optimisations which use the SSE
registers %xmm0-%xmm3 to speed up aligned memory copies.

Unfortunately, the Tivoli VMM's exception handler does not save or
restore %xmm0-%xmm3.  The net effect of this bug in the Tivoli VMM is
that any privileged instruction (such as "lidt") issued by iPXE may
result in unexpected corruption of the %xmm0-%xmm3 registers.

Even more unfortunately, this problem affects the code path taken in
response to a hardware interrupt from the NIC, since that code path
will call PXENV_UNDI_ISR.  The net effect therefore becomes that any
NIC hardware interrupt (e.g. due to a received packet) may result in
unexpected corruption of the %xmm0-%xmm3 registers.

If a packet arrives while Tivoli is in the middle of using its
memcpy() function, then the unexpected corruption of the %xmm0-%xmm3
registers will result in unexpected corruption in the destination
buffer.  The net effect therefore becomes that any received packet may
result in a 16-byte block of corruption somewhere in any data that
Tivoli copied using its memcpy() function.

We can work around this bug in the Tivoli VMM by saving and restoring
the %xmm0-%xmm3 registers across calls to virt_call().  To work around
the problem, we need to save registers before attempting to execute
any privileged instructions, and ensure that we attempt no further
privileged instructions after restoring the registers.

This is less simple than it may sound.  We can use the "movups"
instruction to save and restore individual registers, but this will
itself generate an undefined opcode exception if SSE is not currently
enabled according to the flags in %cr0 and %cr4.  We can't access %cr0
or %cr4 before attempting the "movups" instruction, because access a
control register is itself a privileged instruction (which may
therefore trigger corruption of the registers that we're trying to
save).

The best solution seems to be to use the "fxsave" and "fxrstor"
instructions.  If SSE is not enabled, then these instructions may fail
to save and restore the SSE register contents, but will not generate
an undefined opcode exception.  (If SSE is not enabled, then we don't
really care about preserving the SSE register contents anyway.)

The use of "fxsave" and "fxrstor" introduces an implicit assumption
that the CPU supports SSE instructions (even though we make no
assumption about whether or not SSE is currently enabled).  SSE was
introduced in 1999 with the Pentium III (and added by AMD in 2001),
and is an architectural requirement for x86_64.  Experimentation with
current versions of gcc suggest that it may generate SSE instructions
even when using "-m32", unless an explicit "-march=i386" or "-mno-sse"
is used to inhibit this.  It therefore seems reasonable to assume that
SSE will be supported on any hardware that might realistically be used
with new iPXE builds.

As a side benefit of this change, the MMX register %mm0 will now be
preserved across virt_call() even in an i386 build of iPXE using a
driver that requires readq()/writeq(), and the SSE registers
%xmm0-%xmm5 will now be preserved across virt_call() even in an x86_64
build of iPXE using the Hyper-V netvsc driver.

Experimentation suggests that this change adds around 10% to the
number of cycles required for a do-nothing virt_call(), most of which
are due to the extra bytes copied using "rep movsb".  Since the number
of bytes copied is a compile-time constant local to librm.S, we could
potentially reduce this impact by ensuring that we always copy a whole
number of dwords and so can use "rep movsl" instead of "rep movsb".

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-02 14:02:31 +01:00
Michael Brown 2d42d3cff6 [librm] Reduce real-mode stack consumption in virt_call()
Some PXE NBPs are known to make PXE API calls with very little space
available on the real-mode stack.  For example, the Rembo-ia32 NBP
from some versions of IBM's Tivoli Provisioning Manager for Operating
System Deployment (TPMfOSD) will issue calls with the real-mode stack
placed at 0000:03d2; this is at the end of the interrupt vector table
and leaves only 498 bytes of stack space available before overwriting
the hardware IRQ vectors.  This limits the amount of state that we can
preserve before transitioning to protected mode.

Work around these challenging conditions by preserving everything
other than the initial register dump in a temporary static buffer
within our real-mode data segment, and copying the contents of this
buffer to the protected-mode stack.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-04-29 11:48:39 +01:00
Michael Brown 5e5450c2d0 [comboot] Support COMBOOT in 64-bit builds
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-04-15 15:31:36 +01:00
Michael Brown c4e8c40227 [prefix] Use CRC32 to verify each block prior to decompression
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-24 16:52:26 +00:00
Michael Brown 3df598849b [pxe] Implicitly open network device in PXENV_UDP_OPEN
Some end-user configurations have been observed in which the first NBP
(such as GRUB2) uses the UNDI API and then transfers control to a
second NBP (such as pxelinux) which uses the UDP API.  The first NBP
closes the network device using PXENV_UNDI_CLOSE, which renders the
UDP API unable to transmit or receive packets.

The correct behaviour under these circumstances is (as often) simply
not documented by the PXE specification.  Testing with the Intel PXE
stack suggests that PXENV_UDP_OPEN will implicitly reopen the network
device if necessary, so match this behaviour.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-22 17:33:21 +00:00
Michael Brown c32b07b81b [int13] Allow default drive to be specified via "san-drive" setting
The DHCP option 175.189 has been defined (by us) since 2006 as
containing the drive number to be used for a SAN boot, but has never
been automatically used as such by iPXE.

Use this option (if specified) to override the default SAN drive
number.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-22 09:55:09 +00:00
Michael Brown ab5b3abbba [int13] Allow drive to be hooked using the natural drive number
Interpret the maximum drive number (0xff for hard disks, 0x7f for
floppy disks) as meaning "use natural drive number".

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-22 09:55:09 +00:00
Michael Brown 311a5732c8 [gdb] Add support for x86_64
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-22 08:44:32 +00:00
Michael Brown 1afcccd5fd [build] Do not use "objcopy -O binary" for objects with relocation records
The mbr.bin and usbdisk.bin standalone blobs are currently generated
using "objcopy -O binary", which does not process relocation records.

For the i386 build, this does not matter since the section start
address is zero and so the ".rel" relocation records are effectively
no-ops anyway.

For the x86_64 build, the ".rela" relocation records are not no-ops,
since the addend is included as part of the relocation record (rather
than inline).  Using "objcopy -O binary" will silently discard the
relocation records, with the result that all symbols are effectively
given a value of zero.

Fix by using "ld --oformat binary" instead of "objcopy -O binary" to
generate mbr.bin and usbdisk.bin.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-21 17:49:58 +00:00
Michael Brown 04ef198d2f [efi] Move architecture-independent EFI prefixes to interface/efi
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-17 14:51:14 +00:00
Michael Brown dbc9e591a5 [test] Move i386-specific tests to arch/i386/tests
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-17 14:32:00 +00:00
Michael Brown c14971bf88 [xen] Use generic test_and_clear_bit() function
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-16 22:46:05 +00:00
Michael Brown 9bab13a772 [hyperv] Use generic set_bit() function
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-16 22:33:41 +00:00
Michael Brown c867b5ab1f [bitops] Add generic atomic bit test, set, and clear functions
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-16 22:33:40 +00:00
Michael Brown 1f65ed53da [build] Allow assembler section type character to vary by architecture
On some architectures (such as ARM) the "@" character is used as a
comment delimiter.  A section type argument such as "@progbits"
therefore becomes "%progbits".

This is further complicated by the fact that the "%" character has
special meaning for inline assembly when input or output operands are
used, in which cases "@progbits" becomes "%%progbits".

Allow the section type character(s) to be defined via Makefile
variables.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-13 11:20:53 +00:00
Michael Brown a8037ee131 [efi] Centralise architecture-independent EFI Makefile and linker script
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-12 21:47:13 +00:00
Michael Brown cc9f31ee0c [librm] Do not unconditionally preserve flags across virt_call()
Commit 196f0f2 ("[librm] Convert prot_call() to a real-mode near
call") introduced a regression in which any deliberate modification to
the low 16 bits of the CPU flags (in struct i386_all_regs) would be
overwritten with the original flags value at the time of entry to
prot_call().

The regression arose because the alignment requirements of the
protected-mode stack necessitated the insertion of two bytes of
padding immediately below the prot_call() return address.  The
solution chosen was to extend the existing "pushfl / popfl" pair to
"pushfw;pushfl / popfl;popfw".  The extra "pushfw / popfw" appears at
first glance to be a no-op, but fails to take into account the fact
that the flags restored by popfl may have been deliberately modified
by the protected-mode function.

Fix by replacing "pushfw / popfw" with "pushw %ss / popw %ss".  While
%ss does appear within struct i386_all_regs, any modification to the
stored value has always been ignored by prot_call() anyway.

The most visible symptom of this regression was that SAN booting would
fail since every INT 13 call would be chained to the original INT 13
vector.

Reported-by: Vishvananda Ishaya <vishvananda@gmail.com>
Reported-by: Jamie Thompson <forum.ipxe@jamie-thompson.co.uk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-12 12:39:17 +00:00
Michael Brown d3db00ecf9 [pcbios] Restrict external memory allocations to the low 4GB
When running the 64-bit BIOS version of iPXE, restrict external memory
allocations to the low 4GB to ensure that allocations (such as for
initrds) fall within our identity-mapped memory region, and will be
accessible to the potentially 32-bit operating system.

Move largest_memblock() back to memtop_umalloc.c, since this change
imposes a restriction that applies only to BIOS builds.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 13:25:09 +00:00
Michael Brown 99b5216b1c [librm] Support ioremap() for addresses above 4GB in a 64-bit build
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-26 15:34:28 +00:00
Michael Brown 5bd8427d3d [ioapi] Split ioremap() out to a separate IOMAP API
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-26 15:33:40 +00:00
Michael Brown 6143057430 [librm] Add support for running in 64-bit long mode
Add support for running the BIOS version of iPXE in 64-bit long mode.
A 64-bit BIOS version of iPXE can be built using e.g.

  make bin-x86_64-pcbios/ipxe.usb
  make bin-x86_64-pcbios/8086100e.mrom

The 64-bit BIOS version should appear to function identically to the
normal 32-bit BIOS version.  The physical memory layout is unaltered:
iPXE is still relocated to the top of the available 32-bit address
space.  The code is linked to a virtual address of 0xffffffffeb000000
(in the negative 2GB as required by -mcmodel=kernel), with 4kB pages
created to cover the whole of .textdata.  2MB pages are created to
cover the whole of the 32-bit address space.

The 32-bit portions of the code run with VIRTUAL_CS and VIRTUAL_DS
configured such that truncating a 64-bit virtual address gives a
32-bit virtual address pointing to the same physical location.

The stack pointer remains as a physical address when running in long
mode (although the .stack section is accessible via the negative 2GB
virtual address); this is done in order to simplify the handling of
interrupts occurring while executing a portion of 32-bit code with
flat physical addressing via PHYS_CODE().

Interrupts may be enabled in either 64-bit long mode, 32-bit protected
mode with virtual addresses, 32-bit protected mode with physical
addresses, or 16-bit real mode.  Interrupts occurring in any mode
other than real mode will be reflected down to real mode and handled
by whichever ISR is hooked into the BIOS interrupt vector table.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-24 03:10:12 +00:00
Michael Brown e2cf3138f0 [librm] Rename prot_call() to virt_call()
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-22 00:50:32 +00:00
Michael Brown 4c1f2486e6 [librm] Support userptr_t in 64-bit builds
In a 64-bit build, the entirety of the 32-bit address space is
identity-mapped and so any valid physical address may immediately be
used as a virtual address.  Conversely, a virtual address that is
already within the 32-bit address space may immediately be used as a
physical address.

A valid virtual address that lies outside the 32-bit address space
must be an address within .textdata, and so can be converted to a
physical address by adding virt_offset.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-21 11:37:37 +00:00
Michael Brown b6ebafe1bb [librm] Mark virt_offset, text16, data16, rm_cs, and rm_ds as constant
The physical locations of .textdata, .text16 and .data16 are constant
from the point of view of C code.  Mark the relevant variables as
constant to allow gcc to optimise out redundant reads.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-21 11:13:04 +00:00
Michael Brown 5fbfe50ccb [librm] Do not preserve flags unnecessarily
No callers of prot_to_phys, phys_to_prot, or intr_to_prot require the
flags to be preserved.  Remove the unnecessary pushfl/popfl pairs.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-21 01:01:28 +00:00
Michael Brown ea203e4fe1 [librm] Add phys_call() wrapper for calling code with physical addressing
Add a phys_call() wrapper function (analogous to the existing
real_call() wrapper function) for calling code with flat physical
addressing, and use this wrapper within the PHYS_CODE() macro.

Move the relevant functionality inside librm.S, where it more
naturally belongs.

The COMBOOT code currently uses explicit calls to _virt_to_phys and
_phys_to_virt.  These will need to be rewritten if our COMBOOT support
is ever generalised to be able to run in a 64-bit build.
Specifically:

  - com32_exec_loop() should be restructured to use PHYS_CODE()

  - com32_wrapper.S should be restructured to use an equivalent of
    prot_call(), passing parameters via a struct i386_all_regs

  - there appears to be no need for com32_wrapper.S to switch between
    external and internal stacks; this could be omitted to simplify
    the design.

For now, librm.S continues to expose _virt_to_phys and _phys_to_virt
for use by com32.c and com32_wrapper.S.  Similarly, librm.S continues
to expose _intr_to_virt for use by gdbidt.S.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-20 23:09:36 +00:00
Michael Brown a4923354e3 [build] Fix building on older versions of binutils
Some older versions of binutils have issues with both the use of
PROVIDE() and the interpretation of numeric literals within a section
description.

Work around these older versions by defining the required numeric
literals outside of any section description, and by automatically
determining whether or not to generate extra space for page tables
rather than relying on LDFLAGS.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-19 20:03:30 +00:00
Michael Brown 163f8acba0 [librm] Generate page tables for 64-bit builds
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-19 03:21:38 +00:00
Michael Brown d1562c38a6 [librm] Prepare for long-mode memory map
The bulk of the iPXE binary (the .textdata section) is physically
relocated at runtime to the top of the 32-bit address space in order
to allow space for an OS to be loaded.  The relocation is achieved
with the assistance of segmentation: we adjust the code and data
segment bases so that the link-time addresses remain valid.

Segmentation is not available (for normal code and data segments) in
long mode.  We choose to compile the C code with -mcmodel=kernel and
use a link-time address of 0xffffffffeb000000.  This choice allows us
to identity-map the entirety of the 32-bit address space, and to alias
our chosen link-time address to the physical location of our .textdata
section.  (This requires the .textdata section to always be aligned to
a page boundary.)

We simultaneously choose to set the 32-bit virtual address segment
bases such that the link-time addresses may simply be truncated to 32
bits in order to generate a valid 32-bit virtual address.  This allows
symbols in .textdata to be trivially accessed by both 32-bit and
64-bit code.

There is no (sensible) way in 32-bit assembly code to generate the
required R_X86_64_32S relocation records for these truncated symbols.
However, subtracting the fixed constant 0xffffffff00000000 has the
same effect as truncation, and can be represented in a standard
R_X86_64_32 relocation record.  We define the VIRTUAL() macro to
abstract away this truncation operation, and apply it to all
references by 32-bit (or 16-bit) assembly code to any symbols within
the .textdata section.

We define "virt_offset" for a 64-bit build as "the value to be added
to an address within .textdata in order to obtain its physical
address".  With this definition, the low 32 bits of "virt_offset" can
be treated by 32-bit code as functionally equivalent to "virt_offset"
in a 32-bit build.

We define "text16" and "data16" for a 64-bit build as the physical
addresses of the .text16 and .data16 sections.  Since a physical
address within the 32-bit address space may be used directly as a
64-bit virtual address (thanks to the identity map), this definition
provides the most natural access to variables in .text16 and .data16.
Note that this requires a minor adjustment in prot_to_real(), which
accesses .text16 using 32-bit virtual addresses.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-19 02:58:09 +00:00
Michael Brown bfe6e3e90e [relocate] Preserve page alignment during relocation
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-19 02:39:56 +00:00
Michael Brown 6eb1c927a3 [librm] Transition to protected mode within init_librm()
Long-mode operation will require page tables, which are too large to
sensibly fit in our .data16 segment in base memory.

Add a portion of init_librm() running in 32-bit protected mode to
provide access to high memory.  Use this portion of init_librm() to
initialise the .textdata variables "virt_offset", "text16", and
"data16", eliminating the redundant (re)initialisation currently
performed on every mode transition as part of real_to_prot().

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-19 01:01:27 +00:00
Michael Brown 31b5c2e753 [librm] Provide an abstraction wrapper for prot_call
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-02-18 23:23:38 +00:00