OpenBSD cvs log

created 2022-08-14T03:21:41Z
begin 2022-08-12T00:00:00Z
end 2022-08-13T00:00:00Z
path src/sys
commits 18

date 2022-08-12T00:32:59Z
author krw
files src/sys/kern/subr_disk.c log diff annotate
message Coverity says multiplying two uint32_t's and assigning them to
a uint64_t may not produce the (humanly) obvious result.

Cast one of them to a (uint64_t) in the hope of invoking the
appropriate int promotion god.

CID 1519495

date 2022-08-12T02:20:36Z
author cheloha
files src/sys/arch/amd64/amd64/cpu.c log diff annotate
src/sys/arch/amd64/amd64/tsc.c log diff annotate
src/sys/arch/amd64/include/cpu.h log diff annotate
src/sys/arch/amd64/include/cpuvar.h log diff annotate
src/sys/kern/kern_tc.c log diff annotate
src/sys/sys/timetc.h log diff annotate
message amd64: simplify TSC synchronization testing

Computing a per-CPU TSC skew value is error-prone, especially on
multisocket machines and VMs. My best guess is that larger latencies
appear to the current skew measurement test as TSC desync, and so the
TSC is demoted to a kernel timecounter on these machines or marked
non-monotonic.

This patch eliminates per-CPU TSC skew values. Instead of trying to
measure and correct for TSC desync we only try to detect desync, which
is less error-prone. This approach should allow a wider variety of
machines to use the TSC as a timecounter when running OpenBSD.

In the new sync test, both CPUs repeatedly try to detect whether their
TSC is trailing the other CPU's TSC. The upside to this approach is
that it yields no false positives. The downside to this approach is
that it takes more time than the current skew measurement test. Each
test round takes 1ms, and we run up to two rounds per CPU, so this
patch slows boot down by 2ms per AP.

If any CPU fails the sync test, the TSC is marked non-monotonic and a
different timecounter is activated. The TC_USER flag remains intact.
There is no middle ground where we fall back to only using the TSC in
the kernel.

Before running the test, we check for the IA32_TSC_ADJUST register and
reset it if necessary. This is a trivial way to work around firmware
bugs that desync the TSC before we reach the kernel. Unfortunately,
at the moment this register appears to only be available on Intel
processors. I cannot find an equivalent but differently-named MSR for
AMD processors.

Because there is no per-CPU skew value, there is also no concept of
TSC drift anymore.

Miscellaneous notes:

- This patch adds a new timecounter utility function, tc_reset_quality().
Used after sync test failure to mark the TSC non-monotonic.

- I have left TSC_DEBUG enabled for now. Unsure if we should leave it
enabled for release or not. If we disable it we no longer run the
sync test after failing it once. Running the test even after failure
provides information about the desync on every CPU.

- Taking 1ms per test round is fairly conservative. We can experiment
with and discuss shorter test rounds. My main goal with a relatively
long test round is ensuring VMs actually run the test. It would be
bad if a hypervisor interrupted the test for so long that it concealed
desync.

- The use of two test rounds is mostly a diagnostic tool: it would be
very strange if a CPU passed the first round but failed the second.
If we ever saw this in the wild it would indicate something odd.

- Most of the desync seen in test reports is on Ryzen CPUs. I
believe, but cannot prove, that this is due to a widespread
firmware bug on AMD motherboards. Hopefully AMD and/or the
downstream vendors fix it.

- Fixing TSC desync by writing the TSC directly with WRMSR is very
difficult. The TSC is a moving target incrementing very quickly and
compensating for WRMSR overhead is non-trivial. We can experiment
with this, but my confidence is low that we can make it work reliably.

Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by
deraadt@ throughout. Reprompted by Yuichiro Naito several times.
With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@.

Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan,
Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@,
Renato Aguiar, and Timo Myyra.

Patch v1: https://marc.info/?l=openbsd-tech&m=164330092208035&w=2
Patch v2: https://marc.info/?l=openbsd-tech&m=164558519712957&w=2
Patch v3: https://marc.info/?l=openbsd-tech&m=165698681018991&w=2
Patch v4: https://marc.info/?l=openbsd-tech&m=165835507113680&w=2
Patch v5: https://marc.info/?l=openbsd-tech&m=165923705118770&w=2

"just commit it" deraadt@

date 2022-08-12T08:31:06Z
author jsg
files src/sys/arch/m88k/m88k/trap.c log diff annotate
message use string literal for format string
ok miod@

date 2022-08-12T08:34:43Z
author jsg
files src/sys/arch/hppa/hppa/db_disasm.c log diff annotate
message use string literal for format string
ok deraadt@ miod@

date 2022-08-12T12:08:54Z
author bluhm
files src/sys/netinet6/ip6_input.c log diff annotate
message At successful return ip6_check_rh0hdr() keeps *offp unmodified.
The IPv6 routing header type 0 check should modify *offp only in
case of an error, so that the generated icmp6 packet has the correct
pointer.
OK sashan@

date 2022-08-12T13:36:19Z
author aoyama
files src/sys/arch/luna88k/dev/lunafb.c log diff annotate
message Fix to work 1bpp Xorg server again on 1bpp framebuffer hardware.

Recent xenocara wsfb driver can treat LUNA's framebuffer "offset", but it
requires one more page by mmap() when we use offset.

Noticed and tested on nono emulator with 1bpp setting.

date 2022-08-12T14:30:52Z
author visa
files src/sys/kern/vfs_bio.c log diff annotate
src/sys/kern/vfs_subr.c log diff annotate
src/sys/msdosfs/msdosfs_vfsops.c log diff annotate
message Put more struct vnode fields under splbio().

Buffer cache related struct vnode fields can be accessed in interrupt
context. Be more consistent with the use of splbio().

OK mpi@

date 2022-08-12T14:30:53Z
author visa
files src/sys/nfs/nfs_vfsops.c log diff annotate
src/sys/nfs/nfs_vnops.c log diff annotate
src/sys/sys/vnode.h log diff annotate
src/sys/ufs/ext2fs/ext2fs_inode.c log diff annotate
src/sys/ufs/ext2fs/ext2fs_vfsops.c log diff annotate
src/sys/ufs/ffs/ffs_vfsops.c log diff annotate
message Put more struct vnode fields under splbio().

Buffer cache related struct vnode fields can be accessed in interrupt
context. Be more consistent with the use of splbio().

OK mpi@

date 2022-08-12T14:49:15Z
author bluhm
files src/sys/netinet/ip_input.c log diff annotate
src/sys/netinet/ip_var.h log diff annotate
src/sys/netinet6/ip6_input.c log diff annotate
src/sys/netinet6/ip6_var.h log diff annotate
message There are some places in ip and ip6 input where operations fail due
to out of memory. Use a generic idropped counter for those.
OK mvs@

date 2022-08-12T16:38:09Z
author mvs
files src/sys/net/if_pflow.c log diff annotate
message Fix race between pflow_output_process() and pflow_clone_destroy().

Unlink pflow(4) interface from `pflowif_list' before start destruction to
prevent pflow_output_process() being rescheduled. Also wait until running
pflow_output_process() task finished.

Problem reported and fix tested by Hrvoje Popovski.

ok bluhm@

date 2022-08-12T16:38:50Z
author mvs
files src/sys/net/if_pflow.c log diff annotate
src/sys/net/if_pflow.h log diff annotate
message Remove unused fields from 'pflow_softc' structure.

ok bluhm@

date 2022-08-12T16:42:54Z
author bluhm
files src/sys/net/if_media.c log diff annotate
message Fix non working continue in do while(0) loop.
OK mvs@ jca@
CID 1519492

date 2022-08-12T17:04:16Z
author bluhm
files src/sys/netinet/ip_output.c log diff annotate
message Remove differences between ip_fragment() and ip6_fragment(). They
do nearly the same thing, so they should look similar.
OK sashan@

date 2022-08-12T17:04:17Z
author bluhm
files src/sys/netinet6/ip6_output.c log diff annotate
message Remove differences between ip_fragment() and ip6_fragment(). They
do nearly the same thing, so they should look similar.
OK sashan@

date 2022-08-12T17:19:52Z
author miod
files src/sys/arch/alpha/alpha/trap.c log diff annotate
src/sys/arch/hppa/hppa/trap.c log diff annotate
message Make sure we don't pass uninitialized siginfo values to trapsignal(); from
clang via jsg@, ok jsg@

date 2022-08-12T20:05:49Z
author krw
files src/sys/kern/subr_disk.c log diff annotate
message Revert to pre-r1.249 more laissez-faire checks for valid MBR
partitions.

miod@ (re)discovered an off-by-one in some device size
calculations. Whether the ancient misbehaviour of some devices to
confuse number of sectors with highest valid sector address or
something newer.

Should fix miod@'s octeon boot disk.

date 2022-08-12T20:17:46Z
author stsp
files src/sys/arch/amd64/stand/efi32/efidev.c log diff annotate
src/sys/arch/amd64/stand/efi64/efidev.c log diff annotate
src/sys/arch/amd64/stand/efiboot/efidev.c log diff annotate
src/sys/arch/amd64/stand/libsa/biosdev.c log diff annotate
src/sys/arch/amd64/stand/libsa/softraid_amd64.c log diff annotate
src/sys/lib/libsa/softraid.c log diff annotate
message add support for booting from RAID 1C softraid(4) volumes on amd64

Only boot-loader changes are needed. Both installboot(8) and
the kernel already do what is required to make this work.

ok kn@

Tested:
biosboot on vmm: kn, stsp
biosboot and efiboot on server hardware: stsp

date 2022-08-12T20:18:58Z
author stsp
files src/sys/arch/amd64/stand/boot/conf.c log diff annotate
src/sys/arch/amd64/stand/cdboot/conf.c log diff annotate
src/sys/arch/amd64/stand/efi32/conf.c log diff annotate
src/sys/arch/amd64/stand/efi64/conf.c log diff annotate
src/sys/arch/amd64/stand/efiboot/conf.c log diff annotate
src/sys/arch/amd64/stand/pxeboot/conf.c log diff annotate
message Crank amd64 boot loader version numbers for softraid(4) RAID 1C boot support.