Script '/data/test/perform/bin/cvsbuild.pl -h root@ot15 -d 2025-01-13T19:57:29Z -P /home/bluhm/mvs-somove-lock.diff align' started at 2025-01-14T01:53:53Z. Command 'ssh root@ot15 sysctl kern.version hw.machine hw.ncpu hw.ncpuonline' started. Command 'ssh root@ot15 sysctl kern.version hw.machine hw.ncpu hw.ncpuonline' finished. Command 'ssh root@ot15 cd /usr/src && cvs -qR up -C sys' started. Command 'ssh root@ot15 cd /usr/src && cvs -qR up -C sys' finished. Command 'ssh root@ot15 cd /usr/src/sys && patch -ENufF0 -p0' started. Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |On Tue, Dec 24, 2024 at 08:53:14PM +0300, Vitaliy Makkoveev wrote: |> On Tue, Dec 24, 2024 at 07:59:38PM +0300, Vitaliy Makkoveev wrote: |> > > On 24 Dec 2024, at 19:49, Alexander Bluhm wrote: |> > > |> > > On Sat, Dec 21, 2024 at 08:16:05AM +0300, Vitaliy Makkoveev wrote: |> > >> This diff only unlocks sosend() path, all the rest is still locked |> > >> exclusively. |> > >> |> > >> Even for tcp(4) case, sosend() only checks `so_snd' free space and |> > >> sleeps if necessary, actual buffer handling happening within exclusively |> > >> locked PCB layer. In the PCB layer we don't need to protect read-only |> > >> access to `so_snd' because corresponding read-write access is serialized |> > >> by socket lock. |> > >> |> > >> sosend() needs to be serialized with somove(), so sotask() takes |> > >> sblock() on `so_snd' of spliced socket. This discards previous |> > >> protection of tcp(4) spliced sockets where solock() was used to prevent |> > >> concurrent unsplicing. This scheme was used to avoid sleep in sofree(). |> > >> |> > >> However, sleep is sofree() is possible. We have two cases: |> > >> |> > >> 1. Socket was not yet accept(2)ed. Such sockets can't be accessed from |> > >> the userland and can't be spliced. sofree() could be called only from |> > >> PCB layer and don't need to sleep. |> > >> |> > >> 2. Socket was accepted. While called form PCB layer, sofree() ignores it |> > >> because SS_NOFDREF bit is not set. Socket remains spliced, but without |> > >> PCB. Sockets without PCB can't be accessed from PCB layer, only from |> > >> userland. So, soclose()/sofree() thread needs to wait concurrent |> > >> soclose()/sofree() thread for spliced socket as is is already done for |> > >> udp(4) case. In such case it is safe to release solock() and sleep |> > >> within sofree(). |> > >> |> > >> Note, I intentionally left all "if (dosolock)" dances as is. |> > > |> > > While running your diff though my performnce tests, I got this panic |> > > twice. I have never seen it before, so I guess it is related. |> > > |> > > panic: softclock: invalid to_clock: 19668320 |> > > Stopped at db_enter+0x14: popq %rbp |> > > TID PID UID PRFLAGS PFLAGS CPU COMMAND |> > > 120146 91986 89 0x1100012 0 1 relayd |> > > 65124 13759 89 0x1100012 0 3 relayd |> > > 33160 19438 0 0x14000 0x200 2 softnet0 |> > > db_enter() at db_enter+0x14 |> > > panic(ffffffff82883be3) at panic+0xdd |> > > softclock(0) at softclock+0x1d5 |> > > softintr_dispatch(0) at softintr_dispatch+0xe6 |> > > Xsoftclock() at Xsoftclock+0x27 |> > > acpicpu_idle() at acpicpu_idle+0x2b9 |> > > sched_idle(ffffffff83593ff0) at sched_idle+0x298 |> > > end trace frame: 0x0, count: 8 |> > > https://www.openbsd.org/ddb.html describes the minimum info required in bug |> > > reports. Insufficient info makes it difficult to find and fix bugs. |> > > |> > > ddb{0}> x/s version |> > > version: OpenBSD 7.6-current (GENERIC.MP) #cvs : D2024.12.23.00.00.00: Tue Dec 24 14:53:39 CET 2024 |> > > root@ot15.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP |> > > |> > > ddb{0}> show panic |> > > *cpu0: softclock: invalid to_clock: 19668320 |> > |> > Interesting. Which test triggers this? I suspect sosplice. |> |> This diff contains only sosplice() relared hunks. Regardless `so_snd' |> unlocking, our current sosend() releases solock() around uiomove(), so |> concurrent somove() thread could break us. This diff prevents this by |> taking sblock() on `so_snd' of spliced socket. |> |> I performed couple of kern/sosplice runs without any issue. Alexander, |> could you test it too? I suspect the problem could be in timeout |> reinitialisation in sofree(), but I can't trigger it during regress run. |> | |This is the updated diff. It uses barriers instead of `ssp_idleto' |timeout and `ssp_task' task redefinition. | |Index: sys/kern/uipc_socket.c |=================================================================== |RCS file: /cvs/src/sys/kern/uipc_socket.c,v |diff -u -p -r1.347 uipc_socket.c |--- kern/uipc_socket.c 19 Dec 2024 22:11:35 -0000 1.347 |+++ kern/uipc_socket.c 24 Dec 2024 19:12:56 -0000 -------------------------- Patching file kern/uipc_socket.c using Plan A... Hunk #1 failed at 62. Hunk #2 failed at 325. Hunk #3 failed at 353. Hunk #4 failed at 445. Hunk #5 failed at 496. Hunk #6 failed at 1502. Hunk #7 failed at 1556. Hunk #8 failed at 1582. Hunk #9 failed at 1601. 9 out of 9 hunks failed--saving rejects to kern/uipc_socket.c.rej done Command 'ssh root@ot15 cd /usr/src/sys && patch -ENufF0 -p0' failed: 256 at /data/test/perform/bin/Machine.pm line 187. Command 'ssh root@ot15 cd /usr/src/sys && patch -ENufF0 -p0' failed: 256 at /data/test/perform/bin/Machine.pm line 187.