created | 2021-03-14T00:51:07Z |
---|---|
begin | 2021-03-06T00:00:00Z |
end | 2021-03-07T00:00:00Z |
path | src/sys |
commits | 4 |
date | 2021-03-06T09:20:49Z | |||
---|---|---|---|---|
author | jsg | |||
files | src/sys/dev/rnd.c | log | diff | annotate |
src/sys/kern/genassym.sh | log | diff | annotate | |
src/sys/kern/uipc_mbuf.c | log | diff | annotate | |
message | ansi |
date | 2021-03-06T09:20:50Z | |||
---|---|---|---|---|
author | jsg | |||
files | src/sys/arch/i386/isa/isa_machdep.c | log | diff | annotate |
src/sys/arch/i386/pci/pci_intr_fixup.c | log | diff | annotate | |
src/sys/dev/microcode/adw/adwmcode.c | log | diff | annotate | |
src/sys/lib/libkern/muldi3.c | log | diff | annotate | |
message | ansi |
date | 2021-03-06T19:25:27Z | |||
---|---|---|---|---|
author | patrick | |||
files | src/sys/arch/arm64/dev/smmu.c | log | diff | annotate |
message |
One major issue talked about in research papers is reducing the overhead of the IOVA allocation. As far as I can see the current "best solution" is to cache IOVA ranges in percpu magazines. I don't think we have this issue at all thanks to bus_dmamap_create(9). The map is created ahead of time, and we know the maximum size of the DMA transfer. Since with smmu(4) we have IOVA per domain, allocating IOVA 'early' is essentially free. But pagetable mapping also incurs a performance penalty, since we allocate pagetable entry descriptors through pools. Since we have the IOVA early, we can allocate those early as well. This allocation is a bit more expensive though, but can be optimized further. All this means that there is no allocation overhead in hot code paths. The "only" thing remaining is assigning IOVA to the segments, adjusting the pagetable mappings, and flushing the IOTLB on unload. Maybe there's a way to do a combined flush for NICs, because we give a list of mbufs to the network stack and we could do the IOTLB invalidation only once right before we hand over the mbuf list to the upper layers. |
date | 2021-03-06T19:30:07Z | |||
---|---|---|---|---|
author | patrick | |||
files | src/sys/arch/arm64/dev/smmu.c | log | diff | annotate |
message |
Since with the current design there's one device per domain, and one domain per pagetable, there's no need for a backpointer to the domain in the pagetable entry descriptor. There can't be any other domain. Also since there's no list, no list entry member is needed either. This reduces early allocation to half of the previous size. I think it's possible to reduce it even further and not need a pagetable entry descriptor at all, but I need to think about that a bit more. |