I am playing with some changes to suspend / resume code and I see some
behavior that I didn't quite expect. Here is a sketch of the code.
- atomic_add_rel_int(&suspend_count, 1) // lock add $1, addr
- while (1) cpu_spinwait()
- while (suspend_count < expect_count) cpu_spinwait()
- enter ACPI suspend
What I see is that after the resume suspend_count may have a value
different from what it was before the suspend (expect_count). In fact,
it's just zero in all tests I have done so far.
If I move wbinvd to a position just after the suspend_count increment,
then the post-resume value is consistently the expected value.
It appears that the changes to suspend_count performed by the APs may
never make it to the main memory unless the caches of the APs are
flushed afterwards. This is the part that I find surprising.
I will double-check the code and the test to be more confident that what
I described is what actually happens.
The hardware is AMD and the architecture is amd64.
I understand that on modern hardware even lock-ed stores do not have to
go to the main memory immediately as they can be handled by cache
coherency protocols. For AMD it seems to be MOESI. I guess that one of
the AP caches Owns the cached (dirty) value and the BSP has it only in
the Shared mode, so the cache flush on the BSP did nothing to store the
cached value and the owning AP didn't get enough time to do the store.
Just wanted to share this and get some feedback on whether the theory is