Allow sharing return types with the rest of the code base. For example,
we use 'u128 = std::array<u64, 2>', meanwhile Google's code uses
'uint128 = std::pair<u64, u64>'.
While we are at it, use size_t instead of std::size_t.
Previous to this commit, the tests were using operator[] from
unordered_map to query elements but this silently inserts empty elements
when they don't exist. If all threads were executed without concurrency,
this wouldn't be an issue, but the same unordered_map could be written
from two threads at the same time. This is a data race and makes some
previously inserted elements invisible for a short period of time,
causing them to insert and return an empty element. This default
constructed element (a zero) was used to index an array of fibers that
asserted when one of them was nullptr, shutting the test session off.
To address this issue, lock on thread id reads and writes. This could be
a shared mutex to allow concurrent reads, but the definition of
std::this_thread::get_id is fuzzy when using non-standard techniques
like fibers. I opted to use a standard mutex.
While we are at it, fix the included headers.