Research

I am currently looking into lock-free programming and useful H/W semantics that could ease the development effort for the same yet not conpromising on performance and scalability garuntees. There has been a lot of literature in the lock-free realm supporting the need for stronger atomicity garuntees like an MCAS (multi-word compare and swap) because even though HTM may fit the bill for atomicity and non-blocking garuntees, it is inefficient. My PhD effort is to marry the two and provide a fast and light-weight HTM that could address these issues effectively. In my reseach, I discover that there is a need for a set of hardware atomics not just an MCAS, but more such atomics and in the absence of such instructions, we improve HTM (the underlying coherence mechanism) such that we can model all the atomics needed using HTM as a building block.

Experimental setup:

Lock-free microbenchmarks developed :
- https://github.com/mahita649/lfbench_suite
gem5 ARM’s transactional Memory model used as a base: gem5v20_arm-tme

Publications are listed here

In the past, I have explored various HTM implementations and HTM in the context of persistent memory in particular. While both share very similar guarantees with respect to consistency, atomicity and durability, the optimal ways to achieve them while ensuring performance for both can be very different. For example, version management for HTMs using “undo” logging has less overhead and delivers performance where as persistent memories perform better with “redo” logging ( as implemented in DHTM).

Additionally, I am interested in advanced parallel architecture topics of coherency and consistency, synchronization and interconnect network and have been working with simulators like GEM5 and sniper in conducting preliminary experiments for my research.