BASH- Bandwidth Adaptive Snooping

The paper “bash” is special to me as it was one of the early papers that introduced me to the concept of “hybrids”- a very common trend in research. In my opinion, around the time this paper come out, the widely used systems were small to medium scale servers – cluster of 64 processors/nodes and not more. The research question on the authors mind must be looking at a unified solution for coherence for any server in the market- small ones benefit from snoopy and as size grows, directory scales better. So, how to unify this is what they try to address.

Authors: Milo M. K. Martin, Daniel J. Sorin, Mark D. Hill, and David A. Wood, Computer Sciences Department, University of Wisconsin-Madison

Source and date: HPCA 2002

Unification of snooping and directory:

The idea is to use snooping based protocol when the system is made up of smaller number of processors (less sharers, everyone can talk cache-to-cache and cache-to-cache transfers are faster) and move to directory-like implementation as the system grows (we have more people talking to one another and a coordinator/moderator is needed to get everyone’s point across without deadlocks and starvation); so merge these two and give a combined solution.

Hypothesis is that use bandwidth utilization as a hint of how many processors are present in the system and adapt accordingly. The adapting process should not oscillate and gradually change to accommodate the needs-hence, they propose a utilization counter that increments itself by 1 when the channel is busy and decrements itself by 3 when the channel is idle- based on a threshold mark of the utilization (counter value), the system chooses between a MOSI protocol or a GS320-style directory protocol.

These protocols have subtle difference in there implementation but can be handled easily in hardware. BASH uses 2 networks ( 2 in MOSI- 1 for request and 1 for response, 3 in directory but the request network takes care of ordering as in forwarding lane of the directory) and use self-request snooping as marker for the ordering + retries instead of forwarding requests & NACKs to avoid deadlocks.

Author’s evaluation of the solution:

The section 4 of the paper in-detail provides the results of their simulations supporting their hypothesis. These results show that BASH is actually able to switch between broadcasting and unicasting effectively as the workload varies. Their modelling of network/ transactions delays (cache-to-cache vs memory-to-cache etc) based on the systems and models available at that point in time is reasonable (more on the pessimistic side but definitely not optimistic, so the results are good indicators of the success of their proposal.)

They also talk about the various aspects that they took into consideration for evaluating (livelock/deadlocks, scalability, complexity and verification of the methodology) which cover most of the questions that need answering.

The issues left out:

  1. The section from the paper that talks of verification says that their verification was thorough for correctness of the protocol but could not really test deadlocks/livelocks effectively. In real systems, there might arise some really pathological cases of deadlocks that might suffer badly.
  2. We note later in the paper that the threshold utilization number doesn’t really matter much (55%, 75%, 90%- perform almost similar) which I think indicates that the protocol needs more fine-tuning; it switches effectively for small and large cases but for middle-range bandwidths, new angle is needed to decide which protocol to adapt (the authors also acknowledge this in their future work section).
  3. This method even though looks lucrative and simple enough to implement- an actual implementation in H/W might prove to be a PD and verification challenge and these have not been explored.