01signal.com

Crossing clock domains with data

This page is the last in a series of three pages on clock domains.

When one bit is not enough

Quite often, the signal that is required to cross clock domains is a data word, and not a single bit. The straightforward solution in this case is the FPGA vendor's dual-clock FIFO, as already suggested, however sometimes this isn't an option. Besides, someone had to implement that FIFO to begin with.

So the goal is to make a vector signal appear validly on another clock domain. Let's first start with a naïve and incorrect example for a clock domain crossing, just to explain why it isn't that easy:

reg [7:0] foo, bar, bar_metaguard;

always @(posedge clk1)
  foo <= foo + 1;

always @(posedge clk2)
  begin
    bar_metaguard <= foo; // This will fail sometimes!
    bar <= bar_metaguard;
  end

This is exactly like the simple metastability guard example on the previous page, but the registers are 8-bit vectors, and @foo increments instead of being toggled.

So why is this wrong? The problem is the differences in the routing delays between the eight bits' paths from @foo to @bar_metaguard. When @foo changes, some of the bits' changes may arrive with legal timing to some of @bar_metaguard's corresponding flip-flops, and others won't.

So even if none of @bar_metaguard's 8 bits get metastable, there can be a situation where @foo changes, and only some bits of @bar_metaguard sample the new value, and others don't. For example, if @foo changes from 0xff to 0x00, @bar_metaguard could sample, say, 0x2e, because some bits sampled @foo's value before the change, and others after it. This incorrect value will be visible on @bar a @clk2 cycle later.

To solve this problem, it's first necessary to define the need: Is @bar required to contain a valid value all the time, or is it intended to occasionally pass information from one clock domain to another? I'll discuss these two options separately.

Option #1: Continuous sampling

If the destination word (@bar in the example) is expected to continuously sample the word on the other clock domain (@foo), and always contain a legal and meaningful value, there's only one way to ensure this: Make sure that on each @clk1 cycle, only on one of @foo's bits changes (or none). This way, each change is either sampled or missed by @bar_metaguard, and either way it reflects one of the values that @foo had.

So if @foo in the example above was incremented in Gray code rather than plain binary code, it would work perfectly fine: The essence of Gray coding is that only one bit changes each time the word is incremented, so @bar would be guaranteed to always carry a consistent and meaningful value.

The most common use of this method is inside dual-clock FIFOs, where Gray code is used to pass the read and write addresses in the FIFO's RAM across the two clock domains. As each side knows the other side's current address (subject to the delay of the metastability guards), it can also calculate how many elements there are in the FIFO, and hence produce e.g. empty and full signals.

But wait, what happens if @clk1 has a higher frequency than @clk2? That doesn't matter if we're fine with @bar skipping some of @foo's values. For example, if @foo is a Gray coded counter, some count steps will be skipped when looking at @bar, but all its values will be correct in the sense that they did appear in @foo at some point in time.

So in this sense, it can be easier to work with a vector than a single bit: Missing the toggling of a single bit because the destination clock is slower could mean missing that anything happened at all, but with a word that correctly crosses the clock domain with this method, it may not matter that intermediate states went lost.

A petty comment on path reordering

For this method to work, there's an underlying assumption, which is almost certainly met without doing anything special for it. And still, let's consider this theoretical example: Say that @clk1 runs at 500 MHz, and that one of @foo's paths to @bar_metaguard has a routing delay of 4 ns, and a second path a delay of 1 ns. A routing delay of 4 ns is of course extremely unlikely to be seen, but let's see what can happen: One of the bits toggles and the change begins the journey that takes 4 ns. On the following clock cycle, 2 ns later, the other bit toggles, and reaches @bar_metaguard 1 ns later. But that's 1 ns earlier than the first bit's arrival. Hence @bar_metaguard can sample the entire word with a value that @foo never had.

As routing delays are typically much shorter than in this example, this is not expected to happen in reality. Nevertheless, any delay is theoretically possible. To eliminate this possibility altogether, a constraint like the following can be used (given in Vivado format):

set_max_delay -datapath_only -from [ get_pins -hier -filter {name=~*/C} ] -to [ get_pins -hier -filter {name=~*_metaguard*/D} ] 1.5

This constraint somewhat resembles the one given in the previous page, but it's a whole different story: As the relevant paths cross clock domains of unrelated clocks, it's meaningless to take the clocks' skews and jitters into account. This is what the -datapath_only flag says: Never mind the time it takes for the clocks to reach the flip-flops. Just measure the path.

What makes this constraint confusing is that the path starts at the clock pin of the source flip-flop, and ends at the data input (D) pin. The stopwatch hence starts when the source flip-flop gets its clock and ends when the updated signal arrives at the destination, and this is required to satisfy its setup time. This path hence includes both side's timing specifications and requirements.

By restricting all these paths to 1.5 ns, as in this constraint, no path can exceed this time limit, and hence the skew between path delays is limited to this figure as well. So even with a 2 ns clock period on @clk1, the reordering scenario is impossible. Which, once again, is extremely unlikely anyhow, but this is the way to ensure that.

Option #2: Occasional update of the register

The restriction that only one bit may toggle on each clock is often too restrictive. When the data is updated occasionally, another technique can be used. For the following example, assume that @do_update is asserted (i.e. has value '1') only once in several clocks, and that it's used to indicate that the value in @foo should be updated with @new_value:

reg [7:0] foo, bar;
reg       toggle, toggle_metaguard, toggle_a, toggle_b;
reg       new_value_bar;

always @(posedge clk1)
  if (do_update)
    begin
      foo <= new_value;
      toggle <= !toggle;
    end

always @(posedge clk2)
  begin
    toggle_metaguard <= toggle;
    toggle_a <= toggle_metaguard;
    toggle_b <= toggle_a;

    if (toggle_a != toggle_b)
      bar <= foo; // No metastability guard, because foo is stable
    new_bar <= (toggle_a != toggle_b); // Not necessary, just side info
  end

For now, ignore @new_bar. I'll come to that later.

So this is how it works: @foo is updated only when @do_update is asserted, and @toggle is negated along with that.

On @clk2's clock domain, @toggle_metaguard samples @toggle as a metastability guard. On the following clock cycle, this is copied into @toggle_a. The value in @foo is copied directly into @bar in the next clock cycle, following the change in @toggle_a, because of the "if" statement.

The fact that @bar and @foo are in different clock domains has no significance, because @foo has been stable for well more than enough time to meet the timing requirements.

Why am I so sure about that? This time I have a good reason, and it goes like this: The whole procedure starts when @toggle_metaguard changed value because @toggle did. Had @bar sampled @foo at the same @clk2 cycle, it would have been unsafe, but with some luck maybe it would have been OK. But then there's another @clk2 cycle until @toggle_metaguard's new value gets to @toggle_a. But @bar isn't updated even then, only on the next @clk2 cycle.

So there's at least two @clk2 clock period's worth of time from the moment @toggle's new value was available for sampling by @toggle_metaguard, and until @bar samples @foo. Compared with any flip-flop's setup time, that's an eternity. That said, the same set_max_delay as shown in the previous page can be applied on the paths to @bar, even though it's very unlikely to be necessary, for the same reasons as above.

The Achilles' heel of this method is that @do_update must be asserted rarely enough to ensure that @foo remains stable when it's sampled by @bar. A reasonable minimal time between such updates is the time corresponding to four @clk2 cycles. So the calculation is how many @clk1 cycles that corresponds to, rounded up to the nearest integer. If @clk1 is four times slower than @clk2 (or slower), that's not a restriction at all. Otherwise, there must be some mechanism in logic that ensures @do_update doesn't get asserted more often than allowed.

The truth is that in real-life designs, when the update rate is very slow, clock domains are sometimes crossed carelessly without any protection like the toggle register. So it's just @foo copied into @bar continuously, and who cares what happens when @foo changes once in a long while. More often than not, this is the result of neglecting the whole clock domain issue, because hey, it works. Until it doesn't occasionally.

Speaking of being sloppy, note that neither @toggle nor any of its related registers are reset nor assigned an initial value in the example above. This is usually fine, because odds are that the synthesizer assigns them all with an initial value of 0. And even if these registers don't have the same value initially, it results in unnecessary sampling of @foo, after which this transient response is over. It might be a good idea to reset these registers nevertheless.

More advanced variants

So far, I've presented three simple examples:

These simple examples are the basis for several other mechanisms.

First, I promised to say something about @new_bar in the example above. So it's just a register that goes '1' for one clock along with @bar having a new value. Nothing odd with this, but note that @bar and @new_bar reflect @foo and @do_update in the other clock domain. So this is a way to pass commands and status messages across a clock domain (have I mentioned that a FIFO should be used instead, when possible?).

Another interesting expansion of the last example is that instead of the foo-bar pair of registers, there can be a dual port RAM. This is a method for passing buffers of data across clock domains. Suppose that logic in the @clk1 clock domain writes data into the RAM, and eventually fills one half of it. As it goes on to fill the second half, it toggles the @toggle register. This is passed to the @clk2 clock domain as shown above. Instead of updating @bar, as shown above, the logic consumes the data in the first half of the RAM. This is how this simple toggle register can synchronize a double-buffered read-write data flow. In fact, the role of @toggle isn't just to change value, but for this usage it also informs the other side which buffer half is currently being written to.

Even though this double-buffer mechanism might sound appealing, it's still best to use it only when a FIFO can't do the job. For example, when the data from the double buffer is read in a different order than it's written.

When toggling one bit is too slow

When just toggling a bit limits the pace of passing information between the clock domains, it can be replaced with a word in Gray code. By incrementing the number in Gray code, it can pass the resynchronization logic correctly, and hence inform the destination clock domain about the number of events that have taken place.

For example, in a dual-clock FIFO, this word is incremented by the logic that writes data into the FIFO's dual-port RAM for every data element written. This allows the logic that reads data from the same RAM to keep track on how many words it's safe to read. As with the example of occasional update of a register above, the resynchronization logic's delay ensures that the read operation always occurs sufficiently later — in this case from the RAM, and not just a register.

Summary

In the end, it boils down to this: When crossing clock domains, there's always resynchronization logic in place. The data word that passes this resynchronization is limited, so only one bit can change value on each clock cycle of the source clock (@clk1 in the examples). Otherwise, illegal data may arrive at the destination.

In some applications, this setting is good enough, but when this one-bit-at-a-time limitation is too restraining, the data can instead pass between the clock domains with a vector register or through RAM, without any resynchronization logic applied to the data itself. What makes this work is that the data word is ensured to be stable when it's sampled at the destination, by virtue of logic that maintains a minimal time gap between the write and read operations of the data. This logic is however based upon clock domain resynchronization, which obeys to the one bit limitation, possibly by using Gray code.

So resynchronization logic and this one-bit rule are always there when crossing unrelated clock domains safely. It's just a matter of how they're applied.

Copyright © 2021-2022. All rights reserved. (42e6e8c4)