01signal.com

Asynchronous resets on FPGA: Not as easy as many believe

This page is the first of three in a series about resets in FPGAs. Because so many use asynchronous resets without knowing that they don't really work as they expect, this first page explains why the whole topic isn't as easy as it may seem.

Does your reset really work?

Consider this example for resetting a state machine, which is wrong:

   always @(posedge clk or negedge resetn)
     if (!resetn)
       state <= ST_START;
     else
       case (state)
	 ST_START:
	   begin
	      state <= ST_NEXT;
	      [  ... do some stuff maybe? ... ]
	   end

	 ST_NEXT:
	   begin
	      [ ... do something ... ]
	   end
       endcase

So what's wrong with the reset here, you may wonder? This is exactly like they show in the textbooks! An active low asynchronous reset, that brings the @state register to its initial state. What could go wrong?

For the sake of discussion, let's assume that the @resetn signal by itself is OK. In other words, it isn't connected directly to a pushbutton or something, but possibly some reset generating chip or is generated internally by the FPGA. This way or another, it's asserted stably and long enough, and then it's deasserted. Still wrong.

And when I say wrong, I mean the kind of wrong that makes the FPGA behave weirdly every now and then for no apparent reason.

So what's the problem? Well, because the reset is asserted long enough, it will surely bring @state into its initial state. But what happens when it's deasserted (becomes '1') in the example above? That's when the relevant flip-flops should start responding to rising edges of @clk.

However it takes a little time for the flip-flop to recover from the reset signal, and to start sampling data on rising clock edges. And since the reset is asynchronous by definition, it may be deasserted any time with relation to @clk.

If the first rising edge of @clk arrives too soon after the deassertion of the reset, it will be ignored. That's perfectly fine if all flip-flops that are connected to @clk do that. But not all flip-flops are exactly the same, and some get the clock edge a bit earlier than others, and some get the reset deassertion later than others.

So it boils down to this: With some bad luck, the reset signal might deassert close enough to the clock's rising edge, so that some flip-flops respond to the first rising edge, and others ignore it. Blame the tolerance of flip-flops on the same chip, blame the clock and reset skews, the bottom line is that some flip-flops are one clock cycle ahead of the others.

To understand how bad this is, consider the example above. If the synthesizer recognizes that this is a state machine, there's a good chance that it will implement the state variable with one-hot encoding. In other words, it assigns a single-bit register for each state, which is asserted when the state machine is in the related state. Let's say that the synthesizer assigned a register called hot_state_0 for the ST_START state, and hot_state_1 for ST_NEXT. Clearly, the reset asserts hot_state_0 and deasserts hot_state_1.

Now note that the state machine moves unconditionally from ST_START to ST_NEXT. Accordingly, hot_state_0 is deasserted on the first clock after the reset is released, and hot_state_1 is asserted.

But what if the reset is deasserted with unlucky timing, making some flip-flops miss the first clock edge and others don't? One possibility is that hot_state_0 misses the first clock, but hot_state_1 responds to it. As a result, both are asserted, which is an illegal condition with one-hot encoding. If it's the other way around, both registers become deasserted, so in fact all one-hot registers of the state machine are deasserted. Either way, the state machine may never be able to recover to a legal state.

How will this look in practice? It depends on the application, of course, but odds are that something will not work correctly until the FPGA is reset again. Finding the reason for this behavior might be extremely difficult, because this problem will appear randomly, and not necessarily often. It's also likely to behave differently for each FPGA design build, and maybe different from one FPGA device to another. In short, it's the kind of bug that can drive you nuts. It maybe doesn't sound all that bad when discussing the source of the problem, but when such an instability occurs in real life, it feels like the FPGA is haunted.

But I do this all the time, and it works!

Indeed. In the vast majority of cases, it doesn't matter all that much if some flip-flops miss the first clock after reset.

The main reason that the state machine example above can fail is that it leaves the initial state on the first clock cycle. Most state machines in real-life designs have some condition to move away from the initial state, which typically isn't possible during the first few clock cycles. So one gets away with this mistake.

Another example is a simple counter:

   reg [15:0] counter;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       counter <= 0;
     else
       counter <= counter + 1;

In this case, @counter consists of 16 flip-flops, each receiving the next count value at its data input, and @resetn as an asynchronous reset.

When @resetn is asserted, @counter gets the value 0, and the next count value is 1. Hence all flip-flops except counter[0] remain as zero, whether they miss the first clock edge or not. So it will start counting correctly either way. In the vast majority of cases where code like this appears, one clock cycle earlier or later doesn't matter.

This is however a different story:

   reg [15:0] counter;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       counter <= 0;
     else
       counter <= counter - 1;

A small difference, but a significant one: If the counter starts at zero and counts down, its next count value is 0xffff. In other words, all flip-flops must flip value on the first clock after the reset. Hence if some respond to the first clock edge after the reset, and others won't, the counter will effectively start from a virtually arbitrary initial condition.

But who resets a counter with zero and then counts down?

So here's a more realistic example: A clock enable signal making the logic run at half the clock rate (and hence allows for a multi-cycle path if so required):

   reg en;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       en <= 0;
     else
       en <= !en;

   always @(posedge clk)
     if (en)
       [ ... do something ... ]

Apparently nothing can go wrong here: The clock enable @en is a single register, so it doesn't really matter when it starts toggling, or does it...? The thing is that a clock enable signal tends to have a high fan-out, so the synthesizer might duplicate it to avoid exceeding the fan-out limit. My anecdotal experiment with a Vivado synthesizer showed that each of the duplicated @en registers relied on its own output signal. Hence if they don't begin toggling on the same clock cycle, they will keep outputting opposite values indefinitely.

If such accident happens, the logic is likely to malfunction completely. So if you insist on an asynchronous reset, at least make sure there's only one register of the clock enable's source, possibly as in

   reg pre_en; // Apply some don't-touch synthesis directive on this
   reg en;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       pre_en <= 0;
     else
       pre_en <= !pre_en;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       en <= 0;
     else
       en <= pre_en;

   always @(posedge clk)
     if (en)
       [ ... do something ... ]

The trick is to toggle @pre_en, which has a low fan-out and possibly some attribute that tells the synthesis not to fiddle with it. This way, it's surely a single register. All @en registers sample this register, so they won't present opposite values. As for the first clock after reset, it doesn't matter if it's missed by some @en flip-flops, because the value to sample is zero anyhow.

So the bottom line is that using an asynchronous reset incorrectly will be just fine usually, mainly because the logic happens to be tolerant to the clock-reset timing uncertainty. Nevertheless, applying asynchronous resets carelessly can lead to occasional misbehavior which can be extremely difficult to solve.

Timing constraint between reset and clock

The seemingly obvious way to avoid the uncertain timing relation between the reset's release and the next rising edge is to issue a timing constraint on the reset signal. However by doing so, the reset signal becomes synchronous, as it necessarily is clocked by the same clock — or else there's no meaning to a timing constraint with respect to that clock.

The fact that the Verilog code pattern for an asynchronous reset is used makes no difference, neither does it matter if the flip-flop's asynchronous reset input is used, or if the flip-flop is configured to consider its reset input as asynchronous: If the reset is clocked and constrained, it's effectively synchronous, and you might consider using the Verilog pattern for it directly:

   always @(posedge clk)
     if (!resetn)
       state <= ST_START;
[ ... ]

That said, Intel's Youtube video on timing closure promotes using clocked and constrained asynchronous resets for the sake of utilizing dedicated global routing resources in the FPGA. I find this quite odd, because even global routing resources can have a significant delay, in particular on large FPGAs. But there are surely some scenarios where this makes sense.

Actually, there's another reason for remaining with the asynchronous reset even thought it's effectively synchronous, which is discussed on the next page: It allows propagating the assertion of the reset signal asynchronously, which comes handy in simulations as well as in tests for ASICs.

If this synchronized asynchronous reset method is used, it's important to ensure that the timing constraint is enforced on the reset signal paths going to flip-flops' asynchronous reset inputs. The fact that the reset is generated by a flip-flop that is clocked with the same clock as the destination flip-flop, doesn't ensure by itself that the path between them is timed. Most timing tools ignore any path ending at asynchronous inputs by default, so odds are that an explicit tool setting is necessary. Be sure to verify in the timing reports that these paths are indeed constrained. It's easy to fall on this one.

This wraps up the first page in this series on resets. The next page discusses the different options for resetting and FPGA initialization.

Copyright © 2021-2022. All rights reserved. (59ca02e6)