01signal.com

Asynchronous resets on FPGA: Not as easy as many believe

This page is the first of three in a series about resets in FPGAs. Because so many use asynchronous resets without knowing that they don't really work as they expect, this first page explains why the whole topic isn't as easy as it may seem.

Does your reset really work?

Consider this example for resetting a state machine, which is wrong:

   always @(posedge clk or negedge resetn)
     if (!resetn)
       state <= ST_START;
     else
       case (state)
	 ST_START:
	   begin
	      state <= ST_NEXT;
	      [  ... do some stuff maybe? ... ]
	   end

	 ST_NEXT:
	   begin
	      [ ... do something ... ]
	   end
       endcase

So what's wrong with the reset here, you may wonder? This is exactly like they show in the textbooks! An active low asynchronous reset, that brings the register that implements @state to its initial state. What could go wrong?

For the sake of discussion, let's assume that the @resetn signal by itself is OK. In other words, it isn't connected directly to a pushbutton or something like that. Possibly, a chip that is designed to generate a reset signal was used, or the reset is generated internally by the FPGA. This way or another, we'll assume that the reset becomes active in a stable manner, that it stays active for long enough, and then becomes inactive. Still wrong.

And when I say wrong, I mean the kind of wrong that makes the FPGA behave weirdly every now and then for no apparent reason.

So what's the problem? Well, because the reset is active for long enough, it will surely bring @state into its initial state. But what happens when becomes inactive (i.e. changes back to '1' in the example above)? That's when the relevant flip-flops should start responding to rising edges of @clk.

However it takes a little time for the flip-flop to recover from the reset signal, and to start sampling the data input on rising clock edges. And since the reset is asynchronous by definition, it may be deactivated any time with relation to @clk.

If the first rising edge of @clk arrives too soon after the deactivation of the reset, the flip-flop ignores this rising edge. That's perfectly fine if all flip-flops that are connected to @clk do that. But not all flip-flops are exactly the same, and some get the clock edge a bit earlier than others, and some get the deactivation of the reset later than others.

Truth to be said, this explanation is a bit simplistic. For a more accurate view on the problem, refer to the part about Recovery and Removal on the page that explains the basics of timing.

So it boils down to this: With some bad luck, the reset might become inactive close enough to the clock's rising edge, so that some flip-flops respond to the first rising edge, and others ignore it. In fact, some flip-flops may take some extra time to decide what to do. Blame the difference between flip-flops on the same chip or blame the clock skew and reset skew: The bottom line is that some flip-flops are one clock cycle ahead of the others.

To understand how bad this is, consider the example above. If the synthesizer recognizes that this is a state machine, there's a good chance that it will implement the state variable with one-hot encoding. In other words, it assigns a single-bit register for each state. Each of these registers is active when the state machine is in the related state. Let's say that the synthesizer assigned a register called hot_state_0 for the state that is named ST_START, and hot_state_1 for ST_NEXT. Clearly, the reset makes hot_state_0 active and disactivates hot_state_1.

Now note that the state machine moves unconditionally from ST_START to ST_NEXT. Accordingly, hot_state_0 becomes inactive on the first clock after the reset is released, and hot_state_1 becomes active.

But what if the reset becomes inactive with a unlucky timing, so that some flip-flops miss the first clock edge and others don't? One possibility is that hot_state_0 misses the first clock, but hot_state_1 responds to it. As a result, both become active, which is an illegal condition with one-hot encoding. If it's the other way around, both registers become inactive, so in fact all one-hot registers of the state machine are inactive. Either way, the state machine may never be able to recover to a legal state.

How will this look in practice? It depends on the application, of course, but odds are that something will not work correctly until a reset is applied on the FPGA again. Finding the reason for this behavior might be extremely difficult, because this problem will appear randomly, and not necessarily often. It's also likely to behave differently from the compilation of one FPGA design to another, and maybe different from one board of electronics to another. In short, it's the kind of bug that can drive you nuts. It maybe doesn't sound all that bad in this discussion, because the source of the problem is the topic of this discussion. But when such an instability occurs in real life, it can be just anything, and it often feels like the FPGA is haunted.

But I do this all the time, and it works!

Indeed. In the vast majority of cases, it doesn't matter all that much if some flip-flops miss the first clock after reset.

The main reason that the state machine example above can fail is that it leaves the initial state on the first clock cycle. Most state machines in real-life designs have a rule for moving away from the initial state, so they always stay in this state during the first few clock cycles. So one gets away with this mistake.

But here's another example. A simple counter:

   reg [15:0] counter;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       counter <= 0;
     else
       counter <= counter + 1;

In this case, @counter consists of 16 flip-flops. Each of these flip-flops receives the value of the counter for the next clock cycle at its data input, as well as the @resetn at its asynchronous reset input.

When @resetn is active, @counter gets the value 0, and the value of the counter for the next clock cycle is 1. Hence all flip-flops except counter[0] remain as zero, whether they miss the first clock edge or not. So it will start counting correctly either way. In the vast majority of cases where code like this is written, it doesn't matter if the counter missed the first clock cycle or not.

This is however a different story:

   reg [15:0] counter;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       counter <= 0;
     else
       counter <= counter - 1;

A small difference, but a significant one: If the counter starts at zero and counts down, the value of the counter at the next clock cycle is 0xffff. In other words, all flip-flops must change their value on the first clock after the reset. Hence if some respond to the first clock edge after the reset, and others won't, the counter can start from a virtually any random value.

But who resets a counter with the value zero and then counts down?

So here's a more realistic example: A clock enable signal that makes the logic behave as if the clock's frequency was reduced to half (and hence allows for a multi-cycle path if so required):

   reg en;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       en <= 0;
     else
       en <= !en;

   always @(posedge clk)
     if (en)
       [ ... do something ... ]

One could think that nothing can go wrong here: The clock enable, @en, is a single register, so it doesn't really matter when it starts toggling, or does it...? The thing is that a clock enable signal tends to have a high fan-out, so the synthesizer might duplicate it to avoid exceeding the limit for fan-out.

My anecdotal experiment with a Vivado synthesizer showed that each of the duplicated registers that implemented @en relied on its own output signal. In other words, there wasn't a single signal that all flip-flops used to decide what their next output should be. Rather, there were many independent flip-flops that always changed their value on a rising edge of the clock. Hence if these flip-flops don't begin toggling on the same clock cycle, their outputs remain different indefinitely.

If such accident happens, the logic is likely to malfunction completely. So if you insist on an asynchronous reset, at least make sure that all clock enables rely on a single source, possibly as in

   reg pre_en; // Apply some don't-touch synthesis directive on this
   reg en;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       pre_en <= 0;
     else
       pre_en <= !pre_en;

   always @(posedge clk or negedge resetn)
     if (!resetn)
       en <= 0;
     else
       en <= pre_en;

   always @(posedge clk)
     if (en)
       [ ... do something ... ]

The trick is that @pre_en is the register that decides on the next value. The flip-flop has a low fan-out and possibly some attribute that tells the synthesizer not to tamper with it. This way, it's surely a single register. All flip-flops that implement @en rely on @pre_en, so they all agree on the value of @en on the next clock. As for the first clock after reset, it doesn't matter if it's missed by some of the flip-flops, because the value of @en on the first clock cycle is zero anyhow.

So the bottom line is that using an asynchronous reset incorrectly will be just fine usually, mainly because the logic is usually tolerant to the uncertainty on when reset becomes inactive relative to the clock. Nevertheless, applying asynchronous resets carelessly can lead to occasional misbehavior which can be extremely difficult to solve.

Using timing constraint between reset and clock

The seemingly obvious way to avoid the uncertain timing relation between the reset's disactivation and the clock's rising edge is to use a timing constraint on the reset signal. However by doing so, the reset signal becomes synchronous.

But synchronous with which clock? It is often convenient to have one global asynchronous reset for the entire logic design. This reset is created with logic that is synchronous with one specific clock. If this reset is used with logic that is synchronous with a different clock, we have a clock domain crossing. This is a topic by itself, but the most important point is the possibility that the tools ignore timing on the relevant paths.

So the starting point for using timing constraints on asynchronous resets is that there must be a separate asynchronous reset for each clock. Or if you insist, a separate asynchronous reset for each group of related clocks. Otherwise there's no meaning to a timing constraint. If that sounds weird, it's because the asynchronous reset is not asynchronous anymore.

The fact that the Verilog code uses a pattern for an asynchronous reset makes no difference. Neither does it matter if the flip-flop's asynchronous reset input is used, nor if the flip-flop is configured to consider its reset input as asynchronous: If the reset is synchronous with a clock, and a timing constraint is used, then the reset is practically synchronous. For this case you might consider using the Verilog pattern for it directly:

   always @(posedge clk)
     if (!resetn)
       state <= ST_START;
[ ... ]

That said, Intel's Youtube video on timing closure promotes using asynchronous resets that are synchronous to a clock, and timing constraints should be used too. The choice of an asynchronous reset is for the purpose of utilizing dedicated resources for global routing in the FPGA. I find this quite odd, because even global routing can have a significant delay, in particular on large FPGAs. But there are surely some scenarios where this makes sense.

Actually, there's another reason for remaining with an the asynchronous reset even when it's effectively synchronous, which is discussed on the next page: It allows propagating the activation of the reset signal asynchronously, which is useful in simulations as well as in tests for ASICs.

If this method with a synchronized asynchronous reset is used, it's important to ensure that the timing constraint is enforced on the all signal paths that go to the flip-flops' asynchronous reset inputs. The fact that the reset is generated by a flip-flop that is synchronous with the same clock as the destination's flip-flop, doesn't ensure by itself that the path between them is timed. By default, some timing tools ignore any path that end at asynchronous inputs, so this enforcement may need to be enabled explicitly. Be sure to verify in the timing reports that these paths are indeed covered by the timing constraints. It's easy to fall on this one.

This wraps up the first page in this series about resets. The next page discusses the different options for resets and initialization of the FPGA .

Copyright © 2021-2024. All rights reserved. (6f913017)