01signal: Improving timing on FIFOs by adding registers

Overview

This page, which is the last in a series about FIFOs, shows how to modify an existing FIFO into another. First, how to turn a "standard FIFO" into an FWFT FIFO, and then the other way around. This is followed by some more advanced methods to improve the FIFO's timing, i.e. make it work at higher frequencies.

From a practical point of view, it's quite pointless reading through this page unless you have a problem that is related to a FIFO with achieving timing constraints. The content of this page is difficult, and not necessary for common use of FIFOs. It might nevertheless be worth the effort as an exercise, for the purpose of training the muscles you'll use while designing logic, in particular logic that processes data.

Not directly related, there's another page showing how create a very deep FIFO with the help of external memory (typically DDR memory, but anything that wrapped with an AXI interface will do). The nice thing about this trick is that even though this huge FIFO can be as deep as the external memory that it uses, all this is transparent to the application logic: It has the same interface as a baseline FIFO.

I should mention that I wrote the Verilog code on this page many years ago, so the coding style differs slightly from today.

Standard FIFO to FWFT FIFO

Just to quickly recap on FWFT FIFOs from the earlier page: With a "standard FIFO", a low @empty port means that valid data will be presented on the FIFO's output after @rd_en is high on a rising clock edge. A FWFT FIFO presents the data on the output as soon as it's available, so a low @empty signal means that the data on the output is valid.

The meaning of @rd_en is different as well: For a "standard FIFO" it means "bring me data". On a FWFT FIFO it's something like "I just used the data, bring me the next data if you have it".

So this is the module that turns a "standard FIFO" into a FWFT FIFO. Not surprisingly, it merely manipulates @rd_en and @empty. The rest of the signals are just passed through.

module basic_fwft_fifo(rst, 
                       rd_clk, rd_en, dout, empty,                 
                       wr_clk, wr_en, din, full);

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

   reg                   dout_valid;
   wire                  fifo_rd_en, fifo_empty;

   // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full)
       );

   assign fifo_rd_en = !fifo_empty && (!dout_valid || rd_en);
   assign empty = !dout_valid;

   always @(posedge rd_clk or posedge rst)
      if (rst)
         dout_valid <= 0;
      else
         begin
            if (fifo_rd_en)
               dout_valid <= 1;
            else if (rd_en)
               dout_valid <= 0;
         end 
endmodule

I would normally explain the code here, but that would just repeat the explanations given for the FWFT FIFO on the previous page.

FWFT FIFO to standard FIFO

This is really simple. Since a low @empty from an FWFT FIFO means that data is present on the output port, create a register which samples this data when @rd_en is high.

So it's just this:

module standard_fifo(rst, 
                     rd_clk, rd_en, dout, empty,                 
                     wr_clk, wr_en, din, full);

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

   reg [(width-1):0]     dout;
   wire [(width-1):0]    dout_w;
   
   always @(posedge rd_clk)
     if (rd_en && !empty)
       dout <= dout_w;
   
   fwft_fifo wrapper
     (
      .wr_clk(wr_clk),
      .rd_clk(rd_clk),
      .rst(rst),
      .din(din),
      .wr_en(wr_en),
      .rd_en(rd_en && !empty),
      .dout(dout_w),
      .full(full),
      .empty(empty)
      );   
endmodule

Note that only @dout is manipulated. @empty is passed through as is: If it's high, @dout_w is invalid, so @dout can't sample a value from it.

Tricks for improving timing

Welcome to the grand finale of these four pages on FIFOs. Which is definitely the most difficult part to read.

So every now and then, when trying to figure out why an FPGA design doesn't achieve the timing constraints (i.e. reach the desired clock frequency), it turns out that the critical path starts and/or ends at a FIFO. Let's first go over the cases that are easy to solve, and finish with the hard nut.

When @empty and/or @full are in the critical path

The @empty signal and @full signal may appear in the critical path, in particular if @wr_en and @rd_en are combinatorial functions of these. That's mainly because these signals are often not just for request a write operation or read operation from the FIFO, but they also act as enable signals for the application logic that consumes or produces data: If the data didn't flow, the logic freezes as well.

Therefore, there are often many logic equations that rely on @wr_en and @rd_en, and often the logic functions are quite complicated. The result is a high fanout. All this boils down to a problematic propagation delay.

@empty and @full are outputs of flip-flops in any properly written FIFO, so there isn't much to improve with that. But because the FPGA's software often delivers the FIFO as a synthesized netlist, it's impossible (or at least difficult) to duplicate these registers for the sake of reducing their fanout. Also, there can be a large physical distance on the FPGA's logic fabric between these registers and the application logic that uses their output values. On large FPGAs, this can make the crucial contribution to the paths' delay.

The fix to this problem has already been given on this page, when discussing @almost_empty and @almost_full. By using these ports, the outputs of @wr_en and @rd_en can be registers. This solves the problem with the combinatorial function, and also allows controlling the fanout of these signals. On top of that, this also helps the tools to place these registers closer to the logic that consume their values, so it helps reducing the propagation delay.

When @wr_en and/or @din are in the critical path

This situation is definitely the easiest to solve. Just add a layer of registers. Something like

always @(posedge wr_clk)
  begin
    wr_en_reg <= wr_en;
    din_reg <= din_reg;
  end

and then connect @wr_en_reg and @din_reg to the FIFO instead. To prevent the FIFO from overflow, @almost_full should be used instead of @full. Or more generally speaking, the threshold for filling the FIFO should be subtracted by one.

When @rd_en and/or @dout are in the critical path

Now we're getting serious. Not only is this a relatively difficult problem to solve, but it's also the most likely to occur. There are a few reasons for that:

The @rd_en signal is used in combinatorial logic inside the FIFO, among others while calculating the next read address. So the FIFO itself contributes an amount of delay.
The application logic that generates @rd_en is often a combinatorial function that can be complicated. This logic function may consists of state registers and several flags that coming from different parts of the logic fabric.

As for @dout:

The FIFO's data output is often connected directly to a block RAM. Compared with an FPGA's flip-flop, these RAMs have a significantly worse clock-to-output timing. If the FIFO is implemented with several RAMs, their data outputs are inserted into a multiplexer, so @dout is the result of this combinatorial logic. This adds even more delay.
The application logic may use the value of @dout in combinatorial functions.
The physical distance on the FPGA between the FIFO's RAMs and the application logic can add delay on large FPGAs.

So the goal is to eliminate the combinatorial path between @rd_en and the FIFO's logic, and to do the same with @dout.

Detaching only @dout's combinatorial path

I don't really mean to present this as a solution, but the discussion might be helpful to understand this as a preparation for grasping the next step. If this just confuses you, skip this section.

So suppose we only wanted to detach @dout's combinatorial path. Note that the wrapper module that converts an FWFT FIFO into a "standard FIFO" (as shown above), does exactly that: It adds a register, and by doing so, it ends @dout's combinatorial path. But that requires an FWFT FIFO as the starting point.

But there was wrapper module that converts a "standard FIFO" into FWFT FIFO. So maybe convert the FIFO back and forth? Or write a single module that does the equivalent? Either way, a solution of this form worsens the situation with @rd_en.

Yet, this solution is worth looking closer at: The conversion to an FWFT FIFO merely consisted of keeping track on when the wrapped FIFO's @dout was valid, and hold @fifo_rd_en high when @dout wasn't valid (and/or when the external @rd_en was high).

The conversion back to a "standard FIFO" was done by copying the value of the wrapped FIFO's @dout into a register when @rd_en was high.

So all in all, the first mechanism kept the wrapped FIFO's @dout valid when possible, and the second mechanism copied @dout into another register when the external @rd_en requested that.

But this doesn't solve the problem with @rd_en's combinatorial path: In order to allow continuous reading, a word must be read from the original FIFO on each clock that the external @rd_en is high. Otherwise the FWFT's @dout turns invalid because it has been consumed but not updated. Hence this internal FIFO's @rd_en must be a combinatorial function of the external @rd_en. If we want to change this, another register needs to be added to @dout's path, as shown next.

Detaching both combinatorial paths with reg_fifo

Without further ado, this is the reg_fifo module, which detaches the combinatorial paths for @rd_en and @dout:

module reg_fifo(rst, 
                rd_clk, rd_en, dout, empty,                 
                wr_clk, wr_en, din, full);
   
   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

   reg                   fifo_valid, middle_valid;
   reg [(width-1):0]     dout, middle_dout;

   wire [(width-1):0]    fifo_dout;
   wire                  fifo_empty, fifo_rd_en;
   wire                  will_update_middle, will_update_dout;

   // orig_fifo is "standard" (non-FWFT) FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(fifo_dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full)
       );

   assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);
   assign will_update_dout = rd_en && !empty;
   assign fifo_rd_en = !fifo_empty && !(middle_valid && fifo_valid);
   assign empty = !(fifo_valid || middle_valid);

   always @(posedge rd_clk)
      if (rst)
         begin
            fifo_valid <= 0;
            middle_valid <= 0;
            dout <= 0;
            middle_dout <= 0;
         end
      else
         begin
            if (will_update_middle)
               middle_dout <= fifo_dout;
            
            if (will_update_dout)
               dout <= middle_valid ? middle_dout : fifo_dout;
            
            if (fifo_rd_en)
               fifo_valid <= 1;
            else if (will_update_middle || will_update_dout)
               fifo_valid <= 0;
            
            if (will_update_middle)
               middle_valid <= 1;
            else if (will_update_dout)
               middle_valid <= 0;
         end 
endmodule

The first thing to note is that @dout is a register that is defined in this module, and that @rd_en causes an update of this register. It's important not to get these two confused with the similar signals that are connected to the internal FIFO, which are @fifo_dout and @fifo_rd_en.

Now to how this module works.

Understanding the pipeline

Just like the converter to a FWFT FIFO, there is an instantiation of a regular FIFO, orig_fifo. The logic in the reg_fifo module attempts to keep @fifo_dout's value valid by reading a word from orig_fifo when @fifo_dout doesn't contain a valid value. But in addition to that, there's a second register, which is called @middle_dout. The logic attempts to keep this register valid as well, by taking the value of @fifo_dout, when possible.

So it's possible to look at @fifo_dout, @middle_dout and @dout as a pipeline that moves the data in orig_fifo forward.

There are two registers that keep track of when these pipeline stages are valid: @fifo_valid is high when @fifo_dout is valid, and @middle_valid is high when @middle_dout is valid.

The purpose of this pipeline is the ability to bypass its middle stage: When @rd_en is high (and @empty is low), @dout takes its new value from either @middle_dout or @fifo_dout, but it always prefers @middle_dout. In other words, if @middle_dout is valid, @dout uses @middle_dout, otherwise it uses @fifo_dout instead. How this is the key to detaching @rd_en's combinatorial path is explained later on.

So first, let's look at the details of the implementation. There are two separate pathways from the FIFO's data output to fifo_reg's output register. They are shown separately to the left and right in this drawing:

If none of the two pipeline stages (@fifo_dout and @middle_dout) is valid, @empty is high to indicate that there's nowhere to take data from:

assign empty = !(fifo_valid || middle_valid);

The attempt to keep these pipeline stages valid is reflected by

assign fifo_rd_en = !fifo_empty && !(middle_valid && fifo_valid);

which says that if any of the two pipeline stages is invalid, read from orig_fifo, if possible. If @fifo_dout is already valid, its value is copied into @middle_dout at the same time as @fifo_dout is updated (more about this below).

Now let's look at the definitions of the @will_update_* pair:

assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);
assign will_update_dout = rd_en && !empty;

First pay attention to that @will_update_dout equals @rd_en plus a safety guard against reading from reg_fifo it when @empty is high.

Next, we have @will_update_middle, which controls the update of @middle_out, as follows:

always @(posedge rd_clk)
  if (will_update_middle)
     middle_dout <= fifo_dout;

Looking at @will_update_middle's definition above, there are two conditions for updating @middle_dout: One is that the value of @fifo_dout is valid, which is quite obvious, and then there's the expression saying (middle_valid == will_update_dout). Let's break this expression down to the four possible options, as it explains how the whole machinery works. Keep in mind that all of this plays a role only when @fifo_dout is valid:

@middle_valid == 0 and @will_update_dout == 0. @middle_dout isn't valid, and @fifo_dout's value will not be copied into @dout. So update @middle_dout from @fifo_dout.
@middle_valid == 0 and @will_update_dout == 1. @middle_dout isn't valid, but @dout will be updated, so obviously it will be updated from @fifo_dout. Because @fifo_dout's value is consumed, it can't be copied into @middle_out, so do nothing.
@middle_valid == 1 and @will_update_dout == 0. @middle_dout is valid, and nothing is copied into @dout. Both pipeline stages are valid, and will remain so anyhow. So do nothing.
@middle_valid == 1 and @will_update_dout == 1. @middle_dout is valid, and will be copied into @dout. So update @middle_dout from @fifo_dout in order to remain valid.

Note that @fifo_rd_en is low when @middle_valid and @fifo_valid are high at the same time. As a result, no data is fetched from orig_fifo when the scenarios of the two last cases occur.

In particular, when both of the two pipeline stages are valid and @rd_en is high, @fifo_dout's value is copied into @middle_dout. And because @fifo_rd_en is low, @fifo_valid will change to low on the following clock cycle. Which is fine, because @middle_valid will remain high so it can supply data on this following clock cycle if that is needed. On the clock cycle after that @fifo_valid will be high again (if there's data in orig_fifo).

So why isn't @fifo_rd_en defined to keep @fifo_dout valid in this specific situation? Because then @fifo_rd_en would have to be a combinatorial function of @rd_en, which is exactly what this dual-stage pipeline is designed to avoid.

With this at hand, it's time to look at how @dout is defined. Except for reset, the definition is:

always @(posedge rd_clk)
  if (will_update_dout)
    dout <= middle_valid ? middle_dout : fifo_dout;

Substituting @will_update_dout with its definition, it becomes:

always @(posedge rd_clk)
  if (rd_en && !empty)
    dout <= middle_valid ? middle_dout : fifo_dout;

This is similar to the conversion from a FWFT FIFO to a "standard FIFO", only there are two sources to choose from: If @middle_dout contains a valid value, it's taken. Otherwise, @fifo_dout. If neither is valid, @empty is high, so nothing happens anyhow.

Why does this help? As for the output timing, @dout is clearly a register. Regarding @rd_en, note that @fifo_rd_en depends only on @middle_valid and @fifo_valid, which are both registers. Plus @fifo_empty, which is the output of orig_fifo itself (this combinatorial path is inevitable). Accordingly, @fifo_rd_en doesn't depend on the external @rd_en, hence there's no combinatorial path from @rd_en to orig_fifo.

Tracking the validity of the pipeline stages' registers

Just to complete the picture: The two *_valid flags tell us whether the related register contains valid data. Regarding @fifo_valid:

 if (fifo_rd_en)
   fifo_valid <= 1;
 else if (will_update_middle || will_update_dout)
   fifo_valid <= 0;

This is like the definition of @dout_valid in the conversion from a "standard FIFO" to a FWFT FIFO above: When @fifo_rd_en is high on a rising edge, @fifo_valid becomes high as a result of that. Which makes sense, because if we read data from orig_fifo, this FIFO's output is consequently considered valid. But if @fifo_rd_en was low, and data was copied into one of @middle_dout or @dout, we don't consider @fifo_dout valid anymore: The FIFO's output has just been used, and the FIFO didn't replace it with new data.

@middle_valid follows the same logic:

 if (will_update_middle)
   middle_valid <= 1;
 else if (will_update_dout)
   middle_valid <= 0;

When @will_update_middle is high, data is copied into @middle_dout, so @middle_valid becomes high as well. Otherwise, and if the data in @middle_dout is copied to @dout, @middle_valid changes to low. @will_update_dout is a sufficient condition for this, because recall from above that @dout prefers copying from @middle_dout, when possible.

Does it work at all?

This module is so complicated, that it requires an almost formal proof that it works. So one way to answer this question is to ask how many of the two pipelines stages, @fifo_dout and @middle_dout, are valid. This value isn't defined in the reg_fifo module, but it could have been defined as

wire [1:0] valid_count;
assign valid_count = fifo_valid + middle_valid;

This imaginary @valid_count can obviously take the values 0, 1 or 2. It counts up or down as follows:

@valid_count counts up by one on every clock cycle that @fifo_rd_en is high and (rd_en && !empty) is false.
@valid_count counts down by one on every clock cycle that @fifo_rd_en is low and (rd_en && !empty) is true.
Otherwise, @valid_count doesn't change.

Take a look on the logic equations, and convince yourself that these three statements are correct.

So let's see what happens when there's data in orig_fifo and the application logic wants to read continuously:

reg_fifo's logic tries to push up @valid_count towards 2 by reading from orig_fifo. On the other hand, @empty is low when @valid_count is not zero, hence @rd_en is allowed to go high as soon as @valid_count is 1. So when @valid_count is 1, @fifo_rd_en will be high because @valid_count isn't 2.

But @valid_count won't reach the value of 2, because @rd_en prevents it from that by staying high. So the data flows with both @fifo_rd_en and @rd_en held high, and @valid_count remaining on 1. Except for the beginning, the data is copied from @fifo_dout to @dout.

This break-even is broken when orig_fifo becomes empty: In this case @valid_count becomes to zero because @fifo_rd_en is not allowed to be high anymore. Another tie-breaker is when the FIFO isn't empty, and @rd_en becomes low because the application logic doesn't want to read more: In this case, @valid_count rises to 2, and remains there.

But later on, when @rd_en becomes high again, @valid_count goes down to 1, and only at that point does @fifo_rd_en change to high (unless orig_fifo is empty).

Once again, @valid_count is just a theoretical signal that isn't implemented in the module. This explanation was hopefully helpful for understanding why the two extra pipeline stages guarantee a continuous flow of data.

Usage notes

This module can be used as a drop-in replacement for the "standard FIFO" that it wraps. From a functional point of view, nothing changes. orig_fifo will however see a slight change in the behavior of @rd_en, @dout and @empty, but doesn't matter as long as orig_fifo is behaves correctly as a FIFO (which is a safe assumption). The ports that relate to writing data are passed through untouched, so there's no change whatsoever with these.

Since the module adds a couple of pipelines stages, orig_fifo's fill counters may present lower values than the total number of words that are stored in reg_fifo (i.e. the number of words that are stored in orig_fifo as well as the pipeline stages counted together). Hence if @almost_empty or similar ports are enabled on orig_fifo, they may present a pessimistic picture.

A slight drawback of reg_fifo is that its @empty output isn't a register, but rather a combinatorial function of two registers. This isn't optimal for timing, but with a minimal impact in most use cases. This can be fixed by defining combinatorial registers like @next_fifo_valid and @next_middle_valid in the same spirit as @next_words_in_ram, as shown on this page. It's not implemented here, mainly because reg_fifo is complicated enough as is.

A FWFT FIFO with improved timing

To finish this topic, below is the module that does the same thing as reg_fifo, but gives the application logic an FWFT FIFO instead. Note that it's based upon a standard FIFO, not a FWFT FIFO. So don't get confused with this...

The transition from reg_fifo to this module is to a large extent the same as I've already discussed regarding FWFT FIFOs.

module fwft_reg_fifo(rst, 
                     rd_clk, rd_en, dout, empty,                 
                     wr_clk, wr_en, din, full);

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

   reg                   middle_valid, dout_valid;
   reg [(width-1):0]     dout, middle_dout;

   wire [(width-1):0]    fifo_dout;
   wire                  fifo_empty, fifo_rd_en;
   wire                  will_update_middle, will_update_dout;

   // orig_fifo is "standard" (non-FWFT) FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(fifo_dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full)
       );

   assign will_update_middle = !fifo_empty && (middle_valid == will_update_dout);
   assign will_update_dout = (middle_valid || !fifo_empty) && (rd_en || !dout_valid);
   assign fifo_rd_en = !fifo_empty && !(middle_valid && dout_valid);
   assign empty = !dout_valid;

   always @(posedge rd_clk)
      if (rst)
         begin
            middle_valid <= 0;
            dout_valid <= 0;
            dout <= 0;
            middle_dout <= 0;
         end
      else
         begin
            if (will_update_middle)
               middle_dout <= fifo_dout;
            
            if (will_update_dout)
               dout <= middle_valid ? middle_dout : fifo_dout;
                        
            if (will_update_middle)
               middle_valid <= 1;
            else if (will_update_dout)
               middle_valid <= 0;
            
            if (will_update_dout)
               dout_valid <= 1;
            else if (rd_en)
               dout_valid <= 0;
         end 
endmodule

This wraps up this series about FIFOs.