01signal.com

Improving timing for FIFO by adding registers

Overview

This page, which is the last in a series about FIFOs, shows how to modify an existing FIFO into another. First, how to turn a "standard" FIFO into an FWFT, then the other way around. This is followed by some more advanced methods to improve the FIFO's timing, i.e. make it work at higher frequencies.

From a practical point of view, it's quite pointless reading through this page unless you have a FIFO-related issue with meeting timing constraints. It might nevertheless be worth the effort, for the purpose of training the muscles you'll use while designing logic, in particular logic that processes data.

Not directly related, there's another page showing how create a very deep FIFO with the help of external memory (typically DDR memory, but anything wrapped with an AXI interface will do). The nice thing about this trick is that even though this huge FIFO can be as deep as the external memory is uses, all this is transparent to the application logic: It has the same interface as a baseline FIFO.

I should mention that I wrote the Verilog code on this page many years ago, so the coding style differs slightly from today.

Standard FIFO to FWFT

Just to quickly recap on FWFT FIFOs from the earlier page: With a "standard" FIFO, a low @empty port means that valid data will be presented on the FIFO's output after @rd_en is sampled high on a rising clock edge. A FWFT FIFO presents the data on the output as soon as it's available, so a low @empty flag means that the data on the output is valid.

The meaning of @rd_en is different as well: For a "standard" FIFO it means "bring me data". On a FWFT it's something like "I just grabbed the data, bring me the next one if you have it".

So this is the module that turns a "standard" FIFO into a FWFT. Not surprisingly, it merely manipulates @rd_en and @empty. The rest of the signals are just passed through.

module basic_fwft_fifo(rst, 
                       rd_clk, rd_en, dout, empty,                 
                       wr_clk, wr_en, din, full);

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

   reg                   dout_valid;
   wire                  fifo_rd_en, fifo_empty;

   // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full)
       );

   assign fifo_rd_en = !fifo_empty && (!dout_valid || rd_en);
   assign empty = !dout_valid;

   always @(posedge rd_clk or posedge rst)
      if (rst)
         dout_valid <= 0;
      else
         begin
            if (fifo_rd_en)
               dout_valid <= 1;
            else if (rd_en)
               dout_valid <= 0;
         end 
endmodule

I would normally explain the code here, but that would just repeat the explanations given for the FWFT FIFO on the previous page.

FWFT to standard FIFO

This is really simple. Since a low @empty from an FWFT FIFO means that data is present on the output port, create a register which samples this data when rd_en is asserted.

So it's just this:

module standard_fifo(rst, 
                     rd_clk, rd_en, dout, empty,                 
                     wr_clk, wr_en, din, full);

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

reg [(width-1):0] dout; wire [(width-1):0] dout_w; always @(posedge rd_clk) if (rd_en && !empty) dout <= dout_w; fwft_fifo wrapper ( .wr_clk(wr_clk), .rd_clk(rd_clk), .rst(rst), .din(din), .wr_en(wr_en), .rd_en(rd_en && !empty), .dout(dout_w), .full(full), .empty(empty) ); endmodule

Note that only @dout is manipulated. @empty is passed through as is: If it's high, @dout_w is invalid, so @dout can't sample a value from it.

FIFO tricks for improving timing

Welcome to the grand finale of these four pages on FIFOs. Which is definitely the most difficult part to read.

So every now and then, when trying to figure out why an FPGA design doesn't meet timing (i.e. reach the desired clock frequency), it turns out that the critical path starts and/or ends at a FIFO. Let's first go over the cases that are easy to solve, and finish with the hard nut.

@empty and/or @full in the critical path

The @empty and @full signals may appear in the critical path, in particular if the @wr_en and @rd_en are combinatoric functions of these. That's mainly because the write-enable and read-enable signals aren't just enabling write and read, but they often also act as enable signals for the application logic that consumes or produces data: If the data didn't flow, the logic freezes as well.

Therefore, the logic equations relying on @wr_en and @rd_en often get quite complicated, and there can be quite a few of them, ending up with an impressive fanout. All this boils down to a significant propagation delay.

Any properly written FIFO will have the @empty and @full signals coming directly from registers, so there isn't much to improve in that sense. But since the FIFO is often delivered by the FPGA software as an already synthesized netlist, these registers can't be (or aren't easily) duplicated for the sake of reducing their fanout. Also, their physical location on the FPGA might be far away on the logic fabric from the logic consuming the signals, because they have to be close to the FIFO's logic too. On large FPGAs, this can make the crucial contribution to the paths' delay.

The fix to this problem has already been given on this page, when discussing @almost_empty and @almost_full. Using these ports, @wr_en and @rd_en are output directly by registers defined in the application logic. This solves the combinatoric relationship, allows controlling the fanout of these signals, and odds are the tools place these registers closer to the logic as well.

@wr_en and/or @din in the critical path

Definitely the easiest to solve. Just add a layer of registers. Something like

always @(posedge wr_clk)
begin
wr_en_reg <= wr_en;
din_reg <= din_reg;
end

and then connect @wr_en_reg and @din_reg to the FIFO instead. To prevent the FIFO from overflowing, @almost_full should be used instead of @full. Or more generally speaking, the fill threshold of whatever signal is used for this purpose should be taken down by one.

@rd_en and/or @dout in the critical path

Now we're getting serious. Not only is this a relatively difficult problem to solve, but it's also the most likely to occur. There are a few reasons for that:

As for @dout:

So the goal is to cut the combinatoric path between @rd_en and the underlying FIFO's logic, and do the same with @dout.

Detaching @dout only

I don't really mean to present this as a solution, but it might be helpful to understand this as a preparation for grasping the next step. If this just confuses you, skip this section.

So suppose we only wanted to detach @dout. Note that the wrapper module shown above, that converts an FWFT FIFO into a "standard" one, does exactly that, as it adds a register. But that requires an FWFT to begin with.

But then there was wrapper module converting a "standard" FIFO into FWFT. So convert back and forth, and call it a day? Or write a single module that does the equivalent? Either way, a solution of this form worsens the situation with @rd_en.

Yet, this back-and-forth solution is worth looking closer at: The conversion to an FWFT FIFO merely consisted of keeping track on when the wrapped FIFO's @dout was valid, and drive @fifo_rd_en high when it wasn't (on top of when the external @rd_en was high).

The conversion back to a "standard" FIFO was done by copying the wrapped @dout's value into a register when @rd_en was high.

So all in all, the first mechanism kept the wrapped FIFO's @dout valid when possible, and the second one copied it into another register when requested by the external @rd_en.

But this doesn't allow detaching the combinatoric dependency on @rd_en: In order to allow continuous reading, a word must be read from the internal FIFO (the one wrapped twice) on each clock that the external @rd_en is high, or else the FWFT's @dout turns invalid as it has been consumed but not updated. Hence this internal FIFO's @rd_en must be a combinatoric function of the external one. If we want to change this, another register needs to be added to the @dout path, as shown next.

Detaching @dout and @rd_en with reg_fifo

Without further ado, this is the reg_fifo module, which detaches the combinatoric dependency for @rd_en and @dout:

module reg_fifo(rst, 
                rd_clk, rd_en, dout, empty,                 
                wr_clk, wr_en, din, full);
   
   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

   reg                   fifo_valid, middle_valid;
   reg [(width-1):0]     dout, middle_dout;

   wire [(width-1):0]    fifo_dout;
   wire                  fifo_empty, fifo_rd_en;
   wire                  will_update_middle, will_update_dout;

   // orig_fifo is "standard" (non-FWFT) FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(fifo_dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full)
       );

   assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);
   assign will_update_dout = rd_en && !empty;
   assign fifo_rd_en = !fifo_empty && !(middle_valid && fifo_valid);
   assign empty = !(fifo_valid || middle_valid);

   always @(posedge rd_clk)
      if (rst)
         begin
            fifo_valid <= 0;
            middle_valid <= 0;
            dout <= 0;
            middle_dout <= 0;
         end
      else
         begin
            if (will_update_middle)
               middle_dout <= fifo_dout;
            
            if (will_update_dout)
               dout <= middle_valid ? middle_dout : fifo_dout;
            
            if (fifo_rd_en)
               fifo_valid <= 1;
            else if (will_update_middle || will_update_dout)
               fifo_valid <= 0;
            
            if (will_update_middle)
               middle_valid <= 1;
            else if (will_update_dout)
               middle_valid <= 0;
         end 
endmodule

The ports of the wrapped FIFO (orig_fifo) are connected to the signals named @fifo_rd_en and @fifo_dout in the reg_fifo module. Its respective ports are @rd_en and @dout.

Now to how the reg_fifo module works.

Understanding the pipeline structure

Just like the converter to FWFT, reg_fifo instantiates a regular FIFO, orig_fifo, and attempts to keep its @fifo_dout's value valid by reading a word from the orig_fifo when it's not. But then there's a second register, @middle_dout, which it also attempts to keep valid, by grabbing the value of @fifo_dout, when possible.

So one can view @fifo_dout, @middle_dout and @dout as a pipeline, which just passes the data through.

There are two registers that keep track of the validity of the two pipeline stages, @fifo_dout and @middle_dout: @fifo_valid and @middle_valid, respectively. Each is high when its related pipeline stage is valid.

The point of this pipeline is the ability to bypass its middle stage: When @rd_en is high (and @empty low), @dout fetches the value from @middle_dout if it's valid, but if it's not, it takes it from @fifo_dout instead. This is the key to detaching the combinatoric relationship with @rd_en, as explained later on.

So there are two separate paths for FIFO data to the output register @dout:

Data flow with extra registers

If none of the two pipeline stages @fifo_dout and @middle_dout is valid, @empty goes high to indicate that there's nowhere to grab data from:

assign empty = !(fifo_valid || middle_valid);

The attempt to keep these stages valid is reflected by

assign fifo_rd_en = !fifo_empty && !(middle_valid && fifo_valid);

which says that if any of the two stages is invalid, read from orig_fifo, if possible. The logic is set up so that if @fifo_dout is already valid, its value is piped into @middle_dout in the nick of time.

Now let's look at the definitions of the @will_update_* pair:

assign will_update_middle = fifo_valid && (middle_valid == will_update_dout);
assign will_update_dout = rd_en && !empty;

First pay attention to that @will_update_dout equals @rd_en plus a safety guard against asserting it when @empty is high.

So next to @will_update_middle, which controls the update of @middle_out, not surprisingly:

always @(posedge rd_clk)
  if (will_update_middle)
     middle_dout <= fifo_dout;

Looking at @will_update_middle's definition above, there are two conditions for updating @middle_dout: One is that the value of @fifo_dout is valid, which is quite obvious, and then there's this (middle_valid == will_update_dout) expression. Let's break it down to the four possible options, as it explains how the whole machinery works. Keep in mind that all of this plays a role only when @fifo_dout is valid:

Note that @fifo_rd_en is low when @middle_valid and @fifo_valid are high at the same time. As a result, no data is fetched from orig_fifo when the scenarios of the two last bullets occur.

In particular, when both stages are valid and @rd_en is high, @fifo_dout's value is piped into @middle_dout, and since @fifo_rd_en is low, @fifo_valid will go low on the following clock. Which is fine, because @middle_valid will remain high so it can supply data on this following clock if needed, and on the clock cycle following that, @fifo_valid will be high again (if there's data in orig_fifo).

So why isn't @fifo_rd_en defined to keep @fifo_dout valid in this specific situation? Because it would require @fifo_rd_en to be a combinatoric function of @rd_en, which is exactly what this dual-stage pipeline is designed to avoid.

With this at hand, it's time to look at how @dout is defined, which boils down to this (minus reset):

always @(posedge rd_clk)
if (will_update_dout) dout <= middle_valid ? middle_dout : fifo_dout;

Substituting @will_update_dout with its definition, it becomes:

always @(posedge rd_clk)
if (rd_en && !empty) dout <= middle_valid ? middle_dout : fifo_dout;

which is similar to the FWFT to regular FIFO conversion, only there are two sources to choose from: If @middle_dout contains a valid value, it's taken. If not, @fifo_dout. If neither is valid, @empty is high, so nothing happens anyhow.

Why does this help? As for the output timing, @dout is clearly a register. Regarding the detachment of @rd_en, note that data is read from orig_fifo unless both @fifo_dout and @middle_dout are valid. And of course, unless orig_fifo is empty. This condition doesn't depend on the external @rd_en, hence there's no combinatoric relation with it.

The pipeline stages' validity registers

Just to complete the picture: The two *_valid flags tell us whether the respective register contains valid data. Regarding @fifo_valid,

 if (fifo_rd_en)
   fifo_valid <= 1;
 else if (will_update_middle || will_update_dout)
   fifo_valid <= 0;

this is pretty much like @dout_valid as defined for the wrapper from a "standard" to FWFT FIFO above: When @fifo_rd_en is high on a rising edge, @fifo_valid goes high as a result: If we just pulled data from orig_fifo, its current output is considered valid. But if @fifo_rd_en was low, and data was sampled by one of @middle_dout or @dout, we don't consider @fifo_dout valid anymore, since its data has been consumed.

@middle_valid goes by the same logic:

 if (will_update_middle)
   middle_valid <= 1;
 else if (will_update_dout)
   middle_valid <= 0;

When @will_update_middle is high, data is sampled into @middle_dout, so @middle_valid goes high as well. If not, it will go low when @dout samples data from it (recall that @middle_dout is @dout's preferred source to sample from).

Does it work at all?

One way to answer this is to ask how many of the two stages, @fifo_dout and @middle_dout, are valid. This value isn't defined in the reg_fifo module, but it could have been as

wire [1:0] valid_count;
assign valid_count = fifo_valid + middle_valid;

This imaginary @valid_count can obviously take the values 0, 1 or 2. It counts up or down as follows:

Take a look on the logic equations, and convince yourself that these three bullets are correct.

So let's see what happens when there's data in orig_fifo and the application logic wants to read continuously:

reg_fifo's logic tries to push up @valid_count towards 2 by reading from orig_fifo. On the other hand, @empty is low when @valid_count is not zero, hence @rd_en is allowed to go high as soon as @valid_count is 1. So when @valid_count is 1, @fifo_rd_en will be high because @valid_count isn't 2, but it won't reach 2, because @rd_en is high. So the data flows with both @fifo_rd_en and @rd_en held high, and @valid_count remaining on 1. Except for the beginning, the data is copied from @fifo_dout to @dout.

This tie is broken when orig_fifo becomes empty, in which case @valid_count drops to zero because @fifo_rd_en is not allowed to be high anymore. Another tie breaker is when the FIFO isn't empty, and @rd_en goes low because the application logic doesn't want to read more, in which case @valid_count rises to 2, and remains there.

But later on, when @rd_en becomes high again, @valid_count goes down to 1, and only at that point is @fifo_rd_en brought high (unless orig_fifo is empty).

Once again, @valid_count is just a theoretical signal that isn't implemented in the module. It was hopefully helpful for understanding why the two extra pipeline stages guarantee a continuous flow of data.

Usage notes

This module can be used as a drop-in replacement for the "standard" FIFO it wraps, and from a functional point of view, nothing changes. There will however be a slight change in the signal patterns of @rd_en, @dout and @empty, within the correct behavior of a FIFO. The ports relating to writing are passed through untouched, so there's no change whatsoever with these.

Since the module adds a couple of pipelines stages, orig_fifo's fill counters may present lower values than the total number of words stored in orig_fifo and the pipeline stages counted together. Hence if @almost_empty or similar ports are enabled on orig_fifo, they may present a overly pessimistic picture.

A slight drawback of reg_fifo is that its @empty output isn't a register, but rather a combinatoric function of two registers. This isn't optimal for timing, but with a minimal impact in most use cases. This can be fixed by defining the combinatoric registers @next_fifo_valid and @next_middle_valid in the same spirit as @next_words_in_ram as shown on this page. It's not implemented here, mostly because reg_fifo is complicated enough as is.

A FWFT with detached @dout and @rd_en

To wrap this up, this is the module that does the same timing improvement as reg_fifo, but exposes a FWFT FIFO instead. Note that it's based upon a standard FIFO, not a FWFT. So don't get confused with this...

The transition from reg_fifo to this module is pretty much the same as I've already discussed regarding FWFT FIFOs.

module fwft_reg_fifo(rst, 
                     rd_clk, rd_en, dout, empty,                 
                     wr_clk, wr_en, din, full);

   parameter width = 8;
   
   input                 rst;
   input                 rd_clk;
   input                 rd_en;
   input                 wr_clk;
   input                 wr_en;
   input [(width-1):0]   din;
   output                empty;
   output                full;
   output [(width-1):0]  dout;

   reg                   middle_valid, dout_valid;
   reg [(width-1):0]     dout, middle_dout;

   wire [(width-1):0]    fifo_dout;
   wire                  fifo_empty, fifo_rd_en;
   wire                  will_update_middle, will_update_dout;

   // orig_fifo is "standard" (non-FWFT) FIFO
   fifo orig_fifo
      (
       .rst(rst),       
       .rd_clk(rd_clk),
       .rd_en(fifo_rd_en),
       .dout(fifo_dout),
       .empty(fifo_empty),
       .wr_clk(wr_clk),
       .wr_en(wr_en),
       .din(din),
       .full(full)
       );

   assign will_update_middle = !fifo_empty && (middle_valid == will_update_dout);
   assign will_update_dout = (middle_valid || !fifo_empty) && (rd_en || !dout_valid);
   assign fifo_rd_en = !fifo_empty && !(middle_valid && dout_valid);
   assign empty = !dout_valid;

   always @(posedge rd_clk)
      if (rst)
         begin
            middle_valid <= 0;
            dout_valid <= 0;
            dout <= 0;
            middle_dout <= 0;
         end
      else
         begin
            if (will_update_middle)
               middle_dout <= fifo_dout;
            
            if (will_update_dout)
               dout <= middle_valid ? middle_dout : fifo_dout;
            
            if (fifo_rd_en)
               fifo_valid <= 1;
            else if (will_update_middle || will_update_dout)
               fifo_valid <= 0;
            
            if (will_update_middle)
               middle_valid <= 1;
            else if (will_update_dout)
               middle_valid <= 0;
            
            if (will_update_dout)
               dout_valid <= 1;
            else if (rd_en)
               dout_valid <= 0;
         end 
endmodule

This wraps up this series on FIFOs.

Copyright © 2021-2022. All rights reserved. (42e6e8c4)