01signal.com

Source-synchronous inputs

Introduction

This page discusses source-synchronous data inputs: This technique means that the data inputs are synchronous with a clock that the external component generates in parallel with these inputs.

Source synchronous clocking with inputs only

This method is often used simply because this is the way the external component works. Another good reason is that the source of the data is physically far away from the FPGA. It's also possible that there is a cable and connector between the FPGA and the other side. For example, a digital camera that sends pixel data.

A possible difficulty with a source-synchronous clock is that it may not be active all the time. The clock signal may also be affected by a dangling physical connection or excessive noise. It's also possible that the clock doesn't have a steady clock period, or that the jitter is excessively high. This happens often when the source of the data (and hence also the clock) is a microprocessor's I/O peripheral.

Note that there is a separate page that discusses the relations between clock and data in general.

Coping with an unstable clock

The most important guideline with source-synchronous inputs is that the clock should not be connected directly to logic elements inside the FPGA. Rather, the clock that is used inside the FPGA should be a clean clock, which is generated by a PLL that is inside the FPGA.

If an external clock is connected directly to logic inside the FPGA, there may be weird problems: A bad clock creates unexpected behaviors that don't look like a problem with a clock. Excessive jitter and glitches may violate the timing requirements that ensure the design's reliable operation. The consequence is that virtually anything can happen, including situations that are impossible according to the Verilog code. It's therefore easy to mistakenly think that the problem is a bug in the FPGA design.

It's natural to (mistakenly) think that problems with a clock only cause a loss of clock cycles, and consequently some data elements will be missing. When a bad clock causes other problems, the attempts to solve this problem are often focused on the parts of the logic design that seem most related. That can waste a lot of time.

The only situation where an external clock can be used directly by the logic fabric is when this clock is guaranteed to be stable and clean. If this clock isn't stable when the FPGA begins to operate, this issue requires treatment: As long as the clock is unstable, a reset must be applied to the logic that relies on this clock.

Possible strategies

There are mainly four possible strategies for synchronizing with the external clock. These strategies are discussed below, in no particular order.

Strategy #1: Using a PLL

With this strategy, the external clock is connected to the input of a PLL on the FPGA. This PLL's output clock is used for the logic elements. A reset signal is applied to these logic elements when the PLL is not locked. This solution ensures that the logic elements rely on a stable clock: When the PLL's output clock isn't stable, the logic elements are deactivated by the reset.

The PLL also makes it easier to achieve the timing requirements, compared with connecting the external clock directly: The PLL compensates for the delay between the clock pin and the FPGA's internal clock.

However, note that imperfections of the external clock can cause excessive jitter at the PLL's output. The PLL's lock detector may continue to indicate that the PLL operates properly, even though the clock that it produces is unusually noisy. There is no simple solution to this situation. One possibility is to change the timing constraints for the logic that depends on the PLL's output. For example, the clock's jitter in the timing constraints can be increased to a value that the PLL will probably not exceed (because a loss of lock would occur).

There is a similarity between this strategy and system synchronous clocking: In both scenarios, the external clock is connected to a PLL, and the output of this PLL is used inside the FPGA. The timing constraints are hence written in the same way as for a system synchronous clock.

Note that the PLL usually aligns its output clock with the external clock in a way that is optimal for a system synchronous clock. The optimal alignment for a source synchronous clock may be slightly different. In both possibilities, the clocks aren't perfectly aligned. Rather, there's an intentional small time difference between the clocks' edges. This time difference makes it easier to meet the timing requirements of the I/O registers. Some PLLs can be configured to align the clocks for optimal performance with a source synchronous clock.

This strategy is the easiest one to implement, compared with the other strategies that are listed here. It's suitable for relatively high clock frequencies. But for frequencies that are close to the maximum of what the I/O is capable of, this strategy will probably not work.

Strategy #2: Synchronous sampling

This strategy is useful for relatively low clock frequencies. But this is definitely the best way to cope with a misbehaving external clock.

The idea behind this strategy is that the external clock is treated as a data signal: It is sampled with a register using an internal clock that is stable and independent of the external clock. In parallel, the data signals are sampled with the same internal clock. This internal clock is also used for all logic that implements the synchronous sampling.

The frequency of the internal clock must be significantly higher than the external clock. This allows the logic inside the FPGA to detect changes in the external clock by looking at the value of the register that is sampling the external clock pin.

Let's understand how this works: Consider the situation when this register was low during the previous clock cycle of the internal clock, but now this register is high. This is the result of a rising edge that has occurred on the external clock. In this situation, let's look at the registers that are sampling the data inputs. These registers contain the values that were present when a rising edge of the external clock occurred.

Now let's say that the logic writes the value in these registers into a FIFO in this situation. The result is equivalent to sampling the data inputs in response to rising edges of the clock, and writing these values to the FIFO.

This Verilog code illustrates the idea. The external clock is @data_clk.

module top (
   input stable_clk,

   input data_clk,
   input [7:0] data
);

   reg [7:0] data_samp, data_samp_d;
   reg       data_clk_samp, data_clk_samp_d;
  
   always @(posedge stable_clk)
     begin
        data_samp <= data;
        data_clk_samp <= data_clk;

        data_samp_d <= data_samp;
        data_clk_samp_d <= data_clk_samp;
     end

   data_fifo fifo_i
     (
      .wr_clk(stable_clk),
      .din(data_clk_samp_d),
      .wr_en(data_clk_samp && !data_clk_samp_d),

       [ ... other ports connected here ... ]
      );
endmodule

The important part is the FIFO's wr_en: "data_clk_samp && !data_clk_samp_d" is high in response to a rising edge of @data_clk. That causes the value of @data_clk_samp_d to be written into the FIFO.

Note that only @stable_clk is used as a clock by the logic. @data_clk is treated like a regular I/O input.

IOB registers should be used to minimize the timing differences between the FPGA's input ports. Also, the timing constraints should be written for the purpose of ensuring that IOB registers are used.

Note however that this Verilog code should not be used in a real application, because the timing requirements of @data_samp and @data_clk_samp are not guaranteed: The input ports are asynchronous in relation to @stable_clk. This is solved with metastability guards which are absent from this example for the sake of simplicity.

The timing analysis of this strategy is more complicated: Normally, there is a constant time difference between the clock edge and moment that the data signals are sampled. This is because the clock that is used for sampling is synchronous with the data signals. But with synchronous sampling, the sampling of @data is done with @stable_clk. Hence the moment of sampling has nothing to do with the data signals' own timing. Instead, the logic selects only the values of @data_clk_samp_d that are close to a rising clock edge.

So there is a random time difference between the moment of the data clock's rising edge and the moment that the actual sampling occurs. The amount of randomness can be estimated by looking at the worst case timing scenario: A change in @data_clk is not detected because the timing requirements of the register weren't met. The detection of the clock edge is therefore postponed to the next clock cycle of @stable_clk. But what if the timing was on the limit? In that case, it was a matter of luck if the the clock edge was detected or not. Hence the magnitude of the uncertainty is approximately a clock period of @stable_clk. This is the maximal difference that the randomness can cause.

A more accurate calculation needs to take the jitters of both clocks into account. The differences between the PCB traces should also be counted in. So there are a lot of things to bear in mind.

Even though the timing calculation is complicated, there are two simple rules of thumb:

An accurate timing calculation will usually reveal that the minimal frequency of @stable_clk is not high enough. But if the frequency is five or six times higher than @data_clk (or higher than so), there is probably no need to make the timing calculation at all: The margins are large enough. But don't forget that there is still need to add metastability guards to the Verilog code above.

Synchronous sampling is hence an excellent solution when the data rate is low in relation the clock frequency that the FPGA can support: The data clock doesn't have to be stable. In fact, the exact frequency of this clock doesn't have to be known in advance. It's enough that the frequency is below a certain limit. If this clock becomes inactive for a brief moment, the only result is that data isn't collected during that specific time period. The damage from any malfunctions of this clock is limited to the disruption of the data flow. This will lead to a visible malfunction of the system, but this malfunction looks like a problem with the clock, and not like anything else.

So even though the sampling of the data signals has an inherent randomness, synchronous sampling is a reliable and robust solution for a source synchronous input. The only real disadvantage is the limitation on the data rate.

It's possible to use DDR input registers in order to double the sampling frequency with the same @stable_clk. The processing of the data signals is more complicated this way, but the principle is the same as in the Verilog code above. With this technique, it's often possible to work with data rates as high as 200 MHz.

Strategy #3: Phase shifting

This strategy is usually chosen when the data rate is close to the maximum that the FPGA can support.

The usual ways for guaranteeing the timing requirements will not work at such data rates: It will be impossible to achieve the timing constraints. However, a reliable sampling of the data signals is still possible.

The problem with timing constraints is to ensure the timing requirements through a calculation: Such calculation includes several uncertain parameters, e.g. differences in the manufacturing of the FPGA. When these parameters are taken into account, calculations will not yield a timing solution that ensures a reliable sampling. When the data rate is very high, there is no timing surplus to waste on these uncertainties.

But for a specific FPGA chip, these parameters are constant. The solution is hence to search for the correct timing while the FPGA is working. In practice, this means that a state machine inside the FPGA adjusts the delay between the the data clock and the moment of sampling. So an adaptive mechanism is used to find the optimal timing, rather than relying on calculations. This mechanism is called phase shifting.

This strategy is often used with the data signals from DDR SDRAM memories. In this application, the goal is always to reach a data rate as high as possible. Hence the I/O ports' capabilities are pushed to their limit. As a result, phase shifting is the only way to ensure that the data inputs are sampled reliably: After the DDR memory has been initialized, a special sequence of data is written to the memory (the data signals to the memory are source-synchronous outputs, so there is no difficulty with the timing). The FPGA then repeatedly reads from the same part in the DDR memory. A state machine finds the optimal timing by gradually altering the delay of the sampling. The expected input data is known, because it's the same sequence of data that was previously written. The state machine can therefore easily evaluate how reliable the data is, and find the optimal delay.

An interesting feature of this mechanism is that the data clock can be ignored: The DDR memory is synchronized with the clock that the FPGA generates. This clock is the part of the source-synchronous outputs that are received by the DDR memory. It's therefore guaranteed that the data clock of the source-synchronous inputs has exactly the same frequency as the clock that is generated by the FPGA. The phase shifting mechanism can therefore rely on the internal clock instead of the clock that arrives along with the data. It doesn't matter that there's an unknown delay between these two clocks: The state machine finds the optimal timing regardless of this delay.

In fact, this is how this mechanism is usually implemented for DDR memories: Even though DDR memories have an source-synchronous clock (called a data strobe), the usual implementation of the phase shifting mechanism ignores this signal. The rationale is that it's more important to ensure that the data signals arrive reliably than to be aligned with the strobe.

So using phase shifting can eliminate the need for a data clock. An internal clock can be relied upon instead, if its frequency is guaranteed to be exactly the same as the data rate.

With this strategy, the purpose of the timing constraints is are the same as with regular IOB registers.

Strategy #4: Using the clock directly

This strategy is apparently the most straightforward one: The external clock is connected directly to flip-flops inside the FPGA. Something like this:

module top (
   input data_clk,
   input [7:0] data
);

   reg [7:0] data_samp;
  
   always @(posedge data_clk)
     begin
       data_samp <= data;

      [ ... ]
     end

As mentioned above, this strategy is usually not a good idea. The primary reason is that if @data_clk has a glitch, all logic that depends on this clock becomes unpredictable.

It can be reasonable to use an external clock this way if it's guaranteed to be clean and stable. But even if this is guaranteed, it may be difficult to achieve the timing requirements because of the delay between the clock pin and the flip-flops. Some FPGAs have special clock resources for the purpose of reducing this delay. This may require using a dedicated clock input pin and restrict the I/O ports to specific regions on the FPGA.

When the clock is used directly, the timing constraints are written in the same way as for a system synchronous clock.

The straightforward alternative for this strategy is to use a PLL, as suggested above. If the data clock isn't stable, synchronous sampling should be considered.

Summary

It's not a coincidence that a large part of this page is dedicated to synchronous sampling: This is the method that is most recommended if the data rate is not higher than this method is capable of.

If other strategies are considered, it's important to pay attention to the data clock's stability. If a clock deviates from its allowed behavior (i.e. the clock period and maximal jitter), this can result is a permanent and irrecoverable malfunction of the related logic. Only a reset will bring back the logic to normal operation.

Using a PLL for generating a reliable clock improves the situation considerably: The PLL's output is reliable as long as its lock detector indicates so. A reset should be applied to this logic otherwise.

Copyright © 2021-2023. All rights reserved. (4c701b97)