01signal: More about the clock period constraint

This page belongs to a series of pages about timing. After a brief introduction to the theory behind timing constraints and the first page about the clock period constraint, it's time to look at a few realistic scenarios with this constraint.

The next step: Using a PLL

In the example that was shown in the previous page, the external clock pin was connected directly to the logic. In most real-life designs, some kind of PLL is used to create the clock that is used by the logic. The most obvious reason for doing this is that the logic needs a different frequency than the external clock's frequency. But using a PLL can also help by cleaning the external clock from imperfections, in particular jitter.

A PLL can be added to the design with Verilog code as follows:

module top(
    input clk,
    input foo,
    output reg bar_reg
);
    reg foo_reg;
    reg bar;
    wire pll_clk;
   
   clk_wiz_0 pll_i
   (.clk_in1(clk),
    .clk_out1(pll_clk));

always @(posedge pll_clk)
  begin
    foo_reg <= foo;
    bar <= !foo_reg;
    bar_reg <= bar;
  end
endmodule

This is like the previous example, but this time the flip-flops' clock is @pll_clk instead of @clk. The PLL is generated by Vivado's Clocking Wizard IP, which is used in the Verilog code as a module with the name clk_wiz_0.

For this example, the Clocking Wizard has been configured to accept a 250 MHz reference clock, and create a 125 MHz clock on the output port (i.e. @clk_out1). In order to keep the example simple, clk_wiz_0 doesn't have a reset input or a "locked" output. In most real-life designs it's recommended to enable these ports and to use them.

Another thing about clk_wiz_0 is that it has the phase alignment option enabled, so it aligns @pll_clk's clock edges with the clock edges of @clk. This option is useful when the design has I/O ports that are synchronous with the external clock: The timing relationship between the external clock and the internal clock becomes predictable thanks to the enabled phase alignment. This is useful with I/O ports that must meet timing requirements that are relative to the external clock.

To be accurate about Xilinx' terminology, Xilinx' FPGAs have two types of PLLs: One type is called PLL and the second type is called MMCM. The difference is irrelevant for the sake of this example. clk_wiz_0 is an MMCM, but for clarity I shall refer to it with the term PLL.

It's worth mentioning again, that everything that is said here about the PLL applies to all FPGAs in the market. This example is shown with Vivado, but a PLL that does exactly the same as clk_wiz_0 can be generated for all FPGAs.

The timing constraint with a PLL

The most important thing to know about timing constraints with a PLL is that there is nothing special to do about it. The timing constraint is written with relation to the external pin (@clk in this example), and if the PLL produces a clock with another frequency, it's the tools' job to take that into consideration.

It's worth saying again: There should never be a need to write an additional timing constraint because a PLL is used. If you ever feel the need to do so, there's a good chance that something is wrong with your design, and "fixing" the problem with an additional timing constraint will not solve the real problem.

So as before, the timing constraint is just this:

create_clock -period 4.000 -name clk [get_ports clk]

Those who use Quartus should be aware of the pitfall that is described on this page.

The timing report with a PLL

We shall now look at the timing report of the same path as in the example on the previous page. The only difference is that the PLL has been added, as shown above. Here the analysis of the relevant path is shown in the same order as it appears in the timing report. So first, this is the summary of the analysis:

Slack (MET) :             7.288ns  (required time - arrival time)
  Source:                 foo_reg_reg/C
                            (rising edge-triggered cell FDRE clocked by clk_out1_clk_wiz_0  {rise@0.000ns fall@4.000ns period=8.000ns})
  Destination:            bar_reg__0/D
                            (rising edge-triggered cell FDRE clocked by clk_out1_clk_wiz_0  {rise@0.000ns fall@4.000ns period=8.000ns})
  Path Group:             clk_out1_clk_wiz_0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            8.000ns  (clk_out1_clk_wiz_0 rise@8.000ns - clk_out1_clk_wiz_0 rise@0.000ns)
  Data Path Delay:        0.669ns  (logic 0.382ns (57.100%)  route 0.287ns (42.900%))
  Logic Levels:           1  (LUT1=1)
  Clock Path Skew:        -0.048ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    -0.715ns = ( 7.285 - 8.000 ) 
    Source Clock Delay      (SCD):    -0.616ns
    Clock Pessimism Removal (CPR):    0.051ns
  Clock Uncertainty:      0.062ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.103ns
    Phase Error              (PE):    0.000ns
  Clock Net Delay (Source):      1.389ns (routing 0.002ns, distribution 1.387ns)
  Clock Net Delay (Destination): 1.218ns (routing 0.002ns, distribution 1.216ns)

There are several notable differences. First, the Requirement is 8 ns, instead of 4 ns before. This is expected, since the PLL's output is 125 MHz. The tools created an additional timing constraint automatically for this output, with a clock period that equals 8 ns. As the clock period has become longer by 4 ns and the data path remains unchanged, the slack increased by approximately 4 ns.

Another indication of the automatic timing constraint is that the Path Group says "clk_out1_clk_wiz_0" in this report. It was "clk" before. In fact, it says "clk_out1_clk_wiz_0" in this report in all places where it was "clk" before. The representation of the automatic timing constraints in the timing report is discussed below, in the context of multiple clocks.

A smaller consequence of the PLL is that the Clock Uncertainty has gone up to 0.062 ns: It was 0.035 ns in the previous example. This is because the Discrete Jitter is now 0.103 ns, and not zero.

Now let's move on to the timing analysis itself. This time, the Source Clock Path, the Data Path and the Destination Clock Path are shown together, exactly as they appear in the real report:

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_out1_clk_wiz_0 rise edge)
                                                      0.000     0.000 r  
    AG12                                              0.000     0.000 r  clk (IN)
                         net (fo=0)                   0.000     0.000    pll_i/inst/clkin1_ibuf/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.738     0.738 r  pll_i/inst/clkin1_ibuf/INBUF_INST/O
                         net (fo=1, routed)           0.105     0.843    pll_i/inst/clkin1_ibuf/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.049     0.892 r  pll_i/inst/clkin1_ibuf/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.975     1.867    pll_i/inst/clk_in1_clk_wiz_0
    MMCME3_ADV_X1Y0      MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT0)
                                                     -4.474    -2.607 r  pll_i/inst/mmcme3_adv_inst/CLKOUT0
                         net (fo=1, routed)           0.501    -2.106    pll_i/inst/clk_out1_clk_wiz_0
    BUFGCE_X1Y0          BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.101    -2.005 r  pll_i/inst/clkout1_buf/O
    X2Y0 (CLOCK_ROOT)    net (fo=3, routed)           1.389    -0.616    pll_clk
    SLICE_X49Y58         FDRE                                         r  foo_reg_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X49Y58         FDRE (Prop_EFF2_SLICEL_C_Q)
                                                      0.138    -0.478 f  foo_reg_reg/Q
                         net (fo=1, routed)           0.241    -0.237    foo_reg
    SLICE_X49Y58         LUT1 (Prop_D5LUT_SLICEL_I0_O)
                                                      0.244     0.007 r  bar__0_i_1/O
                         net (fo=1, routed)           0.046     0.053    p_0_in
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/D
  -------------------------------------------------------------------    -------------------

                         (clock clk_out1_clk_wiz_0 rise edge)
                                                      8.000     8.000 r  
    AG12                                              0.000     8.000 r  clk (IN)
                         net (fo=0)                   0.000     8.000    pll_i/inst/clkin1_ibuf/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.515     8.515 r  pll_i/inst/clkin1_ibuf/INBUF_INST/O
                         net (fo=1, routed)           0.066     8.581    pll_i/inst/clkin1_ibuf/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.034     8.615 r  pll_i/inst/clkin1_ibuf/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.873     9.488    pll_i/inst/clk_in1_clk_wiz_0
    MMCME3_ADV_X1Y0      MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT0)
                                                     -3.934     5.554 r  pll_i/inst/mmcme3_adv_inst/CLKOUT0
                         net (fo=1, routed)           0.422     5.976    pll_i/inst/clk_out1_clk_wiz_0
    BUFGCE_X1Y0          BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.091     6.067 r  pll_i/inst/clkout1_buf/O
    X2Y0 (CLOCK_ROOT)    net (fo=3, routed)           1.218     7.285    pll_clk
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/C
                         clock pessimism              0.051     7.336    
                         clock uncertainty           -0.062     7.274    
    SLICE_X49Y58         FDRE (Setup_DFF2_SLICEL_C_D)
                                                      0.067     7.341    bar_reg__0
  -------------------------------------------------------------------
                         required time                          7.341    
                         arrival time                          -0.053    
  -------------------------------------------------------------------
                         slack                                  7.288

Comparing with the previous example, there is only one difference in the paths: The PLL (which appears as a MMCME3_ADV_X1Y0 in the report) has been inserted between the external clock pin and the global clock buffer.

This PLL has a dramatic effect: In the source clock path, the PLL's delay is −4.474 ns, and in the destination clock path the same delay is −3.934 ns. This negative delay represents the fact that the PLL adjusts the clock edge so that the global clock is slightly earlier than the clock at the input pin. Note that the clock's total delay at the output of the global clock buffer is −0.616 ns in the source clock path. The same delay is 7.285 ns in the destination clock path. This is 0.715 ns earlier than the second clock edge (at 8 ns).

In other words, the external clock pin and the global clock (that is delivered to the logic) have almost the same time difference in both clock paths. Even though one clock path was calculated with maximal delays, and the second clock path was calculated with minimal delays, the total result is almost the same.

This is not a coincidence: The PLL uses the global clock output as a reference, so the phase of the global clock buffer's input is adjusted to ensure the relation with the clock input pin. The differences between the minimal delays and the maximal delays are compensated by the PLL. The timing calculations reflect this by the fact that the difference between the fastest and slowest cases is just 0.1 ns. This difference in the previous example was much bigger because there was no PLL (0.575 ns, see "Clock pessimism removal" on the previous page).

Why the global clock is adjusted to about 0.6 ns before the external input, and not any other value is a different story. This makes it easier to achieve timing constraints that are related to I/O pins in many cases, so the tools make this choice automatically. Nevertheless, all FPGAs have the option to manipulate this delay.

Two related clocks

Quite often, a logic design requires more than one clock. The existence of several clocks in an FPGA design is a topic of its own, which is discussed in the introduction to clock domains. The two topics, clock domains and timing, are inseparable, so it's recommended to read through that introduction (possibly briefly) before continuing here. Because of the close relation between these two topics, there is some overlap between that introduction and this series of pages.

In the discussion below, I shall often use expressions like "the signal X is synchronous with @clk". This means that X is the output of a flip-flop that changes value only in response to a rising edge of a clock that has the name "clk" (except for asynchronous resets, but that's irrelevant). Naturally, if two signals are the outputs of flip-flops that respond to the same clock, these two signals are "synchronous with the same clock".

We shall now look at two clocks that are generated by the same PLL. This is interesting because in most cases, these two clocks are considered related clocks. To understand why this is interesting, let's say that there is one flip-flop that is synchronous with one of these clocks, and another flip-flop is synchronous with the second clock. In this case, it's fine to connect signals between these two flip-flops as if they were synchronous with the same clock. The FPGA tools ensure that the timing requirements are met in this case.

For a better understanding on how to work with multiple clocks, there is a separate page about related clocks and unrelated clocks. That page is the introduction to clock domain crossing. Here we focus on the timing aspects of related clocks, and in particular on the timing report in relation to such clocks.

In order to produce the timing reports below, a different PLL (i.e. a Clocking Wizard IP) was used in the example. The name of this new PLL is clk_wiz_1. The following Verilog code was used:

module top(
    input clk,
    input foo,
    output reg bar_reg
);
    reg foo_reg;
    reg bar;
    wire pll_clk_8, pll_clk_6;
   
   clk_wiz_1 pll_i
   (.clk_in1(clk),
    .clk_out1(pll_clk_8),
    .clk_out2(pll_clk_6));

always @(posedge pll_clk_8)
  foo_reg <= foo;
   
always @(posedge pll_clk_6)
  begin
    bar <= !foo_reg;
    bar_reg <= bar;
  end

As before, the input of this PLL (i.e. @clk) is 250 MHz, but it has two outputs: One is connected to @pll_clk_8, which runs at 125 MHz (i.e. the clock period is 8 ns, hence the signal's name). This is exactly like the previous example's @pll_clk. The second output is connected to @pll_clk_6, which has a clock period that equals 6 ns, which is approximately 166.67 MHz.

Because @pll_clk_8 and @pll_clk_6 are generated by the same PLL, they are related clocks.

clk_wiz_1 has the phase alignment option enabled too, in particular to remain consistent with the previous example. And in case there was any doubt about it: There is still only one timing constraint, which is the same as before:

create_clock -period 4.000 -name clk [get_ports clk]

Understanding the clock summary

The information about all clocks is summarized at the beginning of the timing report. I bring this up only now, because the clock summary is more interesting when there are more than one clock. All FPGA tools generate a summary of this sort in the report, and it's always a good idea to look at this part:

-------------------------------------------------------------------------
| Clock Summary
| -------------
-------------------------------------------------------------------------

Clock                 Waveform(ns)         Period(ns)      Frequency(MHz)
-----                 ------------         ----------      --------------
clk                   {0.000 2.000}        4.000           250.000         
  clk_out1_clk_wiz_1  {0.000 4.000}        8.000           125.000         
  clk_out2_clk_wiz_1  {0.000 3.000}        6.000           166.667         
  clkfbout_clk_wiz_1  {0.000 2.000}        4.000           250.000

The main advantage of this part is that it's easy to understand, and it's also easy to check for the most common mistake regarding timing constraints: If a clock has the correct frequency.

This clock summary shows that the timing constraint has been interpreted correctly: There is one external clock defined with the name clk, and its clock period is 4 ns. In addition, there are three derived clocks: clk_out1_clk_wiz_1, clk_out2_clk_wiz_1 and clkfbout_clk_wiz_1. As their names imply, they were generated automatically because of the PLL with the name clk_wiz_1.

The first two clocks (clk_out1_clk_wiz_1 and clk_out2_clk_wiz_1) are the two outputs of the PLL. The third clock (clkfbout_clk_wiz_1) is used by the PLL to adjust the phase of the global clocks to the external clock. clkfbout_clk_wiz_1 has the same frequency as @clk.

When the the phase alignment option is enabled, the PLL's feedback clock (clkfbout_clk_wiz_1) is connected to the global clock buffer. As PLLs always synchronize between their input clock and the feedback clock (by aligning these clocks, or maintaining a fixed delay between them), the fact that clkfbout_clk_wiz_1 is a global clock also ensures a known delay between the input clock and the other output clocks.

It's important to note that clk_out1_clk_wiz_1, clk_out2_clk_wiz_1 and clkfbout_clk_wiz_1 do not define clocks that exist in reality. These are just symbols of the clocks that are used by the software for timing calculations. One thing that shows that these clocks are theoretical is their waveforms, which are shown in the clock summary: These waveforms reflect the duty cycle of each of these clocks. But it means nothing that all four clocks (clk and the three theoretical clocks) have a rising edge at exactly 0 ns. In particular, it doesn't mean that these four clocks are perfectly aligned. Even if they are aligned, it's visible through the timing calculations and the real alignment is not perfect.

How these theoretical clocks are used is discussed below, in relation to the timing calculation that involves two of these clocks.

Another important thing about these theoretical clocks is that there is no explicit timing constraint for their creation: The creation of clk_out1_clk_wiz_* doesn't exist anywhere in the design's sources or in the files that are created by the tools. In most cases, it's incorrect to attempt to write such timing constraint, because the delays of the clock path will not be accounted for correctly this way. If such constraints are written, the relative timing between clocks will be calculated incorrectly by the tools, and paths between related clock domains will not be calculated correctly.

So once again, only one timing constraint should automatically generate the timing constraints for all of the PLL's output clocks.

The timing report with two related clocks

As evident from the Verilog code above, @foo_reg is synchronous with @pll_clk_8, and @bar is synchronous with @pll_clk_6. Hence, the statement that updates @bar involves a clock domain crossing:

bar <= !foo_reg;

The two clocks are related clocks, so the tools ensure that @bar's timing requirements are met by virtue of this calculation (for t_su):

Slack (MET) :             0.475ns  (required time - arrival time)
  Source:                 foo_reg_reg/C
                            (rising edge-triggered cell FDRE clocked by clk_out1_clk_wiz_1  {rise@0.000ns fall@4.000ns period=8.000ns})
  Destination:            bar_reg__0/D
                            (rising edge-triggered cell FDRE clocked by clk_out2_clk_wiz_1  {rise@0.000ns fall@3.000ns period=6.000ns})
  Path Group:             clk_out2_clk_wiz_1
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            2.000ns  (clk_out2_clk_wiz_1 rise@18.000ns - clk_out1_clk_wiz_1 rise@16.000ns)
  Data Path Delay:        1.160ns  (logic 0.307ns (26.466%)  route 0.853ns (73.534%))
  Logic Levels:           1  (LUT1=1)
  Clock Path Skew:        -0.250ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    -0.681ns = ( 17.319 - 18.000 ) 
    Source Clock Delay      (SCD):    -0.600ns = ( 15.400 - 16.000 ) 
    Clock Pessimism Removal (CPR):    -0.169ns
  Clock Uncertainty:      0.182ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.103ns
    Phase Error              (PE):    0.120ns
  Clock Net Delay (Source):      1.369ns (routing 0.002ns, distribution 1.367ns)
  Clock Net Delay (Destination): 1.208ns (routing 0.002ns, distribution 1.206ns)

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_out1_clk_wiz_1 rise edge)
                                                     16.000    16.000 r  
    AG12                                              0.000    16.000 r  clk (IN)
                         net (fo=0)                   0.000    16.000    pll_i/inst/clkin1_ibuf/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.738    16.738 r  pll_i/inst/clkin1_ibuf/INBUF_INST/O
                         net (fo=1, routed)           0.105    16.843    pll_i/inst/clkin1_ibuf/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.049    16.892 r  pll_i/inst/clkin1_ibuf/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.975    17.867    pll_i/inst/clk_in1_clk_wiz_1
    MMCME3_ADV_X1Y0      MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT0)
                                                     -4.438    13.429 r  pll_i/inst/mmcme3_adv_inst/CLKOUT0
                         net (fo=1, routed)           0.501    13.930    pll_i/inst/clk_out1_clk_wiz_1
    BUFGCE_X1Y1          BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.101    14.031 r  pll_i/inst/clkout1_buf/O
    X2Y0 (CLOCK_ROOT)    net (fo=1, routed)           1.369    15.400    pll_clk_8
    SLICE_X49Y58         FDRE                                         r  foo_reg_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X49Y58         FDRE (Prop_EFF_SLICEL_C_Q)
                                                      0.139    15.539 f  foo_reg_reg/Q
                         net (fo=1, routed)           0.807    16.346    foo_reg
    SLICE_X49Y58         LUT1 (Prop_D5LUT_SLICEL_I0_O)
                                                      0.168    16.514 r  bar__0_i_1/O
                         net (fo=1, routed)           0.046    16.560    p_0_in
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/D
  -------------------------------------------------------------------    -------------------

                         (clock clk_out2_clk_wiz_1 rise edge)
                                                     18.000    18.000 r  
    AG12                                              0.000    18.000 r  clk (IN)
                         net (fo=0)                   0.000    18.000    pll_i/inst/clkin1_ibuf/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.515    18.515 r  pll_i/inst/clkin1_ibuf/INBUF_INST/O
                         net (fo=1, routed)           0.066    18.581    pll_i/inst/clkin1_ibuf/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.034    18.615 r  pll_i/inst/clkin1_ibuf/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.873    19.488    pll_i/inst/clk_in1_clk_wiz_1
    MMCME3_ADV_X1Y0      MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT1)
                                                     -3.890    15.598 r  pll_i/inst/mmcme3_adv_inst/CLKOUT1
                         net (fo=1, routed)           0.422    16.020    pll_i/inst/clk_out2_clk_wiz_1
    BUFGCE_X1Y0          BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.091    16.111 r  pll_i/inst/clkout2_buf/O
    X2Y0 (CLOCK_ROOT)    net (fo=2, routed)           1.208    17.319    pll_clk_6
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/C
                         clock pessimism             -0.169    17.150    
                         clock uncertainty           -0.182    16.967    
    SLICE_X49Y58         FDRE (Setup_DFF2_SLICEL_C_D)
                                                      0.067    17.034    bar_reg__0
  -------------------------------------------------------------------
                         required time                         17.034    
                         arrival time                         -16.560    
  -------------------------------------------------------------------
                         slack                                  0.475

As mentioned before, a timing calculation is an imaginary experiment, where an imaginary stopwatch starts along with clock edges. In the previous example, this stopwatch started at 0 ns. But in this calculation is starts with the clock edge at 16 ns. This is because the clock periods of the two clocks aren't equal. The calculation is made on the worst-case combination between the first clock and the second clock.

The first clock's rising edges are at 0 ns, 8 ns, 16 ns, 24 ns and so on. The second clock's rising edges are at 0 ns, 6 ns, 12 ns, 18 ns, 24 ns and so on. So the smallest time gap between the first clock and the second clock is the first clock's 16 ns and the second clock's 18 ns. This is the situation that this timing calculation examines.

This means that the data path is limited to approximately 2 ns, which is like 500 MHz. This is a very strict requirement, but because there is only one LUT in the data path, and both flip-flops are on the same slice, it was easy to achieve this goal. However, this example shows why it's important to select frequencies that work well together for related clocks. Often one clock's frequency is chosen as a multiple of the other's clock's frequency. Doing this avoids scenarios like the 2 ns gap of this example.

By the way, if it's impossible to choose frequencies that work well together, the solution is to treat the clocks as unrelated clocks.

The derived clocks

I mentioned earlier that clk_out1_clk_wiz_1 and clk_out2_clk_wiz_1 are theoretical clocks, and this is how this fact is reflected in the timing report: Note that both paths are calculated with the external pin (AG12) as the starting point. How could that make sense? In reality, the signal on this pin is the reference clock, and not any of these two clocks.

So let's take clk_out1_clk_wiz_1 for example: The idea behind the calculation is to pretend as if there was a clock with 125 MHz at the external pin. In reality, a clock with 125 MHz exists only at the output of the PLL, i.e. the output of MMCME3_ADV_X1Y0. But instead of involving the PLL's frequency manipulation in the timing calculation, a theoretic clock is used instead. This clock has an ideal waveform that starts at 0 ns. The PLL is treated as if it doesn't change the frequency of any clock, but only adds delays.

So the theoretical clock (e.g. clk_out1_clk_wiz_1) defines the waveform of the clock (frequency, duty cycle, jitter etc.). However, this theoretical clock doesn't define the timing relations with other clocks that exist as real signals on the FPGA.

So how can we see that @pll_clk_6 and @pll_clk_8 are in fact aligned? The answer lies in comparing between the actual timing and the ideal timing of these clocks. For example, clk_out1_clk_wiz_1's theoretical clock edge is at 16 ns in the calculation above. On the other hand, clk_out2_clk_wiz_1's clock edge is at 18 ns, which is later by 2 ns. Now let's compare the clock edges of the outputs of the global clock tree: The first clock's clock edge arrives at 15.400 ns, according to the timing report. For the second clock edge, the report says 17.319 ns. So according to the calculation, the time difference is 1.919 ns instead of 2 ns. That is only 0.081 ns less than the ideal difference. So the clocks are definitely aligned.

Let's compare with the previous example, where there was only one clock: The ideal time difference between two clock edges was the clock period, i.e. 8 ns. But according to the relevant timing report (see above), the first clock edge arrived at −0.616 ns, and the second clock edge at 7.285 ns. The ideal time difference was 8 ns, but it was actually 7.901 ns. So the second clock edge was earlier than expected by 0.099 ns.

So with two related clocks, the diversion from the ideal time difference is 0.081 ns. With only one clock, this diversion was roughly the same, 0.099 ns. In both cases, the reason for this diversion is that the calculation for the first clock edge is made with maximal delays, and minimal delays are used for the second clock edge.

The conclusion is hence that @pll_clk_6 and @pll_clk_8 are aligned with roughly the same precision as if they were one clock.

Note that these clocks would have been mutually aligned even without the option that enables the PLL's phase alignment: The outputs of a PLL are usually aligned anyhow. This alignment is ensured by using global clock buffers that have nearly the same delays, regardless of their fanout. In other words, it doesn't matter how many logic elements each of these clock buffers is connected to, the delay from the PLL's output to the destination is approximately the same.

What we learned about related clocks

The analysis of this timing report shows that a path between two related clock domains is roughly equivalent to a path inside a clock domain. And yet, it's not exactly the same. In particular, the Clock Uncertainty has gone up from 0.062 ns to 0.182 ns, as each clock has its own jitter, and the alignment isn't perfect either.

This timing report also showed how the alignment of the two clocks is reflected in the timing calculations.

When there are clock domain crossings in an FPGA design, it's a good idea to examine the interactions between these clocks in the timing report. The purpose is to make sure that the tools treat the clocks like we expect. These are the two most important things to check:

If the logic considers two clocks as related clocks: Verify that there are timing calculation on all paths between these clocks.
If the logic considers two clocks as unrelated clocks: Verify that no timing calculations are made on paths between these two clocks.

Vivado can create a Clock Interaction Report, which graphically displays the design's clock domain crossings, and how each such crossing is treated by the tools. Other FPGA tools have a similar feature, e.g. Quartus Pro's CDC Viewer. When possible, it's recommended to create and examine this report.

Another thing to look at is if the clocks aligned. If the design mistakenly uses an unaligned clock, it may create unnecessary difficulties in achieving the timing constraints. For example, if there's a path between @clk and @pll_clk_8, the tools will enforce the timing constraints, and hence make sure that this path works reliably. Note however that @clk is the reference clock that goes into the PLL, and @pll_clk_8 is the output of this PLL. Hence these two clocks are not aligned. As a result, the tools may work unnecessarily hard in order to achieve the timing constraints for this path. Other paths may fail to achieve their timing constraints because of this unnecessary effort.

And once again, the topic of clock domains is covered by this series of pages.

t_hold is important too

All timing calculations that I've shown until now were related to the t_su requirement. It's natural to focus on this requirement, because when the tools fail to achieve the timing constraints, it's almost always because at least one path failed to achieve the requirement for t_su.

It's nevertheless important to keep t_hold in mind: In order to meet this requirement the tools sometimes slow down the data path artificially by adding routing delay. So even though the t_hold requirement is rarely mentioned as the reason for a failure to achieve the timing constraints, this requirement can be the hidden reason for the failure.

In fact, something related happened with the routing delay of the wire that connects between the two flip-flops: In all timing reports that involved only one clock, this wire was inside the same slice (SLICE_X49Y58). Hence the delay of this wire was 0.241 ns in all these reports. But when two clocks were involved in the path, this delay went up to 0.807 ns.

The explanation is that 0.241 ns is the minimal delay that can be achieved between two flip-flops that are placed in the same slice. The longer delay (0.807 ns) is the result of a different routing of this wire. This didn't happen by accident: The tools deliberately made this extra long routing in order to meet the t_hold requirement. This is also reflected in the "Data Path Delay" row in the report: The delay for logic is just 26.5%, and the rest is routing. That alone is a sign that something has happened. More about this below.

Analysis of the t_hold requirement

Recall from the theoretical page in this series of pages that t_hold is the amount of time that the data input on the second flip-flop must be stable after the clock edge. Let's describe the situation where t_hold is violated: A clock edge arrives to the first flip-flop, and the second flip-flop also receives a clock edge at approximately the same time (note that these clock edges can come from the same clock or from different clocks). In response to its clock edge, the first flip-flop updates its output after a delay (clock-to-output). But the updated value reaches the second flip-flop too early. As a result, this flip-flop doesn't sample the previous value reliably: The new value arrived before the second flip-flop had the time to finish reacting to its clock edge. In other words, the t_hold requirement is violated.

When the same clock is used with both flip-flops, this can happen mainly because of a clock skew: If the clock edge arrives earlier to the first flip-flop, it's possible that the first flip-flop changes its output too early. That opens for the possibility that the updated signal reaches the second flip-flop fast enough to violate the t_hold requirement. Note that it's the same clock edge that reaches both flip-flops, so there is no significance to neither the clock's frequency or the amount of clock jitter. Only a different delay (i.e. clock skew) plays a role.

But for the timing analysis below, we shall stay with the last example, which involves two clocks: @pll_clk_8 and @pll_clk_6. A timing analysis with one clock will be similar, but not as interesting, because it's too easy to meet the t_hold requirement with one clock.

So the timing report that we shall look at is made for two related clocks. These two clocks have different frequencies, and as before, there are different combinations between the time of the first clock's clock edge and the second clock's clock edge. But unlike the calculation for t_su, the worst case for t_hold is when both clock edges are at 0 ns: Violations of t_hold occur when the two clock edges are nearly simultaneous, so no other combination is worse than this.

So with these insights at hand, let's look a the timing report:

Slack (MET) :             0.093ns  (arrival time - required time)
  Source:                 foo_reg_reg/C
                            (rising edge-triggered cell FDRE clocked by clk_out1_clk_wiz_1  {rise@0.000ns fall@4.000ns period=8.000ns})
  Destination:            bar_reg__0/D
                            (rising edge-triggered cell FDRE clocked by clk_out2_clk_wiz_1  {rise@0.000ns fall@3.000ns period=6.000ns})
  Path Group:             clk_out2_clk_wiz_1
  Path Type:              Hold (Min at Fast Process Corner)
  Requirement:            0.000ns  (clk_out2_clk_wiz_1 rise@0.000ns - clk_out1_clk_wiz_1 rise@0.000ns)
  Data Path Delay:        0.458ns  (logic 0.104ns (22.707%)  route 0.354ns (77.293%))
  Logic Levels:           1  (LUT1=1)
  Clock Path Skew:        0.127ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    -0.542ns
    Source Clock Delay      (SCD):    -0.248ns
    Clock Pessimism Removal (CPR):    -0.421ns
  Clock Uncertainty:      0.182ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.103ns
    Phase Error              (PE):    0.120ns
  Clock Net Delay (Source):      0.495ns (routing 0.002ns, distribution 0.493ns)
  Clock Net Delay (Destination): 0.576ns (routing 0.002ns, distribution 0.574ns)

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_out1_clk_wiz_1 rise edge)
                                                      0.000     0.000 r  
    AG12                                              0.000     0.000 r  clk (IN)
                         net (fo=0)                   0.000     0.000    pll_i/inst/clkin1_ibuf/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.339     0.339 r  pll_i/inst/clkin1_ibuf/INBUF_INST/O
                         net (fo=1, routed)           0.025     0.364    pll_i/inst/clkin1_ibuf/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.015     0.379 r  pll_i/inst/clkin1_ibuf/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.405     0.784    pll_i/inst/clk_in1_clk_wiz_1
    MMCME3_ADV_X1Y0      MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT0)
                                                     -1.721    -0.937 r  pll_i/inst/mmcme3_adv_inst/CLKOUT0
                         net (fo=1, routed)           0.167    -0.770    pll_i/inst/clk_out1_clk_wiz_1
    BUFGCE_X1Y1          BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.027    -0.743 r  pll_i/inst/clkout1_buf/O
    X2Y0 (CLOCK_ROOT)    net (fo=1, routed)           0.495    -0.248    pll_clk_8
    SLICE_X49Y58         FDRE                                         r  foo_reg_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X49Y58         FDRE (Prop_EFF_SLICEL_C_Q)
                                                      0.049    -0.199 f  foo_reg_reg/Q
                         net (fo=1, routed)           0.343     0.144    foo_reg
    SLICE_X49Y58         LUT1 (Prop_D5LUT_SLICEL_I0_O)
                                                      0.055     0.199 r  bar__0_i_1/O
                         net (fo=1, routed)           0.011     0.210    p_0_in
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/D
  -------------------------------------------------------------------    -------------------

                         (clock clk_out2_clk_wiz_1 rise edge)
                                                      0.000     0.000 r  
    AG12                                              0.000     0.000 r  clk (IN)
                         net (fo=0)                   0.000     0.000    pll_i/inst/clkin1_ibuf/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.595     0.595 r  pll_i/inst/clkin1_ibuf/INBUF_INST/O
                         net (fo=1, routed)           0.042     0.637    pll_i/inst/clkin1_ibuf/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.022     0.659 r  pll_i/inst/clkin1_ibuf/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.457     1.116    pll_i/inst/clk_in1_clk_wiz_1
    MMCME3_ADV_X1Y0      MMCME3_ADV (Prop_MMCME3_ADV_CLKIN1_CLKOUT1)
                                                     -2.474    -1.358 r  pll_i/inst/mmcme3_adv_inst/CLKOUT1
                         net (fo=1, routed)           0.209    -1.149    pll_i/inst/clk_out2_clk_wiz_1
    BUFGCE_X1Y0          BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.031    -1.118 r  pll_i/inst/clkout2_buf/O
    X2Y0 (CLOCK_ROOT)    net (fo=2, routed)           0.576    -0.542    pll_clk_6
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/C
                         clock pessimism              0.421    -0.121    
                         clock uncertainty            0.182     0.061    
    SLICE_X49Y58         FDRE (Hold_DFF2_SLICEL_C_D)
                                                      0.056     0.117    bar_reg__0
  -------------------------------------------------------------------
                         required time                         -0.117    
                         arrival time                           0.210    
  -------------------------------------------------------------------
                         slack                                  0.093

First, there are some obvious differences: The Path Type is Hold, and not Setup as before. That's expected. It then says "Min at Fast Process Corner", which is the opposite from what it said for the setup path: This calculation uses the minimal delays, and not the maximal delays for the data path. This is suitable for a worst case calculation regarding the situation where the data input changes too early.

The Requirement is 0 ns, which is typical for a hold path. The frequencies of the clocks don't play a role here: The scenario that is examined is when both clocks have a rising edge at the same time.

As for the clock paths, note that the delays of each component on the source clock path are consistently shorter than the same delays of the destination clock path (and not longer). Once again, this is consistent with the purpose of this calculation, because the worst case is when the data changes too early, relative to the arrival of the clock edge to the second flip-flop. Multi-corner timing analysis was explained on the previous page.

But the most important thing about this timing report is that the slack is small: Only 0.093 ns. This is a often an indication that the tools had to make an effort in order to meet the requirement. However, note that a small slack doesn't necessarily indicate that there was a problem that needed solving.

When the timing analysis is made for t_hold, a small slack in quite common. So why is this path suspicious? Primarily because of the larger routing delay between two flip-flops on the same slice (SLICE_X49Y58), as mentioned above. It's likely that during the early stages of place and route, the software detected a failure to meet the t_hold requirement between these two flip-flops. So how was this corrected?

A failure to meet a t_hold requirement means that the data signal arrives too early, relative to the clock edge of the second flip-flop. This is corrected by artificially adding delay to the data path. As a result, the data input changes its value a little later at the second flip-flop. The tools probably made the wiring between the two flip-flops longer. This increases the routing delay, and solves the problem with t_hold. Such increase of the delay could have created a problem to meet the t_su requirement, but in this case there was no such problem: The timing constraint was achieved for this path. (i.e. the requirements for both t_su and t_hold were met).

Nevertheless, this example shows how the need to solve a problem with t_hold can create a seemingly unrelated problem with t_su. This is something to keep in mind when a path's ratio between logic delay and routing delay is really low. There are of course other possible reasons for a large routing delay, in particular a high fan-out. But when the fan-out is low (as in this case, where the fan-out is 1), it's worth asking if the large routing delay was inserted deliberately by the tools in order to solve a t_hold problem.

But why was there a problem with t_hold only in the case with two clocks? The answer is that with two clocks, there is a larger uncertainty on when the clock edges arrive to each of the two flip-flops. Several factors contribute to this uncertainty, in particular clock skew and jitter. Because the calculation for t_hold is made from 0 ns to 0 ns, this calculation is more sensitive to small uncertainties on when the clock edges arrive.

Therefore, the uncertainty that is related to two related clocks often requires corrective measures in order to meet the t_holdrequirement. These corrections are made automatically by the tools, of course. It's nevertheless important to be aware of these corrections when there are problems to achieve the timing constraints.

Summary

These two last pages in this series have shown a few examples of timing reports which were all the result of one specific timing constraint. All these timing reports were also related to a specific and simple example of logic. Hopefully, these examples helped to establish an understanding on the basics of timing.

At this point, it's recommended to look on timing reports of your own design, in order to see how the same principles apply to paths that contain more than one LUT.

The next page begins the discussion about timing problems and how to solve them.

More about the clock period constraint