The clock period constraint and its timing analysis

This page belongs to a series of pages about timing. The previous page went through the basics of the theory behind timing constraints. The next step is to see how this theory is applied.


This page introduces the most important timing constraint, which defines the frequency of a clock. This is followed by an detailed example of a timing report's analysis of a path.

The details of the timing report's analysis can appear to be an advanced topic, but this is not the case: The purpose of timing constraints is to control the tool's timing analysis of the design, and set the accurate requirements that must be met. So in order to understand timing constraints, it's necessary to look into the timing analysis. And the timing analysis is shown in the timing reports.

Also, just writing timing constraints is far from enough. It's important to be able to verify that these constraints works correctly on the design. Such a verification is only possible with a deep understanding of the timing report. Without such understanding, it's quite easy to mistakenly rely on timing constraints that don't fulfill their purpose, and then wonder why the logic design doesn't work properly.

All FPGA tools generate timing reports as textual files, and these are the reports shown here. The tools also have a graphical interface for viewing the exact same information. This graphical interface is sometimes helpful and sometimes confusing, so working with the textual reports is an important first step.

The period constraint

The most useful timing constraint is the period constraint: It informs the tools about the frequency of a clock signal.

For example, suppose that the logic design consists of only this Verilog module:

module top(
  input clk,
  input foo,
  output reg bar_reg);

  reg foo_reg;
  reg bar;

   always @(posedge clk)
	foo_reg <= foo;
	bar <= !foo_reg;
	bar_reg <= bar;

It's clear from the Verilog that the @clk is used as a clock, and that this signal is an external port (i.e. it's connected to a physical pin on the FPGA).

If @clk's frequency is 250 MHz (4 ns), a timing constraint like the following is required:

create_clock -period 4 -name clk [get_ports clk]

This command means as follows: "There is a clock at the I/O port that is called @clk. This clock's period is 4 ns. If there are other timing constraints that will refer to this clock, they will use the name 'clk' for this purpose".


Timing analysis

We shall now look at Vivado's timing analysis for the setup requirement of the path that starts at @foo_reg and ends at @bar. In other words, this is the path that is the result of this Verilog expression:

bar <= !foo_reg;

The timing analysis that is shown here consists of three parts, which appear in the timing report in this order:

  1. A header, which contains the summary of the path's timing analysis, and some additional information.
  2. A calculation of the time it takes from a clock edge until a stable logic state is present at the input of the second flip-flop (@bar in this example).
  3. A calculation that finds when the input of the second flip-flop must be stable in order to meet the timing requirement.

These three parts are shown and described separately below. Note that in the timing report, there is no visible separator between these parts. In particular, it's not obvious from the timing report where the second part ends and the third part begins. It's up to the reader of this report to recognize the beginning of the third part from its content.

Even though Vivado's timing report shown here, the same methodology is adopted by Quartus and several other FPGA tools. The information that is written in other tools' timing reports is usually presented slightly differently. However, the theory behind these reports is the same. So going through this example is helpful for other FPGA tools as well.

We shall skip the first part in the timing report, and come back to this part after discussing the first two parts. It's easier to explain the report in this order. But first, a few general words.

The timing analysis' strategy

Unlike the simple static timing analysis that was shown on the previous page, a real timing analysis requires to take the clock's imperfections into account. In order to do so, the delay calculation starts from the clock's origin, for example at the physical input pin of the clock. This is different from the simple data path delay calculation (as shown in the previous page), which only takes the data path's delay into account.

So the theoretical experiment is now as follows: Start a stopwatch together with a clock edge at the clock's origin. Let this stopwatch continue as this clock edge travels to the first flip-flop and activates it. Continue to measure the time as this flip-flop updates its output, and follow the updated signal to its destination. Stop the stopwatch when this signal reaches the second flip-flop.

The next step is to check if the result is good. For the tsu requirement, this means that the signal arrived soon enough, relative to the next clock edge.

But this is a clock edge on the second flip-flop. When will it arrive?

To find that out, the stopwatch is started again, together with the next clock edge at the clock's origin. The stopwatch is stopped when this second clock edge reaches the second flip-flop. So the starting point is the same, but the clock edge is later and the destination of the clock edge is different.

After this theoretical experiment, we know when the clock edge arrives at the second flip-flop. In order to meet the tsu requirement, this flip-flop's input must be stable early enough with relation to this clock edge.

The stopwatch of the first theoretical experiment shows when the data is stable at the second flip-flop. The stopwatch of the second experiment shows when the clock edge arrives to the same flip-flop. All that is left is to compare these two numbers, and check if the difference is larger than required by tsu.

Uncertainties that are related to the clock can be taken into account with this method, because it allows to apply the worst-case scenario in both theoretical experiments. For a setup time calculation, the upper limit of all delays is used in the calculation on the first experiment with the stopwatch. As a result, the calculation shows the latest possible time that the signal can arrive at the input of the second flip-flop. But for the second experiment with the stopwatch, the lower limit of all delays is used. The result is the earliest possible time that the second clock edge can arrive at the second flip-flop.

So the calculation is made for the scenario where the signal arrives as late as it can be, and the clock edge arrives as early as possible. If the setup time requirement is met under these conditions, there's no doubt this requirement is always met.

For a hold time calculation, it's the other way around: Minimal delays are used for the first experiment, and maximal delays on the second experiment.

The source path calculation

As mentioned above, the first part of the timing report is shown later on. So we shall now look at the second part in the timing analysis: The source path calculation. It consists of two segments:

Together, these two segments calculate the time from the rising edge at the FPGA's external clock pin until the data reaches the second flip-flop.

This is the relevant part in the timing report:

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk rise edge)        0.000     0.000 r  
    AG12                                              0.000     0.000 r  clk (IN)
                         net (fo=0)                   0.000     0.000    clk_IBUF_inst/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.738     0.738 r  clk_IBUF_inst/INBUF_INST/O
                         net (fo=1, routed)           0.105     0.843    clk_IBUF_inst/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.049     0.892 r  clk_IBUF_inst/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.839     1.731    clk_IBUF
                                                      0.101     1.832 r  clk_IBUF_BUFG_inst/O
    X2Y0 (CLOCK_ROOT)    net (fo=3, routed)           1.389     3.221    clk_IBUF_BUFG
    SLICE_X49Y58         FDRE                                         r  foo_reg_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X49Y58         FDRE (Prop_EFF2_SLICEL_C_Q)
                                                      0.138     3.359 f  foo_reg_reg/Q
                         net (fo=1, routed)           0.241     3.600    foo_reg
    SLICE_X49Y58         LUT1 (Prop_D5LUT_SLICEL_I0_O)
                                                      0.244     3.844 r  bar__0_i_1/O
                         net (fo=1, routed)           0.046     3.890    p_0_in
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/D
  -------------------------------------------------------------------    -------------------

Each row in this part represents a logic element or a wire. When the "Delay type" is "net", the row corresponds to a wire. All delays of this sort are routing delays, i.e. the time it takes the signal to propagate from one logic element to another.

The column named "Incr" shows how much delay each element in the path contributes. The "Path" column shows the total delay up to that point.

The horizontal line in the middle marks the end of the Source Clock Path and the beginning of the Data Path. Now let's take a closer look on the delays of these two paths.

The first seven delays are related to the input pin, which is AG12 (this is the physical position of this pin). According to this report, the input pin and the logic that is related to this pin contribute a total delay of 1.731 ns.

The global clock buffer contributes another 0.101 ns from its input to its output. This is followed by the routing delay of the global clock, which is a relatively large number: 1.389 ns. This is the time it takes the clock signal to reach @foo_reg's clock input. The reason for this large delay is that the a global clock buffer and a clock tree are used to distribute this signal. These routing resources are intended to distribute a clock to large parts of the FPGA with the same delay to all destinations. Hence there's a large delay, even when the clock reaches just a few destinations, like in this examples (the fan-out of the clock is just 3).

At this point, the clock has finally reached the slice that contains the flip-flop, which is SLICE_X49Y58 in this example. So this is the end of the Source Clock Path. The horizontal line in the report indicates the beginning of the Data Path. In the example of the previous page, this is where the simple static timing analysis begins.

In the Netlist Resources column to the right, it says "foo_reg_reg/C" on the row above the horizontal line, and it says "foo_reg_reg/Q" just after this row. So the row after the horizontal line is definitely the clock to output delay of @foo_reg's flip-flop (from C to Q). This delay is 0.138 ns. "FDRE" means Flip-flop with Data, Reset and Enable.

This is followed by a routing delay to the LUT (0.241 ns), the propagation delay inside the LUT (0.244 ns) and the routing delay to the second flip-flop (0.046 ns). The routing delays are exceptionally small, because everything is packed into a single slice.

To summarize this part, it took 3.221 ns for the clock edge to travel from the external pin to the clock input of the first flip-flop. It then took 0.669 ns until the updated signal arrived at the second flip-flop's input (3.890 − 3.221 = 0.669 ns). In total, the time from the clock edge on the external pin to a stable signal at the final destination (the second flip-flop's data input) was 3.890 ns (at most, this is a worst-case calculation).

So it's now time to ask if this was soon enough. Is the setup time requirement met?

Destination Clock Path

The purpose of this second calculation is to find out how quickly the clock edge can travel from the external pin to the second flip-flop's clock input.

Note that the clock path's destination is the second flip-flop. This should not be confused with the previous calculation's clock path, where the destination was the first flip-flop.

There are other notable differences:

The third part of the timing analysis (Destination Clock Path) is as follows:

                         (clock clk rise edge)        4.000     4.000 r  
    AG12                                              0.000     4.000 r  clk (IN)
                         net (fo=0)                   0.000     4.000    clk_IBUF_inst/I
    AG12                 INBUF (Prop_INBUF_HRIO_PAD_O)
                                                      0.515     4.515 r  clk_IBUF_inst/INBUF_INST/O
                         net (fo=1, routed)           0.066     4.581    clk_IBUF_inst/OUT
    AG12                 IBUFCTRL (Prop_IBUFCTRL_HRIO_I_O)
                                                      0.034     4.615 r  clk_IBUF_inst/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.722     5.337    clk_IBUF
                                                      0.091     5.428 r  clk_IBUF_BUFG_inst/O
    X2Y0 (CLOCK_ROOT)    net (fo=3, routed)           1.218     6.646    clk_IBUF_BUFG
    SLICE_X49Y58         FDRE                                         r  bar_reg__0/C
                         clock pessimism              0.527     7.173    
                         clock uncertainty           -0.035     7.138    
    SLICE_X49Y58         FDRE (Setup_DFF2_SLICEL_C_D)
                                                      0.067     7.205    bar_reg__0
                         required time                          7.205    
                         arrival time                          -3.890    
                         slack                                  3.315

The tools chose to place @foo_reg and @bar on the same slice, so in this specific case, the clock path of this analysis is the same as the first analysis. Accordingly, it's easy to see that the sequence of logic elements is exactly the same as the previous analysis, until the horizontal line. After this sequence, there are two timing parameters that make adjustments to the calculation: Clock pessimism and clock uncertainty. These are discussed separately at the bottom of this page.

One row before the last of this calculation, we have the earliest time at which the the second clock edge can arrive: 7.138 ns after the first clock edge. The setup time requirement is that the data signal must be stable before this clock edge. tsu specifies how much. So in order to get the latest time at which the data must be stable, tsu is subtracted from the clock's arrival time.

In the example above, tsu is negative and equals −0.067 ns. The final result is hence 7.138 − (−0.067) = 7.205 ns. In other words, the data must be stable at the second flip-flop no later than 7.205 ns after the first clock edge.

In the previous calculation, the result was that the data is stable 3.890 ns after this clock edge, at worst. So that is good enough, with a margin: The difference between the requirement and what is guaranteed is 7.205 − 3.890 = 3.315 ns. In other words, the slack is 3.315 ns.

Summary of the timing analysis

It's now time to go back to the first part of the path's timing report. This part comes before the calculations that were shown above, and it summarizes the main results. In addition, this header clarifies a few values that appear in this calculation.

So keep in mind that this is how a timing analysis of a path begins:

Slack (MET) :             3.315ns  (required time - arrival time)
  Source:                 foo_reg_reg/C
                            (rising edge-triggered cell FDRE clocked by clk  {rise@0.000ns fall@2.000ns period=4.000ns})
  Destination:            bar_reg__0/D
                            (rising edge-triggered cell FDRE clocked by clk  {rise@0.000ns fall@2.000ns period=4.000ns})
  Path Group:             clk
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            4.000ns  (clk rise@4.000ns - clk rise@0.000ns)
  Data Path Delay:        0.669ns  (logic 0.382ns (57.100%)  route 0.287ns (42.900%))
  Logic Levels:           1  (LUT1=1)
  Clock Path Skew:        -0.048ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    2.646ns = ( 6.646 - 4.000 ) 
    Source Clock Delay      (SCD):    3.221ns
    Clock Pessimism Removal (CPR):    0.527ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns
  Clock Net Delay (Source):      1.389ns (routing 0.002ns, distribution 1.387ns)
  Clock Net Delay (Destination): 1.218ns (routing 0.002ns, distribution 1.216ns)

This part consists of many different pieces of information, so I'll go through them one by one. So starting from the first row:

Slack (MET) :             3.315ns  (required time - arrival time)

This says that the timing constraint was achieved (met), and that there was time left over (slack), 3.315 ns.

  Source:                 foo_reg_reg/C
                            (rising edge-triggered cell FDRE clocked by clk  {rise@0.000ns fall@2.000ns period=4.000ns})
  Destination:            bar_reg__0/D
                            (rising edge-triggered cell FDRE clocked by clk  {rise@0.000ns fall@2.000ns period=4.000ns})

These rows say which data path is examined. This is defined by the start position and end position of this path (the Source and Destination). The data path starts at foo_reg_reg/C, which is the clock input of @foo_reg. This path ends at the data input of bar_reg__0/D.

Another thing is that "clk" is mentioned, and its waveform is described. Note that "clk" refers to the name that was given to the clock with create_clock. In this example "clk" is also the name of the clock signal. But if another name is used in the "-name" parameter of the timing constraint, that name appears in the timing report, regardless of the name of the signal. This is true for all places where "clk" is mentioned in this example of the timing report.

  Path Group:             clk

The Path Group is "clk" here, which indicates that the reason for examining this path is the timing constraint with the same name.

  Path Type:              Setup (Max at Slow Process Corner)

The Path Type is Setup. This means that the setup time requirements is examined. "Max at Slow Process Corner" means that the maximal delays were used in the calculation for the data path (the first calculation). The meaning of Corner in this context is explained below.

  Requirement:            4.000ns  (clk rise@4.000ns - clk rise@0.000ns)

Recall that there's an imaginary stopwatch that is started for each of the two theoretical experiments that are described above. The "Requirement" is the time difference between when this stopwatch is started. In this case it's the clock period that is required by the timing constraint, which is 4 ns.

  Data Path Delay:        0.669ns  (logic 0.382ns (57.100%)  route 0.287ns (42.900%))

The Data Path Delay is the sum of all delays after the horizontal line in the first calculation (i.e. the delays of the data path).

In this row, the delay is also shown separately for logic and route. This shows how much time is spent on logic elements and how much time is spent on the wires between these logic elements. The common rule of thumb is that approximately 60% of the delay should be on logic, and the rest on routing. Hence the path in this example shows the normal situation.

If the routing's delay takes a significantly larger proportion, this could indicate a problem, in particular if the path fails to achieve the timing constraint. More on this on the next page, in the section that discusses the analysis of thold.

  Logic Levels:           1  (LUT1=1)

The data path consists of a single LUT, hence the number of logic levels is 1.

  Clock Path Skew:        -0.048ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    2.646ns = ( 6.646 - 4.000 ) 
    Source Clock Delay      (SCD):    3.221ns
    Clock Pessimism Removal (CPR):    0.527ns

The Clock Path Skew is the time difference between when the same clock edge arrives to the two flip-flops. This is a calculated worst case, which includes the fact that the maximal delays are used in one clock path, and the minimal delays on the other clock path. See the section named "Clock pessimism removal" below, which explains the arithmetic behind these rows.

  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

See the section named "Clock uncertainty" below. These rows show how the clock uncertainty was calculated. The result of this calculation, 0.035 ns, is unrealistically optimistic. This happened because no jitter was specified in the timing constraint, so the tools assumed that the jitter is zero. In addition to that, the external clock is used directly (i.e. without a PLL inside the FPGA), so there is almost no source for jitter to account for.

Even though it's a mistake to not specify the external clock's jitter in the timing constraint, this is how it is usually done. This is rarely a source of problems, because the jitter is usually small compared with other things that are taken into account in the timing calculation.

  Clock Net Delay (Source):      1.389ns (routing 0.002ns, distribution 1.387ns)
  Clock Net Delay (Destination): 1.218ns (routing 0.002ns, distribution 1.216ns)

These are the delays of the clock signal itself, from the output of the clock buffer to each of the flip-flops' clock inputs. In this example, there isn't much information to take from these numbers.

Multi-corner timing analysis

The delay of all logic elements depends on a few unknown parameters, for example the temperature and the supply voltage. Also, the term "process" is used to refer to inaccuracies during the production of the FPGA. Even though every FPGA is tested in order to ensure that it conforms to the datasheet, there is still some uncertainty regarding each logic element's behavior.

The FPGA tools perform the timing analysis for each path in a few extreme scenarios, which are called "corners". For example, one scenario can be the lowest temperature that is allowed, combined with the fastest process (i.e. when the FPGA happened to be manufactured with low delays). A second scenario can be the highest temperature combined with the fastest process. The third and fourth scenarios repeat this with the slowest process.

So in this example, the timing analysis is carried out for the extreme conditions of two parameters: Temperature and process. This is called a four-corner timing analysis. But this is just one of many possibilities to perform multi-corner timing analysis. Each FPGA tool carries out the timing analysis in a different way.

But regardless of which corners the tools choose to examine, the worst case is always considered the result of the timing analysis. In other words, the tools calculate the slack for each corner, and it's the lowest slack that counts.

The tools are programmed to perform the adequate multi-corner timing analysis for each FPGA, so there is no need for a deep understanding of this topic. But when reading a timing report, it's important to check if it relates to a single corner, or if it's the summary of all corners (i.e. worst case). There is a possibility for confusion, in particular with Quartus.

Clock pessimism removal and Clock uncertainty are discussed next. These are relatively advanced topics. If you're not interested in the finer details of the timing calculation, it's fine to skip to the next page in this series.

Clock pessimism removal

Because of the coincidence that both flip-flops are on the same slice, it's possible to compare the calculations of the clock path delays (i.e. the time until the clock edge reached the slice). In the second calculation, this time is 2.646 ns, because the calculation starts at 4 ns and ends at 6.646 ns, so 6.646 − 4 = 2.646 ns. In the previous calculation, the result was 3.221 ns. This difference of 0.575 ns.

The reason for this difference is that the maximal delays were used on the first calculation, and the minimal delays were used in the second calculation. The difference between the minimal delays and maximal delays represents the fact that the delays are not known accurately, because of natural inaccuracies in the manufacturing process. So if the clock path to the first flip-flop is completely different from the clock path to the second flip-flop, the worst case must be taken into account. This worst case is when all delays of the first path are the maximal allowed, and all delays of the second path are the minimal allowed. This is called clock pessimism.

Note that this has nothing to do with temperature or differences between one physical FPGA to another: Both calculations are made for the same temperature and manufacturing process. This difference is because each delay in the segment has a tolerance that is within the FPGA's specification.

But why is clock pessimism part of the calculation when the clock path is identical in both paths? The answer is that it's a mistake to do this. This is why there is a row with the title "clock pessimism" in the Destination Clock Path. This row adds 0.527 ns to the delay in order to compensate for this mistake. The title should actually be "clock pessimism removal" (CPR).

So the idea behind clock pessimism removal is to eliminate the unnecessary differences between the minimal delay and maximal delay on the parts of the clock path that are identical in both calculations. The tools compare the clock paths of both calculations, and find the segment that is common to both paths. The sum of all differences in the delays (in the shared segment) is the clock pessimism that should be removed.

As mentioned above, the difference between the clock paths was 0.575 ns. But the clock pessimism that was applied was just 0.527 ns. So the reduction was smaller by 0.048 ns. The reason is that inside the slice there is a small segment of the clock path that isn't common to both calculations: One wire inside the slice goes to the first flip-flop and the second wire goes to the second flip-flop. The delays of these two wires can be different.

Clock uncertainty

In the timing calculation, 0.035 ns were reduced from the calculation of the clock path. This makes the timing calculation stricter.

Clock uncertainty accounts for everything that is random about the time between each two clock edges. This randomness is called clock jitter, and is the result of various sources of noise and random behavior of the electronics.

The reduction of 0.035 ns from the calculation represents the fact that even though the clock period is defined as 4 ns in the timing constraint (create_clock), this clock period is random in reality. As a result, the time between two clock edges can be shorter. How much shorter? In this calculation, it has been assumed that the time between two clock edges will never be less than 3.965 ns (4−0.035 = 3.965 ns).

Is this assumption substantiated? That's a difficult question, as the estimation of clock jitter is a complicated subject which goes well beyond the scope of this discussion about timing constraints. It's nevertheless recommended to study this topic further, as clock jitter can be the source of various problems in a logic design, regardless of timing constraints.

This concludes the first page out of two about the clock period constraint. There is more to learn on the next page...

Copyright © 2021-2024. All rights reserved. (6f913017)