Understanding Partial Reconfiguration with Vivado

This is the first post in a series of four on Partial Reconfiguration, or Dynamic Function eXchange (DFX) with Xilinx' Vivado. The intention of this post is to explain the main concepts of this topic. This prepares the ground for the next post, which outlines the practical steps for setting up an FPGA project with Partial Reconfiguration.


Partial reconfiguration is a technique that allows replacing the logic of some parts of the FPGA, while its other parts are working normally. This reconfiguration consists of feeding the FPGA with a bitstream, exactly like the initial bitstream that programs its functionality on powerup. However the bitstream for Partial Reconfiguration doesn't cause the FPGA to halt, but rather it targets specific logic elements, and updates the memory cells that control their behavior. It's a hot replacement of specific logic blocks.

Xilinx FPGAs support this feature since Virtex-4 (and Intel FPGAs starting from Series-V).

This post walks through the concepts behind Partial Reconfiguration without getting into the hands-on technical details, as a preparation for the next post, which does exactly that. Because everything is related to everything in this topic, it's important to understand the whole framework before breaking it down to individual actions.

Explain that again?

Let's start with the old school case. Say that there's a module instantiated somewhere in the design hierarchy. Say, in Verilog:

   moduleA reconfig_ins
 [ ... ]

or in VHDL:

  reconfig_ins : moduleA
    port map(
      clk        => clk,
      this       => this_w,
      that       => that_w,
 [ ... ]

Obviously, there's some moduleA.v or moduleA.vhd, or some IP called moduleA, that populates the instantiation (along with its submodules). So we implement the project and obtain a bitstream file, and program the FPGA with it. So far, the usual stuff.

But now, say that we write moduleB instead of moduleA in the code above, and implement the logic to obtain a bitstream file. For this to work, there must be a moduleB.v or moduleB.vhd, or an IP called moduleB in the design.

We now have two bitstream files, that differ in what logic populates the reconfig_ins instance. To switch between these two, we need to program the entire FPGA with the desired bitstream, with an interruption of the FPGA's operation as it's programmed.

Partial Reconfiguration is a technique that allows switching from one version to another without this interruption: The FPGA keeps working normally, while the logic in reconfig_ins changes from moduleA to moduleB, and vice versa. Almost needless to say, this isn't possible by just implementing the two designs separately.

moduleA and moduleB are referred to as Reconfigurable Modules (RM), meaning the logic that can be injected into the FPGA by virtue of Partial Reconfiguration.


There are several reasons for using Partial Reconfiguration, such as:

The partial bitstream

If you have some experience with FPGA design, odds are that you're used to a simple routine: Make some edits to the design's sources (and IPs), kick off the implementation tools, check that it went through OK, and then load the bitstream into the FPGA through JTAG. Or alternatively, load some flash device with some image of the bitstream, and recycle the FPGA board's power.

Because this is what we're all used to, it's easy to mistake the bitstream for just a lot of data that fills the entire FPGA with some mystical information on how each logic element should behave. In reality, a bitstream consists of a series of commands that are executed sequentially by the FPGA as it's loaded. Indeed, the common bitstream effectively loads the entire FPGA with information, but it's done with several commands that control the configuration's progress and even more important: To which logic elements the configuration data is targeted.

And since the bitstream itself says which logic elements are affected, it's possible to make a bitstream that alters some logic elements, and leaves others intact — and that's the cornerstone for Partial Reconfiguration.

Having said that, the partial bitstream must be compatible with the logic that is already loaded in the FPGA. It's not just a matter of overwriting the wrong logic elements: The initial and partial bitstreams are tightly coupled, in particular because the initial bitstream uses logic and routing resources within the area that the partial bitstream alters. When the partial bitstream is set up correctly with respect to the initial bitstream, this delicate dance goes by unnoticed. If not, the FPGA will most likely go crazy, including the functionality that should have remained untouched.

Loading the partial bitstream

The delivery of a Partial Configuration bitstream to the FPGA can be done with any configuration interface, as long as the process can be done while the FPGA is running. This includes the JTAG interface, so a Partial Configuration .bit file can be loaded with the Hardware Manager as usual. But even more interesting, it can be done from within the FPGA's own logic, by using the dedicated Internal Configuration Access Port (ICAP). This port can be used only for Partial Reconfiguration, since the FPGA's logic fabric part that performs the configuration must remain intact throughout the process.

The ICAP is just an interface to the FPGA's configuration subsystem, and doesn't dictate anything about the source of the bitstream. Accordingly, there is no limitation on how the bitstream data reaches the FPGA, or where and how it's stored. It just has to somehow be available to the piece of FPGA logic that feeds the ICAP.

For example, Xillybus offers a simple means for sending a bitstream file to the ICAP from a computer over a PCIe or USB 3.x interface, if the board has such.

Static logic

To get Partial Reconfiguration right, one must give respect to the opposite part: The static logic. It's a general term for the parts in the FPGA design that must remain untouched and are hence present from the initial bitstream configuration.

This logic is static in two aspects: The functional aspect, meaning that it consists of the parts in the FPGA design (HDL and IP) that will function without interruption from the FPGA's initial configuration. The second aspect, which is no less important, is that the placement of this logic is limited to sites in the logic fabric that are allocated as static. So there are certain physical sites and resources that are off limits for any later manipulations.

In a real-life design, it's not enough that the static logic remains untouched, but that it continues to function properly while weird things happen in the FPGA. As there are almost certainly nets connecting between the static logic and the the logic that is being altered, it's up to the FPGA designer to make sure everything remains smooth. The third post in this series discusses this.

Separating static and reconfigurable logic

For Partial Reconfiguration to be even possible, there must be a strict separation between the static and reconfigurable logic. In particular, the physical logic elements on the FPGA must be separated, so that none of the sites that contain static logic are affected as the FPGA is reconfigured.

To understand what this calls for, let's first look at what we are all used to.

Recall that the regular FPGA implementation flow starts with a synthesis of the HDL design. Note that instantiation of modules in HDL doesn't imply any separation between them. On the contrary: The synthesizer treats instantiations as a description of how the logic should work, but is free to consider the entire design as one big, flat, piece of logic. Optimizations across module boundaries is allowed, desired and occurs a lot. For example, if a register in module X happens to be equivalent to some completely unrelated register in module Y, one of them is removed and the remaining one is used in both modules (unless the synthesizer is specifically told to refrain from that).

Once the HDL has been synthesized, the synthesized netlist is mixed with those of the design's IPs (if any).

Next, this big chunk of logic elements is placed and routed all over the FPGA's logic fabric in the way that meets timing constraint and other goals. Logic belonging to different parts of the design can be packed into the same slice, or in opposite parts of the FPGA. Even the smallest change in the design can cause a dramatically different placement. This is chaotic but harmless, since each implementation is independent, and who cares how the logic is scattered on the FPGA's logic fabric.

Back to Partial Reconfiguration: As just mentioned, to make it even possible there must be a clear distinction between static and reconfigurable logic. To ensure this, a technique called Hierarchical Design is employed. The idea is to look at the entire design as a collection of components, like physical components on a PCB. One side of this is that each component (ehm, instantiated module) is assigned a certain area on the logic fabric. And since it needs distinction, it clearly makes sense to synthesize it separately — just like you would produce the component separately.

So let's connect this concept with Partial Reconfiguration, which boils down to two main differences in the design flow:


Vivado's terminology for a floorplanning unit is a Pblock, which is just a placeholder of information inside Vivado. There are Tcl functions to create a Pblock, add logic cells to it, and then add sets of FPGA logic sites. Vivado interprets this as a placement constraint, saying that the logic cells that were added to the Pblock may be placed only in the sites there were assigned to it. So in the end of the day, Pblocks are just like other constraints in the XDC constraint file.

Pblocks are often defined using Vivado's GUI by opening the synthesized or implemented design, and drawing rectangular regions on the device's graphical representation. This creates a Pblock that includes all logic elements in the drawn rectangle of the types allowed for floorplanning (for that FPGA family). So Vivado translates the rectangle into ranges of logic elements.

It's equally fine to set these ranges manually by editing the XDC file. Also, the region is allowed to consists of several rectangles, so the shape can be more complex than just one. However Xilinx' documentation (UG909) suggests trying to keep the shapes simple to avoid difficulties in routing.

This is an example for an XDC file for Kintex-7:

create_pblock pblock_pr_block_ins
add_cells_to_pblock [get_pblocks pblock_pr_block_ins] [get_cells -quiet [list pr_block_ins]]
resize_pblock [get_pblocks pblock_pr_block_ins] -add {SLICE_X118Y0:SLICE_X153Y99 SLICE_X118Y250:SLICE_X145Y349 SLICE_X0Y0:SLICE_X117Y349}
resize_pblock [get_pblocks pblock_pr_block_ins] -add {DSP48_X5Y100:DSP48_X5Y139 DSP48_X5Y0:DSP48_X5Y39 DSP48_X0Y0:DSP48_X4Y139}
resize_pblock [get_pblocks pblock_pr_block_ins] -add {RAMB18_X4Y0:RAMB18_X6Y39 RAMB18_X4Y100:RAMB18_X5Y139 RAMB18_X0Y0:RAMB18_X3Y139}
resize_pblock [get_pblocks pblock_pr_block_ins] -add {RAMB36_X4Y0:RAMB36_X6Y19 RAMB36_X4Y50:RAMB36_X5Y69 RAMB36_X0Y0:RAMB36_X3Y69}

And the image below shows what it looks like in the implemented design. The reconfigurable partition (named pblock_pr_block_ins, which contains almost no logic in this example) is drawn in purple. Its shape is created as a union of three rectangles (the three ranges listed in each resize_pblock command above).

In this drawing, all placed logic is drawn in cyan. The vast majority of it belongs to the static region, and its confinement in a small area is evident.

Example of device view with Pblock partition

Note that only slices, RAMs and DSP48 are constrained. These are the only types of logic that Partial Reconfiguration can touch in series-7 FPGAs. Everything else, which is in practice anything but "pure logic" must belong to the static design. With Ultrascale devices and later, virtually everything can be reconfigured.

There are other restrictions on Pblocks as well, but there's no point repeating chapters 6-8 in UG909 here.

Anyhow, it's a good idea to open an implemented design, no matter of what, zoom in and out on the device view, and watch how the logic elements are organized in the FPGA. In particular note that there are columns of logic of the same type, sometimes with a few elements in the middle that break the uniformity (in particular special elements, such as the ICAP block, PCIe blocks etc.).

It's worth mentioning that the Pblocks topic is not specific to Partial Reconfiguration. For example, everything in this section applies to using Pblocks for hierarchical design as well.

More on floorplanning

And here come a couple of counterintuitive facts: Even though the graphical representation of floorplanning consists of shapes drawn on the FPGA map, it applies only to the types of logic it constraints. So if almost all slices of the FPGA are allocated for Partial Reconfiguration, islands of other logic elements that are completely surrounded by these slices may very well belong to static logic. For example, there is no problem whatsoever if the ICAP block itself is in the middle of a rectangle that is assigned for Partial Reconfiguration (constraining its slices etc.).

This isn't all that weird given that the Partial Reconfiguration bitstream targets certain logic elements and leaves others intact. But what about routing? If an ICAP block is stuck in the middle of reconfigurable logic, how are the wires drawn to static logic slices?

This leads us to the second counterintuitive fact: Routing for the static design uses resources inside the reconfigurable region. This routing remains stable throughout the Partial Reconfiguration process, or else it wouldn't be static logic. So inside the reconfigurable region, the routing for the reconfigurable logic changes, but not for the static logic. If there's anything magic about this whole topic, it's this little fact. It's also the reason why using a partial bitstream that isn't compatible with the static logic is likely to disrupt the FPGA completely.

The reverse is not true, of course: The reconfigurable logic uses no resources except for those explicitly listed as in the XDC example above. As for routing resources, nothing in the static region will change as a Partial Reconfiguration stream is loaded, so in that sense reconfigurable logic never influences the static region. Well, that's almost true. If the shape of the reconfigurable region isn't a plain rectangle, Vivado may let routing get outside the reconfigurable region on Ultrascale devices to improve routing.

What should be evident at this point, is that the rules for floorplanning are anything but simple. The good news is that Vivado produces fairly informative Critical Warnings in response to floorplanning rule violations, so it's reasonable to get this done with trial and error.

Parent and Child implementations

The important notion on implementing reconfigurable logic is that every single path in the FPGA must meet timing, before and after logic is replaced. Hence it's impossible to implement reconfigurable logic separately from the static logic. Rather, the entire FPGA is implemented to pass timing (and other) constraints for each possible reconfigurable module. In other words, with the moduleA and moduleB example above, Vivado implements the full design with moduleA, and then implements the full design with moduleB. As a biproduct, there are regular bitstream files for each of these two options.

It's worth emphasizing: All implementations produce a full initial bitstream and also a partial bitstream, Parent and Child implementations alike. The FPGA's initial configuration can be done with any implementation's full bitstream.

In order to make it possible to move from moduleA to moduleB with Partial Reconfiguration, the logic content, its placement and routing in the static partition must be exactly the same between the two. To achieve this, Vivado implements one design scenario (say, with moduleA) as the Parent Implementation, and all the others (moduleB in the example) as Child Implementations. How this is done exactly is detailed in the last post in this series, but to make a long story short:

Vivado begins with running the Parent Implementation for moduleA with no particular oddity: The static and reconfigurable logic are synthesized separately, and the floorplanning constraints enforce their placements in exclusively mutual sites on the FPGA. Other than that, it's a regular implementation. In particular, placement and routing is performed for optimal results on this specific design, subject to floorplanning constraints of course.

The next step is to implement the Child Implementation for moduleB. There is no need to synthesize the static design, as it was already synthesized for the Parent Implementation. So only the reconfigurable logic is synthesized.

The implementation is then run in the same way as the Parent Implementation, with one crucial difference: The place and route of all static logic is forced to be identical with the result of the Parent Implementation. Given this constraint, the reconfigurable logic is placed and routed for optimal results.

So the key to the relations between the Parent and Child Implementations is that a Child Implementation starts where the Parent Implementation ended, but replaces the reconfigurable logic with its own. The Child Implementation then continues as usual, but without touching anything in the static logic area.

Because all Child Implementations need to adapt themselves to the static logic's placements and routes, it might be harder to meet timing compared with implementing the design the old school way. There are in fact two obstacles:

This should be kept in mind when choosing which of the reconfigurable modules to implement as the Parent Implementation. For example, it might be the setting that is most difficult to meet timing with, or a setting that somehow includes the features of all other. Or possibly, the other way around: A reconfigurable module that effectively includes no logic at all (a "grey box") to achieve a neutral implementation of the static logic.

Vivado's paradigm for compiling a project with Partial Reconfiguration is that it ends when all bitstreams are up-to-date and mutually compatible. In other words, any of the implementations' initial bitstreams can be used for a full FPGA configuration, and this can be followed by loading any of the implementations' partial bitstreams.

Hence the first "Generate Bitstream" launches the Parent and all Child Implementations. In follow-up compilations, Vivado launches only runs that need update, as always.

The Dynamic Function eXchange Wizard

The purpose of this Wizard, which can be launched from the Tools menu, is to define the Parent and Child Implementations, and in particular which one contains which reconfigurable module.

It's easiest to explain this Wizard by looking at the Tcl commands it makes when adding a Child Implementation:

create_reconfig_module -name bpf -partition_def [get_partition_defs pr ] 
add_files -norecurse /path/to/pr_block1.v  -of_objects [get_reconfig_modules bpf]
create_pr_configuration -name config_2 -partitions [list pr_block_ins:bpf ]
create_run child_0_impl_1 -parent_run impl_1 -pr_config config_2 -flow {Vivado Implementation 2020}

I'll go through this Tcl sequence backwards, from the last line to first:

So in the last line a Child Implementation run is created. The new run is assigned the name "child_0_impl_1" and its parent run is set to be "impl_1". No less important, the configuration of this new run is set to be "config_2".

"config_2" is defined in the third line, saying that "bpf" is the reconfigurable module that goes into "pr_block_ins" reconfigurable partition. "pr_block_ins" has been mentioned above, but what is "bpf"?

In the first line, a reconfig_module is created and named "bpf" — this is just any name that is convenient for telling what the logic does. The second line says that a certain Verilog module is added to this reconfigurable module.

So all in all, these four lines create a new Child Implementation and say that a certain Verilog file should by synthesized as its reconfigurable module. On the way, two Tcl object were created, "bpf" and "config_2".

The Dynamic Function eXchange Wizard is a GUI representation of the relations that are bound between design sources, reconfigurable modules, configurations and implementation runs. It's just a convenient way to communicate the information for producing Tcl commands like those shown above.

This setting may seem overly complicated, but that's because the example is simple. Among others, the reconfig_module object is likely to have more source files, and possibly IPs assigned to it, in a realistic design.

But why is the configuration ("config_2") necessary? Why isn't the assignment of "bpf" into "pr_block_ins" made with the create_run command? Once again, that's a legitimate question because this post is restricted to only one reconfigurable partition. If there are several such partitions, a configuration defines which partition gets which reconfigurable module, so it makes sense to give each such combination a name, like config-something.

So if there are multiple partitions, is it required to make an implementation for each possible combination of reconfigurable modules? This question isn't relevant for this series of posts, so feel free to skip to the next section.

Recall that Vivado implements the entire design, the static and reconfigurable logic together, and ensures that it meets timing constraints as a whole. Therefore, if there are multiple partitions, the bulletproof way to use Partial Reconfiguration is to load all partitions with their partial bitstreams, all being outputs of the same implementation run, having the configuration for that specific combination.

Having said that, if the reconfigurable modules have no mutual interaction, i.e. all their ports' paths end at static logic, I can't figure out what could go wrong with treating each partition separately. Indeed, Vivado hasn't explicitly approved timing of the FPGA as a whole if partial bitstreams from different runs are mixed and matched. But since all paths with the static logic have met timing, and the static logic is exactly the same on all implementation runs, isn't that good enough? The documentation doesn't seem to address this issue.

Routing and Partition Pins

There's one piece missing in the puzzle: The routing that connects between the static and reconfigurable logic. Recall that the Parent Implementation places and routes the design in the optimal way for that reconfigurable logic. But then the child's reconfigurable module needs to fit into the same reconfigurable area, and connect with the static design. At least some part of the routing belongs to the static logic, and can't change.

This is where Partition Pins come in. Conceptually, one can think about the reconfigurable logic as a physical device, and the partition pins as the metal pins that connected to the PCB.

However in reality, partition pins are just positions in the coordinate system of the FPGA's routing resources. They are the places where the static logic's routing ends, and the reconfigurable logic's routing should continue. Their only importance is that the Parent and Child Implementations agree on their positions.

No physical resources such as LUTs or flip-flops are required to establish these anchor points, and no additional delay is incurred by them. The routing segments leading to and from the partition pins create delay, of course, but the partition pins themselves don't.

The positions of the partition pins are set during the Parent implementation, and Child implementations are forced to follow. In other words, the routing between the static and reconfigurable logic starts where the Parent Implementation said it will, and the Child Implementation can only try its best inside the reconfigurable area. It may turn out that some partition pins are placed in unfavorable positions for the Child's reconfigurable logic, which might cause difficulties meeting timing.

Partition pins are often grouped together somewhere close to the reconfigurable area's perimeter. It seems like Vivado is designed to select routing sites that aren't too specialized for a particular design.

However partition pins will be found anywhere inside the reconfigurable partition, if that was necessary to meet timing constraints during the Parent Implementation. Recall that the static design is allowed to use routing resources inside the reconfigurable area, so there's no problem that some of the static routing segment goes there into.

To prevent timing issues with partition pins, it's beneficial if the output ports of the reconfigurable module are registers, and that the inputs are sampled by registers as well. Likewise, the static logic is better off applying registers similarly. In fact, it's always a good idea to follow this rule, whenever possible and without complicating the design.

The Greybox

One more thing about the DFX Wizard is the greybox: In the Edit Configuration window, it's possible to assign a greybox as the reconfigurable module, rather than one of the proper reconfigurable modules. A greybox is a bogus module that is generated by Vivado. It fits the ports of the real reconfigurable module, but instead of real logic, there's one LUT for every port pin. The LUTs that are generated for inputs are connected to nothing on the other end, and the LUTs for outputs produce a zero value. For vector ports, a LUT is created for each bit in the vector.

It's may not be a good idea to use a greybox in the Parent Implementation, as it makes it too easy for the placer and router. Even if the reconfigurable modules are very different from each other, it's probably better to write a mockup module that challenges the tools somewhat.

But for the sake of creating an initial bitstream file with minimal logic, a child implementation with greyboxes only will do the trick. Recall that all implementations produce full bitstreams, that can all be used as the initial bitstream, since they all have the exact same static logic.

Clearing Bitstreams (Ultrascale only)

This relates only to non-plus Ultrascale devices.

As mentioned above, all implementations end with two bitstreams: One bitstream for the entire design, used for an initial configuration of the FPGA with the related reconfigurable module inside. The second bitstream is for Partial Reconfiguration with the same reconfigurable module.

For Ultrascale devices, there's a third bitstream, the "clearing bitstream", generated for each implementation. This bitstream needs to be sent to the FPGA before the partial bitstream. Note that the clearing bitstream matches the logic currently in the FPGA, and not the one to be loaded next. Hence there's a need to keep track of the FPGA's current situation, which is not necessary with other FPGA families.

Loading the clearing bitstream effectively shuts down the reconfigurable module, even though it doesn't really change the logic. The output ports of this module may contain show any value (until a new partial bitstream has been loaded and started up).

According to UG909, loading the clearing bitstream of the reconfigurable module not currently loaded may disrupt the static logic as well, and hence get the reconfiguration mechanism stuck.

Xilinx' documentation seems to be vague regarding what happens if a partial bitstream is loaded without the clearing bitstream first. In chapter 9 of UG909 it first says: "Prior to loading in a partial bitstream for a new Reconfigurable Module, the existing Reconfigurable Module must be cleared". So the conclusion would be that the clearing bitstream is mandatory.

But a few rows below, it goes "If a clearing bit file is not loaded, initialization routines (GSR) have no effect". This implies that it's OK not to use clearing bitstream at all, if it's OK that all synchronous elements (RAMs and flip-flops, in principle) wake up in an unknown state. In my own anecdotal experiments with skipping the clearing bitstream I saw no problems, but that proves nothing.

So with Ultrascale devices, there's definitely a need to keep track of what's loaded in the FPGA. This can be done, for example, by adding an output port to the reconfigurable module, which is assigned to a per-module constant ID code. Something that can make sense regardless.

Plugin vs. Remote Update use cases

This parent-child methodology is apparently designed for a specific use case, that I'll call the plugin usage: Reducing the FPGA cost by plugging in the reconfigurable module currently required, rather than having all possibilities statically loaded all the time. For example, if the FPGA is used to implement one of a list of image filters, implement that filter as a reconfigurable module, and reconfigure the FPGA for the sake of switching from one filter to another.

Xilinx calls Partial Configuration "Dynamic Function eXchange" (DFX), seemingly reflecting the primary intended use of this technique.

When this is the purpose of Partial Reconfiguration, the parent-child flow works well: A complete kit of bitstream files are generated. Any of the initial bitstreams can be used to initialize the FPGA, and all reconfigurable bitstreams can be used later for Partial Reconfiguration. When a new version of the project is released, the entire kit, all bitfiles, are replaced.

But there's another usage pattern, which I shall refer to as Remote Update. It's when Partial Reconfiguration is used as a means for version upgrades, possibly in the far future. In this usage scenario, the initial bitstream is released at some point in time, and can't be changed afterwards. Later on, partial bitstreams are released, all of which must be compatible with the initial one. These later releases may very well span across years.

For Remote Update, the parent-child flow may be tricky to work with as is: Even though the it's possible to run the Child Implementation to obtain a new partial bitstream without re-running the parent, an accidental source code update may invalidate the parent design, cause it to re-implement. As a result, the static logic of the new Parent Implementation will be incompatible with the previous one, and hence the partial bitstream can't be used with the original initial bitstream.

So if Partial Reconfiguration is intended as a means to successively update the FPGA design over time, the implementation flow needs some manipulation. This topic is discussed in the the last post in this series.

Compressing bitstreams

This isn't directly related to Partial Reconfiguration, but it's sometimes desired to have a short initial bitstream file, in particular for ensuring a quick initial configuration of the FPGA. In this context, Partial Reconfiguration becomes a means to complete the configuration process, possibly from the same data source (e.g. the SPI flash) or from a completely different one (e.g. a PCIe interface).

Compressing the bitstream is allowed for both the initial and partial bitstreams.

This is the line to add to the .xdc constraints file to request a compressed bitstream:

set_property bitstream.general.compress true [current_design]

This concludes the theoretical part. The next post shows how the practical steps for configuring a project to use Partial Reconfiguration.

Copyright © 2021-2022. All rights reserved. (59ca02e6)