01signal.com

Remote Update with Partial Reconfiguration on Vivado

This is the last post in a series of four about Partial Reconfiguration, or Dynamic Function eXchange (DFX) with Xilinx' Vivado. It's mainly intended for those who want to use Partial Reconfiguration in a Remote Update scenario. This post is written assuming you've already read the previous three.

Introduction

Having discussed Partial Reconfiguration in general, and then Vivado's usual procedure for this purpose, this post sets the ground for using this technique for Remote Update of the FPGA's logic.

The main issue with this usage scenario is that the partial bitstream needs to be compatible with the initial bitstream that was generated possibly years before. Hence the original Parent Implementation must be available during the implementation of the reconfigurable logic, or it must be regenerated to produce exactly the same result — that is, with exactly the same place and route.

It may be possible to keep the entire Vivado project intact so that rerunning the Parent Implementation is avoided. I may also be possible obtain exactly the same result if this implementation is repeated. It's however difficult to ensure the possibility to release partial bitstreams in the future, based upon either of these two methods.

There is a reliable solution to this problem, but in order to understand how it works, it's necessary to first be familiar with DCPs and OOCs. A brief introduction to both topics follows.

The Design Checkpoint (DCP)

Recall that Vivado carries out an FPGA design's implementation by virtue of several Design runs, which typically named synth_1, impl_1 as well as additional runs that are categorized as Out-of-Context module runs (OOCs).

Each such run consists of executing a Tcl script, which generates a temporary "in-memory project", loads design files, sets properties and attributes, and calls Tcl functions that carry out synthesis, placing, routing, the generates bitstreams, and other tasks.

This in-memory project creates no files on the disk, and has nothing to do with the Vivado project that is visible in the GUI. It's an object in memory that allows performing many sequential operations with Tcl commands.

As the implementation progresses, Design CheckPoints (DCP) files are written to the disk (by virtue of the write_checkpoint command in Tcl). The content of a DCP file is a snapshot of the in-memory project. In other words, it's a database that reflect the design files that have been loaded and the operations that have been made on the project since they were loaded.

For example, the synthesis run (which is usually named synth_1) may load all HDL files, constraint files and IP files, and then use a Tcl command called synth_design in order to carry out the synthesis of the HDL files. The result is one big netlist from all these sources (including the IPs). Note however that the netlist is stored in memory, as part of the in-memory project. Accordingly, the synthesis run's Tcl script makes a function call to write_checkpoint in order to create a DCP file, which is the product of the synthesis. This concludes the synth_1 run.

In fact, the DCP that is generated by synth_1 is called a netlist DCP even though it usually also contains constraints from the XDC files.

The implementation run that comes afterwards, which is usually called impl_1, creates a new in-memory project and reads this netlist DCP (among others) as the starting point for the following operations. This run writes several DCP files, each being a snapshot of the project after a processing stage (e.g. optimize design, place, physically optimize, route etc.).

All runs, including synth_1, can load DCPs into their in-memory project. This is what they usually do.

Out Of Context (OOC) Module Runs

In Vivado, IPs are usually configured by virtue of a GUI tool. When this is done, a script generates source files (primarily HDL and constraint files) and performs a synthesis on these. This results in a netlist DCP, which is then loaded in the main project's synthesis run and implementation runs. This reduces the time it takes for the implementation of the entire project to complete, by eliminating the need to regenerate the sources of the IPs and carry out their synthesis over and over again.

The Vivado run that takes the raw products (an IP's configuration information, some HDL files, or whatever there is) and turns that into a netlist DCP, is called an Out-Of-Context run in Vivado's terminology. This term is used only in relation to Vivado, and has no other uses I know of. It's actually a rather poor choice of name.

This netlist DCP is loaded by the synthesis run and implementation runs, usually by virtue of a Tcl command, read_ip, which usually boils down to loading the DCP that has been generated by the OOC run.

Vivado also allows selecting an HDL module in the main project, and request that its synthesis is carried out it as an OOC (by right-clicking the source in the Project Manager's source tree and pick "Set as Out-of-Context for Synthesis…").

The drawback of OOCs is that the synthesizer can perform certain optimizations across boundaries of modules when it gets the entire design as a single project. So individual synthesis by virtue of OOCs may reduce performance as well as waste resources.

OOCs and DCPs with Partial Reconfiguration

When a source file has been selected as the top-level for a reconfigurable module, Vivado creates an OOC run for the synthesis of this module and its submodules. This run generates a netlist DCP for use in the relevant implementation. This is true for the Parent Implementation as well as Child Implementations: The reconfigurable module is always represented with a separate DCP.

This netlist DCP is however used differently in Parent Implementation and Child Implementations: The netlist DCP that is assigned to the Parent Implementation is loaded by synth_1 as well as impl_1, just like it does with an IP in a regular implementation. The fact that Partial Reconfiguration is involved influences this process mostly through the placement constraints that are imposed by floorplanning.

The Child Implementation, on the other hand, has no dedicated synthesis phase. Rather, it mixes the reconfigurable module's netlist DCP with the final DCP from the Parent Implementation (i.e. the DCP after place and route). More precisely, the Child Implementation takes the Parent Implementation's final DCP, removes the reconfigurable logic's part, and inserts its own reconfigurable module's netlist DCP instead. A bit like removing the middle part of the vegetable for making stuffed zucchini or something of that sort.

And now is the time to break this down into Tcl commands.

The nuts and bolts of Parent-Child implementations

The place to look for how the implementation works under the hood, is in the Tcl file having the name of the project (plus a .tcl suffix). This file is in the same directory as where the implementation's files are created.

In particular, the implementation script for the child implementation is interesting. It's no coincidence that Chapter 3 ("Vivado Software Flow") in the related user guide, UG909, shows and explains that script, even if not saying so directly.

As just mentioned, there isn't very much special about the Parent's implementation, except that it relies on a DPC for the netlist of its reconfigurable module, and that floorplanning constraints are applied. It's more or less like any hierarchical design.

But then the Parent Implementation writes two bitstreams rather than one, by virtue of Tcl commands like this:

write_bitstream -force -no_partial_bitfile theproject.bit 
write_bitstream -force -cell pr_block_ins pr_block_ins_lpf_partial.bit

and then it creates the DCP that is later used by the Child Implementation:

update_design -cell pr_block_ins -black_box
lock_design -level routing
write_checkpoint -force theproject_postroute_physopt_bb.dcp

Recall that while this part is running, there is an in-memory project, which started with loading netlist DCPs, and went through place and route and all other optimizations. These three Tcl lines are executed after writing the bitstream, so the in-memory project is at the really final stage.

This is the right time to punch a hole and do the stuffed zucchini thing: The update_design command turns the reconfigurable module into a black box. In other words, all its logic is removed, giving room for other logic to come in.

Then the place and route of the design is locked by virtue of the lock_design command. After which the project's snapshot is written into theproject_postroute_physopt_bb.dcp. "bb" stands for "Black Box", of course.

The relevant part in the Child Implementation's script is as follows:

create_project -in_memory -part xc7k325tffg900-2
set_property design_mode GateLvl [current_fileset]
add_files -quiet .../impl_1/theproject_postroute_physopt_bb.dcp
add_files -quiet .../two_synth_1/pr_block.dcp
set_property SCOPED_TO_CELLS pr_block_ins [get_files .../bpf_synth_1/pr_block.dcp]
link_design -top theproject -part xc7k325tffg900-2 -reconfig_partitions pr_block_ins
opt_design 
write_checkpoint -force theproject_opt.dcp
[ ... ]

and from this point it goes on to place and route etc.

Note that the implementation script above consumes just two sources, and both are DCPs:

When the implementation continues, the fact that the first DCP was locked ensures that nothing of the static logic moves. Nevertheless, the reconfigurable module's place and route is done as usual, based upon the netlist DCP.

As a side note, if there are XCI IPs that belong to the reconfigurable module, a row like the following is added for each IP, between the two add_files commands above:

read_ip -quiet .../theproject.srcs/sources_1/ip/blkmem/blkmem.xci

This isn't special to Partial Reconfiguration, though — it's done this way in any implementation.

Just before writing the two bitstreams, the Child Implementation verifies that the routed design it has obtained is compatible with the the static logic of the Parent Implementation, in particular regarding place and route.

The Tcl command for this is something like:

pr_verify -full_check -initial /path/to/impl_1/theproject_postroute_physopt.dcp -additional /path/to/child_1_impl_1/theproject_routed.dcp -file child_1_impl_1_pr_verify.log

Note that this compares two DCP files, regardless of the in-memory project. The Parent Implementation's routed DCP is compared with the final DCP of the Child Implementation. The output of this comparison goes to a file named *_pr_verify.log.

This comparison guarantees that the partial bitstream is compatible in the sense that it can be loaded when the parent's bitstream already is already loaded in the FPGA. It goes through the static partition's logic elements as well as the routing.

pr_verify returns with a failure status if there's an incompatibility. If this happens, the creation of bitstreams in Vivado's script is prevented. It's important to keep this step in mind when writing custom implementation scripts.

There's no reason this verification should ever fail, but if it does, the bitstream generation fails with a lot of errors like "ERROR: [Constraints 18-891] HDPRVerify-08: design check point .../impl_1/theproject_postroute_physopt.dcp places instance ... at site SLICE_X118Y125, yet design check point .../impl_2/theproject_routed.dcp does not. Both check point must have the same static placement result".

These errors messages will probably reach the limit of 100, and then get silenced.

Solution for the Remote Update scenario

Recall from above that the challenge is that the result of the original Parent Implementation must be available when the implementation of the reconfigurable logic is carried out as a Child Implementation.

The brute-force solution is to make a copy of the entire Vivado project directory, along with any files it might depend on. When the need to generate a new partial bitstream arises, restore all files, force the Parent Implementation as up-to-date, and create a bitstream for the Child Implementation only. This method is technically OK, but is likely to be annoying in the long run. If you choose this way, be sure to manually run pr_verify against the original Parent Implementation's DCP file, because this method will not detect an unintentional change in the Parent Implementation.

There are two other alternatives, which are based upon the knowledge on how the Parent Implementation and Child Implementations interact, which is through two DCP files, as explained above. The obvious advantage of these two alternatives is that you know what you're doing.

The principle behind both alternatives is to ensure that the partial bitstream relies upon the two DCP files that were generated along with the initial bitstream, which I shall refer to as the Golden DCPs.

The first alternative is to carry out the implementation of the partial bitstream as a Tcl script with non-project flow. Essentially, it means running the script that Vivado creates for the Child Implementation, however modifying it to use the Golden DCPs. More precisely, modify the arguments of the link_design command and pr_verify command, so that they rely on the Golden DCPs.

The main drawback of this alternative is that running a Tcl script with non-project flow doesn't integrate well with Vivado's GUI, so it becomes considerably more difficult to process messages, open the implemented design for review etc.

The Golden DCPs hack

Ideally, it would have been possible to automatically modify the script that is generated by Vivado for the Child Implementation run, so it would relate to the Golden DCPs. Unfortunately, it doesn't seem like there's a reliable way to do that.

However Vivado allows defining Tcl scripts for execution before and after certain stages in the implementation. These scripts are executed from the implementation run's script, so they can't be used to change the run's script itself.

Nevertheless, this opens for an somewhat ugly method, which is the second alternative: The idea is to overwrite the DCPs of the Parent Implementation with the Golden DCPs. By doing so, the Child Implementation runs normally, but relies on the Golden DCPs, regardless of what the Parent Implementation happened to generate.

The advantage of this method is that the regular work habits with Vivado remain unchanged: Changes are made to the reconfigurable logic, Vivado reruns the OOC for the synthesis of the reconfigurable module, and then the Child Implementation for generating the partial bitstream. Since no changes are made in the static logic, Vivado has no reason to start its related runs.

The Tcl script that implements this Golden DPC copy method is

if { [catch {
    set parentimpldir "[ file normalize "../impl_1"]"
    set goldendir "[ file normalize "/path/to/golden"]"

    file copy -force "[file normalize "$goldendir/theproject_postroute_physopt_bb.dcp"]" "$parentimpldir/"
    file copy -force "[file normalize "$goldendir/theproject_postroute_physopt.dcp"]" "$parentimpldir/"
} errmsg ] } {
    send_msg_id golden-reconfig-1 error "Failed to copy golden parent reconfiguration file(s): $errmsg"
    return -code error
}

This script assumes that the Parent Implementation is kept in the "impl_1" directory adjacent to the Child Implementation directory (it most likely is) and that the two Golden DCPs are stored in a directory as defined in line 3 of this script.

Right-click the child run (e.g. child_0_impl_1) at the Design Runs tab, pick "Change Run Settings…" and in the dialog that opens, set tcl.pre for Design Initialization (init_design) to the script. Or, in Tcl, if the script was saved as golden_pr.tcl:

add_files -fileset utils_1 -norecurse /path/to/golden_pr.tcl
set_property STEPS.INIT_DESIGN.TCL.PRE [ get_files /path/to/golden_pr.tcl -of [get_fileset utils_1] ] [get_runs child_0_impl_1]

The important thing to keep in mind when using this script is to ignore all information that is presented by Vivado regarding the Parent Implementation, as well as any files that is produced on its behalf. Vivado may open its Implemented Design GUI and show its reports, but all this may very well be completely unrelated. So this is an opening to some confusion.

A second possible annoyance is that Vivado may run the Parent Implementation every now and then in response to a change in the project's settings, or even in the constraints file. This can be worked around by right-clicking the relevant row in the Design Runs tab, and choose "Force Up-to-Date". This menu appears only when the run is in Out-of-Date state, i.e. completed but Vivado deems a refresh is needed. The related Tcl command is e.g.

set_property needs_refresh false [get_runs synth_1]

What files to save along with the Golden DCPs

Clearly, the Golden DCPs must be kept in safe place to maintain the capability to generate compatible bitstream files in the future. Along with these, it's a actually a good idea to compress the entire Vivado project into a .tar.gz / .zip file for a quick resumption. Alternatively, or in addition to that, a project archive can be generated with File > Project > Archive… or something like

archive_project /path/to/theproject.xpr.zip -force -include_local_ip_cache -include_config_settings

Note however that both the project file as well as others in a Vivado project may contain absolute paths, so deploying the project in another directory, or on another computer, may not work as expected. This is true for the archive as well.

Which Vivado version is used is written in all possible report files and the .xpr project file, but taking a note on that won't hurt. Possibly retain an executable copy of that version of Vivado, even through there's no apparent reason why a software upgrade would matter. Mixing the Golden DCPs from one Vivado version with the reconfigurable logic's netlist DCP from another version will probably work OK. But this is not something Vivado was planned to do.

To sum this up, the minimal set of files is:

Note that the usage of clearing bitstream on Ultrascale FPGAs relates to the logic that is already in the FPGA. That's why the clearing bitstream that relates to the initial bitstream must be saved. Also note that if the initial bitstream is the result of some Child Implementation, the clearing bitstream of that implementation must be saved.

As for saving the sources of the Parent Implementation, it's important because it determines the connections between the static logic and reconfigurable logic. For example, if the sources are edited, and a port is added to the reconfigurable logic and, and this port appears the instantiation, the set of partition pins changes. Hence the netlist of the reconfigurable logic will have external pins that don't appear in the original static design.

When such mismatch occurs, the child implementation fails with an error message saying something like "ERROR: [Netlist 29-77] Could not replace (cell 'pr_block_bb', library 'work_pr_block_ins_pr_block_ins_4', file 'NOFILE') with (cell 'pr_block', library 'work', file 'pr_block.edf') because of a port interface mismatch; in strict mode, no extra ports are allowed. 8 ports are missing on the original cell. 5 of the missing ports are: 'thingy[7]' 'thingy[6]' 'thingy[5]' 'thingy[1]' 'thingy[0]'".

What actually fails in this case is the link_design command (see above), which glues static logic and combinatorial logic together.

The simple and obvious way to avoid this problem is not to change the static part of the design, and neither the port list of the reconfigurable logic.

However, it's actually OK to make changes in the static part of the project: It's just the instantiation of the reconfigurable module that must remain the same. As long as the implementation goes through smooth, including the verification at the end, there is no problem.

Summary

Even though there isn't a fully smooth solution for the Remote Update usage of Partial Reconfiguration, there are a few available strategies to achieve this goal nevertheless.

The important point is that in principle, the two Golden DCPs is all that is needed to build and verify a partial bitstream for an existing project.

And regardless how unconventional the strategies presented here might seem, keep in mind that the verification carried out by pr_verify is a comprehensive check for the compatibility between the partial bitstream and the static logic that is already in place. As long as the correct Golden DCP is used for this verification, and the test passes, there is nothing else to worry about. Plus that the Child Implementation achieves the timing constraints, of course, but that's true for any implementation of a design.

Copyright © 2021-2024. All rights reserved. (6f913017)