Newsletter


April 24, 2007

Rigorous Automated Verification Yields High Quality Silicon



Our passion to achieve high quality silicon led us down a new road when it came time for functional verification of a project larger than any previous ASIC in our team's history. In this paper we describe why our functional verification methodology yielded functionally robust silicon that has our customers begging for production parts.

Although we could endlessly blather about our design verification methodology, if our ASIC came back from the fab showing no life or demonstrating mediocre quality, well, the blather would be meaningless, right? But that's not what happened. We plugged our ASIC onto a demonstration board and it worked. After rigorous silicon validation we realized that we have hit the goal to produce robust functional silicon with no surprises! We passionately pursued this goal while deploying Cadence's Specman eRM verification methodology on this PCI Express Switch ASIC project. We would love to take you back into the Lab where we have all 4 ports on our PCI Express switch pumping a huge amount of traffic and driving 8 simultaneous videos on the LCD. Seeing it work is worth a million words and achieving robust functional silicon is priceless!

Our Challenge
The PCI Express Switch project was launched in 2004. It would be a chip much larger and much more complicated than projects tackled in our team's history. Also, the PCI Express protocol presents its own challenges and complexities over its precursor, the PCI protocol. The switch would be our first 3+ million gate design with multiple physical layer interfaces, or PHYs , on-chip and containing a large number of memories. Since it was a platform development effort it had additional challenges to support deployment of future flavors with feature variations such as configurable number of ports and throughput levels. Coupled with the physical complexity, came a tremendous amount of behavioral complexity. Meeting PCI Express compliance would become a major effort especially if we wanted our first silicon to be marketable. Just getting the device powered up and initialized, traversing all the power-up states and combinations, is a phenomenal verification effort in itself. And with that begs the question, "How do we verify the millions of combinations of states that this device can go into?"

For the most part, verification efforts on previous projects utilized directed test cases. Directed tests are typically created to exercise a very specific feature, often re-using identical initialization patterns from other tests to get the DUT to a known state, then injecting a single, basic transaction—all of this done with the checking built into the test. However, given the enormity of the problem, we did not have the time or resources needed to write each test to exercise each individual feature.

We were also concerned that after design changes we would not be sure if the tests are still hitting their intended target? Changes in the design might make directed tests ineffective and require them to be modified. The bottom line was that we did not want to get tied down writing directed tests.

The areas we didn't think about also concerned us; corner case scenarios reflecting an unthinkable combination of events could become a show-stopping bug. It was quickly apparent that we needed some type of test generation automation system which would create more stimulus combinations from each simulation, but de-coupled from the checkers whose responsibility were to enforce proper DUT behavior. The environment would need to be intelligent, create behavior in a realistic manner and know what it can and cannot do. It was also important to have a flexible test development flow that allowed us to create a general flow of events which mimicked the normal chip behavior while also providing the ability to randomize interesting behaviors, effectively creating different flavors of that flow.

Because we were planning to take advantage of automated test generation, the understanding of what functional behavior the tests exercised, and more importantly the combinations of features it exercised, would be a cornerstone to our verification process. Functional coverage would be our driving force and our measurement of completeness. This is how would we know if the tests were still hitting their target.

Using a bottom up approach, verifying the individual modules first and then moving vertically upward to verify the entire chip, we felt was a good approach. Thus, we needed to develop module based verification environments that could be easily reused at the chip level. Also, we needed an environment that could be reused and reconfigured to verify derivatives of this switch. This would help us scatter initial development costs over a family of similar products by avoiding having to recreate the verification environment every time.



Our Solution
We turned to Cadence for help and more importantly to their verification methodology. Cadence Consulting Engineers (CEs) introduced our team to a brainstorming process called vPlanning which helped us to quickly define all of our areas of concern. Cadence verification consulting engineers spent about 3 days with the entire verification and design team in a conference room to help gather up all the areas of concern and sculpt a verification plan that we could then follow to complete the development of our functional coverage groups. We found this extreme planning approach to be surprisingly useful and were able to chip away any misunderstandings during the planning process.

We also took advantage of Cadence's verification process automation tool called Specman which executes an Aspect Oriented Programming (AOP) language called "e". The "e" language would be our foundation for developing "e" Verification Components or eVCs as shown in Figure 1. eVCs typically are created with all the necessary components such as bus functional models, checkers, coverage, stimulus libraries and other handy features needed for verification. The Aspect Oriented nature of "e" allows for code reuse and can easily be constrained to achieve the desired behavior. It would also allow us to work in these different device operation modes without too much difficulty


1. Specman creates all components of an eVC.

Cadence's methodology for developing reusable verification components was as valuable as the language itself. The eVC Reuse Methodology, or eRM for short, has evolved from over a decade of real use and success both in our company and outside it. Reading the eRM manuals and taking a look at the many eRM examples that comes with the Specman install was our first step.

The eRM provides guidance on all aspects needed to develop reusable and efficient eVCs. This includes everything from guidance on developing your environment structure down to information on how to create a reusable directory structure where your eVC code would reside.

Ports, introduced with eRM, allowed us to create symbolic references to signals which enable users to attach to one group of signals during module verification and then easily reattach to a different set of signals during chip level verification. Messaging, another feature in eRM, allowed us to create or reduce the amount of information that was sent to the screen or log files. Messaging eliminated the flood of debug messages typically seen in previous projects.

It was agreed upon that we would have constraint-able stimulus files that would contain all the knobs used to control and guide the behavior of the stimulus and device configuration. Such a stimulus file, which we called a scenario file, would look differently than your typical directed test. In our scenario file, we told the environment just enough to guide it in the right direction while allowing it the freedom to pick some interesting scenarios. This gave us the flexibility to find bugs as well as exercise areas that had not yet been exposed.

Functional behavior in our environment was easily described and tracked by Specman/e. This process would be the basis for our development of a verification environment that was Coverage Driven, in other words, coverage would dictate where we steer and constrain our stimulus to target the areas that we haven't exercised. It would also provide a checklist to tell us when we are complete.

In order to determine how many module verification environments to develop, we decided to divide them around natural boundaries where the data and control paths were clearly separate. In the high level diagram of the switch shown in Figure 2, one can see why we decided to develop 5 module level verification environments. These environments would allow us to exhaustively verify the egress port logic, ingress port logic, router, scheduler and the DLL.


2. The high level diagram shows the components of the PCIe Switch DUT.

Click here for a larger version

Each module verification environment contained a reference model (Figure 3) and eVCs to stimulate all the necessary bus behaviors and provide the necessary protocol checking.

Several stimulus sequences, which represented profiles of real PCI Express devices, were created. In order to make sure that we had a good element of realism, a subset of our randomness was dedicated toward realistic transactions. Both realistic random and fully random behavior were areas of focus since these behaviors were very beneficial and found bugs. However, the fully random behavior uncovered many more bugs and also exposed a lot of invalid stimulus. Over time, we matured our stimulus constraints to avoid these illegal combinations so that when an error occurred, it was a real bug.


3. The verification diagram shows the components of the PCIe DLL.

Our module level verification environment was designed with our chip level environment in mind. Since the blocks were rigorously tested, we realized the chip level effort of testing would not be as difficult and would only target untested features. Figure 4 shows how we reused reference models from the module level to the chip level. Test sequence libraries that fed into the BFMs were written in such a way that they were also reusable at the chip level. Certain elements at the module level that did not apply at the chip level were easily turned off. For example, at chip level, only the monitors and some of the BFMs were active. The active / passive nature of our eVCs allowed us to deactivate BFMs for the chip level verification where they were replaced with the real device.


4. Chip level verification environment showing reused verification components taken from the module level verification environment.

Click here for a larger version

Besides creating our own eVCs, we also took advantage of the PCI Express eVC developed by Cadence. The PCIE eVC provided sequence libraries that were able to issue random transactions that exposed issues in the Link interface. In order to achieve complete compliance in the future, we are looking forward to having more use of the PCI Express eVC.



Changing how we Measure and Report Progress
Of course along with using this new methodology comes a shift in how the effort is scheduled, how progress is tracked and how we determined when we were finished. This required a paradigm change in the thinking process traditionally upheld by engineers who come from the traditional discipline of a directed test based methodology. Those unfamiliar with this new coverage driven and automated test generation methodology would expect tests to be created sooner based on their past experience. They would be primarily concerned with "how many tests do you have today?" vs. "how much more functional coverage is achieved today?"

Once deployed, the power of one test scenario being run with different seeds to achieve a wider functional coverage was self-evident and convincing to those who were being exposed to this methodology for the first time. Compared to a traditional directed test based verification environment that relies on a collection of test files which contain both stimulus and checking, a sequence-based environment with automatic checking requires more development time. However, the payoffs are huge. Here is a good example: In a directed test based approach, progress may be linearly tracked by accounting for number of new test cases being written each day. With Specman, the progress is non-linear. For example, it may take 5 days to develop a more robust and intelligent environment and on the 6th day, you'll be able to generate 100 tests. After 1 week, you may have thousands of tests. Progress would no longer be measured in the number of test files executed, but in the number of packets sent to the switch, the number of random seeds used to create those scenarios, and the amount of functional coverage collected. The only thing that would limit the number of unique operations was the number of machines available on our load sharing facility, LSF, and the numbers of licenses available. Thousands of simulations could be run nightly.

After some time, we had the majority of our environment complete and had our first test scenario written. That single test scenario ran with multiple seeds which kept exposing bugs in the design and had our design team very busy while we continued to further develop the environment and create more test scenarios. Once test scenarios deployed, we tracked progress with functional coverage. Figure 5 shows a sample of some of the functional coverage behavior that was tracked during our verification execution. With a quick glance we can easily see what features needed attention.


5. Specman functional coverage gui tracks key functional behavior in the PCIe Switch.

Exposing bugs that were being tracked by severity-level and number of occurrences, measuring diversity of traffic, tracking progress in terms of functional coverage and code coverage helped create an accurate picture of the efficacy of the methodology as well as progress being made on daily basis.

We also conducted frequent verification reviews with the design team to report the progress. After presenting the coverage-to-date as well as stimulus permutations achieved via random seeds, these reviews would frequently result in the RTL engineer saying something to the effect of: "Wow, I didn't even consider that case", or "Is that really possible?" Yes, it is possible, and would not have been simulated had the stimulus been created by the directed-test methodology of the past. Kudos went to the Specman constraint solver and random-seed simulations that walked the stimulus through unimaginable paths. Sometimes the bugs uncovered by the random stimulus were more than bugs, they were unimplemented areas in the DUT. This underscores the necessity to have stimulus and checking that are independently borne of the functional specification, rather than tied to the RTL implementation.

Team Synergy is Equally as Important
While a good set of tools and plenty of documentation and guidance can get you off to a good start, it won't do the work for you. Team synergy was just as important as an advance verification environment. Having everyone close by so we could bounce ideas off of each other and troubleshoot issues was critical. As the team adopted the new automated verification methodology, it had its fair-share of collective learning and it was important to have regular face-to-face communication among all team members to assure consistent adherence of the methodology. The slightest deviation from across-the-board consistent implementation of this methodology could lead to re-do of the effort or cause less-than-optimal deployment of constraint-driven stimulus generation.

Any design verification effort is incomplete without mentioning the rubber-meets-road value of emulation. Our design team was fortunate to be married to an applications team that puts the ASIC RTL into an FPGA platform for at-speed, real-time emulation with off-the-shelf motherboards and endpoint devices. The quality of design that was handed to the FPGA emulation team allowed their efforts to focus not on minute RTL bugs, but rather end-to-end compatibility and compliance testing. This was a natural and orthogonal extension to the functional success of the entire project—particularly because RTL simulations were still limited to a few seconds of simulation time a remarkably rigorous few seconds, we might add.

Another result of the disciplined verification methodology was the ability to replicate bugs seen in the FPGA emulation simply by tweaking stimulus constraints in our Specman environment. Not only did this allow faster FPGA build turnaround time, but paved the way for a solid emulation platform so that the moment ASIC silicon entered the lab, the platform regression testing was completed in no time.

Competition Breathing Down your Neck
The competition is pretty fierce in the PCI Express business and there were several products that appeared to be ahead of us in the game. Knowing that the quality of the first-released silicon would speak for itself, there had to be a balance between "first to market" and "silicon quality." Our methodology and goal was not just to finish, but to finish and create a product with the least amount of bugs as possible. This strategy was aligned with the future desire to quickly develop derivative parts to stay ahead of the competition. Cadence's Specman "e" Reuse Methodology helped us to create a verification environment as scalable as our design architecture—therefore, enabling verification of derivative parts in a resource-efficient and cost-effective manner.

Summary
With only three verification engineers dedicated to a 3+ million gate design, we were able to develop a top-notch verification environment comprised of over 100,000 lines of code from scratch. This enabled a chain reaction of events—from relying more on the overnight server-farm simulations while reducing human test-writing time, to delivering higher quality RTL to the FPGA emulation team, to producing customer-worthy first silicon. All of this could not have been achieved without a strict adherence to eRM, coupled with Cadence's world-class support to create a successful coverage-driven verification flow on the PCI Express switch. As proven in silicon, we are steadfast in saying our verification methodology surpassed previous—and tedious—methodologies used by our team. Therefore the future of verification simulation belongs not in petty directed test writing, but in constrained-random stimulus and automated checking. This shifts the burden of verification rigor out of test creation and into functional coverage closure via the attack of random-seed stimulus on multiple, parallel machines nightly. Now, the only limit is machine availability and disk space. And that problem is left for the bean counters to solve.

About the Authors:
Scott Morrison
is the lead design verification engineer for digital and mixed-signal IP provided by the mixed signal IP development group at Texas Instruments in Dallas, Texas. Scott graduated from the University of Florida in 2003 with a Masters of Engineering specializing in Digital Hardware and Signal Processing.
Henry N. Angulo is a senior member of the Technical Staff at Texas Instruments in Dallas, Texas. Henry has spent four years as an avionics technican at USMC working on communication and navigation systems.
Asad Khan is a design verification lead engineer for PCI Express, 1394 and Consumer Electronic Digital Interface Business-related projects at Texas Instruments. He graduated with BSEE (summa cum laude) in 2001 from University of Texas at Arlington.