Most blockchain upgrades take a long time. Bitcoin has only been upgraded a handful of times in its 15 years of existence. Ethereum’s upgrade to proof of stake took 7 years of development. There’s a reason for that.
Upgrading a live blockchain network is like changing a plane engine mid-flight. Failure leads to catastrophic results, and upgrades should be taken with utmost precaution. As a result, networks often go through extended periods of development.
And implementation takes just a fraction of that development time. Most of development is spent on long phases of testing and monitoring to ensure that everything works, and every edge case has been tested, before pushing the upgrade live on mainnet.
With the Nakamoto upgrade on Stacks, the blockchain working group introduced a lot of new testing capabilities to make the Stacks network more robust and to ensure that this upgrade went smoothly. Core devs rolled out several types of testing capabilities that were new to the Stacks network, and then created new dedicated environments for each of those tests.
New Testing Capabilities on Stacks
Before Nakamoto, testing on Stacks was limited to primary testnet, and testing on Stacks mostly focused on unit tests, integration tests, and e2e testing. All 3 of those are valuable, but the ecosystem needed much more robust testing to prepare for the Nakamoto upgrade.
Stacks core engineers added many new forms of network testing in the last year to make the chain more robust. These tests were added to prepare for Nakamoto’s release, but the ecosystem now has these testing capabilities moving forward for every future release.
Those new types of testing include:
Mutation Testing
While having good test coverage is a good indicator of code quality, the quality of the tests themselves matters even more. Mutation testing allows devs to check and improve the quality of tests by injecting mutants (bugs) into the code to identify tests that don’t catch the bugs. This allows devs to consistently improve the quality and coverage of tests.
With Nakamoto, the blockchain working group enabled automated mutation testing for every PR. This automation generated several mutations against the existing test suites and allowed us to catch all kinds of bugs, including off by one errors, boundary errors, and more.
For example, mutation testing helped detect several gaps in integration tests where several functional paths were either not tested at all or the quality of tests wasn’t sufficient.
Fuzzing
Fuzzing is an automated testing approach that injects randomized, unexpected, and edge-case data into smart contracts. This process uncovers unexpected behavior and vulnerabilities that otherwise take longer to uncover or may be harder to detect.
The blockchain working group set up a chaos-driven, stateful fuzzing for rigorous testing of Stacks nodes via Clarity smart contracts. This was valuable in catching various P2P and network bugs. This testing helped us improve node behavior considerably. For example, we identified and fixed an issue related to stackerdb where block production slowed down under heavy traffic, this was fixed over a few changes (see here and here for examples), and there’s another issue lined up for a later fix.
These methods also helped us uncover a few issues with the pox-4 contract and Clarinet, all of which have been fixed.
Integration Tests
Stacks core devs also set up a strong suite of integration tests that covered edge cases, and multiple scenarios were run against each commit. This process ensured that there wasn’t any regression in existing behavior while also validating that new changes worked as expected.
These integration tests improved test coverage to include all e2e behavior, code paths, and logic.
Functional Tests
The blockchain working group also tested the core blockchain changes with functional tests. Similar to integration tests, these tests use APIs and test specific use cases to ensure that tools and apps work as expected.
To execute these tests, core devs built an automated testing repo that allowed them to test various stacking scenarios and use cases crucial for builders. These tests were run against all testnets as well as local setups in Clarinet to observe and validate behavior across different setups and environments.
The major areas of focus for functional testing and the outcomes - including bug fixes, improved testing and other changes can be found in this checklist. At a high level, that checklist includes:
Chaos Testing
Stacks core devs used Toxiproxy to test various degraded network conditions, such as partial message loss, throttling, and connectivity dropouts. This was integrated into a docker-based regtest environment to allow controlled experimentation across multiple network participants, including miners, signers, and followers.
This allowed core devs to test and experiment by deliberately disrupting communication in RPC channels and P2P links, examining the censoring of “flash blocks” during critical prepare phases, and measuring system behavior through key metrics, including block height progression, block reward cycles, and tracking the number or percentage of nodes that became stuck under adverse conditions.
With chaos testing, the blockchain working group investigated how varying block production speeds and prolonged block intervals (modeled after Bitcoin’s timing constraints) affected the system’s stability. And by pushing the network to its operational limits, core devs were able to gather valuable insights into fault tolerance. Those findings allowed them to tweak the protocol, strengthen infrastructure, and move the system toward greater robustness.
Forking Behavior
The blockchain working group also tested forking behavior of the Stacks chain against various scenarios, which uncovered missing tests and logic paths. For example, while testing a fast block halt following a new burn block (which is the expected behavior), we uncovered a missing integration test.
Miner & Signer Behavior
Alongside testing forking behavior, core devs spent a lot of time testing miner and signer behavior on the Stacks network. The working group identified several use cases and scenarios to test, which highlighted the gaps in existing integration tests (gaps that were eventually filled). This process added a safety net, such that future implementations are built on a well-tested codebase and bugs can be caught upfront.
Those use cases and scenarios include:
- Signer authentication: A missing unit test was added to ensure that only the current tenure miner can propose blocks, validating the signature and key during proposal.
- Signer rollover: A missing integration test was added to verify that when signers change, the miner rejects signatures from the old set of signers, ensuring security and continuity. This also verifies that tenure changes lock in at the chain tip, maintaining consistency and preventing unauthorized modifications
- Miner behavior during an epoch upgrade: This integration test validates miner behavior during epoch upgrades at specific block heights, ensuring seamless operation and adherence to protocol changes.
- Mempool anchoring: This integration test ensures that miners anchor transactions that cannot be added to the mempool in valid blocks, maintaining blockchain integrity.
- Invalid proposal rejection & handling: These tests validate that the signer rejects invalid proposals from the miner, ensuring that only valid blocks are accepted into the Stacks blockchain. This includes test scenarios where the miner proposes a bad or invalid block, ensuring that the signer does not accept it. These tests also include verifying if the /validate RPC call by the signer effectively screens out invalid blocks.
- Invalid block & unknown parent handling: This led to an additional integration test case to evaluate how the integration handles scenarios where the signer encounters an "UnknownParent" error, ensuring resilience and proper error handling in blockchain synchronization.
Performance Testing
In addition to all the functional stability testing, core devs also periodically profiled various testnets to understand and optimize network, miner and signer behavior. These benchmarks helped iteratively improve the overall performance of Stacks. As an added benefit, these optimizations have significantly sped up a genesis sync for a Stacks node from an order of days to to an order of hours, which significantly eases the effort required for node operators.
Why Introduce New Testnets?
In order to leverage all of the new testing capabilities above, the blockchain working group launched multiple pre-launch testnets, each with a specific focus in different kinds of testing. These various pre-launch testnets were designed to fail fast and catch bugs early while emulating live traffic and various participant setups to mimic the various network topologies that can exist on mainnet.
The rationale behind multiple testnets was simple: many different teams in the ecosystem rely on Stacks testnet already. Upgrading that environment comes with its own risks and potential harms.
A lower risk solution was spinning up Nakamoto rules in entirely new testing environments. These new environments also come with benefits: you have more controlled environments, which enables faster feedback loops and greater control of testing variables.
Nakamoto’s Testing Networks
Over the course of testing, the blockchain working group launched and tested on multiple testnets:
Pre-Launch Testnets
As the first line of testing, the blockchain working group launched multiple accelerated testnets with shorter stacking cycles for tighter feedback loops. These testnets were intended for core engineers working on Stacks consensus to test changes frequently and enabled them to reset/upgrade the network as needed. Over the course of Nakamoto’s development, 4 different pre-launch testnets were spun up to isolate and test different class of problems:
- Core node testing with a setup of just a single miner
- A setup with multiple miners and signers with added network latency to emulate real traffic
- A setup with multiple miners and signers with load testing and fuzzing.
- A setup with multiple miners and signers at scale that would be closer to mainnet
Nakamoto Testnet
Alongside the pre-launch testnets, the blockchain working group launched the Nakamoto testnet, a more formal testing environment for Nakamoto. This environment is public and has a 1-week stacking cycle (the same as primary testnet).
Unlike primary testnet, the Nakamoto testnet doesn’t have chainstate and history, which enables devs to reset the entire testnet as needed (a feature that the working group used more than once). This environment was used as a pre-testbed for all release candidates, and the environment gave signers and builders the option to onboard, test, and provide rapid and early feedback on those release candidates. This was a valuable environment for core-eng to break things and move fast.
These versions of pre-launch testnets and Nakamoto testnet allowed Stacks devs to identify many bugs and improvements before the code ever hit primary testnet. This was a critical improvement for testing and would likely be a process that Stacks devs maintain for any future major network upgrades.
Primary Testnet
After extensive testing on the above testnets, the blockchain working group launched the Nakamoto upgrade on primary testnet on October 10th, 2024. The Stacks primary testnet is the ecosystem’s core testing environment.
Primary testnet was first launched in April 2020 and has extensive history and chainstate. This environment gave the blockchain working group the ability to test the Nakamoto upgrade against a chain with history.
Testing on primary testnet leading up to the hardfork was focused on successful signer-miner interactions, throughput, tenure changes being successful, and successful signer set handover past 1 stacking cycle.
The working group tested everything they needed to in this environment, and just 18 days later, Nakamoto was ready for prime time on mainnet.
Conclusion
With all of these testing capabilities, the Stacks network is in a much more robust state than it was in a year ago, and the next time the ecosystem goes through an upgrade, these testing capabilities are already in place to make that process easier.
To learn more about Stacks core or contribute, you can follow the repo here.