Version: Next

ADR 067: Simulator v2

Changelog

June 01, 2023: Initial Draft (@alexanderbez)

Status

DRAFT

Abstract

The Cosmos SDK simulator is a tool that allows developers to test the entirety of their application's state machine through the use of pseudo-randomized "operations", which represent transactions. The simulator also provides primitives that ensures there are no non-determinism issues and that the application's state machine can be successfully exported and imported using randomized state.

The simulator has played an absolutely critical role in the development and testing of the Cosmos Hub and all the releases of the Cosmos SDK after the launch of the Cosmos Hub. Since the Hub, the simulator has relatively not changed much, so it's overdue for a revamp.

Context

The current simulator, x/simulation, acts as a semi-fuzz testing suite that takes in an integer that represents a seed into a PRNG. The PRNG is used to generate a sequence of "operations" that are meant to reflect transactions that an application's state machine can process. Through the use of the PRNG, all aspects of block production and consumption are randomized. This includes a block's proposer, the validators who both sign and miss the block, along with the transaction operations themselves.

Each Cosmos SDK module defines a set of simulation operations that attempt to produce valid transactions, e.g. x/distribution/simulation/operations.go. These operations can sometimes fail depending on the accumulated state of the application within that simulation run. The simulator will continue to generate operations until it has reached a certain number of operations or until it has reached a fatal state, reporting results. This gives the ability for application developers to reliably execute full range application simulation and fuzz testing against their application.

However, there are a few major drawbacks. Namely, with the advent of ABCI++, specifically FinalizeBlock, the internal workings of the simulator no longer comply with how an application would actually perform. Specifically, operations are executed after FinalizeBlock, whereas they should be executed within FinalizeBlock.

Additionally, the simulator is not very extensible. Developers should be able to easily define and extend the following:

Consistency or validity predicates (what are known as invariants today)
Property tests of state before and after a block is simulated

In addition, we also want to achieve the following:

Consolidated weight management, i.e. define weights within the simulator itself via a config and not defined in each module
Observability of the simulator's execution, i.e. have easy to understand output/logs with the ability to pipe those logs into some external sink
Smart replay, i.e. the ability to not only rerun a simulation from a seed, but also the ability to replay from an arbitrary breakpoint
Run a simulation based off of real network state

Decision

Instead of refactoring the existing simulator, x/simulation, we propose to create a new package in the root of the Cosmos SDK, simulator, that will be the new simulation framework. The simulator will more accurately reflect the complete lifecycle of an ABCI application.

Specifically, we propose a similar implementation and use of a simulator.Manager, that exists today, that is responsible for managing the execution of a simulation. The manager will wrap an ABCI application and will be responsible for the following:

Populating the application's mempool with a set of pseudo-random transactions before each block, some of which may contain invalid messages.
Selecting transactions and a random proposer to execute PrepareProposal.
Executing ProcessProposal, FinalizeBlock and Commit.
Executing a set of validity predicates before and after each block.
Maintaining a CPU and RAM profile of the simulation execution.
Allowing a simulation to stop and resume from a given block height.
Simulation liveness of each validator per-block.

From an application developer's perspective, they will only need to provide the modules to be used in the simulator and the manager will take care of the rest. In addition, they will not need to write their own simulation test(s), e.g. non-determinism, multi-seed, etc..., as the manager will provide these as well.

type Manager struct {
  app     sdk.Application
  mempool sdk.Mempool
  rng     rand.Rand
  // ...
}

Configuration

The simulator's testing input will be driven by a configuration file, as opposed to CLI arguments. This will allow for more extensibility and ease of use along with the ability to have shared configuration files across multiple simulations.

Execution

As alluded to previously, after the execution of each block, the manager will generate a series of pseudo-random transactions and attempt to insert them into the mempool via BaseApp#CheckTx. During the ABCI lifecycle of a block, this mempool will be used to seed the transactions into a block proposal as it would in a real network. This allows us to not only test the state machine, but also test the ABCI lifecycle of a block.

Statistics, such as total blocks and total failed proposals, will be collected, logged and written to output after the full or partial execution of a simulation. The output destination of these statistics will be configurable.

func (s *Simulator) SimulateBlock() {
  rProposer := s.SelectRandomProposer()
  rTxs := s.SelectTxs()

  prepareResp, err := s.app.PrepareProposal(&abci.PrepareProposalRequest{Txs: rTxs})
  // handle error

  processResp, err := s.app.ProcessProposal(&abci.ProcessProposalRequest{
    Txs: prepareResp.Txs,
    // ...
  })
  // handle error

  // execute liveness matrix...

  _, err = s.app.FinalizeBlock(...)
  // handle error
  
  _, err = s.app.Commit(...)
  // handle error
}

Note, some applications do not define or need their own app-side mempool, so we propose that SelectTxs mimic CometBFT and just return FIFO-ordered transactions from an ad-hoc simulator mempool. In the case where an application does define its own mempool, it will simply ignore what is provided in RequestPrepareProposal.

Profiling

The manager will be responsible for collecting CPU and RAM profiles of the simulation execution. We propose to use Pyroscope to capture profiles and export them to a local file and via an HTTP endpoint. This can be disabled or enabled by configuration.

Breakpoints

Via configuration, a caller can express a height-based breakpoint that will allow the simulation to stop and resume from a given height. This will allow for debugging of CPU, RAM, and state.

Validity Predicates

We propose to provide the ability for an application to provide the simulator a set of validity predicates, i.e. invariant checkers, that will be executed before and after each block. This will allow for the application to assert that certain state invariants are held before and after each block. Note, as a consequence of this, we propose to remove the existing notion of invariants from module production execution paths and deprecate their usage altogether.

type Manager struct {
  // ...
  preBlockValidator   func(sdk.Context) error
  postBlockValidator  func(sdk.Context) error
}

Consequences

Backwards Compatibility

The new simulator package will not naturally not be backwards compatible with the existing x/simulation module. However, modules will still be responsible for providing pseudo-random transactions to the simulator.

Positive

Providing more intuitive and cleaner APIs for application developers
More closely resembling the true lifecycle of an ABCI application

Negative

Breaking current Cosmos SDK module APIs for transaction generation

References

Osmosis Simulation ADR

ADR 067: Simulator v2

Changelog​

Status​

Abstract​

Context​

Decision​

Configuration​

Execution​

Profiling​

Breakpoints​

Validity Predicates​

Consequences​

Backwards Compatibility​

Positive​

Negative​

References​