Version: Next

ADR 070: Unordered Transactions

Changelog

Dec 4, 2023: Initial Draft (@yihuang, @tac0turtle, @alexanderbez)
Jan 30, 2024: Include section on deterministic transaction encoding

Status

ACCEPTED

Abstract

We propose a way to do replay-attack protection without enforcing the order of transactions, without requiring the use of nonces. In this way, we can support un-ordered transaction inclusion.

Context

As of today, the nonce value (account sequence number) prevents replay-attack and ensures the transactions from the same sender are included into blocks and executed in sequential order. However it makes it tricky to send many transactions from the same sender concurrently in a reliable way. IBC relayer and crypto exchanges are typical examples of such use cases.

Decision

We propose to add a boolean field unordered to transaction body to mark "un-ordered" transactions.

Un-ordered transactions will bypass the nonce rules and follow the rules described below instead, in contrary, the default ordered transactions are not impacted by this proposal, they'll follow the nonce rules the same as before.

When an un-ordered transaction is included into a block, the transaction hash is recorded in a dictionary. New transactions are checked against this dictionary for duplicates, and to prevent the dictionary grow indefinitely, the transaction must specify timeout_timestamp for expiration, so it's safe to removed it from the dictionary after it's expired.

The dictionary can be simply implemented as an in-memory golang map, a preliminary analysis shows that the memory consumption won't be too big, for example 32M = 32 * 1024 * 1024 can support 1024 blocks where each block contains 1024 unordered transactions. For safety, we should limit the range of timeout_timestamp to prevent very long expiration, and limit the size of the dictionary.

Transaction Format

message TxBody {
  ...

  bool unordered = 4;
}

Replay Protection

In order to provide replay protection, a user should ensure that the transaction's TTL value is relatively short-lived but long enough to provide enough time to be included in a block, e.g. ~10 minutes.

We facilitate this by storing the transaction's hash in a durable map, UnorderedTxManager, to prevent duplicates, i.e. replay attacks. Upon transaction ingress during CheckTx, we check if the transaction's hash exists in this map or if the TTL value is stale, i.e. before the current block time. If so, we reject it. Upon inclusion in a block during DeliverTx, the transaction's hash is set in the map along with it's TTL value.

This map is evaluated at the end of each block, e.g. ABCI Commit, and all stale transactions, i.e. transactions's TTL value who's now beyond the committed block, are purged from the map.

An important point to note is that in theory, it may be possible to submit an unordered transaction twice, or multiple times, before the transaction is included in a block. However, we'll note a few important layers of protection and mitigation:

Assuming CometBFT is used as the underlying consensus engine and a non-noop mempool is used, CometBFT will reject the duplicate for you.
For applications that leverage ABCI++, ProcessProposal should evaluate and reject malicious proposals with duplicate transactions.
For applications that leverage their own application mempool, their mempool should reject the duplicate for you.
Finally, worst case if the duplicate transaction is somehow selected for a block proposal, 2nd and all further attempts to evaluate it, will fail during DeliverTx, so worst case you just end up filling up block space with a duplicate transaction.

type TxHash [32]byte

const PurgeLoopSleepMS = 500

// UnorderedTxManager contains the tx hash dictionary for duplicates checking,
// and expire them when block production progresses.
type UnorderedTxManager struct {
  // blockCh defines a channel to receive newly committed block time
  blockCh chan time.Time

  mu sync.RWMutex
    // txHashes defines a map from tx hash -> TTL value, which is used for duplicate
    // checking and replay protection, as well as purging the map when the TTL is
    // expired.
    txHashes map[TxHash]time.Time
}

func NewUnorderedTxManager() *UnorderedTxManager {
  m := &UnorderedTxManager{
        blockCh:  make(chan time.Time, 16),
        txHashes: make(map[TxHash]time.Time),
  }

 return m
}

func (m *UnorderedTxManager) Start() {
  go m.purgeLoop()
}

func (m *UnorderedTxManager) Close() error {
  close(m.blockCh)
  m.blockCh = nil
  return nil
}

func (m *UnorderedTxManager) Contains(hash TxHash)  bool{
  m.mu.RLock()
  defer m.mu.RUnlock()

  _, ok := m.txHashes[hash]
  return ok
}

func (m *UnorderedTxManager) Size() int {
  m.mu.RLock()
  defer m.mu.RUnlock()

  return len(m.txHashes)
}

func (m *UnorderedTxManager) Add(hash TxHash, expire time.Time) {
  m.mu.Lock()
  defer m.mu.Unlock()

  m.txHashes[hash] = expire
}

// OnNewBlock send the latest block time to the background purge loop, which
// should be called in ABCI Commit event.
func (m *UnorderedTxManager) OnNewBlock(blockTime time.Time) {
  m.blockCh <- blockTime
}

// expiredTxs returns expired tx hashes based on the provided block time.
func (m *UnorderedTxManager) expiredTxs(blockTime time.Time) []TxHash {
  m.mu.RLock()
  defer m.mu.RUnlock()

  var result []TxHash
  for txHash, expire := range m.txHashes {
    if blockTime.After(expire) {
      result = append(result, txHash)
    }
  }

  return result
}

func (m *UnorderedTxManager) purge(txHashes []TxHash) {
  m.mu.Lock()
  defer m.mu.Unlock()

  for _, txHash := range txHashes {
    delete(m.txHashes, txHash)
  }
}


// purgeLoop removes expired tx hashes in the background
func (m *UnorderedTxManager) purgeLoop() error {
  for {
        latestTime, ok := m.batchReceive()
        if !ok {
            // channel closed
            return
        }

        hashes := m.expiredTxs(latestTime)
        if len(hashes) > 0 {
            m.purge(hashes)
        }
    }
}


// channelBatchRecv try to exhaust the channel buffer when it's not empty,
// and block when it's empty.
func channelBatchRecv[T any](ch <-chan *T) []*T {
    item := <-ch  // block if channel is empty
    if item == nil {
        // channel is closed
        return nil
    }

    remaining := len(ch)
    result := make([]*T, 0, remaining+1)
    result = append(result, item)
    for i := 0; i < remaining; i++ {
        result = append(result, <-ch)
    }

    return result
}

AnteHandler Decorator

In order to facilitate bypassing nonce verification, we have to modify the existing IncrementSequenceDecorator AnteHandler decorator to skip the nonce verification when the transaction is marked as un-ordered.

func (isd IncrementSequenceDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simulate bool, next sdk.AnteHandler) (sdk.Context, error) {
  if tx.UnOrdered() {
    return next(ctx, tx, simulate)
  }

  // ...
}

In addition, we need to introduce a new decorator to perform the un-ordered transaction verification and map lookup.

const (
    // DefaultMaxTimeoutDuration defines the default maximum duration an un-ordered transaction
    // can set.
    DefaultMaxTimeoutDuration = time.Minute * 40
)

type DedupTxDecorator struct {
  m *UnorderedTxManager
  maxTimeoutDuration time.Time
}

func (d *DedupTxDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simulate bool, next sdk.AnteHandler) (sdk.Context, error) {
  // only apply to un-ordered transactions
  if !tx.UnOrdered() {
    return next(ctx, tx, simulate)
  }

  headerInfo := d.env.HeaderService.HeaderInfo(ctx)
    timeoutTimestamp := unorderedTx.GetTimeoutTimeStamp()
    if timeoutTimestamp.IsZero() || timeoutTimestamp.Unix() == 0 {
        return ctx, errorsmod.Wrap(sdkerrors.ErrInvalidRequest, "unordered transaction must have timeout_timestamp set")
    }
    if timeoutTimestamp.Before(headerInfo.Time) {
        return ctx, errorsmod.Wrap(sdkerrors.ErrInvalidRequest, "unordered transaction has a timeout_timestamp that has already passed")
    }
    if timeoutTimestamp.After(headerInfo.Time.Add(d.maxTimeoutDuration)) {
        return ctx, errorsmod.Wrapf(sdkerrors.ErrInvalidRequest, "unordered tx ttl exceeds %s", d.maxTimeoutDuration.String())
    }

    // in order to create a deterministic hash based on the tx, we need to hash the contents of the tx with signature
    // Get a Buffer from the pool
    buf := bufPool.Get().(*bytes.Buffer)
    // Make sure to reset the buffer
    buf.Reset()

    // Use the buffer
    for _, msg := range tx.GetMsgs() {
        // loop through the messages and write them to the buffer
        // encoding the msg to bytes makes it deterministic within the state machine.
        // Malleability is not a concern here because the state machine will encode the transaction deterministically.
        bz, err := proto.Marshal(msg)
        if err != nil {
            return ctx, errorsmod.Wrap(sdkerrors.ErrInvalidRequest, "failed to marshal message")
        }

        buf.Write(bz)
    }

  // check for duplicates
    // check for duplicates
    if d.txManager.Contains(txHash) {
        return ctx, errorsmod.Wrap(sdkerrors.ErrInvalidRequest, "tx %X is duplicated")
    }

    if d.env.TransactionService.ExecMode(ctx) == transaction.ExecModeFinalize {
        // a new tx included in the block, add the hash to the unordered tx manager
        d.txManager.Add(txHash, ttl)
    }

  return next(ctx, tx, simulate)
}

Transaction Hashes

It is absolutely vital that transaction hashes are deterministic, i.e. transaction encoding is not malleable. If a given transaction, which is otherwise valid, can be encoded to produce different hashes, which reflect the same valid transaction, then a duplicate unordered transaction can be submitted and included in a block.

In order to prevent this, the decoded transaction contents is taken. Starting with the content of the transaction we marshal the transaction in order to prevent a client reordering the transaction. Next we include the gas and timeout timestamp as part of the identifier. All these fields are signed over in the transaction payload. If one of them changes the signature will not match the transaction.

State Management

On start up, the node needs to ensure the TxManager's state contains all un-expired transactions that have been committed to the chain. This is critical since if the state is not properly initialized, the node will not reject duplicate transactions and thus will not provide replay protection, and will likely get an app hash mismatch error.

We propose to write all un-expired unordered transactions from the TxManager's to file on disk. On start up, the node will read this file and re-populate the TxManager's map. The write to file will happen when the node gracefully shuts down on Close().

Note, this is not a perfect solution, in the context of store v1. With store v2, we can omit explicit file handling altogether and simply write the all the transactions to non-consensus state, i.e State Storage (SS).

Alternatively, we can write all the transactions to consensus state.

Consequences

Positive

Support un-ordered and concurrent transaction inclusion.

Negative

Requires additional storage overhead and management of processed unordered transactions that exist outside of consensus state.

References

https://github.com/cosmos/cosmos-sdk/issues/13009

ADR 070: Unordered Transactions

Changelog​

Status​

Abstract​

Context​

Decision​

Transaction Format​

Replay Protection​

AnteHandler Decorator​

Transaction Hashes​

State Management​

Consequences​

Positive​

Negative​

References​