Postmortem: EP9 deployment

What Happened

On March 21, 2022, an onchain vote was posted that executed EP8, EP9 and EP10. EP8 and EP10 are simple USDC token transfers, while EP9 involved replacing the pricing oracle for the .eth registrar controller, enabling a new pricing scheme for recently expired domains.

The new oracle was thoroughly tested, but based on top of as-yet undeployed changes to the .eth registrar controller. One of these changes involved altering the return value from the price oracle; instead of returning a single value with the total registration fee, it was expected to return two uint256es, containing the registration fee and the premium, respectively. Due to lack of good release management, the master branch on the ens-contracts Github repo was used for ongoing development.

As a result, the version of the price oracle that was deployed used this new interface, and returned a 2-tuple for prices, while the currently deployed version of the .eth registrar controller still expected a single value. Because Solidity and the EVM do not have runtime type checking, the new price oracle contract was accepted as a valid replacement contract, and when executed the second return value was simply ignored by the controller. If the vote had passed, this would have resulted in premiums effectively being ignored, meaning all expired domains could be registered immediately at the standard price.

This issue was discovered shortly after the vote was passed when testing the new oracle against the frontend in a Tenderly fork. As soon as it was discovered, delegates were informed of the existence of an issue and the need to prevent the proposal from passing, resulting in major delegates who had not yet voted voting ‘no’ on the proposal.

Timeline

All timestamps on March 21 2022 in UTC.

  • 10:31 PM: Nick compiles, deploys and verifies an instance of the Exponential Premium Price Oracle on mainnet at 0x0B7CbeE19E219050e38B419273229fd24590555a based on the current master branch of the ens-contracts repository.
  • 10:40 PM: Nick posts an onchain vote for EP8, EP9 and EP10.
  • 10:46 PM: Nick posts to Twitter and the DAO forum asking delegates to vote on both proposals onchain.
  • 10:55 PM: Nick asks Makoto to verify that the frontend will handle registrations correctly once the new price oracle is active.
  • March 22 1:25 AM: Makoto reports that the premium price with the new oracle appears to be returning 0 for all names.
  • 1:45 PM: Nick uses Tenderly to diagnose the issue; determines the problem is with the modified interface for the price oracle.
  • 1:54 PM: Nick communicates on the forum and twitter that delegates should vote against the proposal. Nick and Alisha begin reaching out to large delegates to make them aware of the issue before voting.

Lessons Learned

  • TNL needs better release management for the ens-contracts repository. The default branch and published packages must reflect the current state of contracts onchain, with only changes that do not change the generated code (such as comments, formatting, and minor interface refactorings) permitted. A GitHub release should be made each time a new deployment is made to mainnet. Changes that are finalised but not yet deployed should be kept in feature branches, which are only merged to the default branch when the new contract is deployed to mainnet.
    Because the current state of the master branch on GitHub includes several unreleased changes that cannot be backed out to a clean branch that reflects the current chain state, TNL will adopt this for all future features from here on.
  • TNL needs a better release/acceptance testing process; unit and integration tests assume a clean state, but the onchain state may not reflect this. All deployments should come with testing steps (manual or automated) that can be executed on a Tenderly fork of mainnet before proposing a vote. Votes should only be proposed after these tests have been executed successfully against a mainnet fork.
  • Executable proposals need better release management. "TBD"s such as account addresses must be resolved before a proposal can advance from Draft to Active for a snapshot vote. All contracts should be deployed and verified before a Snapshot vote can be taken, not only before an onchain vote.
  • When a contract accepts another contract as a constructor argument or via a ‘setter’, contract authors should consider using an EIP-165 interface check to verify that it implements the correct interface before accepting it. While this check would not have permitted this specific issue (return types are not part of function hashes), it would prevent other similar issues.
  • Anyone responsible for posting votes onchain should default to delaying the vote if steps are discovered to be incomplete or missing, rather than rushing them through to get a vote ready at a desired time.
  • OZ Governor should allow voters to change their vote after it is placed. If this issue had been detected later in the vote, with many major delegates having already voted, it may have been impossible to prevent the proposal going through.
  • Care should be exercised when combining proposals. Combining proposals can result in pressure to get an unready proposal done in time in order for the other associated votes to be advanced; it also increases the risk of an error when unrelated issues are combined into a single votes. The DAO should draw up rules outlining in which circumstances it is acceptable to combine multiple votes into one.

What went right

  • The use of Tenderly forks and simulations made it easy to detect this issue on mainnet, and the debugger made it simple to get to the root cause of the issue. Quick discovery and diagnosis was crucial in reducing the impact of the issue.
  • Major delegates were reachable and responded promptly when asked for help.
12 Likes

TNL, thanks for catching this!! Letting delegates change their votes I think should only be done if also GovernorPreventLateQuorum is implemented at the same time.

5 Likes

First, thank you and @matoken.eth for catching this early and preventing a disaster. Second, I really appreciate the transparency and accountability. It goes a really long way.

9 Likes

Thanks for the timely and informative postmortem. Bug was well spotted, aftermath was well managed, and crisis averted…just in the @nick of time :wink:

I think this is a typo. If I understand correctly, the issue was discovered shortly before the vote passed, right?

Curious, but what was the intended plan to release those changes? Is the .eth registrar controller owned by the root multisig or the dao? I’m assuming the changes were just code improvements (precluding the need for the temp-check/proposal process), but do these just sit in master until they can be bundled into bigger “on-chain vote-justifying” changes?

Were the changes deployed to Ropsten/Kovan/Rinkeby/Goerli?

Was there time-pressure involved? If so, why? I can understand why we’d want EP-5 to pass quickly, but EP-8,9,10 don’t appear to have the same degree of urgency.

If the bug was caught after the vote passed, would we have needed to go through the on-chain vote process again to revert? How quickly would that have taken? Is it possible for each vote to have some revert procedure like “functions X,Y,Z can be conditionally executed by multi-sig A,B,C within the next 5 days”?

Can you remind me why proposals are combined? Is it just to save gas? Also, I’m a bit confused why there were “social votes” for EP-8,9,10, but also an on-chain vote. What would happen if the on-chain vote doesn’t agree with the social vote?

Thanks again for seeing this through. Hope your heart rate has had a chance to return to normal :slight_smile:

2 Likes

Shortly after the vote was posted, it should say.

It’s owned by the DAO. We’ve been waiting to finish up some related features and testing before proposing the change to the DAO.

No, and that’s a good point. This may not have detected the issue anyway, however, since testnet deployments may be ahead of mainnet in other areas at times.

As described in the EP9 proposal, a number of names have already sold at the maximum premium of $100k, so we know it is too low right now. There was also a desire on my part to get this in the same batch of changes so votes could happen while peoples’ attention was on it.

We would. The whole process takes 9 days (7 days of voting + 2 days of timelock), so the gap would have been up to 9 days (less if we pipelined it - posted the new proposal while the old one was still pending)

Something like this would be possible, but would introduce more complexity and layers of indirection.

Proposals are combined to save delegates’ gas.

The social vote for an executable proposal is to gauge whether it will succeed, so we don’t waste everyone’s ether voting on a proposal that will fail. In the unlikely event the snapshot vote passes but the onchain one fails, the proposal is rejected.

1 Like

Should one of the testnets be a designated “staging environment” for ENS, with a guarantee that it’s never ahead of mainnet, except for a current proposal?

So the testnet would always be on the same code as mainnet. Then when an executable proposal is initiated (or shortly beforehand), those code changes are deployed to the testnet too. Then we’d be able to have the community vet those changes if they wish, on that testnet.

“Want to test out EP9? Go here to this staging UI deployment, switch your network to this testnet, and have at it!”

1 Like

@matoken.eth for President 2024

1 Like

We do generally try to keep all the testnets in sync with mainnet, though there inevitably has to be a couple of differences; there’s no testnet DAO, so contracts are controlled directly by an EOA. We could certainly nominate one network as a preview deployment network.

Not Goerli. Impossible to get hold of testnet ETH (unless ENS can set up its own faucet for ENS Devs and Contributors). Someone offered libi.eth $3000 USD for 100,000 Goerli ETH. That tells you the story. I have good experience with Ropsten and Rinkby though.

you should be able to mint goerli eth at https://faucet.paradigm.xyz/

1 Like

Thank you for the very well written postmortem, Git coordination can be hard, especially in a decentralized environment.

2 Likes