OP Fault Proof upgrade break op-verifier and op-gateway implementation in the EVM Gateway

TLDR

On March 19, 2024, Optimism initiated a fault-proof upgrade on the Optimism Sepolia network that broke the functionality of the op-verifier and op-gateway due to the replacement of L2OutputOracle by DisputeGameFactory.

We are currently developing Opti.domains and are eager to contribute to developing and maintaining the Optimism verifier and gateway implementations. We are implementing an upgrade to the op-verifier and op-gateway to support DisputeGameFactory and have opened a pull request on the repository.


On March 19, 2024, Optimism initiated a significant upgrade on the Optimism Sepolia network to introduce a fault-proof system. Unfortunately, this upgrade broke the functionality of the op-verifier and op-gateway components of the ENS EVM Gateway. Below is an overview of the upgrade’s key points.

The L2OutputOracle is being entirely removed and replaced by the OptimismPortal and DisputeGameFactory. The L2OutputOracle smart contract is currently used by the trusted Proposer role to store L2 state output proposals. Presently, developers use these outputs to prove that their withdrawals actually happened on L2. But with fault proofs, developers will have to change how their client software proves withdrawals in the first step of the two-step withdrawal process.

You can see proof on Etherscan that L2OutputOracle has ceased on Optimism Sepolia: https://sepolia.etherscan.io/address/0x90E9c4f8a994a250F6aEfd61CAFb4F2e895D458F

Despite the lack of documentation on upgrading from L2OutputOracle to DisputeGameFactory, we have reverse-engineered the Optimism code and implemented the necessary upgrades to the op-verifier and op-gateway in this pull request. Here is a summary of the changes introduced:

L2OutputOracle DisputeGameFactory
l2OutputIndex disputeGameIndex
getL2Output(…) gameAtIndex(…)
outputRoot gameProxy.rootClaim().raw()
latestOutputIndex() gameCount() + findLatestGames(…)
outputProposal.l2BlockNumber disputeGame.l2BlockNumber()

To facilitate a smooth transition, it’s crucial to incorporate an activation point system into the op-verifier smart contract. Moreover, to ensure our gateway remains adaptable to future updates, the op-verifier contract should be designed with an upgradeable pattern. However, it’s been observed that the EVM Gateway is intended to be immutable, leading us to forego these enhancements for the EVM Gateway. Should you be interested in exploring the implementation of an activation point or an upgradeable smart contract, please let us know.

We have already applied the upgradeable design pattern to all contracts within our Opti.Domains CCIP Gateway and plan to introduce an activation point to our Opti.Domains CCIP Gateway shortly. This approach will enable us to upgrade without disrupting the user experience.

The source code for Opti.Domains CCIP Gateway is available at:

The Opti.Domains team is eager to continue contributing to developing and maintaining the Optimism verifier and gateway implementations.

The integration of the fault-proof system is scheduled to merge into the Optimism Mainnet in Q3 2024.

We also have a plan to open a temp check on scaling ENS to Optimism and generating revenue for ENS DAO early next week.

We would like to work with ENS DAO and hope to receive a warm welcome into the ENS family.

Feel free to comment below if you have any comments.

4 Likes

This is a very welcome contribution - thank you! We haven’t received a lot of support from the Optimism team on this upgrade, and with short notice and limited engineering resources to dedicate to it, it was a significant issue for us. We very much appreciate you putting this together on such short notice. I’ll review the PR and provide feedback early next week.

Is there a technical outline of what Opti.domains is implementing or planning to implement?

My project is very uncut, but your tests expressed in blocksmith.js would be just a few lines. My example CCIP test for OffchainTunnel—I think this is related to your DiamondResolver?

1 Like

Opti.domains was initially created to explore cross-chain domains that simultaneously exist across multiple blockchain networks, beginning with the Optimism Superchain. This venture benefits from a unique advantage due to our smaller user base, which reduces the risks associated with research and development in this innovative domain.

To address specific challenges identified in the Ethereum Name Service (ENS) Resolver, we developed the DiamondResolver. This innovation was guided by ERC-2535: Diamonds, Multi-Facet Proxy to overcome three major issues:

  1. The issue of lost records whenever users migrate to a newer version of the resolver, which necessitates a vast amount of gas fees to re-establish these records.
  2. Users often don’t update to the newest resolver, which makes it hard to give them new features. With the Diamond pattern and our diamond inheritance technology, we can push upgrades to all resolvers at once without intervening the users.
  3. The requirement for developers to inherit every method from the public resolver to create their custom resolver. Ideally, developers should have the flexibility to customize and integrate only the methods they require.

Following the Optimism Governance’s voting on Scale ENS to Optimism mission, we have embarked on designing and implementing our CCIP gateway. This initiative aims to connect ENS holders from the Ethereum Mainnet to Optimism, allowing them to set ENS records on Optimism and benefit from a cheap gas fee. Compared to the off-chain approach, the layer 2 approach allows us to drive incentives more easily.

Our CCIP Gateway makes it easy for users to switch to our layer 2 solution without noticing any differences. This ensures that even older dApps can work with our resolver, even if they don’t specifically support CCIP. We’ve added these important features to our layer 1 CCIP resolver:

  1. Fallback for Reading Resolvers: If a user has set a record in an official ENS resolver, it will be directly accessed from that resolver instead of through the CCIP gateway. This allows older dApps that don’t support CCIP to keep working as usual.
  2. Fallback for Writing Resolvers: Users can still use the ENS UI to set records on L1, though this comes with extra gas costs. However, if the ENS UI incorporates our specific adjustments, it could eliminate these extra costs.
  3. Upgradability with DiamondResolver: This takes advantage of the features mentioned above, allowing us to update our resolver in step with Optimism’s developments safely.
  4. Unified Resolver Standard: We’ve adopted a single standard for all resolver types, ensuring that one codebase can work across any resolver. Developers only need to create their resolver for L2 and can then extend it to L1 with minimal additional coding.

Regarding your blocksmith.js project, we are experiencing an issue with the CCIP gateway integration test when using Foundry, as the gateway is implemented in TypeScript. Therefore, we are considering using your project for the integration test. Thank you for the introduction!

1 Like

We were using SURL for our ccip resolver & gateways test but had to settle for vm.prank and hardcoded test results as our ccip gateways are using fully offchain IPNS/IPFS storage signed by domain manager instead of using web2/ccip gateways in the middle.

I’m huge fan of ERC2535, still it’s hard to explain those diamond industry jargons to other devs :sweat_smile: I see you’re even skipping all erc2535 loupe & facets lookup and straight using basic facets wrapped in old proxy storage slots to make it backwards compatible with block explorers.

We’re playing with our own jargon free version for future, had to skip erc2535 for our old ccip resolver to save deployment costs… GitHub - namesys-eth/carbon-2535: Simplified, industrial jargon free implementation of EIP2535 (Diamond Standard)

This part is true & understandable… we’re already having these issues, so we’ve have to add extra backwards compatibility in contracts & UI to deploy our new v2 ccip resolvers with extra features. Only downside being we’re leaving immutable contracts for multisig upgradable contracts.

3 Likes

What prevents you or anyone who compromises your keys from using this to replace the resolver with a malicious one?

For DiamondResolver, which controls the CCIP Resolver on the Ethereum mainnet, we propose implementing a Timelock with a minimum duration of three days and granting a veto role to trusted parties, such as the ENS DAO, ENS Working Groups, and even trusted individuals—for instance, nick.eth—for rapid response purposes.

This can be done by using openzeppelin-contracts/contracts/governance/TimelockController.sol at release-v5.0 · OpenZeppelin/openzeppelin-contracts · GitHub

We also hope to receive more timely notifications from Optimism regarding upgrades, instead of being informed just a few days before they occur.


For the Opti.Domains registry component that is deployed across every L2 chain, we cannot implement a timelock since upgrades are approved off-chain by signing digest from EOA wallets without a fixed chain ID. In this scenario, it may be necessary to establish a security council comprised of trusted parties and requires a signature threshold to approve the upgrades and deployments.

Our Diamond implementation is based on Solidstate, available at Solidstate’s GitHub repository. While the DiamondReadable still contains loupe functions, these are not utilized in the function lookup process. To comply with the minimal requirements for a Diamond implementation, it is only necessary to extend the Proxy and accurately implement the _getImplementation function (or a similar function). This function must return the correct implementation for each function signature.

While this is certainly better than it being owned by an EOA, I’d argue that a much better solution is to make resolvers immutable, and require user consent to upgrade!

There are options that prioritize this - for example, deploying a lightweight proxy for each user, that can be updated to point at the latest implementation when a user wants to upgrade.

It would be difficult and expensive for PR people to upgrade due to the high gas fees. People in crypto usually don’t take action without incentive. It’s impossible to give incentives for every upgrade… Additionally, while our dependency, Optimism, is upgradeable, this could potentially halt the resolution of new records. Requiring users to pay gas fees again would result in a bad user experience.

We are also trying to minimize the number of transactions required on the Ethereum mainnet to just one, in order to maximize the adoption rate.

1 Like

For the CCIP Resolver, users have the option to opt out of automatic upgrades by setting their resolver to the public resolver facet implementation contract instead of the diamond.

A hidden feature of our diamond structure allows users to clone it, deploying a lightweight proxy that points to the diamond implementation. Users can later replace all facets and remove the fallback address to opt out of automatic upgrades. Related code can be found at modular-ens-contracts/src/current/diamond/DiamondCloneFactory.sol at master · Opti-domains/modular-ens-contracts · GitHub

Even though Ethereum secures 50B+$ in TVL, it also doesn’t let users choose whether to opt-in to PoS merge or not. So, I don’t think having upgradeable is that bad.

It’d be a relatively cheap transaction - cheaper than initial setup.

The upgrade itself is the incentive. If they don’t need the new features, there’s no reason they need to upgrade.

On the contrary, a hard fork requires consent of all economic participants. You can choose not to run it, and to fork off into a PoW chain along with others who don’t agree.

Could you describe exactly how this veto role would work?

Interesting, I didn’t know the name for this concept. OffchainTunnel is ERC-2535 + ERC-3668 + TOR protocol + an on-chain registrar for facets.


Just riffing on this a bit: the PR upgrade issue is related to the fact that it’s doing (2) things, that probably should be separate. Similar to how off-chain names work: a resolver is a router that describes where its storage is located. For the PR, it indicates (1) “my data is on-chain” and (2) “my name is Raffy”.

For an owned name, like raffy.eth, I should be able to write to my own storage whenever I want, regardless of what the registry says (or wrapper says.) It’s my personal storage for names that point to it. Storage contracts don’t need upgraded as they’re just dumb-wrappers around sload/sstore.

If I set the PR as my resolver, it’s job should be to route requests for my name to my storage (since I’m the owner). So it’s not really a PublicResolver, it’s an OwnedOnchainResolver, which is just an trampoline between ENS and storage. Managed names (where storage != owner) and persistent names (where storage survives transfer) are 2 additional resolver types, that involve more complex routing or more complex storage.

This timelock implementation features 3 roles:

  • PROPOSER_ROLE - Queues transactions in the timelock.
  • EXECUTOR_ROLE - Executes the proposed transactions.
  • CANCELLER_ROLE - Vetoes transactions by removing them from the queue.

Upon deployment, we will set our multisig as the proposer and executor. Additionally, each EOA signer of our multisig will be designated as an executor, and the deployer’s wallet will be set as an admin.

We will then use grantRole (implemented in AccessControl) to add the initial cancellers who are responsible for vetoing the transactions.

After that, the admin role will be transferred to the Timelock contract itself to safeguard the grantRole function under the Timelock mechanism.

If a malicious transaction is queued, the cancellers can remove that transaction from the queue immediately through the cancel(bytes32 id) function. Only one canceller is needed to veto the transaction.

Okay. If any of the Working Groups is to oversee this, it should be the Meta-Governance Working Group. Please note, this is neither an endorsement nor a rejection of your proposal. It’s simply a personal suggestion I’m putting forward should the DAO choose to implement this solution.

Disagree; this is in Ecosystem’s purview, because it’s not concerned with the governance of the DAO.

Yes, it should be in the Ecosystem group. Nonetheless, our resolver can operate autonomously without the need for governance voting by letting users set our resolver for their domain names, similar to what others (such as NameSys and Namestone) are doing.

However, if a large number of users adopt our system, it would be beneficial to integrate a special mechanism into the official ENS UI to handle L1 resolver storage more efficiently for users using our resolver. Remember, our resolver includes a fallback mechanism, allowing records that require security to be set on L1.

Regarding concerns about upgradeability, I’m considering ensuring our resolver always prioritizes resolving L1 records in official public resolvers. This approach means that records that require security, such as wallet addresses, can be securely stored on L1. Meanwhile, L2 records, which we plan to use for social profiles, do not currently have on-chain use cases and thus do not require the same level of security.

How does this sound?

world would be much better place if devs reused on-chain libraries instead of redeploying same codes again and again… :vulcan_salute:

/Some side thought on upgradable contracts… Namesys/ccip2.eth’s current offchain resolver is “secure by design”, *I’m not so sure if same design pattern will work for L2 state/storage proofs as we need more R&D to find possible loopholes in gateways and upgradable resolver…

We have upgradable libs/parts as offchain gateway manager contracts allowing only half upgradability, it’s built such that we can upgrade our gateway manager but we can’t break contracts to resolve bad records.

a) all records are signed by owner/manager of domain.eth
b) we can upgrade/break resolver by changing gateway manager rules but we can’t maliciously resolve records that’s not signed by current manager or without signed approval of domain.eth’s manager.

For state/storage proofs there should be more R&D exploring possible scenarios for MITM in case of rogue gateways and malicious resolver upgrades.

An on-chain resolver may approve third parties for setting resolver records on behalf of the user. This prevents us from leveraging these properties.

From my POV, all of the trust is split between the contract that verifies the proofs and the contract that uses the verified data. Assuming those are functional, a rogue gateway can only censor (deny) your data. This is a problem when you’re not in the direct control of the choice of gateway, however this can be alleviated by having multiple gateways. It seems relatively straightforward to ensure verifier correctness.

To support trustless gateways, EIP-3668 should be upgraded to support a new callback-originating revert that indicates the next endpoint should be used: eg. contract → revert OffchainLookup() → endpoint #1 → callback() → revert OffchainNext() → endpoint #2… This allows a verifier to reject a bunk proof without aborting the CCIP-Read session. As long as (1) gateway is functioning, you only experience increased latency. There also should be requirement to shuffle the endpoint set. AFAICT, all of the power is currently held by the gateway and expressed via the response status code.

I think the default is to be extremely skeptical of any contract that uses gateway data. You must see the source code to validate what’s between the verifier and the final callback. This implies that the best solutions (gateway + L1 verifier + L1 resolver + L2 storage) should be monolithic in the sense they provide a complete ENS solution. Having many adhoc contracts that use the same gateway is a lot of surface area.

I say L1+L2 because I think the best solutions are of this type until we have ZK solutions, since a storage proof is useless without DA and L2s will likely be the second best fine-grained persistent storage after Ethereum itself.

Related to my post about MerkleResolver and your stuff with IPFS using stored signatures, I think there’s lots of unexplored territory in terms of how data is stored on L2 and what trade-offs can be made to reduce proof and gateway complexity. Although a general gateway should have slot-level resolution to support arbitrary contracts, keyed bytearrays are probably all you need for ENS. Additionally, it’s probably better to have a complex storage contract that maintains a bunch of internal checks, than a dumb storage contract that relies on a swiss-cheese proofs.