OP Fault Proof upgrade break op-verifier and op-gateway implementation in the EVM Gateway

For DiamondResolver, which controls the CCIP Resolver on the Ethereum mainnet, we propose implementing a Timelock with a minimum duration of three days and granting a veto role to trusted parties, such as the ENS DAO, ENS Working Groups, and even trusted individuals—for instance, nick.eth—for rapid response purposes.

This can be done by using openzeppelin-contracts/contracts/governance/TimelockController.sol at release-v5.0 · OpenZeppelin/openzeppelin-contracts · GitHub

We also hope to receive more timely notifications from Optimism regarding upgrades, instead of being informed just a few days before they occur.


For the Opti.Domains registry component that is deployed across every L2 chain, we cannot implement a timelock since upgrades are approved off-chain by signing digest from EOA wallets without a fixed chain ID. In this scenario, it may be necessary to establish a security council comprised of trusted parties and requires a signature threshold to approve the upgrades and deployments.

Our Diamond implementation is based on Solidstate, available at Solidstate’s GitHub repository. While the DiamondReadable still contains loupe functions, these are not utilized in the function lookup process. To comply with the minimal requirements for a Diamond implementation, it is only necessary to extend the Proxy and accurately implement the _getImplementation function (or a similar function). This function must return the correct implementation for each function signature.

While this is certainly better than it being owned by an EOA, I’d argue that a much better solution is to make resolvers immutable, and require user consent to upgrade!

There are options that prioritize this - for example, deploying a lightweight proxy for each user, that can be updated to point at the latest implementation when a user wants to upgrade.

It would be difficult and expensive for PR people to upgrade due to the high gas fees. People in crypto usually don’t take action without incentive. It’s impossible to give incentives for every upgrade… Additionally, while our dependency, Optimism, is upgradeable, this could potentially halt the resolution of new records. Requiring users to pay gas fees again would result in a bad user experience.

We are also trying to minimize the number of transactions required on the Ethereum mainnet to just one, in order to maximize the adoption rate.

1 Like

For the CCIP Resolver, users have the option to opt out of automatic upgrades by setting their resolver to the public resolver facet implementation contract instead of the diamond.

A hidden feature of our diamond structure allows users to clone it, deploying a lightweight proxy that points to the diamond implementation. Users can later replace all facets and remove the fallback address to opt out of automatic upgrades. Related code can be found at modular-ens-contracts/src/current/diamond/DiamondCloneFactory.sol at master · Opti-domains/modular-ens-contracts · GitHub

Even though Ethereum secures 50B+$ in TVL, it also doesn’t let users choose whether to opt-in to PoS merge or not. So, I don’t think having upgradeable is that bad.

It’d be a relatively cheap transaction - cheaper than initial setup.

The upgrade itself is the incentive. If they don’t need the new features, there’s no reason they need to upgrade.

On the contrary, a hard fork requires consent of all economic participants. You can choose not to run it, and to fork off into a PoW chain along with others who don’t agree.

Could you describe exactly how this veto role would work?

Interesting, I didn’t know the name for this concept. OffchainTunnel is ERC-2535 + ERC-3668 + TOR protocol + an on-chain registrar for facets.


Just riffing on this a bit: the PR upgrade issue is related to the fact that it’s doing (2) things, that probably should be separate. Similar to how off-chain names work: a resolver is a router that describes where its storage is located. For the PR, it indicates (1) “my data is on-chain” and (2) “my name is Raffy”.

For an owned name, like raffy.eth, I should be able to write to my own storage whenever I want, regardless of what the registry says (or wrapper says.) It’s my personal storage for names that point to it. Storage contracts don’t need upgraded as they’re just dumb-wrappers around sload/sstore.

If I set the PR as my resolver, it’s job should be to route requests for my name to my storage (since I’m the owner). So it’s not really a PublicResolver, it’s an OwnedOnchainResolver, which is just an trampoline between ENS and storage. Managed names (where storage != owner) and persistent names (where storage survives transfer) are 2 additional resolver types, that involve more complex routing or more complex storage.

This timelock implementation features 3 roles:

  • PROPOSER_ROLE - Queues transactions in the timelock.
  • EXECUTOR_ROLE - Executes the proposed transactions.
  • CANCELLER_ROLE - Vetoes transactions by removing them from the queue.

Upon deployment, we will set our multisig as the proposer and executor. Additionally, each EOA signer of our multisig will be designated as an executor, and the deployer’s wallet will be set as an admin.

We will then use grantRole (implemented in AccessControl) to add the initial cancellers who are responsible for vetoing the transactions.

After that, the admin role will be transferred to the Timelock contract itself to safeguard the grantRole function under the Timelock mechanism.

If a malicious transaction is queued, the cancellers can remove that transaction from the queue immediately through the cancel(bytes32 id) function. Only one canceller is needed to veto the transaction.

Okay. If any of the Working Groups is to oversee this, it should be the Meta-Governance Working Group. Please note, this is neither an endorsement nor a rejection of your proposal. It’s simply a personal suggestion I’m putting forward should the DAO choose to implement this solution.

Disagree; this is in Ecosystem’s purview, because it’s not concerned with the governance of the DAO.

Yes, it should be in the Ecosystem group. Nonetheless, our resolver can operate autonomously without the need for governance voting by letting users set our resolver for their domain names, similar to what others (such as NameSys and Namestone) are doing.

However, if a large number of users adopt our system, it would be beneficial to integrate a special mechanism into the official ENS UI to handle L1 resolver storage more efficiently for users using our resolver. Remember, our resolver includes a fallback mechanism, allowing records that require security to be set on L1.

Regarding concerns about upgradeability, I’m considering ensuring our resolver always prioritizes resolving L1 records in official public resolvers. This approach means that records that require security, such as wallet addresses, can be securely stored on L1. Meanwhile, L2 records, which we plan to use for social profiles, do not currently have on-chain use cases and thus do not require the same level of security.

How does this sound?

world would be much better place if devs reused on-chain libraries instead of redeploying same codes again and again… :vulcan_salute:

/Some side thought on upgradable contracts… Namesys/ccip2.eth’s current offchain resolver is “secure by design”, *I’m not so sure if same design pattern will work for L2 state/storage proofs as we need more R&D to find possible loopholes in gateways and upgradable resolver…

We have upgradable libs/parts as offchain gateway manager contracts allowing only half upgradability, it’s built such that we can upgrade our gateway manager but we can’t break contracts to resolve bad records.

a) all records are signed by owner/manager of domain.eth
b) we can upgrade/break resolver by changing gateway manager rules but we can’t maliciously resolve records that’s not signed by current manager or without signed approval of domain.eth’s manager.

For state/storage proofs there should be more R&D exploring possible scenarios for MITM in case of rogue gateways and malicious resolver upgrades.

An on-chain resolver may approve third parties for setting resolver records on behalf of the user. This prevents us from leveraging these properties.

From my POV, all of the trust is split between the contract that verifies the proofs and the contract that uses the verified data. Assuming those are functional, a rogue gateway can only censor (deny) your data. This is a problem when you’re not in the direct control of the choice of gateway, however this can be alleviated by having multiple gateways. It seems relatively straightforward to ensure verifier correctness.

To support trustless gateways, EIP-3668 should be upgraded to support a new callback-originating revert that indicates the next endpoint should be used: eg. contract → revert OffchainLookup() → endpoint #1 → callback() → revert OffchainNext() → endpoint #2… This allows a verifier to reject a bunk proof without aborting the CCIP-Read session. As long as (1) gateway is functioning, you only experience increased latency. There also should be requirement to shuffle the endpoint set. AFAICT, all of the power is currently held by the gateway and expressed via the response status code.

I think the default is to be extremely skeptical of any contract that uses gateway data. You must see the source code to validate what’s between the verifier and the final callback. This implies that the best solutions (gateway + L1 verifier + L1 resolver + L2 storage) should be monolithic in the sense they provide a complete ENS solution. Having many adhoc contracts that use the same gateway is a lot of surface area.

I say L1+L2 because I think the best solutions are of this type until we have ZK solutions, since a storage proof is useless without DA and L2s will likely be the second best fine-grained persistent storage after Ethereum itself.

Related to my post about MerkleResolver and your stuff with IPFS using stored signatures, I think there’s lots of unexplored territory in terms of how data is stored on L2 and what trade-offs can be made to reduce proof and gateway complexity. Although a general gateway should have slot-level resolution to support arbitrary contracts, keyed bytearrays are probably all you need for ENS. Additionally, it’s probably better to have a complex storage contract that maintains a bunch of internal checks, than a dumb storage contract that relies on a swiss-cheese proofs.

Chained CCIP-Reads are already supported!

Correct, but in a trustless gateway setup, a rogue gateway can provide a variety of status codes that will terminate the CCIP-Read session.

Shuffling the endpoints lets you dodge the rogue gateway via retry.

However, the callback function should be able to signal that the response was unacceptable and that further endpoints should be tried.

An additional OffchainLookup revert during the callback would be insufficient as the protocol doesn’t reveal the state of the endpoint iterator.

Instead of OffchainNext, it could just be an overload of OffchainLookup, where the endpoint array contains some signal value, like ["$next"]. This would be backwards-compatible as it would just appear like an error to unaware clients.


For open gateways that provide arbitrary storage proofs following some standard, a different solution would be the ability to identify these types of interchangeable endpoints and substitute them with client-side choices.


Edit: If a data URL was a valid CCIP endpoint (maybe it is?), then the contract could do it’s own iteration and randomization by providing pairs for each actual endpoint: [url_i, "data:text/json,{"data":...}]. If url_i fails, the data URL will succeed, and the next URL can be tried: [url_i+1, "data..."]. If url_i succeeds but is a faux response, the callback can just revert again with [url_i+1, "data:..."].

Ah, I see what you mean. Yes, some way for the contract to canonically indicate that the response was unacceptable would be ideal here.

Instead of recursive CCIP-read we can also use multiple gateways as multisig…

0 > start with reverting OffchainLookup with random list of gateways,

eg. we’ve 3 gateways.
1st gateway will read records data from DB/L2 then call another gateway for signature.

  • if 1st gateway fails, ccip-read will auto fallback to next gateway in lookup array

2nd gateway will read from their DB/L2 endpoints and reply 1st gateway with signed data.

  • if 2nd gateway fails 1st gateway can request 3rd gateway

1st gateway can pack their data + signature with 2nd gateway’s signature in {data:…} so resolver contract can check if same data is signed by 2/3 of unique gateways.

we’re testing around basic multisig gateways for .sol+ENS integration, we’ve dev works paused for now as we need proper ccip-write specs for 2-way read-write.

** still WIP, not tested codes

2 Likes

I created a version of this that works pretty well: resolveworks/OffchainNext.sol

It’s a contract base which has a helper function offchainLookup() that invokes revert OffchainLookup with some additional logic:

  • It “randomizes” using the current block.number and tries pairs [uri_i, data:...].
  • If the response is 200, it calls the original callback, and if that reverts with an error that satisfies _shouldTryNext() -> bool, it will try the next endpoint until all endpoints are tried, then revert OffchainEOL.
  • By default, _shouldTryNext() returns true for revert OffchainNext().

It has a blocksmith test, where I give it 5 endpoints:

  1. a random website
  2. server that signs
  3. server that throws
  4. server that returns the wrong answer
  5. server that returns the right answer

I call the example CCIP-Read function in a loop and it succeeds 100% of the time, with a randomized order.

3 Likes

adraffy/CCIPRewriter.sol is a resolver that modifies how name is resolved when input as [name].[basename]. This is very similar to the XOR where [name].[basename] resolves name via ENSIP-10 and then calls the non-ENSIP-10 equivalents instead (ie. an exclusively on-chain version of the name.)

For CCIPRewriter, [name].{base32(endpoint)}.[basename] resolves name via ENSIP-10 and then wraps OffchainLookup() and rewrite urls = [endpoint]. An on-chain name or a name that doesn’t revert is unaffected.

Two obvious use-cases: debugging (rewrite to localhost) and privacy (rewrite to local gateway.)

Example:

I had this behavior available in my resolver (using debugger → custom ccip rewrite) and found it very useful but this technique makes it use-able anywhere ENSIP-10 is supported.


Edit: since gas was 6 gwei, I deployed a version to mainnet at ccipr.eth

2 Likes