ERC-3668 edge case for clientside http/ssl errors

raffy · October 23, 2024, 5:11am

I agree, and brought up same point with ricmoo.

But if we update the client to fix 4XX and invalid 2XX response, we need to ensure a JSON data-url is a valid endpoint, since OffchainTryNext() and OffchainLookupUnanswered() can be emulated using data-url and using recursive CCIP-Read w/pairs of endpoints:

endpoints = [a, b, c]

revert OffchainLookup w/ [a, data:0x]
   if response == 0x or invalid, revert OffchainLookup w/ [b, data:0x]
       if response == 0x or invalid, revert OffchainLookup w/[c, data: 0x]
           if response == 0x or invalid, <No Answer>

My thinking was just that if we update the client, might as well make the spec better.

0xc0de4c0ffee · October 23, 2024, 5:19am

I mean it’s old “locked” ERC, if we can update more than basic fix I’d like to push for direct ipfs://.., ipns://.., bzz://.. support as it’s limited to http/s only.

it’s already possible to do that with current specs, we shouldn’t add all possible iteration in specs even if it’s possible to update old ERCs. It’s up to devs how they want to use that callback as fallback.
gateways = [ https://gateway1/…, datauri:hex(gateway1 failed)]
callback >> retry >> fallback to gateway 2.

nick.eth · October 29, 2024, 1:40pm

It’s difficult to evaluate an entirely new implementation against the current one. Can you describe how it’s different, and why the existing one can’t have PRs instead? Tagging @taytems .

No general objections to this as a new EIP extending 3668. Forwards-compatibiltiy seems like an issue here: contracts may avoid using the new functionality because it will break legacy 3668-only clients, and instead opt for degraded support that they know won’t break things. Perhaps some method could be incorporated that lets the client signal to the contract that it supports this new EIP?

How do you send an error to a contract?

How responses are verified is entirely up to implementations; that’s not specified by 3668.

How would the callback know it was being called for a 4xx? Intuitively this seems bad.

raffy · October 29, 2024, 6:02pm

I only wrote it as a testbed for CCIP-Read, since its a complex example of wrapping and potentially recursive calls, but it turned out as a pretty nice implementation.

Initially I was thinking first URL, but after brainstorming with Premm a bit, the last URL could be a signal value (indicating the new feature set) and that would be maximally backwards compatible without requiring dataurl support.

Since callback(bytes response, bytes extraData) expects calldata, response can be supplied as just bytes4(selector) to indicate an error.

A successful call to f() returns (bytes4) via CCIP-Read would return 0xXXXXXXXX0000.....0000

I think the question is the difference between a CCIP-Read server that emits a 4XX and the contract is never told about the problem, only the client. And one that will pass {data: "0x..."} to the callback regardless of status code. For example, a 404 could supply {data: "0xc5723b51"} which is error NotFound().

I don’t think this feature is necessary, it just seems more useful, since the new protocol would effectively silence the error, as it will just try the next endpoint rather than killing the session.

For new stuff: since the contract decides if it accepts the request or wants another response, the only issue I can think of would be some kind of HTTP middleware throwing an error between the server and the client, and responding with 4XX with malicious {data: "0x..."} that the contract somehow accepts.

For backwards compat, nearly every contract expects the response to ABI encoded, so bytes.length % 32 > 0 will likely revert.

taytems · October 30, 2024, 12:00am

The general idea here seems pretty useful in allowing the contract to be aware of and control the response in the case of an error. I’ve run into the issue before (specifically with the UniversalResolver sender).

I’d note that the proposed solution would make it harder to debug a faulty endpoint since an HTTP error no longer becomes the source of the error to throw, but is instead given to the contract’s resulted handling of OffchainLookupUnanswered(). That isn’t crucial though and could be fixed with better logging/etc.

I don’t really see how this is an improvement over the pending UniversalResolver v3 changes. The implementation is more simple yes, but it also provides a significantly smaller feature set without any of the existing edge case handling.

0xc0de4c0ffee · October 30, 2024, 4:16am

This is clearly bug in ERC3668 specs…

Possible direct bug fix this in erc3668.

We’re working on new erc7700 draft to wrap all types of crosschain & dweb storage providers, @NameSys will push that new version soon.
Basic outline is to wrap everything with single new revert selector, eg.(bytes4(erc7700.selector)+bytes4(erc3668. selector)+…data). Adding proper interface checks for read, write & callback usages, auto retry next gateway after 4xx & 5xx and support ipfs:// bzz:// ipns:// types in gateway *if 3668 revert data is wrapped in 7700 selector.

nick.eth · November 6, 2024, 4:46pm

I don’t think this is true? The callback expects the response from the CCIP-read function, which will not usually be calldata.

Sending errors to contracts seems like a very odd change in the usual flow, and not something I would expect to happen, too.

I think this amounts to ignoring the status code of the response as long as it contains a valid body? I think that’s a bad idea.

raffy · November 7, 2024, 7:28pm

Okay, not calldata, but an abi-encoded response, which should be word-aligned. My repo implements this solution and I don’t see any issue disambiguating a response from an error. It’s the same as a returning an error over multicall as we’ve discussed previously.

My design only needs to encode 1 signal value: OffchainLookupUnanswered(). If I can’t assume this is word-aligned, there are other ways of communicating this information to the contract (see below).

It’s perfectly fine if it doesn’t do this, but that just means that either the status 2XX or the response is ignored. Servers would instead indicate an error to the contract by responding 200. These are equivalent to me.

I just think that the contract should dictate which response is accepted and ultimately decide what to do if no response is sufficient.

Recall, with this change, the contract’s callback:

can revert OffchainTryNext(sender) to reject a response w/o terminating the OffchainLookup session.
may be supplied bytes that are not word-aligned (this can already happen.)
will be supplied OffchainLookupUnanswered() if no response is accepted

Taking your feedback, 2 and 3 could be replaced with: if the hash of the response bytes are exactly 0x... then the CCIP-Read client is indicating OffchainLookupUnanswered(). Ideally, that response would be less than 32 bytes, causing any abi.decode to blow up, and therefore be backwards-compatible (since any existing CCIP-Read session that got no response is currently fatal.)

nick.eth · November 8, 2024, 3:00pm

Okay. Still not a fan, though I see your point. It certainly shouldn’t do this unless it can detect that the contract definitely implements this new standard, though.

I’m not really sure what you mean by this.

Why not have 400s and 500s return an error in the same way you document above, with the response embedded?

raffy · November 8, 2024, 8:55pm

This also seems fine but I’m not sure what the contract would do with a human-readable error.

For an error, I see 3 possible responses:

400 “bad request”`
400: {"data": "0x..."} — my suggestion: just ignore status code and propagate whatever it sent, let the contract decide
200: {"data": "0x..."} — your suggestion (I thought): standard abi-encoded response that encodes there was an error, where the contract can decode it, but the HTTP layer isn’t aware of it

My thinking was, unless the server returns {"data": "0x...."} the contract can’t really digest it. It could wrap it in Error(string) but what’s the contract going to do with it?

Currently, if it’s 4XX, the 3668 session just terminates. Instead of silently ignoring the error, it seemed useful (if it was properly formatted) to relay that information to the contract.

I’m not against an HTTP status code wrapper, if that’s what you’re suggesting, it just seems like if the server is functioning, it should be relaying errors via {"data": "0x..."} rather than status code. If the server isn’t functioning, likely the error (and body) aren’t usable by the contract.

For example, with a trusted gateway, if the error isn’t signed, why would you trust it? And for an untrusted gateway, why trust anything (including the status code) about the response?

nick.eth · November 11, 2024, 12:56pm

I am not a fan of any solution that involves ignoring the status code, or sending 200s when it’s actually an error.

What I was suggesting is that if you want to be able to handle errors in the contract, you can pass them in as error objects containing the data decoded from the response (if data could be decoded from the response).

What’s the alternative to trusting the error? The process can’t proceed further without a valid response.

0xc0de4c0ffee · November 11, 2024, 1:13pm

while (req.status != 200) {
    retryNext()
}

Both 4XX & 5XX errors should auto fallback to next gateway in list.
with current specs if first gateways is throwing 4xx, it’ll stop whole lookup process.
this is more important as we’re selecting random gateways / shuffling gateway list.

& we also need that if we want to add data:application/json;charset=utf-8,%7B%22data%22%3A%220xfffffffff%22%7D data uris as 2nd gateway for callback to indicate that 1st gateway is throwing 4xx/5xx.

raffy · November 11, 2024, 5:54pm

The feature I want is to allow untrusted/unreliable gateways in the gateway set.

To allow that, the arbiter of truth has to be the contract, since a malicious gateway can just fake its response and status code.

This implies that gateway errors at the HTTP level are meaningless to the protocol, so I concluded the status code can just be ignored, and all that matters is obtaining a correctly encoded response that the contract accepts, otherwise the call is a failure.

I say “ignore” because it seems like nice developer UX if gateway errors also set their status code:

If a signing key is involved, errors should be signed. Any HTTP failure simply moves on to the next endpoint.

For a proving gateway, the appropriate proofs need supplied even in the failure case, otherwise, why trust it? Linea’s gateway suffers from this problem: their gateway can claim any storage value doesn’t exist and their verifier accepts it without proof.

nick.eth · November 13, 2024, 10:38am

Ignoring HTTP status codes is generally a bad practice, though. I’m not sure why you object to just passing them through - along with response data - to the contract and letting it decide?

Right, but you still have to handle the case where an error response is returned without valid proof data.

raffy · November 13, 2024, 11:42pm

That’s ultimately handled by OffchainLookupUnanswered() — no gateway provided response that the contract accepted.

(Note: this is equivalent to having the last endpoint being a data URL where the payload is some unique value the contract can detect.)

Individual errors for each gateways would come in two types:

Server errors: eg. “unknown selector”, “unknown name”, “divide by zero”, etc. which could be signed or proven up to that point
- This only works if it’s properly encoded: {"data": "0x..."}
HTTP errors: eg. “no response”, “server unavailable”

Neither of these should kill the gateway iteration unless the contract says so.

HTTP errors certainly could be wrapped with HTTPError(uint16 code, string message) or w/e but I thought you were against passing error data to the contract.

I claim the common implementation would be the following since there’s not much you can do with that error:

if bytes4(response) == HTTPError.selector) { 
    revert OffchainTryNext(address(this));
}

Unaware contracts (except for pass-through) would fail to decode that response, revert, and terminate the process, which is the current behavior.

.

nick.eth · November 14, 2024, 11:13am

I questioned the approach, but you provided justification. I’m okay with it as long as there’s some detection mechanism so that 3668-only contracts won’t be sent error data they may not understand.

raffy · April 15, 2025, 3:02am

I’d like recap on this topic as there is now a good solution to this problem that is both backwards compatible and does not require modifications to EIP-3668.

First, ENSIP-21:Batch Gateway Offchain Lookup Protocol (BGOLP) is now standard. It codifies the Batch Gateway implementation utilized by the Universal Resolver.

With that standardized, local BGOLP’s can be embedded into client-side frameworks (ethers, viem, etc.). ENSIP-21 defines x-batch-gateway:true as the special-purpose URL which indicates to an aware client that (1) the request is BGOLP and (2) the rest of the gateways can be ignored and the local gateway can handle the request.

viem just merged an update that has a built-in BGOLP. viem already uses the Univeral Resolver internally, and now when you call any ENS function in viem and don’t specific custom gateway(s), the local BGOLP handles the request automatically.

Second, at the bottom of ENSIP-21, I outlined a technique for fault tolerance. To be fully backwards compatible, you need to provide an external gateway too (to cover the case where the caller doesn’t have a local BGOLP) but now you can now speculatively evaluate offchain servers that may be offline or failing.

Third, @ensdomains/ens-contracts/contracts/ccipRead/ directory contains some useful helper code:

CCIPReader.sol makes it simple to call any CCIP-Read enabled contract.
- ccipRead() is pretty straight forward to understand, it’s just staticcall() + callback.
CCIPBatcher.sol codifies interaction with a BGOLP.
- ccipBatch() is more complicated as you provide an array of (target, call) and then ccipRead() the ccipBatch() function. There probably should be a helper function so you can just invoke ccipBatch() directly.

See the new UniversalResolver.sol or AliasResolver.sol for more examples.

It might make sense to embed a BGOLP implementation into the standard offchain server.

If this is useful for non-ENS applications, it can be turned into an EIP.