[Draft] ENSIP-17: DataURI Format in Contenthash

Hello ENS,

NameSys is proposing an extension to ENSIP-07 which introduces data:uri format in ENS Contenthash field, allowing dynamic data streaming using CCIP-Read and Wildcard Resolution. ENS Contenthash so far only allows static content by linking to decentralised hosting such as IPFS, Arweave etc. This proposed improvement will further enhance ENS domains’ utility by enabling a rich ecosystem of dynamic content in the Contenthash. Please feel free to go through the draft and ask any questions, seek clarifications, give suggestions or propose edits.

PR for this proposal lives here: [Proposal] ENSIP-17: DataURI Format in Contenthash by sshmatrix · Pull Request #165 · ensdomains/docs · GitHub


ENSIP-17: DataURI Format in Contenthash

RFC-2397 Compliant DataURI Format in Contenthash

   
Author(s) sshmatrix.eth, ethlimo.eth, freetib.eth
Status Draft
Submitted 2023-10-31

Abstract

This ENSIP introduces DataURI format in Contenthash field (ENSIP-07) for compatible ENS resolvers. DataURI format (RFC-2397) is desired and suitable for enabling dynamic dWeb content for ENS domains using on-chain and/or off-chain resources.

Motivation

ENS contenthash (ENSIP-07) currently enables linking to static content which is strictly off-chain. The off-chain content is entirely dependent on off-chain providers, and updating this content for ENS-based decentralised websites typically requires updating the on-chain contenthash explicitly (except for IPNS). ENS domains’ avatar text records and their ERC-721/-1155 interfaces already support generated DataURI bytes (data:uri) to resolve JSON and image metadata. This specification enables a similar data:uri format in ENS contenthash field, allowing ENS Resolvers to fetch and serve on-chain and/or off-chain data. The off-chain resources for the DataURI content may use CCIP-Read and an appropriate utf-8 decoder to render the encoded bytes. This specification allows complete support for dynamic data in ENS Contenthash using CCIP-Read (EIP-3886) and Wildcard Resolution (ENSIP-10).

Specification

This specification is an extension of ENSIP-07 to support in-line bytes of data conforming to the data:uri scheme (RFC-2397) as ENS Contenthash. There are no changes to be made in the current ENS Resolvers since contenthash bytes are parsed as utf-8 characters by default. Only a standardisation needs to be enacted for web3 providers to begin resolving ENS Contenthash in data:uri scheme. Simple details of the proposed standardisation are as follows:

Decoded String

  • DataURI is string-formatted according to RFC-2397:
data:<media>/<type>;<encoding>,<payload>

Encoded Bytes

  • The raw string-formatted data is returned as encoded hexadecimal bytes. The encoded value returned by DataURI-compatible contenthash is always prefixed with the 5-byte identifier 0x646174613a followed by the remaining variable encoded databytes.
stringTohex("data:")` = `0x646174613a`

Examples

Decoded String Encoded Bytes
data:text/plain;base64,SGVsbG8gV29ybGQ 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
data:text/plain,Hello World 0x646174613a746578742f706c61696e2c48656c6c6f20576f726c64
 0x646174613a696d6167652f706e673b6261736536342c6956424f5277304b47676f414141414e5355684555674141414167414141414941514d414141442b77537a4941414141426c424d5645582f2f2f2b2f76372b6a5133593541414141446b6c45515651493132503441495838454167414c6741442f614e7062744541414141415355564f524b3543594949
data:image/svg+xml,<svgxmlns='http://www.w3.org/2000/svg'height='30'width='200'><textx='0'y='15'fill='red'>IamSVG</text></svg> 0x646174613a696d6167652f7376672b786d6c2c3c737667786d6c6e733d27687474703a2f2f7777772e77332e6f72672f323030302f737667276865696768743d2733302777696474683d27323030273e3c74657874783d273027793d2731352766696c6c3d27726564273e49616d5356473c2f746578743e3c2f7376673e
data:text/xml,<?xml version='1.0'?><note>I am XML</note> 0x646174613a746578742f786d6c2c3c3f786d6c2076657273696f6e3d27312e30273f3e3c6e6f74653e4920616d20584d4c3c2f6e6f74653e
data:text/html,Hello, <div>I am HTML</div> 0x646174613a746578742f68746d6c2c48656c6c6f2c203c6469763e4920616d2048544d4c3c2f6469763e

With this simple standardisation, web3 providers may now serve data:uri content from on-chain or off-chain resources allowing dynamic content on ENS dWebsites.

Implementation

GitHub : namesys-eth/datauri-eth-resolver (Work-In-Progress)

References

[1] ENSIP-07: Contenthash Field

[2] ENSIP-10: Wildcard Resolution

[3] EIP-3668: CCIP Read: Secure Off-Chain Data Retrieval

[4] RFC-2397: The “data” URL Scheme

Copyright

Copyright and related rights waived via CC0.


7 Likes

This is the type of EIP that we love to see and support wholeheartedly :rocket:

2 Likes

contenthash has been coded as uvarint(proto) + payload... so shouldn’t this convention be followed?

0xE3 = ipfs
0xE4 = swarm
0xE5 = ipns

I guess defining proto 0x64 (d) as a DataURL would work as-is, but it wouldn’t have a known length so it’s not embeddable without an external wrapper (although I guess that’s not a requirement of multicodec.)

The following would avoid the base64 overhead:

  uvarint(0xDD) 
+ uvarint(len(mime)) + mime utf8 bytes // eg. "text/html"
+ uvarint(len(payload)) + payload bytes 

Although for simplicity, I’m a fan of bypassing the multicodec stuff (as long as the first uvarint decodes correctly) and just embedding raw utf-8 data.


Should it also support URLs for 30X redirection?

uvarint(0xDD) + uvarint(0/*url*/) + uvarint(19) + "https://ens.domains"

:pray: @raffy, thanks for feedback.

There’s plaintextv2 as hex("pla") = 0x706c61 multiaddr prefix in multicodec.

522 plaintextv2 multiaddr 0x706c61 draft

We tried 0xe2 IPLD before with PR on ens/content-hash.js then abandoned it for now to work on simpler specs… Requesting for direct CAR file/data to be included is more complicated option. As alt options we also tried some old on-chain IPFS generators in ENS resolver that’ll require external services to manually read data during ccip-read from on-chain and pin that to be semi-dynamically resolvable (*not really scalable).

So we’re proposing hex(“data:”) prefix to be simple and backwards compatible with data uris used in NFTs/avatar. eg, contenthash for 1234.hello-nft.eth can resolve bytes(tokenURI(1234)) directly as data:application/json,{...metadata}.

It’s good idea to request hex(“data:”) to be included on multicodec soon but for now we’re trying to use default/fallback profile in ens-contenthash.js

1 Like

There’s old alternative for that, uniswap.eth is still using it for years…
** really don’t recommend using old stuffs, but it works.
eg,
base16 = f0172000f6170702e756e69737761702e6f7267
base58 = 12uA8M8Ku8mHUumxHcu7uee
base32 = bafzaad3bobyc45lonfzxoylqfzxxezy

01-72-00-0f-6170702e756e69737761702e6f7267
version - libp2p - identity - length - hex(“app.uniswap.org”)

  • edit: For ENS this should be extra prefixed with namespace + length… Comes with deprecated warning. 0xe5010172000f6170702e756e69737761702e6f7267
1 Like

Delighted to see this - but as I mentioned in the PR, and @raffy observes, this definitely needs to be encoded as a valid multicodec value.

2 Likes

:pray: Thanks for the feedback!

We have looked into possible ways for this draft ENSIP to be compatible with multicodec. These are our findings in form of different implementations with and without multicodec. We are open to either implementation in the end and update this draft ENSIP as required.

A) Bypass Multicodec:

First, we’d like to point to the current state of ens/content-hash.js. When using hex("data:") = 0x646174613a as prefix, encoding doesn’t work and the decoder nearly works but it removes the first byte in the process. Please see example below,

import {encode, decode } from "@ensdomains/content-hash";
console.log(encode("data:text/plain;base64,SGVsbG8gV29ybGQ"))
//> 00000000000000000000

console.log(decode("646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751"))
//> ata:text/plain;base64,SGVsbG8gV29ybGQ

The extra 0x00 prefix identifier as a spacer/pseudo namespace could prevent any future collision with multicodec.

console.log(decode("00646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751"))
//> data:text/plain;base64,SGVsbG8gV29ybGQ

Quoting @raffy here,

This will bypass multicodec for DataURIs in Contenthash. ens/content-hash.js and any gateways/clients can easily implement this with basic checks, i.e. checking for if prefix is 0x00646174613a before encoding and decoding so that ENS clients and gateways can use DataURIs directly without leaving any room for current or future collisions with multicodec formats. This approach will be ENS specific and we can change our ENSIP draft to reflect this.

:white_check_mark: This is our preferred implementation but we are not married to it.

B) Multicodec Compatible Formats

If multicodec must be used, then we’d like to propose the following options:

1) raw data type with IPFS namespace:

IPFS namespace is compatible with DataURIs using raw data type.

import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
import * as raw from "multiformats/codecs/raw";
const utf8 = new TextEncoder();

let data = utf8.encode('data:text/plain;base64,SGVsbG8gV29ybGQ')
let cid = CID.create(1, raw.code, await identity.digest(data))

IPFS Format :

base32: bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq
base16: f01550026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash: 0xe30101550026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Namespace Version Multiaddr Multihash Length Data
ipfs 1 raw identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xe301 0x01 0x55 0x00 0x26 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

CID Inspector : https://cid.ipfs.tech/#bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq

Public Gateway : https://ipfs.io/ipfs/bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq

Since this method uses IPFS namespace, ens/content-hash.js and any compatible gateway or client must check if the encoded payload is using raw as multicodec with identity (blank) as multihash; the shorthand prefix for this is 0xe301015500. Clients can decode the remaining raw data as utf-8 string; if this data is not data:uri formatted, it should be auto-rendered as plaintext for correctly formatted data:uri clients, and gateways can render according to mime or type included in the DataURI payload.

2) plaintextv2 data with IPFS or IPLD namespace:

This is similar to the previous option but using plaintextv2 instead of raw as multicodec.

import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
const utf8 = new TextEncoder();

let data = utf8.encode('data:text/plain;base64,SGVsbG8gV29ybGQ');
let cid = CID.create(1, 0x706c61, await identity.digest(data))

plaintextv2 format using IPFS namespace

Base32 : bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r
Base16 : f01e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash : 
0xe30101e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Namespace Version Multiaddr Multihash Length Data
ipfs 1 plaintextv2 identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xe301 0x01 0xe1d8c103 0x00 0x26 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

CID Inspector : https://cid.ipfs.tech/#bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

Public Gateway : https://ipfs.io/ipfs/bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2rh

plaintextv2 format using IPLD namespace

Base32 : bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r
Base16 : f01e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash : 
0xe20101e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Namespace Version Multiaddr Multihash Length Data
ipld 1 plaintextv2 identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xe201 0x01 0xe1d8c103 0x00 0x26 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

CID Inspector : https://cid.ipfs.tech/#bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

Public Gateway : https://dweb.link/api/v0/dag/get?arg=bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

:exclamation: NOTE: plaintextv2 is still in draft and IPFS gateways CANNOT yet render it properly resulting in 500 error. Trying to use it as IPLD might require changing the encoding process too. We do not prefer this method.

C) CARv1 strings

CARv1 files as strings can represent IPFS data or IPLD files and directory but this implementation is more complex than previous options so we’ll only mention this as a footnote. We don’t have the bandwidth to implement this. If ENS devs are happy to explore this for future implementation, it’ll be one of best options for fully on- or off-chain generators and IPFS data storage.

Based on this, we are happy to get more feedback and then make changes to the draft ENSIP! :pray:

This is a cross-post from GitHub


I’m confused why IPFS is involved at all. Why can’t you either use an existing multicodec identifier or define a new one for URIs?

We had thought about the option of a new namespace in our draft but we skipped it since IPFS/IPLD namespace with plaintext or raw encoded payload contained within the CID is sufficiently unique, and backwards compatible with IPFS gateways returning plaintext data.

Proposed IPFS (raw-ipld/plaintext) : 0xe301015500 + <data.length> + <data>
  Normal IPFS      (dag-pb/sha256) : 0xe301017012 + <hash.length> + <hash of data or dag>

However, we’re open to requesting a new ENS-specific namespace for this ENSIP only. In this regard, please suggest a short code (>=2 bytes) for this and we’ll PR that in the multicodec table soon. Something like

  • hex('ens') = 0x656e73 sounds like a good option to us; equivalent namespace is VARINT(0x656e73) = 0xf3dc9503
  • Non-ASCII option 0xda7a is also good, which will lead to a VARINT(0xda7a) = 0xfab403 namespace

The above two options with raw multiaddr should implement like:

  Namespace Version Multiaddr Multihash Length Example
    1 raw identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xda7a 0xfab403 0x01 0x55 0x00 0x26 0x646...751
hex('ens') 0xf3dc9503 0x01 0x55 0x00 0x26 0x646...751

Please suggest us more options other than these two!

There are currently no DataURI-related multiaddr or namespaces and we do not want to introduce one in this context due to lack of manpower and funding to follow up on sidequests. DataURI class is too broad and it’ll also require mime/type codecs which are pending on issues or PR for a very long time. See below:

feat: assign codes for MIME types by Stebalien · Pull Request #159 · multiformats/multicodec · GitHub

Mimetypes as codes · Issue #4 · multiformats/multicodec · GitHub

What do you mean by the last part? I don’t understand how using an ipld or ipfs content identifier format is ‘backwards compatible’ with IPFS gateway return data, which is something entirely different.

This shouldn’t be ENS-specific; you just need a multicodec code that represents ‘the encoded data is a URI of some kind’. URIs don’t have mimetypes, so that shouldn’t be an issue.

We are convinced that we need to request a new multiaddr and multicodec first. We’ll close this ENSIP since we’ve discovered an alternative which is compatible with ENSIP-07 and serves our data:uri requirements for now :partying_face: We may revive this in the future when we have more resources at hand :pray: Feedback is much appreciated!

1 Like