[Draft] ENSIP-17: DataURI Format in Contenthash

NameSys · October 31, 2023, 7:27am

Hello ENS,

NameSys is proposing an extension to ENSIP-07 which introduces data:uri format in ENS Contenthash field, allowing dynamic data streaming using CCIP-Read and Wildcard Resolution. ENS Contenthash so far only allows static content by linking to decentralised hosting such as IPFS, Arweave etc. This proposed improvement will further enhance ENS domains’ utility by enabling a rich ecosystem of dynamic content in the Contenthash. Please feel free to go through the draft and ask any questions, seek clarifications, give suggestions or propose edits.

PR for this proposal lives here: [Proposal] ENSIP-17: DataURI Format in Contenthash by sshmatrix · Pull Request #165 · ensdomains/docs · GitHub

ENSIP-17: DataURI Format in Contenthash

RFC-2397 Compliant DataURI Format in Contenthash


Author(s)	sshmatrix.eth, ethlimo.eth, freetib.eth
Status	Draft
Submitted	`2023-10-31`

Abstract

This ENSIP introduces DataURI format in Contenthash field (ENSIP-07) for compatible ENS resolvers. DataURI format (RFC-2397) is desired and suitable for enabling dynamic dWeb content for ENS domains using on-chain and/or off-chain resources.

Motivation

ENS contenthash (ENSIP-07) currently enables linking to static content which is strictly off-chain. The off-chain content is entirely dependent on off-chain providers, and updating this content for ENS-based decentralised websites typically requires updating the on-chain contenthash explicitly (except for IPNS). ENS domains’ avatar text records and their ERC-721/-1155 interfaces already support generated DataURI bytes (data:uri) to resolve JSON and image metadata. This specification enables a similar data:uri format in ENS contenthash field, allowing ENS Resolvers to fetch and serve on-chain and/or off-chain data. The off-chain resources for the DataURI content may use CCIP-Read and an appropriate utf-8 decoder to render the encoded bytes. This specification allows complete support for dynamic data in ENS Contenthash using CCIP-Read (EIP-3886) and Wildcard Resolution (ENSIP-10).

Specification

This specification is an extension of ENSIP-07 to support in-line bytes of data conforming to the data:uri scheme (RFC-2397) as ENS Contenthash. There are no changes to be made in the current ENS Resolvers since contenthash bytes are parsed as utf-8 characters by default. Only a standardisation needs to be enacted for web3 providers to begin resolving ENS Contenthash in data:uri scheme. Simple details of the proposed standardisation are as follows:

Decoded String

DataURI is string-formatted according to RFC-2397:

data:<media>/<type>;<encoding>,<payload>

Encoded Bytes

The raw string-formatted data is returned as encoded hexadecimal bytes. The encoded value returned by DataURI-compatible contenthash is always prefixed with the 5-byte identifier 0x646174613a followed by the remaining variable encoded databytes.

stringTohex("data:")` = `0x646174613a`

Examples

Decoded String	Encoded Bytes
`data:text/plain;base64,SGVsbG8gV29ybGQ`	`0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751`
`data:text/plain,Hello World`	`0x646174613a746578742f706c61696e2c48656c6c6f20576f726c64`
`data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII`	`0x646174613a696d6167652f706e673b6261736536342c6956424f5277304b47676f414141414e5355684555674141414167414141414941514d414141442b77537a4941414141426c424d5645582f2f2f2b2f76372b6a5133593541414141446b6c45515651493132503441495838454167414c6741442f614e7062744541414141415355564f524b3543594949`
`data:image/svg+xml,<svgxmlns='http://www.w3.org/2000/svg'height='30'width='200'><textx='0'y='15'fill='red'>IamSVG</text></svg>`	`0x646174613a696d6167652f7376672b786d6c2c3c737667786d6c6e733d27687474703a2f2f7777772e77332e6f72672f323030302f737667276865696768743d2733302777696474683d27323030273e3c74657874783d273027793d2731352766696c6c3d27726564273e49616d5356473c2f746578743e3c2f7376673e`
`data:text/xml,<?xml version='1.0'?><note>I am XML</note>`	`0x646174613a746578742f786d6c2c3c3f786d6c2076657273696f6e3d27312e30273f3e3c6e6f74653e4920616d20584d4c3c2f6e6f74653e`
`data:text/html,Hello, <div>I am HTML</div>`	`0x646174613a746578742f68746d6c2c48656c6c6f2c203c6469763e4920616d2048544d4c3c2f6469763e`

With this simple standardisation, web3 providers may now serve data:uri content from on-chain or off-chain resources allowing dynamic content on ENS dWebsites.

Implementation

GitHub : namesys-eth/datauri-eth-resolver (Work-In-Progress)

References

[1] ENSIP-07: Contenthash Field

[2] ENSIP-10: Wildcard Resolution

[3] EIP-3668: CCIP Read: Secure Off-Chain Data Retrieval

[4] RFC-2397: The “data” URL Scheme

Copyright

Copyright and related rights waived via CC0.

Magnum.eth · October 31, 2023, 8:07pm

This is the type of EIP that we love to see and support wholeheartedly

raffy · November 1, 2023, 12:47am

contenthash has been coded as uvarint(proto) + payload... so shouldn’t this convention be followed?

0xE3 = ipfs
0xE4 = swarm
0xE5 = ipns

I guess defining proto 0x64 (d) as a DataURL would work as-is, but it wouldn’t have a known length so it’s not embeddable without an external wrapper (although I guess that’s not a requirement of multicodec.)

The following would avoid the base64 overhead:

  uvarint(0xDD) 
+ uvarint(len(mime)) + mime utf8 bytes // eg. "text/html"
+ uvarint(len(payload)) + payload bytes

Although for simplicity, I’m a fan of bypassing the multicodec stuff (as long as the first uvarint decodes correctly) and just embedding raw utf-8 data.

Should it also support URLs for 30X redirection?

uvarint(0xDD) + uvarint(0/*url*/) + uvarint(19) + "https://ens.domains"

0xc0de4c0ffee · November 1, 2023, 3:07am

@raffy, thanks for feedback.

There’s plaintextv2 as hex("pla") = 0x706c61 multiaddr prefix in multicodec.

522	plaintextv2	multiaddr	0x706c61	draft

We tried 0xe2 IPLD before with PR on ens/content-hash.js then abandoned it for now to work on simpler specs… Requesting for direct CAR file/data to be included is more complicated option. As alt options we also tried some old on-chain IPFS generators in ENS resolver that’ll require external services to manually read data during ccip-read from on-chain and pin that to be semi-dynamically resolvable (*not really scalable).

So we’re proposing hex(“data:”) prefix to be simple and backwards compatible with data uris used in NFTs/avatar. eg, contenthash for 1234.hello-nft.eth can resolve bytes(tokenURI(1234)) directly as data:application/json,{...metadata}.

It’s good idea to request hex(“data:”) to be included on multicodec soon but for now we’re trying to use default/fallback profile in ens-contenthash.js

github.com/ensdomains/content-hash

src/profiles.ts

e3f00395e


      
              decode: decodes.ipfs,
            },
            ipns: {
              encode: encodes.ipns,
              decode: decodes.ipns,
            },
            arweave: {
              encode: encodes.arweave,
              decode: decodes.base64,
            },
            default: {
              encode: encodes.utf8,
              decode: decodes.utf8,
            },
          } as const;

0xc0de4c0ffee · November 1, 2023, 3:40am

There’s old alternative for that, uniswap.eth is still using it for years…
** really don’t recommend using old stuffs, but it works.
eg,
base16 = f0172000f6170702e756e69737761702e6f7267
base58 = 12uA8M8Ku8mHUumxHcu7uee
base32 = bafzaad3bobyc45lonfzxoylqfzxxezy

01-72-00-0f-6170702e756e69737761702e6f7267
version - libp2p - identity - length - hex(“app.uniswap.org”)

edit: For ENS this should be extra prefixed with namespace + length… Comes with deprecated warning. 0xe5010172000f6170702e756e69737761702e6f7267

github.com/ensdomains/content-hash

src/index.test.ts

e3f00395e


      
            test("legacy PeerID => CIDv1", () => {
              expect(decode(ipns_peerID_B58_contentHash)).toEqual(ipns_CIDv1);
            });
            test("ED25519 => CIDv1 with libp2p-key codec", () => {
              expect(decode(ipns_ED25519_contentHash)).toEqual(ipns_libp2pKey_CIDv1);
            });
            test("DNSLINK", () => {
              // DNSLink is fine to be used before ENS resolve occurs, but should be avoided after
              // Context: https://github.com/ensdomains/ens-app/issues/849#issuecomment-777088950
              // For now, we allow decoding of legacy values:
              const deprecated_dnslink_contentHash =
                "e5010170000f6170702e756e69737761702e6f7267";
              const deprecated_dnslink_value = "app.uniswap.org";
          
              expect(decode(deprecated_dnslink_contentHash)).toEqual(
                deprecated_dnslink_value
              );
            });
          });
          test("onion", () => {
            expect(decode(onion_contentHash)).toEqual(onion);

nick.eth · November 2, 2023, 9:12am

Delighted to see this - but as I mentioned in the PR, and @raffy observes, this definitely needs to be encoded as a valid multicodec value.

NameSys · November 4, 2023, 12:17pm

Thanks for the feedback!

We have looked into possible ways for this draft ENSIP to be compatible with multicodec. These are our findings in form of different implementations with and without multicodec. We are open to either implementation in the end and update this draft ENSIP as required.

A) Bypass Multicodec:

First, we’d like to point to the current state of ens/content-hash.js. When using hex("data:") = 0x646174613a as prefix, encoding doesn’t work and the decoder nearly works but it removes the first byte in the process. Please see example below,

import {encode, decode } from "@ensdomains/content-hash";
console.log(encode("data:text/plain;base64,SGVsbG8gV29ybGQ"))
//> 00000000000000000000

console.log(decode("646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751"))
//> ata:text/plain;base64,SGVsbG8gV29ybGQ

The extra 0x00 prefix identifier as a spacer/pseudo namespace could prevent any future collision with multicodec.

console.log(decode("00646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751"))
//> data:text/plain;base64,SGVsbG8gV29ybGQ

Quoting @raffy here,

This will bypass multicodec for DataURIs in Contenthash. ens/content-hash.js and any gateways/clients can easily implement this with basic checks, i.e. checking for if prefix is 0x00646174613a before encoding and decoding so that ENS clients and gateways can use DataURIs directly without leaving any room for current or future collisions with multicodec formats. This approach will be ENS specific and we can change our ENSIP draft to reflect this.

This is our preferred implementation but we are not married to it.

B) Multicodec Compatible Formats

If multicodec must be used, then we’d like to propose the following options:

1) `raw` data type with IPFS `namespace`:

IPFS namespace is compatible with DataURIs using raw data type.

import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
import * as raw from "multiformats/codecs/raw";
const utf8 = new TextEncoder();

let data = utf8.encode('data:text/plain;base64,SGVsbG8gV29ybGQ')
let cid = CID.create(1, raw.code, await identity.digest(data))

IPFS Format :

base32: bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq
base16: f01550026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash: 0xe30101550026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

Namespace	Version	Multiaddr	Multihash	Length	Data
`ipfs`	`1`	`raw`	`identity`	`38`	`data:text/plain;base64,SGVsbG8gV29ybGQ`
`0xe301`	`0x01`	`0x55`	`0x00`	`0x26`	`0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751`

CID Inspector : https://cid.ipfs.tech/#bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq

Public Gateway : https://ipfs.io/ipfs/bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq

Since this method uses IPFS namespace, ens/content-hash.js and any compatible gateway or client must check if the encoded payload is using raw as multicodec with identity (blank) as multihash; the shorthand prefix for this is 0xe301015500. Clients can decode the remaining raw data as utf-8 string; if this data is not data:uri formatted, it should be auto-rendered as plaintext for correctly formatted data:uri clients, and gateways can render according to mime or type included in the DataURI payload.

2) `plaintextv2` data with IPFS or IPLD `namespace`:

This is similar to the previous option but using plaintextv2 instead of raw as multicodec.

import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
const utf8 = new TextEncoder();

let data = utf8.encode('data:text/plain;base64,SGVsbG8gV29ybGQ');
let cid = CID.create(1, 0x706c61, await identity.digest(data))

`plaintextv2` format using IPFS namespace

Base32 : bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r
Base16 : f01e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash : 
0xe30101e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

Namespace	Version	Multiaddr	Multihash	Length	Data
`ipfs`	`1`	`plaintextv2`	`identity`	`38`	`data:text/plain;base64,SGVsbG8gV29ybGQ`
`0xe301`	`0x01`	`0xe1d8c103`	`0x00`	`0x26`	`0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751`

CID Inspector : https://cid.ipfs.tech/#bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

Public Gateway : https://ipfs.io/ipfs/bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2rh

`plaintextv2` format using IPLD namespace

Base32 : bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r
Base16 : f01e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash : 
0xe20101e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

Namespace	Version	Multiaddr	Multihash	Length	Data
`ipld`	`1`	`plaintextv2`	`identity`	`38`	`data:text/plain;base64,SGVsbG8gV29ybGQ`
`0xe201`	`0x01`	`0xe1d8c103`	`0x00`	`0x26`	`0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751`

CID Inspector : https://cid.ipfs.tech/#bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

Public Gateway : https://dweb.link/api/v0/dag/get?arg=bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

NOTE: plaintextv2 is still in draft and IPFS gateways CANNOT yet render it properly resulting in 500 error. Trying to use it as IPLD might require changing the encoding process too. We do not prefer this method.

C) `CARv1` strings

CARv1 files as strings can represent IPFS data or IPLD files and directory but this implementation is more complex than previous options so we’ll only mention this as a footnote. We don’t have the bandwidth to implement this. If ENS devs are happy to explore this for future implementation, it’ll be one of best options for fully on- or off-chain generators and IPFS data storage.

Based on this, we are happy to get more feedback and then make changes to the draft ENSIP!

NameSys · November 6, 2023, 11:22am

This is a cross-post from GitHub

I’m confused why IPFS is involved at all. Why can’t you either use an existing multicodec identifier or define a new one for URIs?

We had thought about the option of a new namespace in our draft but we skipped it since IPFS/IPLD namespace with plaintext or raw encoded payload contained within the CID is sufficiently unique, and backwards compatible with IPFS gateways returning plaintext data.

Proposed IPFS (raw-ipld/plaintext) : 0xe301015500 + <data.length> + <data>
  Normal IPFS      (dag-pb/sha256) : 0xe301017012 + <hash.length> + <hash of data or dag>

However, we’re open to requesting a new ENS-specific namespace for this ENSIP only. In this regard, please suggest a short code (>=2 bytes) for this and we’ll PR that in the multicodec table soon. Something like

hex('ens') = 0x656e73 sounds like a good option to us; equivalent namespace is VARINT(0x656e73) = 0xf3dc9503
Non-ASCII option 0xda7a is also good, which will lead to a VARINT(0xda7a) = 0xfab403 namespace

The above two options with raw multiaddr should implement like:

	Namespace	Version	Multiaddr	Multihash	Length	Example
		`1`	`raw`	`identity`	`38`	`data:text/plain;base64,SGVsbG8gV29ybGQ`
`0xda7a`	`0xfab403`	`0x01`	`0x55`	`0x00`	`0x26`	`0x646...751`
`hex('ens')`	`0xf3dc9503`	`0x01`	`0x55`	`0x00`	`0x26`	`0x646...751`

Please suggest us more options other than these two!

There are currently no DataURI-related multiaddr or namespaces and we do not want to introduce one in this context due to lack of manpower and funding to follow up on sidequests. DataURI class is too broad and it’ll also require mime/type codecs which are pending on issues or PR for a very long time. See below:

feat: assign codes for MIME types by Stebalien · Pull Request #159 · multiformats/multicodec · GitHub

Mimetypes as codes · Issue #4 · multiformats/multicodec · GitHub

nick.eth · November 6, 2023, 11:29am

NameSys:

We had thought about the option of a new namespace in our draft but we skipped it since IPFS/IPLD namespace with plaintext or raw encoded payload contained within the CID is sufficiently unique, and backwards compatible with IPFS gateways returning plaintext data.

What do you mean by the last part? I don’t understand how using an ipld or ipfs content identifier format is ‘backwards compatible’ with IPFS gateway return data, which is something entirely different.

This shouldn’t be ENS-specific; you just need a multicodec code that represents ‘the encoded data is a URI of some kind’. URIs don’t have mimetypes, so that shouldn’t be an issue.

NameSys · November 6, 2023, 2:14pm

We are convinced that we need to request a new multiaddr and multicodec first. We’ll close this ENSIP since we’ve discovered an alternative which is compatible with ENSIP-07 and serves our data:uri requirements for now We may revive this in the future when we have more resources at hand Feedback is much appreciated!

raffy · January 12, 2024, 10:12am

I was going make a different thread but I’ll bump this one instead.

Provided again for reference:

ENSIP-7: ENSIP-7: Contenthash field - ENS Documentation
ERC-1577: https://github.com/ethereum/ercs/blob/master/ERCS/erc-1577.md

The “known” multicodecs are listed here: https://github.com/multiformats/multicodec/blob/master/table.csv

Example: arweave is listed as:

name: arweave-ns
tag: namespace
code: 0xb29910
status: draft
description: Arweave Namespace

I don’t know what tag means or what “namespace” is.

The current ensdomains/content-hash library supports the following protocols:

github.com

ensdomains/content-hash/blob/master/src/map.ts

export const codeToName = {
  0xe3: "ipfs",
  0xe5: "ipns",
  0xe4: "swarm",
  0x01bc: "onion",
  0x01bd: "onion3",
  0xb19910: "skynet",
  0xb29910: "arweave",
} as const;

export const nameToCode = {
  ipfs: 0xe3,
  ipns: 0xe5,
  swarm: 0xe4,
  onion: 0x01bc,
  onion3: 0x01bd,
  skynet: 0xb19910,
  arweave: 0xb29910,
} as const;

This file has been truncated. show original

Example: predomain.eth’s contenthash is encoded as <uvarint(0xB29910)><bytes32> where those bytes are TX_ID (which apparently is a sha256 hash.) To resolve that content, you need to query an arweave gateway, which involves finding a public gateway and reading their API docs for how to make a query (eg. Base64URL encode the hash).

My point: trying to figure this stuff out from purely from documentation is hard.

I just want to make a few random comments:

We should update ENSIP-7 to include all the known protocols. We should include examples of how you decode and query the data, links to documentation, etc. For the CID examples, we should also put conditionals on the CID codec types. Like what happens if you get a IPFS CID that isn’t a dag-pb? Or an IPNS CID that isn’t a libp2p-key? I can do this if appropriate.

Why wasn’t the AddressResolver used to store this data? This is a historical question.

Is contenthash just for websites? Or are there other use-cases?

Since there’s no way to resolve the contenthash without multiple levels of interpretation (protocol → hash decoding → query hash → interpret), can we address ENSIP-17 and @NameSys’s request by simply having a contenthash that is a pointer to a text record? This could either be <uvarint(protocol)><utf8 bytes of text key> or just <uvarint(protocol)> which uses the fixed key "contenthash". Then, contenthash can just be any URI, which allows data:.., ipfs://..., bzz://..., arweave://..., https://, or whatever. This would be both self-describing and pre-formats the data for easy querying. And if the URL is 64 bytes or less, there’s no difference is storage gas.

There should also be discussion about how these protocols are dispatched. Many apps silently use cloudflare which doesn’t support IPNS and doesn’t give the user the opportunity to use their own gateway or traverse content that the app doesn’t understand. However, some browsers (and browser plugins) can handle these protocols natively.

nick.eth · January 12, 2024, 11:03am

Good idea. If you’re happy to take it on, that’d be much appreciated!

Which data?

In the broadest possible sense of ‘website’ - content that can be fetched (ultimately) over HTTP - yes.

What benefit is gained by the additional layer of indirection, rather than having the contenthash just encode the URI directly?

raffy · January 12, 2024, 8:59pm

Okay, I’ll work on an update PR.

I was thinking contenthash() interface could have been some unused coinType instead.

The content could be added without any encoding. People with an url text record could set their contenthash to point to that record (eg. uvarint(proto) + bytes("url")). Similar to how a primary-contact record could reference email, phone, twitter, etc.

The parallel I was considering was the power of DNS TXT records vs. all the specific record types. However, I agree, directly encoded is fine too.

data: seems really interesting with L2 stuff.

And a separate protocol that sidesteps the URL-encoding.

0xc0de4c0ffee · January 15, 2024, 6:35am

+1 for expanding docs with examples. We can help around with fancy CIDs

IPFS:
There’s no need to add “conditional” for different codecs. eg, dag-json, dag-cbor or even raw codecs are already prefixed with IPFS namespace and auto handled by IPFS gateways. There are tons of other codecs “not-implemented” by ipfs gateways that mostlikely won’t fall directly under “IPFS” namespace.

IPLD:
IPFS nodes and gateways don’t support ../ipld/<cid> request directly & ENS isn’t supporting IPLD yet. Support IPLD Contenthash - #7 by 0xc0de4c0ffee … Technically “IPFS” namespace is IPLD with “fs”, IPFS gateways don’t have ../ipld/<cid> path as IPLD is too broad with long list of codecs, & dapps are free to implement their own IPLD schemas that can only be resolved by requesting ipfs gateway.tld ../api/v0/dag/get?arg=<cid>.

IPNS:
It can be libp2p/ed225519, or secp256k1 keys or old deprecated dnslink contenthash so underlying gateway’s ipns/<cid/dnslink> resolving process is same for all. *note: it’d be nice to re-use eth/secp256k1 keys for ipns but it isn’t widely used/supported so we’re using libp2p/ed25519 keygen with deterministic secp256k1 signature’s hash as seed. ipns address by 0xc0de4c0ffee · Pull Request #10 · paulmillr/micro-key-producer · GitHub

Other dweb storage/contenthash have their own namespace so it should be auto handled by those gateways after contenthash is decoded. Arweave & Arweave NS format is one example… IPNS is still slow & patchy with republishing and scaling issue so we’re also looking into “fake” IPNS DHT by storing IPNS data in L2…

We’re using raw codec (generated on-chain) as alternative to data:uri type mentioned in this draft/“abandoned” ENSIP. IPFS gateways have to do guesswork as there’s no mime/content/type attached in CID. if raw data starts with < it’ll try to render that as html… dag-json is easy json format but we don’t get direct dag-cbor/link traversing support in dag-json only… so we’ve to use dag-json wrapped in dag-cbor… IPFS gateways auto support dag-cbor so its all good even without data:uri support.

Here’s one example of raw/html ipfs cid from our tests…
https://ipfs.io/ipfs/bafkqb3qchruhi3lmhy6gqzlbmq7dy5djorwgkptomfwwk43zomwwk5difzsgk5rtfzsxi2b4f52gs5dmmu7dy3lforqsa3tbnvst2itemvzwg4tjob2gs33oeiqgg33oorsw45b5ejwg63thebsgk43dojuxa5djn5xcepr4nvsxiyjanb2hi4bnmvyxk2lwhurhezlgojsxg2bcebrw63tumvxhipjcgm5vkusmhutwq5duobztulzpnzqw2zltpfzs2zlunaxgo2lunb2weltjn4tsepr4nvsxiyjaobzg64dfoj2hspjcn5ttu2lnmftwkiramnxw45dfnz2d2itior2ha4z2f4xw4ylnmvzxs4znmv2gqlthnf2gq5lcfzuw6l3mn5tw6ltqnztsepr4f5ugkylehy6ge33epe7dy2bshzjgkzdjojswg5djnztsa5dpea6gcidiojswmpjcnb2hi4dthixs63tbnvsxg6ltfvsxi2bom5uxi2dvmixgs3zchzxgc3lfon4xgllforuc4z3joruhkyronfxtyl3bhyxdyl3igi7dyl3cn5shspr4f5uhi3lmhy

raffy:

data: seems really interesting with L2 stuff.
raffy:
  uvarint(0xDD) 
+ uvarint(len(mime)) + mime utf8 bytes // eg. "text/html"
+ uvarint(len(payload)) + payload bytes 
And a separate protocol that sidesteps the URL-encoding.

Our core idea of using RFC-2397 datauris as hex in contenthash instead of multicodes is to simplify whole process… Just use hex(“data:…”) to encode and decode back. We don’t have to wait multicodecs support for everything. As we’ve mentioned before mime/contenttype in CID is loong pending for CID?v2, original issue is still open since 2016, last active PR is out there from 2022.

nick.eth · January 15, 2024, 10:45am

That seems clunky!

Having to set a specially encoded contenthash record, so that you don’t have to encode your text record, still seems less simple than just setting an encoded contenthash record containing the URL, though.

raffy · January 17, 2024, 6:29am

I wasn’t aware that data:text/plain;charset=utf8,💩️ is considered valid if we store the URL in UTF-8, but that’s +50% data size in the general case — 50% chance of 2-bytes per byte + escape overhead.

new URL('data:text/plain;charset=utf8,💩️').toString();
// data:text/plain;charset=utf8,%F0%9F%92%A9%EF%B8%8F
// same as encodeURIComponent('💩️')

data:application/octet-stream,... is also possible, however looking at RFC-3986 this seems like a mistake due to all the escape logic or requires base64 which is +33% data size — 4-bytes per 3 bytes.

new URL('data:application/octet-stream,\x20').toString();     // "" => expected " "
new URL('data:application/octet-stream,\x20\x01').toString(); // " %01"
new URL('data:application/octet-stream,\x30\x20').toString();     // "0" => expected "0 "

After considering a few alternatives, I think we should use the following encoding which requires one new multicodec for "url".

codec = 0x12345; // or whatever we pick

// header
uvarint(codec) + uvarint(type) + encoded

// URL (type = 0)
encoded = url.bytes // url is encoded according to RFC-3986 which is ASCII
// ie. encodeURI() except for the ipv6 bracket stuff

However, since this is inefficient for data URLs, we add a type = 1 variant which has a "mime":

// data URL (type = 1)
let mime = "image/jpeg"
let data: bytes[] // anything
encoded = uvarint(mime.utf8Length) + mime.utf8Bytes + data.bytes

The type field can also double as version field for future upgrades.

This allows literal data stored in on-chain to be shared between the contenthash and other use-cases w/o any transcoding.

The following code parses ALL ENS-supported contenthash codec/protocols:

function parseContentHash(bytes[] v) {
  reader = new Reader(v)
  switch (reader.uvarint()) {
    case 0xE3: return {type: 'ipfs', cid: reader.cid()}; // require cid.codec = dag-pb
    case 0xE4: return {type: 'swarm', cid: reader.cid()}; // require cid.codec = swarm-manifest
    case 0xE5: return {type: 'ipns', cid: reader.cid()}; // require cid.codec = libp2p-key, cid.version = 1
    case 0x1BC: return {type: 'onion', address: Base36.encode(reader.bytes())}; // require length = 16, deprecated in 2021
    case 0x1BD: return {type: 'onion', address: Base36.encode(reader.bytes())}; // require length = 56
    case 0xB19910: return {type: 'skylink', id: Base64URL.encode(reader.bytes())}; // require length = 46, this service is dead?
    case 0xB29910: return {type: 'arweave', hash: reader.bytes(32)};
    case 0x12345: {
        switch (reader.uvarint()) {
            case 0: return {type: 'url', url: new URL(String.fromCharCode(...reader.bytes()))}; // throws if invalid
            case 1: return {type: 'data-url', mime: reader.read(reader.uvarint()), data: reader.read()};
            default: throw new Error('unknown url type');
        }
    }
    default: throw new Error('unknown contenthash codec');
} 

function protocolURLFromDecodedContentHash(info) {
    switch (info.type) {
        case 'ipfs': return `ipfs://${info.cid.toString('k')}`; // v0 = Base58BTC, v1 = Base32 (k)
        case 'ipns': return `ipns://${info.cid.toString('k')}`;
        case 'swarm': return `bzz://${info.cid.toString('k')}`;
        case 'onion': return `onion://${info.address}`;
        case 'arweave': return `arweave://${Base64URL.encode(info.hash)}`;
        case 'url': return info.url.toString();
        case 'data-url': return `data:${info.mime};base64,${btoa(String.fromCharCode(...info.data))}`;
    }
}

This doesn’t require a new library and is implementable with vanilla JS.

Encoded examples:

uvarint(0x12345) uvarint(0) "https://www.chonk.com/"
uvarint(0x12345) uvarint(0) "data:image/gif;base64,AAAA";
uvarint(0x12345) uvarint(1) uvarint(9) "image/gif" <0x000000> (same as above)

In the contenthash() website use-case, a data URL just serves that data, and an http/https URL is a 30X redirect.

If the URL corresponds to a unknown protocol (not data/http/https, ie. unfetchable), it can be ignored.

In some cases, a data URL can be regurgitated w/o any interpretation, eg. https://raffy.eth.limo could technically just respond with a jpeg? Although care should be made by content providers to avoid passing unsafe content (eg. just follow basic browser accept rules or only “serve” known mimes). Since no content-deposition is allowed, there’s no file extension risk (like .exe).

However, if a filename is required for some future purpose, this same setup could be extended with type = 2, for (mime, name, data) → uvarint(mime.utf8Length) + mime.utf8Bytes + uvarint(name.utf8.length) + name.utf8.bytes + data.bytes. For a future use-case, we could store a file in addr() records exactly like contenthash that point to a IPFS file, a URL, or an inline-data URL using the exact same scheme.

Somewhat related: there could also be a bytes version of the avatar-string defined as a codec.

codec = 0x54321;

uvarint(codec) + uvarint(type) + encoding

 ERC-721: type = 0 => uvarint(chain) + address(contract) + uvarint(token)
ERC-1155: type = 1 => uvarint(chain) + address(contract) + uvarint(token)

Since "avatar" already suffers from protocol overload (invalid URLs like ipfs:/, ipfs://ipfs/Qm..., etc.)

Example encoding for 10K mainnet NFT:

uvarint(0x54321) uvarint(0) + uvarint(1) + bytes20 + uvarint(10000)
This is only 3+1+1+20+2 = 27 bytes or 1 slot!

Additionally, we could parse the addr() version of "avatar" with the exact same logic as contenthash().

Also "small-avatar" (thumbnailed version).

nick.eth · January 17, 2024, 10:58am

raffy:

I wasn’t aware that data:text/plain;charset=utf8,💩️ is considered valid if we store the URL in UTF-8, but that’s +50% data size in the general case — 50% chance of 2-bytes per byte + escape overhead.
new URL('data:text/plain;charset=utf8,💩️').toString();
// data:text/plain;charset=utf8,%F0%9F%92%A9%EF%B8%8F
// same as encodeURIComponent('💩️')

That’s unfortunate. I had to double-check the RFC, but that is indeed what RFC2397 specifies.

raffy:

data:application/octet-stream,... is also possible, however looking at RFC-3986 this seems like a mistake due to all the escape logic or requires base64 which is +33% data size — 4-bytes per 3 bytes.
new URL('data:application/octet-stream,\x20').toString();     // "" => expected " "
new URL('data:application/octet-stream,\x20\x01').toString(); // " %01"
new URL('data:application/octet-stream,\x30\x20').toString();     // "0" => expected "0 "

The mime-type doesn’t specify the encoding; what you want here is data:text/plain;charset=utf8;base64,8J+SqQ==.

This seems sensible at first glance, though it might make more sense to define two different multicodec values - url and dataUrl. In fact, the latter isn’t truly a URL - it’s a file with a mimetype, and perhaps the multicodec value name should reflect that.

0xc0de4c0ffee · January 18, 2024, 7:21am

it’s supposed to be used by wildcard resolvers with on-chain datauri generator, so users adding datauris directly as contenthash to their resolver isn’t primary use case.

As previously mentioned we’re already using dag-json, dag-cbor links & raw codecs under ipfs namespace as datauri alternative. dag-pb only mode is good for compatibility as some public gateways have their own /ipfs api wrapped… but in default ipfs api case there’s range of supported codecs and some codecs are not implemented yet.

this one is tricky, allowing sub/domain.eth to 30x redirect is similar to ipfs redirects feature, ipfs doesn’t support base ipfs://<cid> redirect to another <cid>, but we can do ipfs://<cid>/google/{xyz} >> 301 redirect to https://google.com?q={xyz}.

it could be verified and used similar to ipns/dnslink… _enslink text record = domain.eth on domain.xyz and verify that and use domain.eth = domain.xyz? there should be multicodec/multiaddr to encode such domain.xyz/ipv4/v6… but i’m not sure about namespace…

raffy:

case 0x12345: {
        switch (reader.uvarint()) {
            case 0: return {type: 'url', url: new URL(String.fromCharCode(...reader.bytes()))}; // throws if invalid
            case 1: return {type: 'data-url', mime: reader.read(reader.uvarint()), data: reader.read()};
            default: throw new Error('unknown url type');
        }
    }

string2Hex(“data:”) = 646174613A, that’s auto “namespace’d” so there’ll be no collision with other multicodec/namespace. Using rfc2397 instead of multicodec we don’t need varint/length data in there.

Another way to do half baked client-side redirect…
data:text/html,<meta http-equiv="refresh" content="0; url=https://www.google.com/">

raffy · January 18, 2024, 10:30am

Oops, I missed this. Can you show an example of this? What exactly can 0x129 (dag-json, ipld, MerkleDAG json) be? And how is it used? Similarly, with dag-cbor? I guess both of those give you structured data but for what purpose?

I think you’re correct and I’m wrong about the codec restrictions on the CID. The ENS client just needs to know how to put the CID into a canonical form for a website use-case, which at the moment, seems to be the base58 for CIDv0 and base36 for CIDv1 (so it can act as a domain.)

So this is codec = 0x55 (raw, ipld, raw) where the “hash” is codec = 0 (identity, multihash, raw) content is <html><head><title>namesys-eth.dev3.eth</title><meta name="description" content="long description"><meta http-equiv="refresh" content="3;URL='https://namesys-eth.github.io'"><meta property="og:image" content="https://namesys-eth.github.io/logo.png"></head><body><h2>Redirecting to <a href="https://namesys-eth.github.io">namesys-eth.github.io</a>.</h2></body></html>.

I think the “url” codec would directly address this use case: uvarint(0x12345) + uvarint(1) + uvarint(9) + "text/html" + "<html>..."

That seems fine to me. I was thinking since a data URL technically fits under both codecs (where the raw-byte version is far more space efficient and useful), they could be shared.

Another related question would be: is there a difference between a URL that would be a redirect vs a URL that would be used for a reverse-proxy? Another variant might be " load this url inside of an <iframe>". Is that part of the contenthash description or something else? eg. redirectURL, proxyURL, dataURL.

0xc0de4c0ffee · January 22, 2024, 10:06am

We’re not touching dag-json “0x129”, there’s extra plaintext JSON (UTF-8-encoded) “0x0200”…

Codec	Data	CID
JSON	`{"hello": "world"}`	ipfs://bagaaiaarpmrgqzlmnrxseorco5xxe3deej6q
RAW	`<h1>Hello World</h1>`	ipfs://bafkqafb4nayt4sdfnrwg6icxn5zgyzb4f5udcpq
DAG-CBOR	`{"json": jsonCID, "html": htmlCID }`	ipfs://bafyqaqvcmruhi3lm3avfqgiaafkqafb4nayt4sdfnrwg6icxn5zgyzb4f5udcptenjzw63wyfjlqaamaaqabc6zcnbswy3dpei5ce53pojwgiit5

Dag-cbor is “link” so we can access that as ipfs://dag_cborCID/html and ipfs://dag_cborCID/json

if you want to lookup bytes *without ipfs namespace prefix 0xe301…
Json : 01800400117b2268656c6c6f223a22776f726c64227d
HTML : 015500143c68313e48656c6c6f20576f726c643c2f68313e
dag-cbor : 01710042a26468746d6cd82a581900015500143c68313e48656c6c6f20576f726c643c2f68313e646a736f6ed82a570001800400117b2268656c6c6f223a22776f726c64227d

or use cid inspector >

[Draft] ENSIP-17: DataURI Format in Contenthash

ENSIP-17: DataURI Format in Contenthash

RFC-2397 Compliant DataURI Format in Contenthash

Abstract

Motivation

Specification

Decoded String

Encoded Bytes

Examples

Implementation

References

Copyright

A) Bypass Multicodec:

B) Multicodec Compatible Formats

1) raw data type with IPFS namespace:

IPFS Format :

2) plaintextv2 data with IPFS or IPLD namespace:

plaintextv2 format using IPFS namespace

plaintextv2 format using IPLD namespace

C) CARv1 strings

1) `raw` data type with IPFS `namespace`:

2) `plaintextv2` data with IPFS or IPLD `namespace`:

`plaintextv2` format using IPFS namespace

`plaintextv2` format using IPLD namespace

C) `CARv1` strings