[Draft] ENSIP-17: DataURI Format in Contenthash

Hello ENS,

NameSys is proposing an extension to ENSIP-07 which introduces data:uri format in ENS Contenthash field, allowing dynamic data streaming using CCIP-Read and Wildcard Resolution. ENS Contenthash so far only allows static content by linking to decentralised hosting such as IPFS, Arweave etc. This proposed improvement will further enhance ENS domainsā€™ utility by enabling a rich ecosystem of dynamic content in the Contenthash. Please feel free to go through the draft and ask any questions, seek clarifications, give suggestions or propose edits.

PR for this proposal lives here: [Proposal] ENSIP-17: DataURI Format in Contenthash by sshmatrix Ā· Pull Request #165 Ā· ensdomains/docs Ā· GitHub


ENSIP-17: DataURI Format in Contenthash

RFC-2397 Compliant DataURI Format in Contenthash

   
Author(s) sshmatrix.eth, ethlimo.eth, freetib.eth
Status Draft
Submitted 2023-10-31

Abstract

This ENSIP introduces DataURI format in Contenthash field (ENSIP-07) for compatible ENS resolvers. DataURI format (RFC-2397) is desired and suitable for enabling dynamic dWeb content for ENS domains using on-chain and/or off-chain resources.

Motivation

ENS contenthash (ENSIP-07) currently enables linking to static content which is strictly off-chain. The off-chain content is entirely dependent on off-chain providers, and updating this content for ENS-based decentralised websites typically requires updating the on-chain contenthash explicitly (except for IPNS). ENS domainsā€™ avatar text records and their ERC-721/-1155 interfaces already support generated DataURI bytes (data:uri) to resolve JSON and image metadata. This specification enables a similar data:uri format in ENS contenthash field, allowing ENS Resolvers to fetch and serve on-chain and/or off-chain data. The off-chain resources for the DataURI content may use CCIP-Read and an appropriate utf-8 decoder to render the encoded bytes. This specification allows complete support for dynamic data in ENS Contenthash using CCIP-Read (EIP-3886) and Wildcard Resolution (ENSIP-10).

Specification

This specification is an extension of ENSIP-07 to support in-line bytes of data conforming to the data:uri scheme (RFC-2397) as ENS Contenthash. There are no changes to be made in the current ENS Resolvers since contenthash bytes are parsed as utf-8 characters by default. Only a standardisation needs to be enacted for web3 providers to begin resolving ENS Contenthash in data:uri scheme. Simple details of the proposed standardisation are as follows:

Decoded String

  • DataURI is string-formatted according to RFC-2397:
data:<media>/<type>;<encoding>,<payload>

Encoded Bytes

  • The raw string-formatted data is returned as encoded hexadecimal bytes. The encoded value returned by DataURI-compatible contenthash is always prefixed with the 5-byte identifier 0x646174613a followed by the remaining variable encoded databytes.
stringTohex("data:")` = `0x646174613a`

Examples

Decoded String Encoded Bytes
data:text/plain;base64,SGVsbG8gV29ybGQ 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
data:text/plain,Hello World 0x646174613a746578742f706c61696e2c48656c6c6f20576f726c64
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII 0x646174613a696d6167652f706e673b6261736536342c6956424f5277304b47676f414141414e5355684555674141414167414141414941514d414141442b77537a4941414141426c424d5645582f2f2f2b2f76372b6a5133593541414141446b6c45515651493132503441495838454167414c6741442f614e7062744541414141415355564f524b3543594949
data:image/svg+xml,<svgxmlns='http://www.w3.org/2000/svg'height='30'width='200'><textx='0'y='15'fill='red'>IamSVG</text></svg> 0x646174613a696d6167652f7376672b786d6c2c3c737667786d6c6e733d27687474703a2f2f7777772e77332e6f72672f323030302f737667276865696768743d2733302777696474683d27323030273e3c74657874783d273027793d2731352766696c6c3d27726564273e49616d5356473c2f746578743e3c2f7376673e
data:text/xml,<?xml version='1.0'?><note>I am XML</note> 0x646174613a746578742f786d6c2c3c3f786d6c2076657273696f6e3d27312e30273f3e3c6e6f74653e4920616d20584d4c3c2f6e6f74653e
data:text/html,Hello, <div>I am HTML</div> 0x646174613a746578742f68746d6c2c48656c6c6f2c203c6469763e4920616d2048544d4c3c2f6469763e

With this simple standardisation, web3 providers may now serve data:uri content from on-chain or off-chain resources allowing dynamic content on ENS dWebsites.

Implementation

GitHub : namesys-eth/datauri-eth-resolver (Work-In-Progress)

References

[1] ENSIP-07: Contenthash Field

[2] ENSIP-10: Wildcard Resolution

[3] EIP-3668: CCIP Read: Secure Off-Chain Data Retrieval

[4] RFC-2397: The ā€œdataā€ URL Scheme

Copyright

Copyright and related rights waived via CC0.


7 Likes

This is the type of EIP that we love to see and support wholeheartedly :rocket:

3 Likes

contenthash has been coded as uvarint(proto) + payload... so shouldnā€™t this convention be followed?

0xE3 = ipfs
0xE4 = swarm
0xE5 = ipns

I guess defining proto 0x64 (d) as a DataURL would work as-is, but it wouldnā€™t have a known length so itā€™s not embeddable without an external wrapper (although I guess thatā€™s not a requirement of multicodec.)

The following would avoid the base64 overhead:

  uvarint(0xDD) 
+ uvarint(len(mime)) + mime utf8 bytes // eg. "text/html"
+ uvarint(len(payload)) + payload bytes 

Although for simplicity, Iā€™m a fan of bypassing the multicodec stuff (as long as the first uvarint decodes correctly) and just embedding raw utf-8 data.


Should it also support URLs for 30X redirection?

uvarint(0xDD) + uvarint(0/*url*/) + uvarint(19) + "https://ens.domains"
1 Like

:pray: @raffy, thanks for feedback.

Thereā€™s plaintextv2 as hex("pla") = 0x706c61 multiaddr prefix in multicodec.

522 plaintextv2 multiaddr 0x706c61 draft

We tried 0xe2 IPLD before with PR on ens/content-hash.js then abandoned it for now to work on simpler specsā€¦ Requesting for direct CAR file/data to be included is more complicated option. As alt options we also tried some old on-chain IPFS generators in ENS resolver thatā€™ll require external services to manually read data during ccip-read from on-chain and pin that to be semi-dynamically resolvable (*not really scalable).

So weā€™re proposing hex(ā€œdata:ā€) prefix to be simple and backwards compatible with data uris used in NFTs/avatar. eg, contenthash for 1234.hello-nft.eth can resolve bytes(tokenURI(1234)) directly as data:application/json,{...metadata}.

Itā€™s good idea to request hex(ā€œdata:ā€) to be included on multicodec soon but for now weā€™re trying to use default/fallback profile in ens-contenthash.js

1 Like

Thereā€™s old alternative for that, uniswap.eth is still using it for yearsā€¦
** really donā€™t recommend using old stuffs, but it works.
eg,
base16 = f0172000f6170702e756e69737761702e6f7267
base58 = 12uA8M8Ku8mHUumxHcu7uee
base32 = bafzaad3bobyc45lonfzxoylqfzxxezy

01-72-00-0f-6170702e756e69737761702e6f7267
version - libp2p - identity - length - hex(ā€œapp.uniswap.orgā€)

  • edit: For ENS this should be extra prefixed with namespace + lengthā€¦ Comes with deprecated warning. 0xe5010172000f6170702e756e69737761702e6f7267
1 Like

Delighted to see this - but as I mentioned in the PR, and @raffy observes, this definitely needs to be encoded as a valid multicodec value.

2 Likes

:pray: Thanks for the feedback!

We have looked into possible ways for this draft ENSIP to be compatible with multicodec. These are our findings in form of different implementations with and without multicodec. We are open to either implementation in the end and update this draft ENSIP as required.

A) Bypass Multicodec:

First, weā€™d like to point to the current state of ens/content-hash.js. When using hex("data:") = 0x646174613a as prefix, encoding doesnā€™t work and the decoder nearly works but it removes the first byte in the process. Please see example below,

import {encode, decode } from "@ensdomains/content-hash";
console.log(encode("data:text/plain;base64,SGVsbG8gV29ybGQ"))
//> 00000000000000000000

console.log(decode("646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751"))
//> ata:text/plain;base64,SGVsbG8gV29ybGQ

The extra 0x00 prefix identifier as a spacer/pseudo namespace could prevent any future collision with multicodec.

console.log(decode("00646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751"))
//> data:text/plain;base64,SGVsbG8gV29ybGQ

Quoting @raffy here,

This will bypass multicodec for DataURIs in Contenthash. ens/content-hash.js and any gateways/clients can easily implement this with basic checks, i.e. checking for if prefix is 0x00646174613a before encoding and decoding so that ENS clients and gateways can use DataURIs directly without leaving any room for current or future collisions with multicodec formats. This approach will be ENS specific and we can change our ENSIP draft to reflect this.

:white_check_mark: This is our preferred implementation but we are not married to it.

B) Multicodec Compatible Formats

If multicodec must be used, then weā€™d like to propose the following options:

1) raw data type with IPFS namespace:

IPFS namespace is compatible with DataURIs using raw data type.

import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
import * as raw from "multiformats/codecs/raw";
const utf8 = new TextEncoder();

let data = utf8.encode('data:text/plain;base64,SGVsbG8gV29ybGQ')
let cid = CID.create(1, raw.code, await identity.digest(data))

IPFS Format :

base32: bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq
base16: f01550026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash: 0xe30101550026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Namespace Version Multiaddr Multihash Length Data
ipfs 1 raw identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xe301 0x01 0x55 0x00 0x26 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

CID Inspector : https://cid.ipfs.tech/#bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq

Public Gateway : https://ipfs.io/ipfs/bafkqajtemf2gcotumv4hil3qnrqws3r3mjqxgzjwgqwfgr2wonreoodhkyzds6lci5iq

Since this method uses IPFS namespace, ens/content-hash.js and any compatible gateway or client must check if the encoded payload is using raw as multicodec with identity (blank) as multihash; the shorthand prefix for this is 0xe301015500. Clients can decode the remaining raw data as utf-8 string; if this data is not data:uri formatted, it should be auto-rendered as plaintext for correctly formatted data:uri clients, and gateways can render according to mime or type included in the DataURI payload.

2) plaintextv2 data with IPFS or IPLD namespace:

This is similar to the previous option but using plaintextv2 instead of raw as multicodec.

import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
const utf8 = new TextEncoder();

let data = utf8.encode('data:text/plain;base64,SGVsbG8gV29ybGQ');
let cid = CID.create(1, 0x706c61, await identity.digest(data))

plaintextv2 format using IPFS namespace

Base32 : bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r
Base16 : f01e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash : 
0xe30101e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Namespace Version Multiaddr Multihash Length Data
ipfs 1 plaintextv2 identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xe301 0x01 0xe1d8c103 0x00 0x26 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

CID Inspector : https://cid.ipfs.tech/#bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

Public Gateway : https://ipfs.io/ipfs/bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2rh

plaintextv2 format using IPLD namespace

Base32 : bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r
Base16 : f01e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Contenthash : 
0xe20101e1d8c1030026646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751
Namespace Version Multiaddr Multihash Length Data
ipld 1 plaintextv2 identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xe201 0x01 0xe1d8c103 0x00 0x26 0x646174613a746578742f706c61696e3b6261736536342c534756736247386756323979624751

CID Inspector : https://cid.ipfs.tech/#bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

Public Gateway : https://dweb.link/api/v0/dag/get?arg=bahq5rqidaatgiylume5hizlyoqxxa3dbnfxdwytbonstmnbmkndvm43ci44govrshf4wer2r

:exclamation: NOTE: plaintextv2 is still in draft and IPFS gateways CANNOT yet render it properly resulting in 500 error. Trying to use it as IPLD might require changing the encoding process too. We do not prefer this method.

C) CARv1 strings

CARv1 files as strings can represent IPFS data or IPLD files and directory but this implementation is more complex than previous options so weā€™ll only mention this as a footnote. We donā€™t have the bandwidth to implement this. If ENS devs are happy to explore this for future implementation, itā€™ll be one of best options for fully on- or off-chain generators and IPFS data storage.

Based on this, we are happy to get more feedback and then make changes to the draft ENSIP! :pray:

This is a cross-post from GitHub


Iā€™m confused why IPFS is involved at all. Why canā€™t you either use an existing multicodec identifier or define a new one for URIs?

We had thought about the option of a new namespace in our draft but we skipped it since IPFS/IPLD namespace with plaintext or raw encoded payload contained within the CID is sufficiently unique, and backwards compatible with IPFS gateways returning plaintext data.

Proposed IPFS (raw-ipld/plaintext) : 0xe301015500 + <data.length> + <data>
  Normal IPFS      (dag-pb/sha256) : 0xe301017012 + <hash.length> + <hash of data or dag>

However, weā€™re open to requesting a new ENS-specific namespace for this ENSIP only. In this regard, please suggest a short code (>=2 bytes) for this and weā€™ll PR that in the multicodec table soon. Something like

  • hex('ens') = 0x656e73 sounds like a good option to us; equivalent namespace is VARINT(0x656e73) = 0xf3dc9503
  • Non-ASCII option 0xda7a is also good, which will lead to a VARINT(0xda7a) = 0xfab403 namespace

The above two options with raw multiaddr should implement like:

  Namespace Version Multiaddr Multihash Length Example
    1 raw identity 38 data:text/plain;base64,SGVsbG8gV29ybGQ
0xda7a 0xfab403 0x01 0x55 0x00 0x26 0x646...751
hex('ens') 0xf3dc9503 0x01 0x55 0x00 0x26 0x646...751

Please suggest us more options other than these two!

There are currently no DataURI-related multiaddr or namespaces and we do not want to introduce one in this context due to lack of manpower and funding to follow up on sidequests. DataURI class is too broad and itā€™ll also require mime/type codecs which are pending on issues or PR for a very long time. See below:

feat: assign codes for MIME types by Stebalien Ā· Pull Request #159 Ā· multiformats/multicodec Ā· GitHub

Mimetypes as codes Ā· Issue #4 Ā· multiformats/multicodec Ā· GitHub

What do you mean by the last part? I donā€™t understand how using an ipld or ipfs content identifier format is ā€˜backwards compatibleā€™ with IPFS gateway return data, which is something entirely different.

This shouldnā€™t be ENS-specific; you just need a multicodec code that represents ā€˜the encoded data is a URI of some kindā€™. URIs donā€™t have mimetypes, so that shouldnā€™t be an issue.

We are convinced that we need to request a new multiaddr and multicodec first. Weā€™ll close this ENSIP since weā€™ve discovered an alternative which is compatible with ENSIP-07 and serves our data:uri requirements for now :partying_face: We may revive this in the future when we have more resources at hand :pray: Feedback is much appreciated!

1 Like

I was going make a different thread but Iā€™ll bump this one instead.

Provided again for reference:

The ā€œknownā€ multicodecs are listed here: https://github.com/multiformats/multicodec/blob/master/table.csv

Example: arweave is listed as:

  • name: arweave-ns
  • tag: namespace
  • code: 0xb29910
  • status: draft
  • description: Arweave Namespace

I donā€™t know what tag means or what ā€œnamespaceā€ is.

The current ensdomains/content-hash library supports the following protocols:


Example: predomain.ethā€™s contenthash is encoded as <uvarint(0xB29910)><bytes32> where those bytes are TX_ID (which apparently is a sha256 hash.) To resolve that content, you need to query an arweave gateway, which involves finding a public gateway and reading their API docs for how to make a query (eg. Base64URL encode the hash).

My point: trying to figure this stuff out from purely from documentation is hard.


I just want to make a few random comments:

We should update ENSIP-7 to include all the known protocols. We should include examples of how you decode and query the data, links to documentation, etc. For the CID examples, we should also put conditionals on the CID codec types. Like what happens if you get a IPFS CID that isnā€™t a dag-pb? Or an IPNS CID that isnā€™t a libp2p-key? I can do this if appropriate.

Why wasnā€™t the AddressResolver used to store this data? This is a historical question.

Is contenthash just for websites? Or are there other use-cases?

Since thereā€™s no way to resolve the contenthash without multiple levels of interpretation (protocol ā†’ hash decoding ā†’ query hash ā†’ interpret), can we address ENSIP-17 and @NameSysā€™s request by simply having a contenthash that is a pointer to a text record? This could either be <uvarint(protocol)><utf8 bytes of text key> or just <uvarint(protocol)> which uses the fixed key "contenthash". Then, contenthash can just be any URI, which allows data:.., ipfs://..., bzz://..., arweave://..., https://, or whatever. This would be both self-describing and pre-formats the data for easy querying. And if the URL is 64 bytes or less, thereā€™s no difference is storage gas.

There should also be discussion about how these protocols are dispatched. Many apps silently use cloudflare which doesnā€™t support IPNS and doesnā€™t give the user the opportunity to use their own gateway or traverse content that the app doesnā€™t understand. However, some browsers (and browser plugins) can handle these protocols natively.

Good idea. If youā€™re happy to take it on, thatā€™d be much appreciated!

Which data?

In the broadest possible sense of ā€˜websiteā€™ - content that can be fetched (ultimately) over HTTP - yes.

What benefit is gained by the additional layer of indirection, rather than having the contenthash just encode the URI directly?

2 Likes

Okay, Iā€™ll work on an update PR.

I was thinking contenthash() interface could have been some unused coinType instead.


The content could be added without any encoding. People with an url text record could set their contenthash to point to that record (eg. uvarint(proto) + bytes("url")). Similar to how a primary-contact record could reference email, phone, twitter, etc.

The parallel I was considering was the power of DNS TXT records vs. all the specific record types. However, I agree, directly encoded is fine too.

data: seems really interesting with L2 stuff.

And a separate protocol that sidesteps the URL-encoding.

1 Like

+1 for expanding docs with examples. We can help around with fancy CIDs :pray:

IPFS:
Thereā€™s no need to add ā€œconditionalā€ for different codecs. eg, dag-json, dag-cbor or even raw codecs are already prefixed with IPFS namespace and auto handled by IPFS gateways. There are tons of other codecs ā€œnot-implementedā€ by ipfs gateways that mostlikely wonā€™t fall directly under ā€œIPFSā€ namespace.

IPLD:
IPFS nodes and gateways donā€™t support ../ipld/<cid> request directly & ENS isnā€™t supporting IPLD yet. Support IPLD Contenthash - #7 by 0xc0de4c0ffee ā€¦ Technically ā€œIPFSā€ namespace is IPLD with ā€œfsā€, IPFS gateways donā€™t have ../ipld/<cid> path as IPLD is too broad with long list of codecs, & dapps are free to implement their own IPLD schemas that can only be resolved by requesting ipfs gateway.tld ../api/v0/dag/get?arg=<cid>.

IPNS:
It can be libp2p/ed225519, or secp256k1 keys or old deprecated dnslink contenthash so underlying gatewayā€™s ipns/<cid/dnslink> resolving process is same for all. *note: itā€™d be nice to re-use eth/secp256k1 keys for ipns but it isnā€™t widely used/supported so weā€™re using libp2p/ed25519 keygen with deterministic secp256k1 signatureā€™s hash as seed. ipns address by 0xc0de4c0ffee Ā· Pull Request #10 Ā· paulmillr/ed25519-keygen Ā· GitHub

Other dweb storage/contenthash have their own namespace so it should be auto handled by those gateways after contenthash is decoded. Arweave & Arweave NS format is one exampleā€¦ IPNS is still slow & patchy with republishing and scaling issue so weā€™re also looking into ā€œfakeā€ IPNS DHT by storing IPNS data in L2ā€¦

Weā€™re using raw codec (generated on-chain) as alternative to data:uri type mentioned in this draft/ā€œabandonedā€ ENSIP. IPFS gateways have to do guesswork as thereā€™s no mime/content/type attached in CID. if raw data starts with < itā€™ll try to render that as htmlā€¦ dag-json is easy json format but we donā€™t get direct dag-cbor/link traversing support in dag-json onlyā€¦ so weā€™ve to use dag-json wrapped in dag-cborā€¦ IPFS gateways auto support dag-cbor so its all good even without data:uri support.

Our core idea of using RFC-2397 datauris as hex in contenthash instead of multicodes is to simplify whole processā€¦ Just use hex(ā€œdata:ā€¦ā€) to encode and decode back. We donā€™t have to wait multicodecs support for everything. As weā€™ve mentioned before mime/contenttype in CID is loong pending for CID?v2, original issue is still open since 2016, last active PR is out there from 2022.

1 Like

That seems clunky!

Having to set a specially encoded contenthash record, so that you donā€™t have to encode your text record, still seems less simple than just setting an encoded contenthash record containing the URL, though.

I wasnā€™t aware that data:text/plain;charset=utf8,šŸ’©ļø is considered valid if we store the URL in UTF-8, but thatā€™s +50% data size in the general case ā€” 50% chance of 2-bytes per byte + escape overhead.

new URL('data:text/plain;charset=utf8,šŸ’©ļø').toString();
// data:text/plain;charset=utf8,%F0%9F%92%A9%EF%B8%8F
// same as encodeURIComponent('šŸ’©ļø')

data:application/octet-stream,... is also possible, however looking at RFC-3986 this seems like a mistake due to all the escape logic or requires base64 which is +33% data size ā€” 4-bytes per 3 bytes.

new URL('data:application/octet-stream,\x20').toString();     // "" => expected " "
new URL('data:application/octet-stream,\x20\x01').toString(); // " %01"
new URL('data:application/octet-stream,\x30\x20').toString();     // "0" => expected "0 "

After considering a few alternatives, I think we should use the following encoding which requires one new multicodec for "url".

codec = 0x12345; // or whatever we pick

// header
uvarint(codec) + uvarint(type) + encoded

// URL (type = 0)
encoded = url.bytes // url is encoded according to RFC-3986 which is ASCII
// ie. encodeURI() except for the ipv6 bracket stuff

However, since this is inefficient for data URLs, we add a type = 1 variant which has a "mime":

// data URL (type = 1)
let mime = "image/jpeg"
let data: bytes[] // anything
encoded = uvarint(mime.utf8Length) + mime.utf8Bytes + data.bytes

The type field can also double as version field for future upgrades.

This allows literal data stored in on-chain to be shared between the contenthash and other use-cases w/o any transcoding.


The following code parses ALL ENS-supported contenthash codec/protocols:

function parseContentHash(bytes[] v) {
  reader = new Reader(v)
  switch (reader.uvarint()) {
    case 0xE3: return {type: 'ipfs', cid: reader.cid()}; // require cid.codec = dag-pb
    case 0xE4: return {type: 'swarm', cid: reader.cid()}; // require cid.codec = swarm-manifest
    case 0xE5: return {type: 'ipns', cid: reader.cid()}; // require cid.codec = libp2p-key, cid.version = 1
    case 0x1BC: return {type: 'onion', address: Base36.encode(reader.bytes())}; // require length = 16, deprecated in 2021
    case 0x1BD: return {type: 'onion', address: Base36.encode(reader.bytes())}; // require length = 56
    case 0xB19910: return {type: 'skylink', id: Base64URL.encode(reader.bytes())}; // require length = 46, this service is dead?
    case 0xB29910: return {type: 'arweave', hash: reader.bytes(32)};
    case 0x12345: {
        switch (reader.uvarint()) {
            case 0: return {type: 'url', url: new URL(String.fromCharCode(...reader.bytes()))}; // throws if invalid
            case 1: return {type: 'data-url', mime: reader.read(reader.uvarint()), data: reader.read()};
            default: throw new Error('unknown url type');
        }
    }
    default: throw new Error('unknown contenthash codec');
} 

function protocolURLFromDecodedContentHash(info) {
    switch (info.type) {
        case 'ipfs': return `ipfs://${info.cid.toString('k')}`; // v0 = Base58BTC, v1 = Base32 (k)
        case 'ipns': return `ipns://${info.cid.toString('k')}`;
        case 'swarm': return `bzz://${info.cid.toString('k')}`;
        case 'onion': return `onion://${info.address}`;
        case 'arweave': return `arweave://${Base64URL.encode(info.hash)}`;
        case 'url': return info.url.toString();
        case 'data-url': return `data:${info.mime};base64,${btoa(String.fromCharCode(...info.data))}`;
    }
}

This doesnā€™t require a new library and is implementable with vanilla JS.

Encoded examples:

  • uvarint(0x12345) uvarint(0) "https://www.chonk.com/"
  • uvarint(0x12345) uvarint(0) "data:image/gif;base64,AAAA";
  • uvarint(0x12345) uvarint(1) uvarint(9) "image/gif" <0x000000> (same as above)

In the contenthash() website use-case, a data URL just serves that data, and an http/https URL is a 30X redirect.

If the URL corresponds to a unknown protocol (not data/http/https, ie. unfetchable), it can be ignored.

In some cases, a data URL can be regurgitated w/o any interpretation, eg. https://raffy.eth.limo could technically just respond with a jpeg? Although care should be made by content providers to avoid passing unsafe content (eg. just follow basic browser accept rules or only ā€œserveā€ known mimes). Since no content-deposition is allowed, thereā€™s no file extension risk (like .exe).

However, if a filename is required for some future purpose, this same setup could be extended with type = 2, for (mime, name, data) ā†’ uvarint(mime.utf8Length) + mime.utf8Bytes + uvarint(name.utf8.length) + name.utf8.bytes + data.bytes. For a future use-case, we could store a file in addr() records exactly like contenthash that point to a IPFS file, a URL, or an inline-data URL using the exact same scheme.


Somewhat related: there could also be a bytes version of the avatar-string defined as a codec.

codec = 0x54321;

uvarint(codec) + uvarint(type) + encoding

 ERC-721: type = 0 => uvarint(chain) + address(contract) + uvarint(token)
ERC-1155: type = 1 => uvarint(chain) + address(contract) + uvarint(token)

Since "avatar" already suffers from protocol overload (invalid URLs like ipfs:/, ipfs://ipfs/Qm..., etc.)

Example encoding for 10K mainnet NFT:

  • uvarint(0x54321) uvarint(0) + uvarint(1) + bytes20 + uvarint(10000)
    This is only 3+1+1+20+2 = 27 bytes or 1 slot!

Additionally, we could parse the addr() version of "avatar" with the exact same logic as contenthash().

Also "small-avatar" (thumbnailed version).

1 Like

Thatā€™s unfortunate. I had to double-check the RFC, but that is indeed what RFC2397 specifies.

The mime-type doesnā€™t specify the encoding; what you want here is data:text/plain;charset=utf8;base64,8J+SqQ==.

This seems sensible at first glance, though it might make more sense to define two different multicodec values - url and dataUrl. In fact, the latter isnā€™t truly a URL - itā€™s a file with a mimetype, and perhaps the multicodec value name should reflect that.

itā€™s supposed to be used by wildcard resolvers with on-chain datauri generator, so users adding datauris directly as contenthash to their resolver isnā€™t primary use case.

As previously mentioned weā€™re already using dag-json, dag-cbor links & raw codecs under ipfs namespace as datauri alternative. dag-pb only mode is good for compatibility as some public gateways have their own /ipfs api wrappedā€¦ but in default ipfs api case thereā€™s range of supported codecs and some codecs are not implemented yet.

this one is tricky, allowing sub/domain.eth to 30x redirect is similar to ipfs redirects feature, ipfs doesnā€™t support base ipfs://<cid> redirect to another <cid>, but we can do ipfs://<cid>/google/{xyz} >> 301 redirect to https://google.com?q={xyz}.

it could be verified and used similar to ipns/dnslinkā€¦ _enslink text record = domain.eth on domain.xyz and verify that and use domain.eth = domain.xyz? :thinking:there should be multicodec/multiaddr to encode such domain.xyz/ipv4/v6ā€¦ but iā€™m not sure about namespaceā€¦

string2Hex(ā€œdata:ā€) = 646174613A, thatā€™s auto ā€œnamespaceā€™dā€ so thereā€™ll be no collision with other multicodec/namespace. Using rfc2397 instead of multicodec we donā€™t need varint/length data in there.

Another way to do half baked client-side redirectā€¦ :laughing:
data:text/html,<meta http-equiv="refresh" content="0; url=https://www.google.com/">

2 Likes

Oops, I missed this. Can you show an example of this? What exactly can 0x129 (dag-json, ipld, MerkleDAG json) be? And how is it used? Similarly, with dag-cbor? I guess both of those give you structured data but for what purpose?

I think youā€™re correct and Iā€™m wrong about the codec restrictions on the CID. The ENS client just needs to know how to put the CID into a canonical form for a website use-case, which at the moment, seems to be the base58 for CIDv0 and base36 for CIDv1 (so it can act as a domain.)

So this is codec = 0x55 (raw, ipld, raw) where the ā€œhashā€ is codec = 0 (identity, multihash, raw) content is <html><head><title>namesys-eth.dev3.eth</title><meta name="description" content="long description"><meta http-equiv="refresh" content="3;URL='https://namesys-eth.github.io'"><meta property="og:image" content="https://namesys-eth.github.io/logo.png"></head><body><h2>Redirecting to <a href="https://namesys-eth.github.io">namesys-eth.github.io</a>.</h2></body></html>.

I think the ā€œurlā€ codec would directly address this use case: uvarint(0x12345) + uvarint(1) + uvarint(9) + "text/html" + "<html>..."


That seems fine to me. I was thinking since a data URL technically fits under both codecs (where the raw-byte version is far more space efficient and useful), they could be shared.


Another related question would be: is there a difference between a URL that would be a redirect vs a URL that would be used for a reverse-proxy? Another variant might be " load this url inside of an <iframe>". Is that part of the contenthash description or something else? eg. redirectURL, proxyURL, dataURL.

1 Like

Weā€™re not touching dag-json ā€œ0x129ā€, thereā€™s extra plaintext JSON (UTF-8-encoded) ā€œ0x0200ā€ā€¦

Dag-cbor is ā€œlinkā€ so we can access that as ipfs://dag_cborCID/html and ipfs://dag_cborCID/json

if you want to lookup bytes *without ipfs namespace prefix 0xe301ā€¦
Json : 01800400117b2268656c6c6f223a22776f726c64227d
HTML : 015500143c68313e48656c6c6f20576f726c643c2f68313e
dag-cbor : 01710042a26468746d6cd82a581900015500143c68313e48656c6c6f20576f726c643c2f68313e646a736f6ed82a570001800400117b2268656c6c6f223a22776f726c64227d

or use cid inspector >

:vulcan_salute:

1 Like