I wasnāt aware that data:text/plain;charset=utf8,š©ļø
is considered valid if we store the URL in UTF-8, but thatās +50% data size in the general case ā 50% chance of 2-bytes per byte + escape overhead.
new URL('data:text/plain;charset=utf8,š©ļø').toString();
// data:text/plain;charset=utf8,%F0%9F%92%A9%EF%B8%8F
// same as encodeURIComponent('š©ļø')
data:application/octet-stream,...
is also possible, however looking at RFC-3986 this seems like a mistake due to all the escape logic or requires base64 which is +33% data size ā 4-bytes per 3 bytes.
new URL('data:application/octet-stream,\x20').toString(); // "" => expected " "
new URL('data:application/octet-stream,\x20\x01').toString(); // " %01"
new URL('data:application/octet-stream,\x30\x20').toString(); // "0" => expected "0 "
After considering a few alternatives, I think we should use the following encoding which requires one new multicodec for "url"
.
codec = 0x12345; // or whatever we pick
// header
uvarint(codec) + uvarint(type) + encoded
// URL (type = 0)
encoded = url.bytes // url is encoded according to RFC-3986 which is ASCII
// ie. encodeURI() except for the ipv6 bracket stuff
However, since this is inefficient for data URLs, we add a type = 1
variant which has a "mime"
:
// data URL (type = 1)
let mime = "image/jpeg"
let data: bytes[] // anything
encoded = uvarint(mime.utf8Length) + mime.utf8Bytes + data.bytes
The type
field can also double as version field for future upgrades.
This allows literal data stored in on-chain to be shared between the contenthash and other use-cases w/o any transcoding.
The following code parses ALL ENS-supported contenthash
codec/protocols:
function parseContentHash(bytes[] v) {
reader = new Reader(v)
switch (reader.uvarint()) {
case 0xE3: return {type: 'ipfs', cid: reader.cid()}; // require cid.codec = dag-pb
case 0xE4: return {type: 'swarm', cid: reader.cid()}; // require cid.codec = swarm-manifest
case 0xE5: return {type: 'ipns', cid: reader.cid()}; // require cid.codec = libp2p-key, cid.version = 1
case 0x1BC: return {type: 'onion', address: Base36.encode(reader.bytes())}; // require length = 16, deprecated in 2021
case 0x1BD: return {type: 'onion', address: Base36.encode(reader.bytes())}; // require length = 56
case 0xB19910: return {type: 'skylink', id: Base64URL.encode(reader.bytes())}; // require length = 46, this service is dead?
case 0xB29910: return {type: 'arweave', hash: reader.bytes(32)};
case 0x12345: {
switch (reader.uvarint()) {
case 0: return {type: 'url', url: new URL(String.fromCharCode(...reader.bytes()))}; // throws if invalid
case 1: return {type: 'data-url', mime: reader.read(reader.uvarint()), data: reader.read()};
default: throw new Error('unknown url type');
}
}
default: throw new Error('unknown contenthash codec');
}
function protocolURLFromDecodedContentHash(info) {
switch (info.type) {
case 'ipfs': return `ipfs://${info.cid.toString('k')}`; // v0 = Base58BTC, v1 = Base32 (k)
case 'ipns': return `ipns://${info.cid.toString('k')}`;
case 'swarm': return `bzz://${info.cid.toString('k')}`;
case 'onion': return `onion://${info.address}`;
case 'arweave': return `arweave://${Base64URL.encode(info.hash)}`;
case 'url': return info.url.toString();
case 'data-url': return `data:${info.mime};base64,${btoa(String.fromCharCode(...info.data))}`;
}
}
This doesnāt require a new library and is implementable with vanilla JS.
Encoded examples:
uvarint(0x12345) uvarint(0) "https://www.chonk.com/"
uvarint(0x12345) uvarint(0) "data:image/gif;base64,AAAA"
;
uvarint(0x12345) uvarint(1) uvarint(9) "image/gif" <0x000000>
(same as above)
In the contenthash()
website use-case, a data URL just serves that data, and an http/https URL is a 30X redirect.
If the URL corresponds to a unknown protocol (not data/http/https, ie. unfetchable), it can be ignored.
In some cases, a data URL can be regurgitated w/o any interpretation, eg. https://raffy.eth.limo
could technically just respond with a jpeg? Although care should be made by content providers to avoid passing unsafe content (eg. just follow basic browser accept rules or only āserveā known mimes). Since no content-deposition is allowed, thereās no file extension risk (like .exe
).
However, if a filename is required for some future purpose, this same setup could be extended with type = 2
, for (mime, name, data)
ā uvarint(mime.utf8Length) + mime.utf8Bytes + uvarint(name.utf8.length) + name.utf8.bytes + data.bytes
. For a future use-case, we could store a file in addr()
records exactly like contenthash
that point to a IPFS file, a URL, or an inline-data URL using the exact same scheme.
Somewhat related: there could also be a bytes
version of the avatar-string defined as a codec.
codec = 0x54321;
uvarint(codec) + uvarint(type) + encoding
ERC-721: type = 0 => uvarint(chain) + address(contract) + uvarint(token)
ERC-1155: type = 1 => uvarint(chain) + address(contract) + uvarint(token)
Since "avatar"
already suffers from protocol overload (invalid URLs like ipfs:/
, ipfs://ipfs/Qm...
, etc.)
Example encoding for 10K mainnet NFT:
uvarint(0x54321) uvarint(0) + uvarint(1) + bytes20 + uvarint(10000)
This is only 3+1+1+20+2 = 27 bytes or 1 slot!
Additionally, we could parse the addr()
version of "avatar"
with the exact same logic as contenthash()
.
Also "small-avatar"
(thumbnailed version).