New proposal for audio text record


We are proposing a process for retrieving audio medias associated with an ENS domain.


Static PFP is insufficient for the personal and iterative web3 life. We are seeing a growing demand of multimedia formats such as music, audio and video because they are more dynamic and expressive. We call it Multimedia Identity (MI) . One can associate one or more MIs with an existing ENS domain, and play the audio or video inside metaverse or social platforms interoperably. This proposal will focus on audio.


The specification for audio text record will be similar to avatar text record (ENSIP-12: Avatar Text Records - ENS Documentation). Some adjustments are listed as follows:

  • Retrieving the audio URI: Add a audio text field to enable users to link the ENS domain name with audio content. The client MUST first look up the resolver for the name and call .text(namehash(name), 'audio') on it to retrieve the audio URI for the name.

  • Audio format: Clients MUST support audio with mime types of audio/x-wav , audio/mpeg , and audio/mp4 . Clients MAY support additional audio types.

  • ENS Media Retrieval API: Add endpoint to retrieve what is set under audio text record, i.e.,<domain_name>


Alice owns a virtual land in a metaverse. When she is not online, she delegates control to a virtual character. To make her virtual character more interesting, she gave the virtual character ability to talk in her designated voice (by setting audio Text Record). After that, she added text records of her subdomains as below, so her virtual character can say these three sentences.

When Bob visits Alice’s land, he can interact with Alice’s virtual character and hear what Alice has to say. Bob sees a list of subdomains owned by Alice under her ENS domain. Suppose Bob selects “Hello there, you’re a new face around here”. The information is passed to ENS Media Retrieval API and returns the metadata (including the final audio file) so Bob can hear it.

Backwards Compatibility

Not applicable.

Security Considerations



Copyright and related rights waived via CC0

Initiator: BambooJuice.eth; xomia.eth
Status: draft


Can this include 3D assets as well?

I think having an “audio avatar” is an interesting idea. I’d recommend coming up with a key that is more specific than “audio,” though.

As it stands, it would be as if the avatar field were called “image,” which doesn’t give devs integrating it an idea of what the image’s purpose is. Calling it “avatar” makes it clear what the image represents. I’d encourage some exploration on what the equivalent for this concept would be!


I find this idea really interesting, but having one record/subdomain per audio sentence doesn’t seem ideal.
Perhaps the audio record could instead point to a distributed text file or folder containing information about the different audio files?

That would make it easier for people to use standardized audio sets containing many sentences, as well as reduce the amount of gas fees they would have to pay :slight_smile: