Virtual Presence Technical Note 3 2009/03/04 User Identity Data About this Note Number 3 Version 4 Date 2009/03/04 Category Informational Status Draft Short Name User Identity Data Document VPTN-3.txt Authors Heiner Wolf, hw, wolf.heiner@gmail.com Working Group - Dependencies - Supersedes - Superseded By - Abstract A virtual presence client needs information to display people who meet. It needs a name, an image, maybe an animated avatar, and more. This document describes the storage and exchange of public user identity data. The virtual presence identity data format is optimized for VP applications, where many people need the public data of their peers, some only once, some repeatedly, where changes happen frequently and must be propagated quickly with minimum bandwidth. Table of Contents 1. Introduction.......................................................2 2. Concepts...........................................................2 2.1 Identity Data...................................................2 2.2 Identity Update.................................................2 3. Specification......................................................3 3.1 Identity Data...................................................3 3.1.1 External Data..............................................3 3.1.2 Inline Data................................................4 3.1.3 Item Content Type..........................................4 3.1.4 Item Order.................................................4 3.1.5 Item Encoding..............................................4 3.1.6 Item Digest................................................4 3.1.7 Properties.................................................4 3.1.8 Nickname...................................................6 3.1.9 Avatar.....................................................6 3.1.10 Identity Digest............................................6 3.2 Identity Update.................................................6 3.2.1 Identity URL...............................................7 3.2.2 Identity Digest............................................7 3.2.3 Identity ID................................................7 4. Requirements.......................................................8 4.1 Data Format.....................................................8 4.2 Caching and Updates.............................................8 4.3 Storage and Protocol............................................8 4.4 Ownership, Control, and Privacy.................................8 5. Security Considerations............................................9 5.1 Identity Data...................................................9 5.2 Identity ID.....................................................9 User Identity Data 1 Virtual Presence Technical Note 3 2009/03/04 6. References.........................................................9 7. Revisions..........................................................9 1. Introduction Any time users meet, they need at least a name to display their peer. Graphical client software needs more. It shows an image or an avatar. Clients also need other data for various purposes, e.g. availability status, reputation, social status, a unique identifier, friends, inventory, communication addresses, and probably more data types in the future, than we can imagine now. The data must be available to any peer. It must be available quickly. It should be cached to be re-used later, when people meet again. It must be updated quickly, if it changes, even if the change happens while there is no association between the clients. The data must always be up to date with a minimum use of bandwidth, because there are many users to meet and many changes to process. Clients could subscribe for updates to be notified when user data changes. This would keep there local cache always up to date. Subscriptions are a perfect solution for instant messaging clients, where clients always show the latest information of peers on the roster (buddy list). But in a VP application, client software needs the information only, if people meet. If user data changes, while users do not see each other, the change can go unnoticed. Actually it should not be propagated, because it will not be displayed anyway. The client software needs the latest information only if users meet. But then it the data must be available quickly with low overhead. 2. Concepts This section describes the concepts of VP user data exchange. Basically, the VP user data exchange makes sure, that users see each other exactly like they want to be seen by their peers. Users completely control their appearance. Users decide where they store their data. All the data, that makes up the appearance on other people's screens is summarized as the public "identity" of the user. 2.1 Identity Data The identity usually has a nickname, an image, maybe a homepage URL, or communication addresses. It may contain or reference an electronic business card like a XMPP vCard. It may have an animated 3D model as avatar. It may even include a reference to a social network rating system. It is up to the user to provide the data in the identity (identity provider). And it is up to the user who receives the identity data to display it (identity consumer). Clients are free to display the information they want to display. Users can change their data any time. They can change the nickname, update the image, and any other item. 2.2 Identity Update A simple way to communicate changes is a version number. The receiver stores the version number with the data. A new version number means, that the data changed. A unique digest of the data is User Identity Data 2 Virtual Presence Technical Note 3 2009/03/04 very similar, but more general. The only requirement is, that the digest changes when the data changes. Actually a version number is a special kind of digest. When users meet, then their clients exchange an identifier, which indicates the state of their user data. The identifier is a version number or a digest, or any other short text sequence, which identifies the state of the user data. If users change their data, e.g. the avatar image or a nickname, then they have to make sure, that their client communicates a different digest. 3. Specification 3.1 Identity Data The identity is an XML document. The top level identity-node contains multiple item-nodes. Item-nodes either carry the item data as inner text or they reference external data. In addition, item-nodes may have these attributes: - id: a unique identifier of the item inside the identity - contenttype: indicates how to use the item, e.g. "avatar" - mimetype: data type, e.g. "image/gif" - order: a number, which indicates the display preference, if there are multiple items of the same contenttype - size: size of the item in bytes - encoding: encoding of the node text, if any - digest: a version identifier, which indicates the version of the item - src: a URL, which points to the data. Example: ]]> 3.1.1 External Data External data is referenced by the src-attribute. The attribute value is a URL. User Identity Data 3 Virtual Presence Technical Note 3 2009/03/04 If the src-attribute is present, then the mimetype-attribute and the size-attribute may be used as hints, but the real values of data size and MIME-type are determined by the response when fetching the actual data. 3.1.2 Inline Data If there is no src-attribute, then the item data is the inner text of the item-node. The item data must be valid XML text. An optional encoding- attributes allows for base64-encoding of binary data. 3.1.3 Item Content Type The contenttype-attribute indicates how the item is to be used. The client is free to ignore the attribute, but it helps to identify which item is to be shown as static image, which item contains the nickname, etc. Possible values of the contenttype-attribute: - "avatar": an static avatar image - "properties": a list of key-value pairs - ... 3.1.4 Item Order An identity may contain multiple items of the same content type. There might be an "avatar"-item with mimetype="avatar/gif" and another one with mimetype="avatar/flash". Both need the appropriate decoder software. If a client has only one of the required avatar decoders, then it will usually select the item, that can be displayed rather than displaying none. But a client which has both decoders may use the order-attribute to determine which avatar is preferred by the user who provided the identity. 3.1.5 Item Encoding Binary data must be encoded, if it is inline (inner text of the item-node). Possible values of the encoding-attribute: - "plain": not encoded (default) - "base64": the data is base64 encoded. A decoder should dismiss embedded line breaks ("\r" and/or "\n"), tabs and white space. - "URL": URL encoding with "&" and "=" separators, and %HH encoding of characters not allowed in HTTP-URLs. 3.1.6 Item Digest Each item-node has a digest-attribute. The value of the digest- attribute must change, if the item data changes. 3.1.7 Properties The "properties"-item contains short textual values. The item data is a list of key-value pairs. The "properties"-item may have mimetype="text/xml". Example: User Identity Data 4 Virtual Presence Technical Note 3 2009/03/04 Note: inline data must be inside a CDATA section. Example: ]]> Property keys currently used include: - Nickname: a short label which may replace the nickname supplied by the chat protocol - Gender: predefined values are "female", "male", "dontknow", "donttell", any other value allowed - NicknameLink: a URL to be opened if people click the nickname - Birthdate: free form date - Profession: free form text - Zodiaksign: predefined values are "Capricorn", "Aquarius", "Pisces ", "Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", any other value allowed - Eyecolor: free form text - Country: ISO country code - Languages: comma separated list of ISO country codes (e.g. "en") or language codes (e.g. "en_UK") - Hobbies: free form text - Interests: free form tags which describe private and professional interests - Statement: short text message to the world - Homepage: URL All keys are optional. Note: there are many free form text properties. They are meant for users, not for the machine. Standardized tags and/or a categorizations for hobbies, interests, profession, eye color, etc. may be defined separately using other keys or other identity content types. Note: the "properties"-item may have a mimetype="text/plain". In this case the data is a line-feed separated list of key=value pairs. This variant is deprecated. Example: Nickname=Tassadar Gender=male User Identity Data 5 Virtual Presence Technical Note 3 2009/03/04 3.1.8 Nickname The nickname is very important, because clients will usually display a nickname and an image. Since all chat protocols support the nickname natively, a nickname is always available. The nickname of the identity may override the nickname supplied by the chat protocol. The nickname is part of the item of contenttype="properties". The nickname is not globally unique. The nickname should not exceed 50 characters. 3.1.9 Avatar An item-node of contenttype="avatar" contains the data to show an avatar image. The avatar may be of any type. There may be animated avatars There may be multiple "avatar"-items. At least one of the "avatar"-items should be an image. The "avatar"- image should be mimetype="image/gif", "image/jpeg", or "image/png". The dimensions should be should not exceed 100x100 pixel. The data size should not exceed 10 kB. All graphical clients should be able to display this "avatar"-image. Note: clients may discover an alternate avatar-item of contenttype="avatar2". The "avatar2"-item has the same properties as the "avatar"-item. 3.1.10 Identity Digest The identity digest is communicated to other clients. The identity digest must change, if any of the items changes or if the list of items changes, e.g. if an item is added or removed. The identity digest may be added to the identity-node as digest- attribute. This is not required by identity consumers. It makes the identity self-contained and simplifies identity-updates for identity providers. Example: ... Note: the identity digest can be computed as a hash (message digest, e.g. MD5) of a concatenation of all item digests. 100 bytes should be enough. It should not exceed 1 kB. 3.2 Identity Update Clients communicate the identity of their users. They exchange the address of the identity file, the current identity digest, and a unique user ID. The details depend on the chat protocol. Clients send and consume the identity triple: - identity URL - identity digest User Identity Data 6 Virtual Presence Technical Note 3 2009/03/04 - identity ID A client, which receives such an identity-triple checks the identity digest of the user identified by the identity ID. If the digest is different from the one, that has been received earlier, then the client fetches the identity file using the identity URL. After receiving the identity file, The client checks the item digest of all items for changes and fetches external item data, if required. XMPP Example: ... (other child-nodes) In XMPP the identity is an extension node of the presence-node. The XML name space is "firebat:user:identity" ("Firebat" is the internal project name of a client). The name space will be changed in the future to follow XSF and IETF recommendations. The example uses an OpenID as unique identity ID (id). Anything, that uniquely identifies a user will do. The identity URL (src) points to the identity XML document. 3.2.1 Identity URL The identity URL points to the identity document (3.1 "Identity Data"). The only URL scheme currently defined is "http". The identity URL is mandatory. 3.2.2 Identity Digest The identity digest is a short (compared to the identity data) textual identifier, which uniquely identifies a state of the identity data. The identity digest might be a version number or a message digest of all item digests. When users change their identity data, then they must take care, that the identity digest changes and that their client communicates the new identity digest. The identity digest may be stored in the identity file as a digest- attribute of the identity-node. The identity digest is optional, but clients should send digest and ID to enable caching of the identity data. Other clients may refuse to fetch identity data supplied without digest and ID. 3.2.3 Identity ID The identity ID is the unique character string. The ID and the digest are used to cache the identity data. User Identity Data 7 Virtual Presence Technical Note 3 2009/03/04 The identity ID may be an email address, the hash of an email address, a URL, or any other globally unique character string. Clients use the ID to associate an identity URL with a user, even if the identity URL changes. A user may change the identity URL (e.g. by changing the identity provider), but may still be recognized by other clients. The identity digest is optional, but clients should send digest and ID to enable caching of the identity data. Other clients may refuse to fetch identity data supplied without digest and ID. 4. Requirements This section lists the original requirements, which lead to the identity storage format described in this document. 4.1 Data Format The format should be extensible to other data types. The format should be extensible to new features. The format should support multiple independent items. The format should support hierarchical storage. The format should be able to include items directly or by external reference. The encoding should be simple so, that it can be written by users. The encoding should be a very common data format. The encoding should be able to embed other various types. 4.2 Caching and Updates The format should enable item caching of individual items. The format should enable cache updates for individual items. Update notification should work with minimum overhead. The data should be up to date very quickly if people meet. 4.3 Storage and Protocol The storage should use a very common protocol, preferably HTTP, because HTTP is also the document and VPI protocol. The storage should be distributed. There should not be a single storage server for user data. The storage address should be a single address, e.g. a URL. 4.4 Ownership, Control, and Privacy Users should completely control their appearance. Users should be able to change the storage address quickly, if they move to another provider, without loosing their identity. Users should be able to control their appearance. Anonymous access of user data should be possible. In addition the User Identity Data 8 Virtual Presence Technical Note 3 2009/03/04 can be personalized access to control the disclosure of information selectively. 5. Security Considerations 5.1 Identity Data The identity data is public. Anyone can access the data. There should be a selective disclosure which differentiates between requesting users. 5.2 Identity ID While the identity ID is primarily used for caching, it can be used to identify users between visits. Clients may change identity ID, digest, and URL from time to time. But frequent changes render identity data caching useless. This would result in greatly increased traffic. Rather, users might accept the fact, that their virtual presence actually makes them present to other users and that their peers may remember meeting them. The positive effects of meeting people and recognizing friends probably outweighs privacy implications, as they do in the real world. 6. References [1] - 7. Revisions 1 hw 2007/08/19 Created 2 hw,jw 2008/02/27 text/xml properties 3 hw 2008/04/26 Security Considerations about identity ID and minor clarifications 4 hw 2009/03/04 Example to show the XML properties User Identity Data 9