Virtual Presence Technical Note 2 2008.07.04 The Location Mapping Specification About this Note Number 2 Version 7 Date 2008.07.04 Category Informational Status Draft Short Name Location Mapping Document VPTN-2.txt Authors Heiner Wolf, hw, wolf.heiner@gmail.com Working Group - Dependencies - Supersedes - Superseded By - Abstract This document describes the location mapping for virtual presence. Table of Contents 1. Introduction.......................................................2 2. A Simple Example...................................................2 3. Configuration Details..............................................3 3.1 Location.......................................................3 3.1.1 Location Match.............................................3 3.1.2 Name.......................................................3 3.1.3 Name Hash..................................................4 3.1.4 Name Prefix................................................4 3.1.5 Service....................................................4 3.1.6 URL........................................................4 3.1.7 Destination................................................5 3.1.8 Ignore.....................................................5 3.1.9 Hidden.....................................................5 3.1.10 Select.....................................................6 3.1.11 Select Options.............................................6 3.1.12 Select Option Tags.........................................6 3.1.13 Selecting Options..........................................8 3.1.14 Private Rooms..............................................9 3.1.15 Delay.....................................................10 3.1.16 Topology..................................................10 3.1.17 Zone......................................................11 3.2 Delegate......................................................11 3.2.1 Delegate to VPI file......................................12 3.2.2 Delegate to Document......................................12 3.3 Info..........................................................12 4. The Location Mapping System.......................................13 4.1 Local Phase...................................................13 4.2 Global Phase..................................................13 5. VPI File Format...................................................14 Location Mapping 1 Virtual Presence Technical Note 2 2008.07.04 5.1 VPI Files.....................................................14 5.2 VPI in HTML Documents.........................................15 6. Requirements......................................................15 6.1 Web Site Variety..............................................15 6.2 Location Variety..............................................15 6.3 Technical Requirements........................................16 7. Security Considerations...........................................16 8. References........................................................16 9. Revisions.........................................................16 1. Introduction The web is a vast cloud of documents. It is roughly organized in clusters of related documents (sometimes called web sites). VP takes place on web pages. VP uses the content of the web. If users navigate to the same page, then VP will make them aware of each other. This is a high level view. Technically, they both join the same chat room and the chat server tells them about each other. Location mapping is the key component, that lets users join the same chat room when they navigate to the same URL. The location mapping process associates a chat room with the document URL. It maps the document URL to a chat room address. The mapping scheme is described here. 2. A Simple Example The mapping relies on regular expressions: - A URL is parsed by a regular expression and - regex-matches are used to construct the chat room address. The URL: http://www.sample-domain.org/notes/VPTN-1.txt is matched by the regular expression: http://www\.sample-domain\.org/(.+)/.* Using the replacement expression for the room name: \1 we get the chat room name: notes Using the chat server address and protocol: xmpp:chat.sample-domain.org we get the URL of the associated chat room: xmpp:notes@chat.sample-domain.org All VP client of visitors to the web page know, that people meet in this chat room. Actually people visiting all documents in the '/notes' folder will meet in the same room. The configuration data for this mapping looks like: \1 xmpp:chat.sample-domain.org Location Mapping 2 Virtual Presence Technical Note 2 2008.07.04 This reads as: - there is a virtual "location" for all URLs which "match" the regular expression, - the first match ("\1") of the regular expression will be used as the "name" of the virtual location, and - the virtual location "service" is provided by the chat server "chat.sample-domain.org" using the XMPP protocol. The data is called virtual presence information (VPI), because it defines and configures a virtual location. Such configuration data may be stored as an XML file on the document server. It may also be embedded into a document. By providing the VPI for it's URLs, the web site operator controls where and how visitors communicate if they visit the web site. If the document server does not provide VPI, then a default VPI will be used. 3. Configuration Details The above example shows the basic location mapping. This section describes all options. 3.1 Location The location-node and it's child nodes define the location mapping. The location-node is a first order element. 3.1.1 Location Match The match-attribute of the location-node selects the URL space to which the location-node applies. The value of the match-attribute is a regular expression. All text inside the location-node is subject to replacement by matches of the regular expression. Any node name, attribute name, attribute value, and node text (CDATA) may be changed. Replacement expressions start with "\" (\0, \1 ... \9). The match-attribute may be omitted. A location-node without match- attribute matches all URLs. Examples: 3.1.2 Name The name-node defines the chat room name. The room name is the inner text of the name-node. The name-node is a child of a location-node. Examples: Location Mapping 3 Virtual Presence Technical Note 2 2008.07.04 sample-room \1 3.1.3 Name Hash The room name may be hashed using the hash-attribute of the name- node. The purpose of the hash is to obscure room names, which are based on URL components to avoid transmission of URLs or URL fragments. Possible values: - "SHA1": 2 hex-digit representation of a SHA-1 message digest - "true": (same as "SHA1") Examples: sample-room -> 03a440b86ae31f837a78c219104a0888f095f984 sample-room -> 03a440b86ae31f837a78c219104a0888f095f984 3.1.4 Name Prefix The name may be prefixed using the prefix-attribute of the name- node. The prefix is applied after a potential hash. The purpose of the prefix is to assign a recognizable (e.g. sortable) prefix to chat rooms despite the hash. Examples: sample-room \1 3.1.5 Service The service-node defines the address of the chat service, which hosts the chat room. The service address is the inner text of the service-node. The service address comprises the address scheme and a scheme specific address, usually a server address. The service-node is a child of a location-node. Examples: xmpp:chat.sample-domain.org irc://foobar.org:6665 3.1.6 URL The url-node contains the complete URL of the chat room. This option may be used instead of name-node and service-node. It is simpler, but prefix- and hash-attributes of the name-node can not be applied to the url-node. The inner text of the url-node is the location mapping result. The url-node is a child of a location-node. Examples: Location Mapping 4 Virtual Presence Technical Note 2 2008.07.04 xmpp:sample-room@chat.sample-domain.org irc://foo-domain.org:6665#bar-channel 3.1.7 Destination The destination defines the document URL which represents the chat room. It is the web browsing destination, which should be used to enter the same chat room. In most cases, location mapping assigns a chat room to a group of document URLs. In other words: multiple URLs are mapped to the same chat room. If a VP component needs a single URL, which represents all URLs of the chat room, then it uses the destination-node, if available. The destination-node is a child of a location-node. Examples: http://www.sample-domain.com/start.htm http://\1 \1 Possible use cases: - Chat room names are usually not suited for end user consumption. If a user interface wants to label a chat room with then it may use the destination. - If a user wants to invite another user to the same location, then the easiest way is to send a URL. The invited user navigates to the page, gets the same chat room name, and enters the chat room. - A list of URLs (Web 2.0: tag cloud) which shows visited web pages would contain destinations. 3.1.8 Ignore Use the ignore-node to opt out of VP. There is no chat room associated with URLs matching a location-node with an ignore-child. The ignore-node is a child of a location-node. Examples: 3.1.9 Hidden The hidden-node tells VP components, that the document URL should not be disclosed. The hidden-node is a child of a location-node. Examples: Possible use cases: - A list of URLs (Web 2.0: tag cloud) which shows visited web pages Location Mapping 5 Virtual Presence Technical Note 2 2008.07.04 should not contain URLs, which match a location-node with hidden- child. 3.1.10 Select The select-node enables multiple parallel rooms. It offers a list of options from which clients may choose an additional suffix to the chat room name. The selection of the suffix may be based on user input, e.g. a language preference. The suffix is appended to the chat room name after an optional hashing (see 3.1.3 "Name Hash"). The location mapping process still returns a single chat room address for a document URL. But the location mapping result may differ between clients, because clients may use different user preferences. It is best practice, that the client software chooses an option and then stores the decision permanently so, that a user returning repeatedly to the same document URL will always join the same chat room. The client may ignore the select-node and join the base room. The base room and the rooms with suffix are peers. There is no hierarchy implied. All components of the room name must be compatible with valid room names of the chat protocol. It is recommended to limit the room name, prefix and suffix to ASCII letters, numbers and simple signs. The name should match this regular expression: [a-zA-Z0-9.-_]+. 3.1.11 Select Options Each option-node offers a potential suffix, which may be appended to the room name. An option-node must have a suffix-attribute. The value of the suffix-attribute is the suffix to be appended. An option-node may have a title-attribute. The title-attribute is a user-readable label. An option-node may have a description-attribute. The description- attribute is a user-readable description of the room. A client software may present the choice of rooms upon entry or it may use predefined user preferences. 3.1.12 Select Option Tags An option-node may have tag-child nodes. Tag nodes provide selection criteria for the client software. A tag-node with inner text "lang:en" is assumed to be populated by English writing users. A tag-node with inner text "sports" is assumed to be populated by users interested in sports. Multiple tags narrow the selection. A user may configure the client software to prefer a room with specific tags. Location Mapping 6 Virtual Presence Technical Note 2 2008.07.04 Common tags are: - "lang:xx": where xx is an ISO country code - "lang:xx_YY": where xx_YY is a locale code - "chat" for those who want to chat - "flirt" for those who want to chat for specific purposes - "busy" for those who do not want to chat - "politics" - "economy" - "business" - "sports" - "entertainment" - "technology" - "games" - ... If there are multiple options which match the user preferences equally well, then the client may select one of the options by random choice. Examples: Location Mapping 7 Virtual Presence Technical Note 2 2008.07.04 3.1.13 Selecting Options Select options provide a choice of available rooms. The client may choose a room or present the list of options to the user. If several options match the user preferences, then the client may choose one of the matches randomly. The select-node may also force the client to select a room by supplying a tag-attribute. Example: Explanation: three of the select options are tagged with "foo". The select-node reads as "select tag foo". This means, that the client should select one of the options, which have the "foo" tag. All five options should be available in the user interface of the client so, that the user may switch. The language preference is also supported. If options are tagged with a language and the select-node's tag-attribute, then the client should select one of the options where the tag-attribute and the language match best. Example: Explanation: walking all options, the client checks for the language and the select-tag ("foo"). The select-tag forces the client to select one of the "foo" options, where the language matches. An English (en) user will automatically get suffix "en1" or "en2". A German user (de) will get "de1" or "de2". Any other language does not match a language tag. Hence, any other language will get suffix "1" or "2". Suffixes "politics" and "sports" will only be available in the user interface and will not be chosen automatically. 3.1.14 Private Rooms Basically, clients can join any chat room they like. But they see each other only, if they join the same room. Hence, clients must agree on the mapping from document URL to chat room address. Location mapping rules implement the agreement. Clients consent to the location mapping by implementing it's rules. However, clients could use other agreements. Users could agree on a chat room address independent of the location mapping process described here, e.g. by email. If both configure their clients to join the same chat room when navigating to the same URL, then they meet in a different room, than all other clients, which stick to the global location mapping. Users may communicate suffixes via email or chat and then configure their clients to use the suffix. The client software may keep a data base of custom suffixes. It may append the suffix to a chat room name derived from the location mapping process as if the suffix had been configured as a select option. Custom room name suffixes are a way to create custom rooms beyond the pre-configured select options, but still sponsored by the global location mapping. Example: Assumed, all document URLs of the web site: www.sample-domain.org are mapped to the chat room name: sample-rooom and the chat service: xmpp:chat.foo.org Assumed, a user configured the client to append the suffix: -my-lounge to any room name of the web site "www.sample-domain.org", then the final chat room address is: sample-rooom-my-lounge@chat.foo.org If the name is hashed according to 3.1.3 "Name Hash", then the final chat room address is: a5fd29db3240979b97c6f41ff3a4ffd10dff51e9-my-lounge@chat.foo.org Location Mapping 9 Virtual Presence Technical Note 2 2008.07.04 3.1.15 Delay The delay-node tells clients to wait before entering the chat room associated with the document URL. This feature may be used to filter out spurious visitors who would join into the room for a short time before leaving again. A behavior observed on search engine domains, where people drop by just to move on to the search results, thus generating useless, even disturbing VP traffic. The value of the sec-attribute specifies the time to wait before entry in seconds. A value of "infinite" means, that the client should not join the chat room. The sec-attribute is required. An optional bypass-attribute controls if the client shows a button for manual entry. The only allowed value is "true". If the bypass-value is "true" and if the user chooses to enter, then the client should enter the chat room immediately. The delay-node is a child of a location-node. Examples: Note: a sec-attribute with value "infinite", without bypass- attribute does not make much sense. 3.1.16 Topology The topology-node is informational. It may be used by the client software to show the user how the location mapping is configured on the web site. The only attribute currently defined is the level-attribute. The value of the level-attribute tells, if there is a room per domain, a room per document, per language, etc. The real description of the location mapping is given by the location-node's match-attribute and the name-node. But this information is not suited for end users. So, authors of VPI may add a topology-node to tell visitors about their mapping. Possible values of the level-attribute: - "multidomain": a room covers multiple domains or web sites - "domain": a room for the top level internet domain - "host": a room per host name - "site": a room for the entire web site - "category": a room per category of the web site - "channel": a room per channel (e.g. a radio station's site) - "thread": a room per thread (e.g. in a forum) - "article": a room per article (e.g. a newspaper) - "query": rooms are assigned depending on the URL-query - "unique": each URL maps to a different room Location Mapping 10 Virtual Presence Technical Note 2 2008.07.04 - "language": a room per visitor language - "selection": the room selection is on the web site - "none": no room defined The topology-node is a child of a location-node. Example: 3.1.17 Zone The zone-node combines multiple rooms to a zone. A location-node may spawn multiple rooms, e.g. if a mapping creates a room per newspaper article. Still, these rooms may have very similar, if not identical configuration. An example is a client feature, that lets users opt out for certain web sites. If they just configure opt in/opt out for every room individually, then users would have to opt out of every room (read: newspaper article). Clearly, if the user decides not to be visible on a newspaper site, then the user usually means the entire site, rather than a single article. Once the user opts out, the setting should be applied to all articles, effectively making the user invisible on the web site even though it has thousands of rooms. There are reasons to group rooms together so, that a configuration setting affects the entire group rather than only the current room. Example: Possible attributes of the zone-node: - "id": a unique zone identifier, e.g. the domain name (required) - "title": a human readable zone name for display (recommended, defaults to "id") - "description": short description (optional) - "src": (reserved for future use) The zone-node is a child of a location-node. Multiple location nodes may have the same zone identifier. 3.2 Delegate While a client searches a VPI file for a matching location-node it may be redirected to another VPI file. The delegate-node is a first order element. Like the location-node it has it's own match attribute (see 3.1.1 "Location Match"). If the client hits a matching delegate-node, then the processing continues in the file indicated by the information found inside the delegate-node. The delegate-node must have a child node, either a uri-node or a document-node. Location Mapping 11 Virtual Presence Technical Note 2 2008.07.04 3.2.1 Delegate to VPI file The uri-node refers to another VPI file. The inner text of the delegate-node contains the URL of the next VPI file. The URL may be absolute or relative. The Examples: http://www.sample-domain.com/vpi.xml default.xml \2.xml 3.2.2 Delegate to Document A document-node refers to the document for which the location mapping is performed. The next VPI data is to be found in the document referenced by the document URL (see 5.2 "VPI in HTML Documents"). Example: 3.3 Info The Info-node provides meta information about the VPI file, such as an expiry time or an email address of the responsible person. The Info-node is optional. Attributes of the Info-node: - "Version": a numerical value, which indicates the file version for version number controlled cache updates. - "Description": a free text description of this file - "AdminContact": email address of the responsible person - "TechContact": email address of the technical contact - "TimeToLive": minimum lifetime of the information in seconds - "ExpiryTime": maximum lifetime in seconds - "RefreshInterval": the cache update interval in seconds - "TransferRetryTime": cache update retry interval in seconds All attributes are optional. A default TimeToLive of 3600 seconds Location Mapping 12 Virtual Presence Technical Note 2 2008.07.04 applies. Example: 4. The Location Mapping System The client software needs virtual presence information to do the location mapping. It gets the VPI from the global location mapping system (LMS). The LMS is a distributed storage of VPI. A client, in search of location mapping rules, checks out the web server of the document URL (local phase). If there is no VPI available, then it retrieves globally defined rules (global phase). 4.1 Local Phase Web sites may provide VPI for their URL space. They do so by providing XML files, which contain VPI data. The file name is: _vpi.xml This file may be stored at any ("/"-separated) path in the folder hierarchy of a web server. The rules it contains will be applied to all URLs inside the same folder. Client software tries to find the VPI file in the folder of the document URL. If there is no VPI file, then the client shortens the path to the next level and tries again until it hits the base URL of the web server ("/" path). Example: Assumed, the document URL is: http://www.sample-domain.org/notes/VPTN-1.txt then, the client will try to fetch: http://www.sample-domain.org/notes/_vpi.xml if this file is not available or if it does not contain valid XML, then it will fetch: http://www.sample-domain.org/_vpi.xml If the client software finds VPI, which matches the document URL, then the local phase ends under control of the web site operator. If the client software does not find matching VPI, then it starts the global phase. 4.2 Global Phase The global phase begins at the global LMS root: Location Mapping 13 Virtual Presence Technical Note 2 2008.07.04 http://lms.virtual-presence.org/root.xml This file contains pointers (3.2 "Delegate") to other VPI files. The client matches the document URL to any location-match or delegate- match attribute it encounters on its way until a location-node provides mapping rules. Example (continued from above): The client fetches: http://lms.virtual-presence.org/root.xml where the document URL matches: \1.xml The delegate-node invokes: http://lms.virtual-presence.org/org.xml where the document URL matches: default.xml The delegate-node invokes: http://lms.virtual-presence.org/default.xml where the document URL may match: xmpp:location.virtual-presence.org \3 So, the final VPI applied to the document URL selects the domain name of the document URL (3.1.1 "Location Match"). The result is hashed (3.1.3 "Name Hash"). The client joins the chat room on the XMPP chat server "location.virtual-presence.org" (3.1.5 "Service"). 5. VPI File Format 5.1 VPI Files Multiple location-nodes and delegate-nodes may be stored in a VPI file. The order of nodes is significant. The nodes are wrapped by a top level vpi-node with namespace specification and preceded by a XML declaration. Examples: \1 xmpp:chat.sample-domain.org \1 xmpp:chat.sample-domain.org Location Mapping 14 Virtual Presence Technical Note 2 2008.07.04 5.2 VPI in HTML Documents The VPI may be stored inside a HTML document. The VPI is stored in a META HTTP-EQUIV HTML header tag. Example: Note: a VP client usually reacts quickly to a navigation event of a web browser. The client searches VPI as described above, applies the mapping rule, and enters a chat room. This frequently happens before the document has been fully rendered by the web browser, because there is much less data involved and the VPI data might even be cached. This means, that the client software may join a chat room before it can access the VPI inside a HTML document. It is recommended to prevent the client from using external VPI by providing a VPI file which delegates to the document (see 3.2.2 "Delegate to Document"). Note: the quotes inside the VPI must not collide with quotes of the enclosing HTML. 6. Requirements This section lists the original requirements, which lead to the location mapping scheme described in this document. 6.1 Web Site Variety There are many different ways to organize web sites. And there are many different ways to structure URLs. Some web sites: - have static file based URLs, - have URLs similar to file system paths, - are based solely on the URL query part, - span multiple DNS domains, - present identical content on different DNS names, - present identical content using different URLs for each user. The location mapping scheme takes this variety into account. 6.2 Location Variety A single web page URL may be a single virtual location, but usually it is part of a web site. The web site may: Location Mapping 15 Virtual Presence Technical Note 2 2008.07.04 - appear to the user as a single virtual location, - have different sections, which users regard as different locations, - have different sections, which the web site operator regards as multiple locations, - consist of multiple web sites, which are regarded as a single location. The location mapping scheme takes these perceptions into account. It allows to split up groups of URLs into separated rooms or combine different URLs into a single room. 6.3 Technical Requirements The location mapping system is also designed to: - use existing technologies, protocols, and formats, - allow web sites to control the mapping for their URL web site, - enable VP on web sites even, if they do not participate actively, - be independent of the chat protocol, - support user and operator defined private rooms, - be easily implemented by web site operators. 7. Security Considerations Virtual presence makes people aware of each other in virtual spaces. It reduces privacy for the benefit of cooperation and awareness. But a VP system should protect as much privacy of the user as possible. It should not disclose URLs without approval of the user. It should not transmit URLs over unencrypted connections. It should protect URLs which contain session or other private information. The LMS has been designed so, that independent clients may join the same chat room based on their web browser URLs without sending the URLs over the network. 8. References [1] - 9. Revisions 1 hw 2007.08.13 Created 2 hw 2007.09.27 Limit name, prefix, suffix to simple ASCII 3 hw 2007.10.31 Moved to top level, added node, fixed typos, rearranged headings to fit delegate and Info 4 hw 2008.01.20 Small text changes for delegate as top level node 5 hw 2008.05.15 Added zone 6 hw 2008.07.01 Fixing word link error Added: Selecting Options 7 hw 2008.07.04 Fixed sample-room SHA1 example Location Mapping 16