Telepresence: Advancing The Future
An Ontario Telepresence Project White Paper

Table of Contents

The Threshold of Frustration
The Myth of the "Super Appliance".
Telepresence: Using Human Centred Design
Positioning for the Future: Broadband Services

The Threshold of Frustration

If current commercial products are any indicator of our ability to design and deploy usable products and services, then technologists as a whole have little credibility as designers of future applications. Consider the three graphs in figure 2.1 below. In the first graph "A", is a stylized depiction of the growth of technology (exponential growth over time). In "B" we show the growth of the promised functionality (exponential). It is noteworthy to mark a few points on graph "B". Today we are living in a world of the telephone, photocopier and VCR. Few of us know how to fully exploit the functionality incorporated into these seemingly "simple" tools. In fact, these devices are typical of technology-centred designs; needlessly complex, arcane and with user-interfaces that are independent and incompatible with one another.

Let us now examine graph "C". Human beings are not developing more neurons, more hours in the day or more capacity to learn. In fact, we have a limited ability to absorb new things. In adult life, new skills often come at the expense of older skills. For instance, to play a musical instrument well, one has to use the time and energy allocated to practicing other skills, such as horseback riding.

We call this fundamental limit the "Threshold of Frustration." Technological wizardry and functionality hidden above the threshold of frustration is inaccessible (and thus invisible) to users. With the VCR, telephone, and photocopier clearly exceeding the threshold, what credibility do designers have to deliver promised functionality like multi-media, virtual reality of computer supported collaborative work (CSCW)?


The Threshold of Frustration

Figure.1: Threshold of Frustration

The only way to apply the technology of graph "A" and deliver the functionality of graph "B", is to design for the innate cognitive, motor and social skills of ordinary users. In the emerging knowledge-based economy, only those products and services which have the user as the focus of the design and implementation will be successful and sustainable in the long term.

The Myth of the "Super Appliance".

Many vendors are seeking to capitalize on the convergence of information, telecommunications and consumer electronics. If current trends are any indication, the main approach taken by these vendors is one of concentrating functionality into what could be called "super appliances". By this, we mean televisions which are also entertainment systems, shopping centres and on-line video stores or computers that are also answering machines, telephones, video editing suites, electronic books, all rolled into one.

The applications that are being designed to run on these "super appliances" are designed and optimized with limited interaction between them. The model used in their conception is technology-based as is shown in Figure 2 below.

Traditional Collaboration Model

Figure 2: Traditional Collaboration Model

This model categorizes collaborative technologies according to two dimensions: time and place, where each dimension has two values (same/different). While each tool in the matrix runs on the same workstation, no tool is aware of the state of the user, the natural transitions necessary from tool to tool nor the state of any other tool. Thus, the user is forced to adapt his/her work flow to the tools available, rather than the tools adapting to the changing parameters of work.

While this model is in wide use in the CSCW community and is useful in establishing a taxonomy of tools and system, we observe that it is inherently technology based and thus inappropriate for use in the design and implementation of systems to successfully support work groups.

Telepresence: Using Human Centered Design

Our experience in the Ontario Telepresence Project was to take precisely the opposite approach. Rather than starting with the "workstation" as the focus of our implementation, we chose to employ sociologists and psychologists to first study the users in their environment, and analyze the nature of the work and the social interactions that were central to the culture of the organization. Only when the social ecology of the workplace was understood, did we begin our design of applications to support selected activities.

When we designed applications for various internal and external field study sites, we chose to use a large number of simple and specialized devices distributed in space and located where users needed them. Although these devices are separate, they work together in concert because they were networked over the same computer controlled audio/visual network.

The net result was a system architecture and deployment methodology that was strong (due to the specificity of individual devices) yet general (due to the overall functionality offered by the networked family, as a whole). This is an approach which we have pioneered, and its adoption or incorporation into design we feel will have a major impact on the usability and success of future systems.


A key motivation driving our work is a desire to make the transitions between the various tasks and activities of the workplace "seamless." By this we mean an effortless, natural transitions between people, tasks, times and places. Seamlessness operates in a number of dimensions:

  • between foreground and background tasks and activities;
  • in moving attention between person and task in collaborative work;
  • between local and distant environments;
  • between computer-mediated human-human communication and human-computer communication; and
  • bridging between the artifacts (such as documents) in our computer-based information space and those in our physical space.

We can illustrate how these notions fit together through the introduction of an alternate model for workgroup interaction described below:

A New Workgroup Model

We introduce a complimentary model that helps in clarifying the shared work at a distance from a human perspective.

New Workgroup Model

Figure 3: New Workgroup Model


Here, the rows are human-human communication and human-computer communication. The columns are foreground activities and background activities. Foreground activities are tasks which are intentional i.e., require human to activate for usage. Speaking on the telephone, or typing into a computer are two examples.

"Background" tasks take place in the periphery i.e., "behind" those in the foreground. Examples include being aware of someone in the next office typing, or the light in your kitchen going on automatically when you enter it, as opposed to manually flicking the switch (a foreground intentional act).

From this simple model, we can derive some valuable observations. First, while tools exist to serve activities in the left column, this is much less true in the case of the right column and nearly all work in technology mediated human-human and human-computer interaction falls in the left column. It is our fundamental belief that the real "sweet spots" in distributed work effectiveness lie on the right hand side of the model.

One could argue that in supporting human-human interaction, telephones and video conferencing do a fairly good job. One can hold fairly rich conversations, see each other, judge moods, etc. So why is there still such a sense of distance between people, despite such technology? Our belief is that this is due to the fact that such technologies do not share some of the key affordances that occur naturally when people work in close physical proximity. Regardless of the fidelity of the video phone, I still have no sense of who is in when I call. I can't "bump into" people in the hall, know who is available and who is busy, or take advantage of synergistic opportunities when just the right combination of people happen to be at the water cooler at a particular time. Yet, in shared physical space, all of these are commonly available almost effortlessly in the background, due to our "peripheral awareness."

Reflecting the Social Ecology of the Workplace

Based on this observation, we propose to develop and evaluate a means of sharing the periphery, the background social ecology, through the use of appropriate technological tools and prostheses. Moreover, we will explore means to support seamless integration of these background tools with existing (and new) foreground applications. Through this approach, we believe we will achieve a significant improvement in the sense of co-presence, or "telepresence" operating over wide geographic distances.

One example of such a technology is referred to in the upper right corner of figure 4, the Portholes system originally developed by Xerox PARC / Rank Xerox EuroPARC (this system has since been commercialized by Telepresence Systems Inc. as ProRata). Portholes is a system which takes video "snapshots" of members of a community every 5 minutes, and circulates them to the computer screens of the members of that same community, as shown in the figure below. Hence, all members have an increase awareness of who is in, what they are doing and if they might be available. Participants give permission to be viewed through selecting their door state if they close there office door no image is allowed. They also provide a means of combating the all too human tendency towards "out of sight, out of mind." All members of the community have a visual presence, regardless of actual geographical location.



Figure 4: Portholes

Portholes is an excellent example of a background "awareness server," of which there are many others currently under experimentation.

Likewise, along the Human-Computer interaction dimension, there are also background technologies. The example cited in the bottom right quadrant of the figure is "smart house" technology. These are technologies such as those which turn down the heat on weekends, automatically water your plants, close blinds, turn on lights, etc., under computer control. Among others, Telepresence Systems, a spin-off company from the Ontario Telepresence Project, will seek to uncover new tools to facilitate background communication among users providing them with all-important contextual information for their business transactions.

Current Videoconferencing Systems are only a Start

Current market offerings of videoconferencing systems provide excellent value in that they can provide many of the benefits of face-to-face meetings. As the costs of these systems plummet and video communication will become more common in business and government offices. However, it is important to note that by itself, videoconferencing only provides users with a tool to support human-to-human, foreground communications. The other quadrants of the model described above are not addressed by this technology.

Moreover, even in simple conference-room to conference-room communications, there is room for refinement in design and implementation. For example, consider for a moment how we use a conference room. A conference room has walls, furniture and audio-visual aids that enable people using the room to interact in a special way. The presenter stands at the front of the room while he/she presents. The other participants sit around the table or in chairs along the walls. The presenter and the audience members establish eye-contact with one another, interrupt (or stay silent) as is socially appropriate, make quiet aside remarks to one another, etc. The presenter moves around the front of the room writing on a whiteboard, transparency or flip chart, pacing, gesturing in the air, etc. The presenter will often sit down at the conference table while an audience member stands up and becomes the presenter.

Other than in custom-built rooms, current videoconferencing systems have not yet capitalized on this rich milieu for interaction. For example, if a formal presentation is being given, there is no distinction between "front" and "back" of the room (see below) no way to use white boards or other large shared drawing surfaces, no ability to establish eye-contact or have an aside conversation with an individual remote person.

Working with our industrial partners, we sought to explore how to use these common artifacts of the conference room in meetings with remote participants. We have built some experimental conference rooms which seek to adapt technology to the social ecology of conference rooms. Among the many innovations we have introduced is the notion of "back-to-front" video conferencing depicted seen in Figure 5. Remote attendees, site at the "back" of the room, each on a different monitor. Eye contact is maintained between the presenter and the remote attendees through one or more cameras associated with the monitors. If a remote participant should wish to make a presentation to the group, he/she can be switched to the large front monitor for the duration of the presentation, the same social action as if he/she were physically present.

Back to front Videoconferencing

Figure 5: Back-to-Front Videoconferencing

Seamless Movement Between the Human, Computer, the Foreground and Background

Our belief is that the real power of this model comes not from merely populating the individual quadrants, but by providing the means to seamlessly make transitions from quadrant to quadrant, as illustrated by the arrows in the version of the model below:

Seamless Movement

Figure 6: Seamless Movement Between Cells of the Model

Let us illustrate this point with an example that will relate to a problem familiar to many: trying to arrange a conference call among a number of colleagues, all of whom are busy, hard to reach, and at different sites.
1) With using our tools, the user would glance at their portholes window to determine if the people appeared to be available. If so, they would use portholes to contact them and the problem is solved - we would have made a transition from the top right to the top left quadrant (via the bottom left, when interacting with portholes).
2) However, what if the more typical case were true: nobody appeared to be available. In this case we instruct an "agent" on our machine to let us know when the parties are available. This is done by simply selecting the appropriate people by pointing at their portholes images, and selecting an operator, such as "set up video conference when available."
3) Moving to the bottom right quadrant, in the background - while you resume other work - the agent "looks" at the incoming portholes images, scanning for any changes. Through simple image processing it can detect comings and goings in the remote offices. When all parties appear to be available, the agent initiates a foreground dialogue with the user, suggesting that now might be an opportune time for the meeting.
4) If so, the user initiates the meeting, and the conversation begins. In a seamless manner, one has moved counter-clockwise from the top right to the top left quadrant. High value and functionality is obtained with minimal complexity for the user. A prosthesis which makes up for many of the problems of distance is provided.
Our belief is that this is just one example of many, and that the architecture which we are pursuing affords exploring such synergies in an effective and coherent manner.

Positioning for the Future: Broadband Services

This model reveals some important technological characteristics, as well aspects of usage. This is illustrated in the following refinement of the basic figure.

Refinement of Model.

Figure 7: Refinement of the Model which Reveals
Telecommunications Network Implications

What we have added here are labels that characterize the bandwidth of the two columns. We observe that activities in the left column are high bandwidth, but bursty, whereas those in the right are relatively low bandwidth, but persistent. For example, a video phone call is high bandwidth, but we may only make 5 calls a day. On the other hand, distributing Portholes or ProRata images is persistent, running constantly in the background. However, the bandwidth required to distribute images is relatively low. Viewed in the context of seamlessly moving from quadrant to quadrant, what we have is a means of capturing the notion of "bandwidth on demand." Furthermore, the model which emerges from this approach is in may ways richer that those commonly used, such as video on demand.


From the telecommunications perspective, what we have is a usage based model which argues strongly that a traditional telephony model (i.e., foreground calls, video conferencing, etc.) is not adequate to support telepresence (including telework, distance learning, etc.). What is equally important is the point that this observation has emerged from a methodology based on placing the emphasis on use not technology - a methodology which will carry on through the project spin-offs.