Copyright © 2023 World Wide Web Consortium. W3C® liability, trademark and permissive document license rules apply.
This document introduces an API for cropping a video track derived from display-capture of the current tab.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This First Public Working Draft represents the direction the Web Real-Time Communications Working Group intends to explore to solve the use case of partial capture of browsing contexts. The Working Group is particularly interested in feedback on how well this direction matches the said use case from potential adopters of the API.This document was published by the Web Real-Time Communications Working Group as a Working Draft using the Recommendation track.
Publication as a Working Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 2 November 2021 W3C Process Document.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MUST and MUST NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This document uses the definition of the following concepts from [SCREEN-CAPTURE]: display-surface and browser display-surface.
This section is non-normative.
Complex applications often comprise multiple documents in distinct iframes, all
displayed within the same browsing context. Consider such an application. Assume one
of these documents, CAPTURING-DOC uses getDisplayMedia
()
or
getViewportMedia
to capture the entire current browsing context. If this
document then wishes to crop the video track to the coordinates of some sub-section
CAPTURE-TARGET of a collaborating document CAPTURED-DOC, how can
CAPTURING-DOC do so performantly and reliably? Recall especially that changes
in layout due to scrolling, zooming or window resizing present additional challenges.
Consider a combo-application consisting of two major parts hosted in different iframes
within the same tab - a video-conferencing application and a productivity-suite
application. Assume the video-conferencing uses existing/upcoming APIs such as
getDisplayMedia
()
and/or getViewportMedia
and captures the
entire tab. Now it needs to crop away everything other than a particular section of the
productivity-suite. It needs to crop away its own video-conferencing content, any speaker
notes and other private and/or irrelevant content in the productivity-suite, before
transmitting the resulting cropped video remotely.
Moreover, consider that it is likely that the two collaborating applications are cross-origin from each other. They can post messages, but all communication is asynchronous, and it's easier and more performant if information is transmitted sparingly between them. That precludes solutions involving posting of entire frames, as well as solutions which are too slow to react to changes in layout (e.g. scrolling, zooming and window-size changes).
It is worthwhile to note that most applications would likey prefer to use
getViewportMedia
in such scenarios. However, as of this writing,
getViewportMedia
is still unspecified and unimplemented. It will have
non-trivial requirements whose adoption will take some time and effort. As such, many
applications will likely use a combination of getDisplayMedia
()
and
Region Capture for some time to come.
The region-capture mechanism comprises two parts:
Element
as a
potential target for the cropping mechanism.
Element
, or to stop
such cropping and revert a track to its uncropped state.
We define two crop-states for video tracks - cropped and
uncropped. Tracks start out uncropped, and may turn to cropped when
cropTo
is successfully called on them.
The cropping mechanism presented in this document
(cropTo
) relies on Crop-session Target rather than
on direct node references. This serves a dual purpose.
CropTarget is an intentionally empty, opaque identifier. Its purpose is to be handed to
cropTo
as input.
WebIDL[Exposed=(Window,Worker), Serializable]
interface CropTarget
{
[Exposed=Window, SecureContext] static Promise<CropTarget
> fromElement
(Element element);
};
There is no consensus yet on whether fromElement
should be exposed beyond
secure contexts.
fromElement()
Calling fromElement
with an Element
of a supported type associates
that Element
with a CropTarget
. This CropTarget
may be used as input to
cropTo
. We define a
valid CropTarget as one returned by a call to CropTarget
.fromElement
()
in a document that is still
active.
When fromElement
is called with a given element, the user
agent creates a CropTarget with element as input.
The user agent MUST return a Promise
p. The user agent MUST resolve
p only after it has finished all the necessary internal propagation of
state associated with the new CropTarget
, at which point the user agent MUST be
ready to receive the new CropTarget
as a valid parameter to
cropTo
.
When cloning an Element
on which fromElement
was previously called,
the clone is not associated with any CropTarget
. If fromElement
is
later called on the clone, a new CropTarget
will be assigned to it.
There is no consensus yet on whether producing a CropTarget
should be done by
invoking an asynchronous method like CropTarget
.fromElement
()
, or a
CropTarget
constructor that accepts an Element
as input. This is further
discussed on
issue #17.
To create a CropTarget with element as input, run the following steps:
Let cropTarget be a new object of type CropTarget
.
Let weakRef be a weak reference to element.
Create cropTarget.[[Element]] initialized to weakRef.
cropTarget keeps a weak reference to the element it represents. In other words, cropTarget will not prevent garbage collection of its element.
CropTarget
objects are serializable. The serialization steps, given
value, serialized, and a boolean forStorage, are:
If forStorage is true
, throw with new DOMException
object
whose name
attribute has the value "DataCloneError
".
Set serialized.[[CropTargetElement]] to
value.[[Element]]
.
The deserialization steps, given serialized and value are:
Set value.[[Element]]
to
serialized.[[CropTargetElement]].
Recall that, as per [SCREEN-CAPTURE], when getDisplayMedia
()
is called,
it returns a Promise
<MediaStream
>, and that this MediaStream
contains
exactly one video track, whose type is MediaStreamTrack
.
We specify that if the user chooses to capture a browser display-surface, the user
agent MUST instantiate the video track as either MediaStreamTrack
, or as some
sub-class of MediaStreamTrack
, and that cropTo
MUST
be exposed on this track. For simplicity's sake, this document assumes that a subclass
called BrowserCaptureMediaStreamTrack
is used by the user agent.
The track MUST be initially uncropped.
WebIDL[Exposed = Window]
interface BrowserCaptureMediaStreamTrack
: MediaStreamTrack {
Promise<undefined> cropTo
(CropTarget
? cropTarget);
BrowserCaptureMediaStreamTrack
clone
();
};
cropTo()
Calls to this method instruct the user agent to start/stop cropping a video track to
the
bounding client rectangle
of cropTarget.[[Element]]
. Since the track is restricted to
the visible viewport of the display-surface, the captured area will be the
intersection of the visible viewport and the element bounding client rectangle.
Whenever cropTo
is invoked, the user agent MUST
execute the following algorithm:
If cropTarget is neither a valid CropTarget nor null
,
the user agent MUST return a Promise
rejected with an UnknownError
.
Promise
.Run the following steps in parallel:
undefined
nor a valid CropTarget,
reject p with a NotAllowedError
and abort these steps.
If cropTarget is either undefined
or a valid CropTarget,
the user agent MUST update this video track's crop-state according to
cropTarget:
undefined
, the user agent MUST stop
cropping. This video track reverts to the uncropped state.
CropTarget
. This means that for each new frame
produced on the track, the user agent calculates the bounding box of the
pixels belonging to the element, and crops the frame to the coordinates of
this bounding box.
Call the track's state before this method invocation PRE-STATE, and after this method invocation POST-STATE. The user agent MUST resolve p when it is guaranteed that no more frames cropped (or uncropped) according to PRE-STATE will be delivered to the application, and that any additional frames delivered to the application will therefore be cropped (or uncropped) according to either POST-STATE or a later state.
The timing of the cropTo promise resolution and the timing of the actual cropping of video frames is observable to JavaScript through MediaStreamTrack transforms. It is expected that the first newly cropped video frame will be enqueued on the MediaStreamTrack ReadableStream just after the cropTo promise is resolved.
clone()
When a BrowserCaptureMediaStreamTrack
is cloned, the user agent MUST produce a
track which is initially uncropped, regardless of the crop-state of the
original track.
We define an Element
for which a CropTarget
was produced (through a call to
fromElement
) as a potential crop-target.
We define a potential crop-target which is targeted by a successful call to
cropTo
as the crop-session target.
Consider a frame produced on a cropped video track. The user agent calculates the intersection of (i) the top-level browsing context's viewport and (ii) the bounding box of all pixels belonging to the crop-session target. This intersection is defined as the crop-session target's coordinates for that frame.
Consider a video track VT cropped to a given crop-session target TARGET. We define the behavior of the crop-session of the VT in the face of changes undergone by TARGET.
We define as an empty crop-session target the case where a crop-session target is attached to the DOM, yet consists of zero pixels which are drawn inside of the top-level browsing context's viewport.
Some examples of when this could happen include:
The user agent MUST NOT produce new frames on tracks with an empty crop-session target. For such a track, the user agent MUST resume the production of frames if the track either become uncropped, or if its crop-session target stops being empty.
We define as disconnected crop-session target a crop-session target that had been detached from the DOM.
The difference between an empty crop-session target and a disconnected crop-session target, is that a disconnected one
may become
unreachable, in which case it would not produce any new frames. Nevertheless, the user agent
MUST treat a disconnected crop-session target the same way it treats an empty crop-session target. The application may call
cropTo
on the track with either undefined
or a
new CropTarget
, thereby allowing the production of frames on the track to be
resumed.
Code in the capture-target:
const mainContentArea = navigator.getElementById('mainContentArea');
const cropTarget = await CropTarget.fromElement(mainContentArea);
sendCropTarget(cropTarget);
function sendCropTarget(cropTarget) {
// Can send the crop-target to another document in this tab
// using postMessage() or using any other means.
// Possibly there is no other document, and this is just consumed locally.
}
Code in the capturing-document:
async function startCroppedCapture(cropTarget) {
const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
if (!!track.cropTo) {
handleError(stream);
return;
}
await track.cropTo(cropTarget);
transmitVideoRemotely(track);
}
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: