Meeting minutes
<atsushi> Day1 minutes
https://github.com/immersive-web/capture/issues/1
<bajones> Ada, you're a little quiet on the video stream
ada: I think this is an important issue for immersive-ar sessions. Right now the only way to capture is to request raw camera access and composite the content on top. Using a very powerful API for a simple capture use case.
… it seems a bit dangerous for users to get used to allowing raw camera access for a simple use case, that the OS would better handle
alcooper: something we proposed a while back, we have a repo somewhere and did some thinking about it, we would like to push it more forward
cabanier: on the headseet you could end up capturing someone else
… there will definitely be a permission prompt needed but maybe also a sound
… not aware of anyone asking for this feature
ada: did people ask for raw camera access?
cabanier: no one except for Nick
… but people have asked for more information about the environment
bajones: it concerns me that people are using raw camera access to capture the session, it's overly complex and requires you to do a lot of compositing
<alcooper> https://
bajones: in the past we considered that most systems have an easy gesture to do this kind of capture
… I wouldn't mind looking at the proposal again, but what I recall of that is that it was a way to request capture via the share API, the capture being made by the OS
… I would be very concerned if we had evidence of people using raw camera access for this
alcooper: droped the link to that explainer above ^
… can move it to the official repo
… the shape of the API is to start capture and get a handle to an object that you can share but maybe not access, so we wouldn't need as strict of a prompt
… I think AR is really where this makes the more sense, for VR the page is already in control of everything
bajones: I will point out that even in VR, if you're using layers it could get non-trivial to composite yourself
… so it could be an nice feature, even if it's more critical for AR
ada: <polling the room>
ada: maybe alcooper should move the proposal to the repo
alcooper: will do!
https://github.com/immersive-web/webxr/issues/1338
ada: the issue here is that the input device the user is holding is also being rendered in XR, looking for the best way to tell the session that a control does not need to be rendered
… the only other example of this is screen input
… but when you have a tracked pointer, how do you express things like the abscence of squeeze event without a profile
… one way to do this would be to send an empty glb file in that case
… that wouldn't render anything
… it could be pretty generic
<ada> ['generic-hand-invisible', 'generic-hand']
bialpio: one comment, I think we should have profile names on transient input sources
… we should be sending generic-touch-screen or something
bajones: I think we do communicate profiles for transient input events. in the past we used external signals, like "if the environment mode is not opaque you don't need to render controllers"
… usually a reasonable assumption
… in the case of transient input on a touchscreen, usually in an AR session, we're not opaque
… in this case of this laptop, or for hands, we might be in a VR mode but have an outline for hands. so the render mode is opaque but you shouldn't draw controlelrs
… the input profile with the empty glb is a reasonable way to trick people into doing the right thing
… but I wouldn't mind an explicit signal
… having the profile named something -invisible might be enough of a signal
… I don't need if we need an explicit boolean for that, the input profile string is _probably_ enough
cabanier: I agree with bajones that this feels slightly hacky
… if the case of the laptop does the WebXR knows that hands are present
… and some experiences render custom controls, but you probably don't want to render them either
… so it sounds better if there was something on the session, or somewhere
<bajones> +1
Brandel: I think having the specific name -invisible might not be a good fit. is it invisible or independtly rendered?
… it's not that it's invisible is that it's rendered elsewhere
… (outside of the XR session)
cabanier: I don't know if you saw ??? that does hand tracking with the laptop camera, and in that case you want to say that the input is hands but you don't want to show them
ada: commeting on the issue to recap, adding an enum but should we also tweak the profiles?
bajones: even if we have an explicit signal it's probably worth doing both
… the worst case scenario is that experiences with custom controllers will render them anyway
… and we can't fully avoid that, but we can make it easy for people
… the input profile could probably be called -invisible, since the asset is, but the explicit signal should be called something else: system-rendered? but the system might _not_ be rendering, you might see the physical thing
… user-visible doesn't sound right either, it's confusing
… I don't know but I like Brandel's point
… the flag shouldn't say "invisible"
ada: I'll make a note on the issue
<bajones> +100
Brandel: I agree on the asset / profile naming, but I would prefer "empty" instead of "invisible"
https://github.com/immersive-web/webxr/issues/1338
https://github.com/immersive-web/real-world-meshing/issues/3
cabanier: in our system and OpenXR, objects in the world can be a plane or a mesh
… or both, but the site doesn't know which plane belongs to a mesh
… the site could look at the space associated with each mesh or plane, and assume the same space means the same object
… but there might be a better way to report this
bialpio: one thing about using the space: for plane we say that all points in the plane should be at z=0 in the space, which might be a constraint
… also do you need to render both the mesh and the planes, or rendering planes-only gives you a simplified view?
… what will the app be expected to render in those cases
… are the OpenXR bindings any indication?
… for some use cases it might be important to know if the planes are enough or if you need to look at the mesh
cabanier: OpenXR gives us all the spaces in the scene, and we can filter them by plane or mesh, they just have different meta data
… but the space returned is the same
… to answer your question the mesh will also describe the plane and be a more precise representation
<ada> ack
bajones: I can see scenarios where you would render a mesh, like for occlusions
… but are there scenarios where you would render a plane? or is it just for hit-testing?
cabanier: if you have both you might not need the planes at all
bajones: if you report both it might be for fallback
… so older experiences still get planes
… but if you request mesh data you probably don't need to mix it with plane data
bialpio: as far as use cases, for plane detection we were thinking about hit testing and areas
… like "will your object fit"?
… this is a different use-case as meshing
… we should at least give freedom to the implementation
… but then developers will have to plan for both scenarios
… maybe we could spec something where you can request meshing but fallback to planes?
… how do be helpful for developers?
ada: it shouldn't be too complex to generate a bounding box for the mesh
bialpio: one remark: we had similar conversations while discussing hit test, the "entity type"
… you would get a type with the hit test result
… the idea was to give you a separate opaque object to correlate when hit comes from the same object
… you could use JS equality to know that the mesh comes from the same object as the plane
cabanier: yes that was in my proposal
bialpio: since it ties to hit-testing we should do something similar
cabanier: agreed
bialpio: In these past conversations the concern was to tying the spec to a specific sub system
… you could have point clouds, vs plane detection, vs mesh
… but why would the site care
cabanier: I wasn't really thinking about hit testing, more about drawing things
… in a way meshes are more modern but not every object has a mesh, a wall wouldn't get a mesh
bialpio: that's also related to object annotations on the Meta device
… how would it work for other systems?
cabanier: it would probably be the same of systems that scan
bialpio: but how to know if a plane is just a simplified mesh?
cabanier: meshes are pretty new still
… but using the space would be _a_ solution for now
https://github.com/immersive-web/webxr-hand-input/issues/120
cabanier: we talked about this a bit, on the quest pro because controllers are self-tracked we can report both hands and controllers
… but do we need to expose this as a capability?
… are there any use cases? an obvious one is to use the controller as a pen
… is it worth the effort to expose, and should it be a feature flag?
ada: it couldn't be on the same XR input, because the grip spaces would be different
ada: it generally feels weird (TODO: ada to add some details :))
ada: I guess you could have handedness for accessories too
… definitely something we should work on
… we'll have to solve this eventually
bajones: not use I agree with ada. I think it's reasonable and desirable to have the hand information and controller info on the same XRInput
bajones: the "dangling controller" is an interesting edge case
… one thing we discussed in the past: we could provide an estimated hand pose with a controller
… we probably want some sort of signal that this is just an estimation
… and then you can have a case where you get hand poses that are tracked and not estimated
… the only weirdness is in the case where I have a controller + hand pose and I get a select event? what is it?
… I guess the system will use the controller as a higher quality signal for select, but probably OS specific
… in any case, apart from the "dangling controller scenario", sounds like that should work
cabanier: one use case where they should be different is quest pro controllers being used as a pen
… because you hold the controller differently
… what bajones said about estimated the hands and rendering them is true, we have an API for that but not in WebXR
… I like ada's proposal of calling this "accessories", it could be a controller or something else
… in OpenXR these are exposed as "extra controllers", I think it makes sense to do the same
ada: you mean the hands are extra controllers?
cabanier: instead of 2 controllers you just get 4
ada: then we might want to have "primary controllers" in your sources list
cabanier: today the controllers already disappear
cabanier: but we didn't discuss if we _should_ expose this
ada: I think it's interesting!
… it's the kind of things people are going to want to do
… especially in the future with full body detection?
ada: what does the room thinks?
<cabanier> +1
<ada> +1
ada: just do +1 / -1 in IRC
<yonet> +1
+1
<bajones> +1
<vicki> +1
ada: I think it's worth looking at
<alcooper> +1
cabanier: definitely nobody is against it :)
… but is it useful?
ada: it is!
ada: this is a new PR, as promised! now that Vision Pro was announced
ada: it's a small change
<ada> https://
ada: the idea behind this input type, currently named "transient intent"
… was related to gaze tracking
… but it could be useful for general intents
… and it's transient
… the input only exists for the duration of the gesture
… on Vision Pro when you pinch it uses your gaze for the target ray, but then it uses your hand
… it's a bit like tracked pointers but the target ray space comes "out of your face"
… it could be a fit for different input types, like accessibility
… the conversation on the PR addresses some comments from bajones
… he pointed out the the target ray space not updating is not really useful
… one way would be for the developer to provide a depth map of the interactive objects to the XRInput at the start of the event, so that the system knows where you might want to interact
… we could then update the input ray accordingly
… but that's a huge API
… another alternative would be to make the assumption that you probably interacted with something 1-2m away and then estimate
… the last thing would be to modify the target ray space to act as a child of the grip space
… (gestures to explain)
… if the developer attaches to object to the target ray space instead of the grip space it would still work as expected
… these are the options I have in mind
… we should discuss and bikeshed the enum itself
bialpio: I think I understand why the space is not updated, but poses are a property of the spaces
… and we need to update them even if nothing changed
… the other concern is that we can create hit test subscription out of spaces
ada: yes but the input is transient
bialpio: need to think about hit testing a bit more
… but we should make sure that the XR system knows the origin of that space at all time during the gesture
… someone might subscribe to the hit test, and we need to know the latest state of the space
ada: in this case the third option might work out
bialpio: yeah that might be the easiest way for this to work
ada: it is important that hit test works
bialpio: do we need to spec out that something _should not change_
ada: or _might now change_
… for gaze tracking we don't expose continuous gaze
… for obvious privacy reasons
bialpio: we could not change it, but keep things open
… it's not guaranteed not to change
ada: we should probably say something about solution #3 for cases where it won't update continuously
… it shouldn't just work well for button presses but also gestures
bajones: lots of thoughts
… but first thank you ada!
… not sure I fully understand the concerns about spaces updating or not
… first thing, should this be a "gaze input", we should strive to match the pointer type to what fits it best
… and introducing a new enum as needed
… on Vision Pro it's a gaze input that happens to be transient
… on some platforms in might not be
… to extend it to assistive devices it would be desirable for them to advertise as gaze input
… if there's nothing particular to call out
… as long as there's a reasonable behavior analog
ada: just to clarify, you think we should also use gaze like the existing gaze input
bajones: yes it's probably okay to have the behavior you described on a gaze pointer
… people don't have that much expectation about how it should behave
… people are using this to inform "should I render something" and "where should I render it"
… same for screen input, when advertised as tracked-pointer people to generate an input that goes with it
… so we should see which behavior we most closely align with
… a transient gaze input is not an issue
ada: but it has a grip space too
bajones: that's good to refine the behavio
bajones: that's good to refine the behavior
… so if you take a system like this (Vision Pro), advertised as gaze but with a grip present
… that's a strong signal for the page
… I do have questions about how this interacts with hand input
… tracked input + transient eye-based input
… what's the grip pose at that point
ada: I have experience with this
… the demo I tested this with is a chess game, to see how to interact with fine objects
… the gaze has enough granularity for this to work
… I also built this with pure hand tracking
… and ended up with both on
… but it just worked together pretty well
… for buttons and menu it was nice to just look & pinch
… they just work together very well
… the hand based input didn't generate unwanted squeeze events
… a developer could just use each for different interactions
bajones: just to clarify, in that mode, the hand input gives you full poses but never generates a select event
ada: correct
bajones: that should be okay, the spec specifies that you need at least one primary input for select events
… the eye-based primary input would be that
… it's transient but that's propably okay
… again thank you for this work
… I want to discuss the manipulation of the ray when you select _again_
… I'm concerned about the automatic parenting of the 2 spaces
… that would be the thing for precise interactions
… but I'm worried that it's contextually sensitive and wouldn't generalize
… it makes a lot of sense to do gaze, pinch, drag
… but then if I <rotates hand on Y> how would the ray pivot?
… there's a lot of nuance to that
… very specific to the application, only appropriate in some cases
… in the chess scenario I might want to pivot the piece in place, rather than swinging it around
… but in the positional audio sample, I probably want to speaker to pivot around my hand
… not sure the browser is the right place to make this decision
ada: so maybe we should go back to having the target ray space not update
bajones: if it stays static, we loose simple click and drag gestures
… buttons are definitely the interaction that I see most
… I do wonder if you could take the initial eye ray, then lock it to the head
… then the application could opt into something more precise
… not sure if I advocate for that or not
… it would allow the existing applications to work, but weird
ada: I would rather having things obviously not work
bajones: it's what you get with cardboard but it's expected
… there's a subset of experiences that won't be updated
… it would be sad if they didn't work on shiny new hardware
… but it might be an awkward fallback
ada: video intermission
… showing the chess demo
cabanier: so what happens if you enable hands at the same time?
ada: you get both, but hands don't send select events
cabanier: so you get the input source until you let go of the gesture?
ada: yes
… they come through as not-handed
cabanier: you say it's a hand?
ada: currently we say it's a single button
… and not rendering anything
… but this is hardcoded (the not rendering anything part)
… otherwise you get a generic controller rendered doing the pinch
… but if we do what bajones suggested and use gaze it might just work
… what's the input profile for gaze?
ada: we know the line we're looking down, but not the depth
cabanier: so you don't know the point in space
ada: no
cabanier: in openXR there is an extension that gives you that: the point in space you're looking at
ada: do you think you would use that if we had it?
cabanier: yes I would prefer using that
ada: so you'd start the ray from the point you're looking at
cabanier: yes that would be the origin of the ray, not exactly sure
ada: it might be weird if you're deeper and you want to do a raycast
… I can look into that
cabanier: if it has hands enabled as well, people are probably going to use that
ada: nothing changes with hands enabled
… you get a third input (2 hands + additionals)
… in this case up to 2 extra transient inputs
cabanier: how does it work with existing experiences?
ada: works pretty well!
… it works well with both hands + the gaze
… if you want you can test it with the positional audio demo
alcooper: bajones made the suggestion of just using gaze
… it was my first instinct too
… especially with transient clearly in the name
… having scrolled through the spec there's a bit of a note about the gaze
… we might want to soften the language
… but adding a new type might add confusion
ada: you think it's okay to have 2 simultaneous gaze input?
alcooper: that would probably be fine, the spec does allow for transient inputs already
bajones: we already had situations where you can have 2 different select events at the same time
… I don't think it would be a problem having 2 different gazes
… they will just track until the select is over
… they usually don't do anything specific for "things going on at the same time"
dino: who's actually going to ship gaze tracking on the web?
cabanier: not sure
… we haven't exposed eye tracking
… I was thinking about it
… quest pro already have hand tracking
… if we introduce a system gesture, it could break some assumptions
dino: so you don't yet have plans to provide raw gaze information
cabanier: no we don't
dino: does anyone?
bajones: I'm not sure...
… mozilla advocated against it a while back
… I don't think anyone is planning on it (raw gaze)
alcooper: when we say gaze we're talking about the enum value
… not raw gaze
bajones: I would imagine that in the case of an eye tracked pointer, you might actually lock to the head at the periphery, not tracking your eyeball
… looking at the chrome codebase, cardboard doesn't surface an input profile
… but we should still have an input profile for this scenario
alcooper: this goes back to the grip space being null if it's not trackable
… since you shouldn't render anything
bajones: if this was anything more than a 1-button input
… you probably want an input profile
… if a single button produces a ray you probably can go away without an input profile
… input profiles should be used for rendering and capabilities
… we touched on this in the other issue
vicki: I think gaze came up previously when we discussed the name
… just want to clarify that this API proposal is not about "Vision Pro input"
… and as dino said no one is implementing gaze tracking
bajones: it's true that gaze is a bit of a misnomer
… it was the name chosen to imply that the input was coming from the user's head
… it's not about the mechanism , how it's being produced
… more of a signal to the dev, don't draw anything at the source!
… it's what it's trying to imply
… so reusing the same name, while uncomfortable, just says it came from the user's head
… it's a hint to the developer
cabanier: thinking about how we could implement this without breaking everybody
… very few experiences expect hands
… on quest pro we could do this when hands aren't requested
… it seems that if you do request hands we shouldn't
ada: hands on meta quest emits events?
cabanier: yes
<leonard> @Ada: I need to leave for the next hour (until 1600 CEDT). I will rejoin after that.
<ada> Channel for the next call is #webapps
<yonet> Anyone need the links here
<Brandel> yes please!
<ada> Zoom: ?wpk=wcpk%7B0%7D%26%26%26%26wcpk0a39d36aab24f17be9f1bf6b5792a2d6&_x_zm_rtaid=6bk_xSkOTTOIdMqQCNeA7Q.1694530119754.1552d5b557fefc954976377efe79b6f8&_x_zm_rhtaid=237
<ada> Day 2 minutes: https://
<yonet> Discord is webapps
<ada> we are in #webapps
<Brandel> ada: following that link takes me to a page
<Brandel> asking for a passcode
<xiaoqian> https://
<alcooper> I joined from the calendar link atsushi sent in email as well: https://
<atsushi> s|https://
<ada> We're heading back to the room!
https://github.com/immersive-web/webxr/issues/1339
ada: developers need to know where to place their content, cannot assume things about placement now
… it would be useful to have a space such that when you place things at that space's origin, it'd be in a good spot
… it'd be interesting to have it be customizable on the fly
… e.g. user could configure it appropriately
… its value could change per session so that it won't be used for fingerprinting
… it could also work for someone who's laying down
… so the interactive content space would shift accordingly
… and the user wouldn't need to sit up to interact with the content
… unsure if it needs to be a feature or just sth available by default
bajones: curiousity, is it z-space?
ada: dell + z-space
bajones: not sure if right tool for this specific case
… maybe, but I think they need their own separate space to track in
… I talked to them previously and they seemed to be OK w/ having to work with content built specifically for their form factor
… so old content compat may not be too big concern
… interesting idea, the spot for interactive content will be different for different people
… concern is that it goes both ways, on one hand if it's explicit space, then there's a wide range of experiences that don't have access to this feature
… you essentially opt into accessibility feature, which isn't great
… there may be cases where exposing accessibility feature will be abused in competitive spce
… but we don't want to allow opting out of accessibility too
… I like the idea, I like it could be used for accessibility, but we shouldn't want accessibility to rely on this feature
… if you want this to be truly useful for accessibility, you need to allow to move the viewer space, which could enable cheating in competitive space
… but apps could implement protections against cheating
ada: they don't have to use the comfort space - they can opt out
bajones: correct, but we should consider accessibility features separately from spaces (?)
… I want to make sure we don't conflate it with being an accessibility tool, but this shouldn't be the only way towards accessibility
ada: it will make life better for everyone and also could be used for accessibility
Brandel: another aspect of interactable zone which this makes me think of is magic leap device
… it had 2 focal planes - 45cm and 1m
… in case it's possible to project (?) at 2 different distances
cabanier: unsure about our plans, quest has couple modes to acommodate people in different positions
… there's bed mode, unsure what will happen if you enter WebXR then
… but it is reasonable that everything tilts
bajones: how do you enable it
cabanier: unsure if it persists, but I'd assume if you enter WebXR in bed mode, it would persist and not force a different space on you
… maybe there's no local-floor
… there's an airplane mode with no expectations that you'll move
ada: I'd be ok if it were up to the user agent to change this space to accommodate the user
bajones: curious about Brandel's scenario - do you have thoughts about how the comfort space (?) may be determined
… examples would be useful
Brandel: HTC Vive and Quest has fixed focal distance so it may be uncomfortable to have object presented as if it's at 50cm
… but on ML it could be at different focal distances
… when varying focal systems come into play it could be possible to leverage those capabilities
bialpio: don't we mandate that y-axis is opposite of the gravity vector
ada: that would still be the case
… if you're playing building blocks, you might want to move the gravity
bialpio: I was worred if we already backed ourselves into a corner
… if it's a regular xrspace, we can change it
… for reference spaces we can not
bajones: no, that's not correct
… at one point y-axis had gravity but then hololens used it on the space station so we removed it
bialpio: I think we need to check the other modules
bajones: we removed it from the spec. It used to be there
… (quotes spec)
… the spec has non-normative text that says how the y axis should work
… this comfort space should still have a y-axis relative to the other y-axis's of other space
… I would expect all reference spaces to be aligned
(https: //www.w3.org/TR/webxr-hit-test-1/#:~:text=(towards%20negative%20gravity%20vector).)
ada: will kick the spec process off
cabanier: RE ML & focal planes, one of them was infinity plane and the way the switched is they looked at your eyes to decide
ada: you may want to place it at the focal distance of the headset
https://github.com/immersive-web/real-world-meshing/issues/1
bajones: what are ppl using and what features are causing impl difficulties, is it worth complexity if noone is using those, etc.
… I called out "inline" because they are causing issues for us and their usage is low
… webxr samples uses them and yonet uses them and maybe a couple other places but for most part it is handled by frameworks where WebXR is just additional mode to switch to
… perhaps inline is not needed
https://github.com/immersive-web/webxr/issues/1341
bajones: unclear if it's fulfilling the purpose it was designed for
… may not be worth the complexity
… e.g. layers stuff caused us to ask how will it interact with inline
… & we have mostly ignored those questions
cabanier: chrome has telemetry, can you share some numbers?
… is it already published somewhere?
bajones: (looking) unsure what we can share
… looking at one graph, cannot give numbers
… very low proportion of sessions
cabanier: maybe they use polyfill?
bajones: low single-digits
… if used then maybe through libraries & polyfill
… if we do want to remove it, it'd be a very long process with long deprecation period
… the problematic case is content that was abandoned
… so if this is not useful and creating impl. burden then it may make sense to deprecate
… worst case they'll lose inline preview prior to entering the experience
alcooper: viewer-only sessions may not hit telemetry
… but since it's done blink-side, it means we can polyfill
yonet: I use it to lazy-load resources, it may be useful to keep it
Brandel: let's go with first step of deprecation and see who complains?
bajones: there's been a decent chunk of browser ecosystem that didn't implement WebXR
… so you still had to rely on a polyfill or three.js / babylon
… but we are approaching the moment where more browsers implement it
… so now might be the right moment to go with deprecation
ada: it'd be a good thing to start on it now so that when we launch things we can launch with the right state
… there were discussions about other kinds of inline or other kinds of sessions
cabanier: it could be useful to have inline stereo
… 3d movie does not need head pose
… there is already plenty of content that could work in 3d
Brandel: I wonder about whether head pose is going to be needed to make the experience comfortable
… we haven't fully resolved it yet
bajones: scenarios cabanier pointed out are way simpler cases overall
… render into 2 views and that's it
… more interesting case for inline is having punched-through, magic-window-style content similar to what we chatted about yesterday during <model>
… this could be the thing we want for z-space
… worth talking about it at some point
… but it may be too complicated for someone just wants to display the stereo content
ada: it could be useful to have the simple case that can be upgraded
cabanier: the reason to use WebXR is that there are already frameworks that talk WebXR
… if you want to watch stereo it'd be good to have curved screens
… IDK what'd happen if you requested it on a flat screen - would it curve?
ada: you could go from inline into "fullscreen" immersive session
cabanier: it'd be useful but it may not be appropriate for this group - stereo video is pretty messy now
… it's pretty bad experience now
ada: mentions layers
cabanier: even just video element is messy
cabanier: we tried displaying stereo when in fullscreen but content gets displayed side by side
ada: unsure which group to take it to
cabanier: I don't think there were any mtgs about it during this TPAC
ada: are we pro-deprecating inline? let's make a PR?
ada: untracked inline stereo session would be interesting
cabanier: we need to discuss this more
ada: that's it for today! next round at 11:30 on Fri
<cabanier> ada: hands tracking using the camera