Prediction-Based Uncertainty Estimation for Adaptive Crowd Navigation

Katyal, Kapil D.; Popek, Katie; Hager, Gregory D.; Wang, I-Jeng; Huang, Chien-Ming

doi:10.1007/978-3-030-50334-5_24

Kapil D. Katyal^10,11,
Katie Popek¹⁰,
Gregory D. Hager¹¹,
I-Jeng Wang^10,11 &
…
Chien-Ming Huang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12217))

Included in the following conference series:

International Conference on Human-Computer Interaction

3240 Accesses
1 Citations

Abstract

Fast, collision-free motion through human environments remains a challenging problem for robotic systems. In these situations, the robot’s ability to reason about its future motion and other agents is often severely limited. By contrast, biological systems routinely make decisions by taking into consideration what might exist in the future based on prior experience. In this paper, we present an approach that provides robotic systems the ability to make future predictions of the environment. We evaluate several deep network architectures, including purely generative and adversarial models for map prediction. We further extend this approach to predict future pedestrian motion. We show that prediction plays a key role in enabling an adaptive, risk-sensitive control policy. Our algorithms are able to generate future maps with a structural similarity index metric up to 0.899 compared to the ground truth map. Further, our adaptive crowd navigation algorithm is able to reduce the number of collisions by 43% in the presence of novel pedestrian motion not seen during training.

You have full access to this open access chapter, Download conference paper PDF

Learning Social and Physical Compliant Multi-modal Futures

Multi-objective deep reinforcement learning for crowd-aware robot navigation with dynamic human preference

Article 14 June 2023

Deep Learning for Predicting Pedestrian Trajectories in Crowds

Keywords

1 Introduction

Humans and robots that co-exist in unstructured environments play a key role in bringing robotic assistance to the mainstream. We have seen unprecedented improvements in the field of autonomous mobile robot navigation in the presence of static and dynamic obstacles in the scene. This research has paved the way for industries, such as autonomous driving to move from proof-of-concept to field-able systems. While the progress has been significant, several challenges remain that provide unique opportunities to increase the reliability, safety and performance of autonomous mobile platforms to human level performance.

Here, we describe an approach centered around making predictions of the environment as a critical element for mobile robot navigation in highly dynamic and cluttered spaces. Our underlying assumption is that predictions of spatial structures and other agents in the scene play a vital role for humans to successfully navigate in similarly highly dynamic and dense environments. Specifically, we believe prediction allows humans to plan trajectories that extend beyond their visual line-of-sight as well as anticipate the movement other humans in the environment. This suggests errors in prediction can capture ‘surprise’ events that enable humans to alter the risk of their own internal navigation policy.

Here, we summarize and extend the works presented in [1, 2] by providing mobile agents with a version of predictive capabilities. We develop techniques that use generative neural networks that leverage spatial structures as a prior for making map predictions that extend beyond the robot’s line-of-sight. We extend this to include map prediction with multiple hypotheses and an information-theoretic robot exploration algorithm. Finally, we extend prediction to include pedestrian motion and develop a risk sensitive control algorithm for navigating around humans. Our objective is to show prediction is a key enabler that leads to more efficient and robust mobile navigation policies.

The remainder of this paper is organized as follows. In Sect. 2, we provide a brief overview of existing work in prediction and robot navigation in human crowds. In Sect. 3, we compare and contrast several approaches to occupancy map prediction. In Sect. 4, we extend approaches to occupancy map prediction to include generation of multiple hypotheses for efficient robot exploration. In Sect. 5, we extend prediction to include pedestrians for adaptive crowd navigation.

2 Related Work

2.1 Deep Learning for Generative Models

Deep neural networks have been used in a number of promising ways to achieve high performance in domains such as vision, speech and more recently in robotics manipulation [3, 4]. Oh et. al. used feedforward and recurrent neural networks to perform action-conditional video prediction using Atari games with promising results [5]. These have also been used in image completion, e.g., by Ulyanov et al. [6]. In addition, GANs have demonstrated a promising method for image generation [7]. Isola et al. proposed an approach for training conditional GANs which create one image from another image [8].

2.2 Deep Learning for Navigation

More recently, several papers have described approaches to combine elements of deep neural networks with autonomous navigation. These include using deep neural networks for model predictive control [3]. Tamar et al. proposed Value Iteration Networks, which embed a planner inside a deep neural net architecture [9]. Several papers investigate the use of deep reinforcement learning to develop collision-free planning without the need of an internal map, however, these are still restricted by the sensor’s FOV [10, 11].

2.3 Crowd Navigation

Crowd navigation has been studied extensively in recent literature (e.g., [12]) and can be classified into three general areas: (1) algorithms that react to moving obstacles in real time, (2) trajectory based approaches that plan paths by anticipating future motion of obstacles, and (3) reinforcement learning based approaches that learn a policy to navigate in crowded environments. Reaction based methods include works such as reciprocal velocity obstacles (RVO) [13] and optimal reciprocal collision avoidance (ORCA) [14]. Trajectory based approaches, such as [15, 16], explicitly propagate estimates of future motion over time and perform trajectory optimization on those future states for collision avoidance. Additionally, several recent works use variations of reinforcement learning to learn policies capable of crowd navigation (e.g., [17,18,19]). Everett et al. [18] developed a decentralized approach to multiagent collision avoidance using a value network that estimates the time to goal for a given state transition. Chen et al. [19] further extended this work by adding an attention mechanism and a novel pooling method to handle a variable number of humans in the scene. Kahn et al. [20] investigates adaptive navigation polices based on uncertainty; however, they only considered environment uncertainty with static obstacles and not navigation in the presence of pedestrians.

3 Occupancy Map Prediction

A major limitation of existing robot navigation algorithms is the limited field-of-view of traditional sensors. This is in contrast with biological systems that routinely make predictions of their environment. For example, imagine walking through a hallway or corridor. Often, we can predict spaces around a corner because we have seen similar hallways and corridors in the past. This prediction allows us to generate trajectories that extend beyond our line-of-sight. In this section, we describe an approach that allows robot systems to generate predicted occupancy maps that extend beyond the sensor’s line-of-sight. We evaluate several neural network architectures, loss functions and datasets to determine what approaches generate most accurate predicted occupancy maps. Formally, we are attempting to learn a function that maps an input occupancy map to an expanded occupancy map that extends beyond the FOV of the sensor.

$$ f : x \rightarrow y_i $$

where $x$ represents the state, in this case, the input occupancy map as an image, $y_i$ represents the output occupancy map and $i\in \mathbb {R}$ represents percent increase of the expanded occupancy map. Components of the function f include an encoding function $f_{enc}(x)\rightarrow h\in \mathcal {H}$ which maps the state space, input occupancy maps to a hidden state and $f_{dec}(h) \rightarrow (y_i)$, which is a decoding function mapping the hidden state to an expanded, predicted occupancy map.

In our experiments, we compare several different neural network architectures including:

(A)
a feedforward network based on a U-Net architecture (unet_ff)
(B)
a feedforward network based on the ResNet architecture (resnet_ff)
(C)
a GAN using the feedforward network from (a) as the generative network (gan)

3.1 U-Net Feedforward Model

The U-Net feedforward model is based on the network architecture defined by Ronneberger et al. [21] and consists of skip connections which allows a direct connection between layers $i$ and $n-i$ enabling the option to bypass the bottleneck associated with the downsampling layers in order to perform an identity operation. Similar to [8], the encoder network consists of 8 convolution, batch normalization and ReLU layers where each convolution consists of a $4 \times 4$ filter and stride length of 2. The number of filters for the 8 layers in the encoder network are: (64, 128, 256, 512, 512, 512, 512, 512). The decoder network consists of 8 upsampling layers with the following number of filters: (512, 1024, 1024, 1024, 1024, 512, 256, 128).

3.2 ResNet Feedforward Model

The ResNet feedforward model is based on the work by Johnson et al. [22] which consists of 2 convolution layers with stride 2, 9 residual blocks as defined by [23] and two deconvolution layers with with a stride of $ \frac{1}{2} $. A key reason this network was selected was based on the ability to learn identify functions, which is key to image translation as well as the success in image-to-image translation demonstrated by the CycleGAN network [24].

3.3 GAN Model

The GAN networks is based on the pix2pix architecture [8] which has demonstrated impressive results in general purpose image translation including generating street scenes, building facades and aerial images to maps. This network uses the U-Net Feedforward model defined in Sect. 3.1 and consists of a 6 layer discriminator network with filter sizes: (64, 128, 256, 512, 512, 512).

3.4 Occupancy Map Prediction Experiment and Results

Our approach to testing occupancy map prediction using the networks defined above first involved generating a dataset and then performing qualitative and quantitative analysis of the predicted images compared to the ground truth.

A dataset of approximately 6000 images of occupancy map subsets was created by simulating a non-holonomic robot moving through a two-dimensional map with a planar LIDAR sensor in C++ with ROS and the OctoMap library [25]. Two maps were created in Solidworks with the path width varying between 3.5 m to 10 m. These were converted into OctoMap’s binary tree format using binvox [26, 27] followed by OctoMap’s binvox2bt tool. The result is an occupancy map with all unoccupied space set as free. We require space outside of the walls to be marked as unknown to provide a ground truth for our estimated maps. These ground truth maps were created by fully exploring the original occupancy maps (Fig. 1).

The robot is modeled as a Dubin’s car, with a state vector $\mathbf {x} = [x, y, \theta ]$ and inputs $\mathbf {u} = [v, \dot{\theta }]$ where (x, y) is the robot’s position, v is the velocity, and $\theta $ and $\dot{\theta }$ are the heading angle and angular velocity, respectively. For simplicity, the robot is constrained to move at fixed forward velocity of 0.5 m/s. A planar LIDAR sensor with a scanning area of 270$^\circ $ and range of 20 m is used to simulate returns given the robot’s current pose against the ground truth map. These simulated returns are used to create the “estimated” occupancy map. Path planning is done with nonlinear model-predictive control and direct transcription at 10 Hz. At each time step, a subset of the maps (both the estimated and ground truth) are saved. A 5 m by 5 m square centered around the robot’s pose was chosen with a resolution of 0.05 m. At each time step, the robot’s current state and action space are also logged. Occupancy maps are expanded over time, so our simulation performs a continuous trajectory and the data set is built consecutively instead of randomly sampling throughout a map. A total of six trajectories were simulated. Four paths were used for training data (5221 images) and two were used as a test set (1090 images). Ground truth datasets of the expanded occupancy maps were also generated. These expanded occupancy maps range from 1.10x to 2.00x expansion in increments of 0.10x, e.g., a 2.00x expansion results in a 10 m by 10 m square subset centered around the robot.

Table 1. SSIM analysis for simulation data

Full size table

We trained each variant of the neural network using the expanded ground truth occupancy maps from scratch for 200 epochs with a batch size of 1. A total of 15 training sessions were performed to evaluate each of the three neural network architectures across five expansion increases (1.10x, 1.30x, 1.50x, 1.70x, and 2.00x). We use the Adam optimizer with an initial learning rate of 0.0002 and momentum parameters $\beta _1 = 0.5, \beta _2 = 0.999$. In the feedforward models, L1 loss was used as proposed in PatchGan [8] and in the GAN model L1+discriminator loss was used. The decoder layers of the network used a dropout rate of 0.50 and weights were initialized from a Normal distribution ($\mu =0, \sigma =0.2$). All models were implemented using PyTorch [28].

Table 1 provides the structural similarity index metric (SSIM) for each of the networks. Based on the SSIM metric, it can be seen that the U-Net feedforward model outperforms the other networks at 1.10x and 1.30x expansion confirming the qualitative assessment. The quality of the prediction generally decreases as the expansion percentage increases and with expansions 1.50x and above the three networks achieve similar performance. This leads us to believe skip connections across layers of an autoencoder network play a critical role in generating accurate predictions of future maps.

4 Uncertainty-Aware Occupancy Map Prediction

In the previous section, we explored properties of neural network architectures, loss functions and datasets used to validate the approach of occupancy map prediction. We now extend our neural network architectures to include generation of multiple hypotheses to measure the uncertainty of the predictions [2]. To capture this uncertainty, we modify the single hypothesis network to output multiple hypotheses predictions. We do this by branching N heads with each head capable of making its own prediction. The loss function is inspired by [29] to become a weighted sum, $1-\epsilon $, of the best performing head loss and the weighted sum, $\epsilon / (N-1)$, of the losses of the other heads. In our experiments, we set $\epsilon = 0.05$.

We show that measuring the uncertainty allows us to develop an information-theoretic exploration policy that drives the robot towards regions of high uncertainty. We modify the unet_ff architecture described in Sect. 3 to generate multiple hypotheses. By forming multiple hypotheses, we can compute the differences across the images to estimate regions of highest uncertainty. We first evaluate the benefits of making multiple predictions compared to single predictions using real world data sets provided by Google Cartographer [30] as described in Fig. 2. Here, even the average structural similarity index and peak signal to noise ratios improved when generating multiple predictions (Table 2).

Table 2. Quantitative analysis with multiple hypotheses prediction

Full size table

We next evaluate our information-theoretic exploration policy. We use the Gazebo simulation environment where the goal of the robot is to efficiently explore the environment to generate a map. Our approach is to drive the robot towards regions with highest uncertainty and compare to existing map exploration algorithms. As an example, in Fig. 2 (a), we show the input to our map prediction algorithm. In (b), we show that by only making a single prediction when multiple futures could exist results in a blurry image. In (c) and (d), we show that making multiple predictions allows our algorithm to represent the possible futures as distinct hypotheses. In (e), we show the difference in the hypotheses representing regions of high uncertainty.

We then cluster the image representing regions of high uncertainty and compute the region with highest variance across the hypotheses. We select this region as the next area to explore and compare with several alternative methods of robot exploration. As seen in Fig. 3, and further summarized in Table 3, we show a significant reduction in overall path length needed to explore the space compared to the alternative approaches. This shows uncertainty-aware prediction of occupied space can play a critical role in efficient exploration of unknown spaces.

5 Adaptive Crowd Navigation

In this section, we extend our uncertainty-aware occupancy map prediction to include prediction of humans in the environment. Our underlying assumption is that enabling prediction of pedestrians can play a critical role in adaptive crowd navigation policies that will ultimately lead to fewer collisions. As illustrated in Fig. 4, our goal is to show errors in pedestrian prediction can serve as a measure of ‘surprise’ (i.e. policy uncertainty) to help detect novel pedestrian motion not seen during training. By detecting novel pedestrian motion, we can alter the risk of the robot’s control policy thereby reducing the probability of collision.

Table 3. Total path length

Full size table

5.1 Pedestrian Prediction

Our pedestrian prediction algorithm is summarized in Fig. 5. This architecture consists of a generator and a discriminator network. The generator network includes a recurrent encoder network, a variational autoencoder, a recurrent decoder network, and an intent predictor. The discriminator network classifies samples representing either real or fake trajectories from the generator. Our main contribution lies in combining a probabilistic interpretation of the desired goal with embeddings learned from past trajectories. This approach allows us to make predictions of future pedestrian trajectories that meet or exceed state-of-the-art algorithms. The details of our pedestrian prediction approach are further presented in [1].

5.2 Crowd Navigation

For our adaptive crowd navigation algorithm, we leverage the CrowdSim simulation environment provided by [19]. CrowdSim supports a flexible interface that enables learning robot policies through simulated pedestrian motion. The pedestrian controller supports using an optimal reciprocal collision avoidance (ORCA) model to simulate pedestrian motion [14]. This model consists of parameters such as preferred velocity, the maximum distance and time to take into account neighboring agent behavior, pedestrian’s radius, and maximum velocity. In addition, CrowdSim provides an OpenAI Gym like environment [35] to experiment with reinforcement learning based policies controlling a robot’s actions to reach a target goal while avoiding obstacles.

We use a similar state space as described by [19, 36] which consists of the following parameters with respect to the robot position as the origin and the x-axis pointing towards the goal: distance from robot position to goal, robot’s preferred velocity, actual velocity and radius. For each pedestrian, the state includes position, velocity, radius, and distance between pedestrian and robot.

The action space consists of 3 discrete speeds and 6 discrete rotation angles for a total of 18 actions. The reward function encourages the robot to successfully reach the target while avoiding collisions with other pedestrians, and is defined similar to [18, 19] as:

$$ R(\mathbf {s}, \mathbf {a}) = {\left\{ \begin{array}{ll} -0.25 &{} \text {if} \quad d_{min}< 0 \\ -0.1-d_{min}/2 &{} \text {else if} \quad d_{min} < 0.2\\ 1 &{} \text {else if} \quad \text {robot reached goal}\\ 0 &{} \text {o.w.} \end{array}\right. } $$

where $d_{min}$ is the minimum distance separating the robot and the humans during the previous timestep.

To implement our adaptive crowd navigation policy, we train a risk averse and an aggressive policy. The aggressive policy consists of a preferred velocity of 2.0 m/s, and the risk averse policy is limited to 1.0 m/s. Our first set of experiments focuses on the performance of CADRL [36] and SARL [19] algorithms with pedestrian motion that is in-distribution of the training data and then outside of the distribution. We trained both algorithms with preferred robot velocities set to 2.0 m/s with a static pedestrian motion model. The starting and ending positions of the pedestrians were sampled uniformly inside a square of width 10 m. We followed this experiment with a series of tests where the adaptive control policy uses various methods of novelty detection of new pedestrian behaviors.

Our first evaluation compares SARL and CADRL with a one-class SVM with a radial basis kernel [37]. We train the one-class SVM algorithm based on the fixed pedestrian motion profile and evaluate its ability to detect novel distributions of pedestrian motion data. Our subsequent tests use deep learning based approaches for novelty detection including Social GAN [38] and our intent-aware pedestrian prediction algorithm. We trained both deep learning algorithms with the same fixed pedestrian motion profile that was used when training the original RL policies. We then tested pedestrian prediction with a changing distribution of pedestrian motion while allowing the robot to navigate towards the goal as described in Table 4. We compute an estimate of novelty by thresholding the prediction error, as measured using the final displacement error (FDE), by a value $\alpha $. If the error exceeds $\alpha $, the policy moves from an aggressive behavior to a risk averse policy with the goal of avoiding collisions. The value of $\alpha $ was chosen by computing the mean and standard deviation of the FDE in the training set. In our experiments, $\alpha $ was set to 3 standard deviations from the mean of the training set to eliminate outliers.

Table 4. ORCA model parameters

Full size table

Our metrics to compare the relative performance of each algorithm consists of the number of successful trials, number of collisions, the average navigation time, the discomfort level, and the average reward. The discomfort level is defined as the frequency of the separating distance being less than the desired separation distance, in this case 0.2 m. The results after running 500 test cases are reported in Table 5. We use the notation $method-p$ where method describes the method used (i.e. SVM, SARL, etc.) and p is the number of pedestrians in the scene.

The first two rows of Table 5 evaluate CADRL and SARL algorithms using pedestrian motion that is in-distribution of the training data. These algorithms generally performed well with SARL resulting in less collisions. Rows 3 and 4 then show the results of CADRL and SARL in the presence of out-of-distribution, novel pedestrian motion in which both algorithms have a significantly higher number of collision and discomfort rates. The one-class SVM algorithm to detect novel pedestrian motion was only able to show a modest improvement in reducing the number of collisions by 2 compared to the SARL algorithm.

The subsequent rows of Table 5 compare deep neural network approaches to detect novel pedestrian motion. Specifically, we compare our intent aware approach with that of Social GAN. Using Social GAN improved the one-class SVM by further reducing the number of collisions by 20. Our intent-aware pedestrian prediction algorithm provided the best performance across almost all of the metrics. We were able to further reduce the number of collisions by 5 compared to Social GAN and overall by 30 compared to the non-adaptive SARL policy. In addition to reducing the number of collisions, we also reduce overall discomfort rate and increase overall reward. Further, we show that these benefits scale as the number of pedestrians in the scene extend from 5 to 20.

Table 5. Quantitative analysis of collision avoidance

Full size table

6 Hardware Prototype

We further assess the real world applicability of our algorithms by evaluating in a proof-of-concept physical test environment (Fig. 6). Our physical test bed consists of the MIT Rapid Autonomous Complex-Environment Competing Ackermann-steering Robot (RACECAR) navigating through several pedestrians. This platform consists of a Hokuyo UST-10LX LiDAR, Sparkfun IMU, the Traxxas 1/10-scale chassis and an onboard NVidia Jetson processor with GPU.

We use a leg detector algorithm based on an SVM classifier to detect pedestrians with respect to the camera frame and a custom SLAM library to generate maps and estimate robot position. Using the pose of the robot with respect to the global frame and the pedestrian with respect to the robot, we transform the pedestrians to a global coordinate frame allowing us to run our trained navigation policies directly from simulations without requiring further training on the physical hardware. As part of this hardware demonstration, we were able to verify that the trained policy can transfer from simulation to the physical robot, execute in real time, operate on noisy sensors and is able to successfully reach its goal while navigating around pedestrians to avoid collisions.

7 Conclusion

In this paper, we highlight the importance of integrating predictive capabilities to mobile robotic systems as we believe this is a critical capability needed to improve the robustness and efficiency of navigation policies. We first experiment with the ability to predict beyond the robot’s line-of-sight using spatial structures as priors. We investigated several neural network architectures and evaluated the general trade-offs between performance and generation of high quality predictions. Next, we show extensions to occupancy map prediction by making multiple hypotheses. Using the multiple hypotheses, we compute regions of high uncertainty as a heuristic for efficient map exploration. We further extend this approach to include prediction of pedestrian motion. We show that by measuring the error between the predicted and observed trajectories, we can detect novel pedestrian motion. This enables a risk sensitive control policy resulting in significantly less collisions compared to alternative methods.

While the work present here show promising results, there are a number of possible extensions for further exploration. We can extend our map prediction algorithms to predict semantic information beyond occupied spaces. We can also improve information-theoretic exploration by using additional forms of uncertainty estimation including bootstrapping [39] and stochastic dropout [40, 41]. For pedestrian prediction, we can incorporate semantic information, group and social dynamics and additional forms of intent estimation to generate better predictions. Finally, we can extend our adaptive, risk-sensitive navigation policy to continually learn in the presence of novel pedestrian motion.

References

Katyal, K., Hager, G.D., Huang, C.-M.: Intent-aware pedestrian prediction for adaptive crowd navigation. In: 2020 International Conference on Robotics and Automation (ICRA) (2020)
Google Scholar
Katyal, K., Popek, K., Paxton, C., Burlina, P., Hager, G.D.: Uncertainty-aware occupancy map prediction using generative networks for robot navigation. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5453–5459, May 2019
Google Scholar
Finn, C., Levine, S.: Deep visual foresight for planning robot motion. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2786–2793. IEEE (2017)
Google Scholar
Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. arXiv preprint arXiv:1603.02199 (2016)
Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in Atari games. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2845–2853. Curran Associates Inc. (2015)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.S.: Deep image prior. CoRR, vol. abs/1711.10925 (2017). http://arxiv.org/abs/1711.10925
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Google Scholar
Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: Advances in Neural Information Processing Systems, pp. 2154–2162 (2016)
Google Scholar
Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. CoRR, vol. abs/1703.00420 (2017). http://arxiv.org/abs/1703.00420
Kahn, G., Villaflor, A., Ding, B., Abbeel, P., Levine, S.: Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. CoRR, vol. abs/1709.10489 (2017). http://arxiv.org/abs/1709.10489
Kruse, T., Pandey, A.K., Alami, R., Kirsch, A.: Human-aware robot navigation: a survey. Robot. Auton. Syst. 61(12), 1726–1743 (2013)
Article Google Scholar
van den Berg, J.P., Lin, M.C., Manocha, D.: Reciprocal velocity obstacles for real-time multi-agent navigation. In: 2008 IEEE International Conference on Robotics and Automation, pp. 1928–1935 (2008)
Google Scholar
van den Berg, J., Guy, S.J., Lin, M., Manocha, D.: Reciprocal n-body collision avoidance. In: Pradalier, C., Siegwart, R., Hirzinger, G. (eds.) Robotics Research. STAR, vol. 70, pp. 3–19. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19457-3_1
Chapter Google Scholar
Phillips, M., Likhachev, M.: SIPP: safe interval path planning for dynamic environments, pp. 5628–5635, June 2011
Google Scholar
Aoude, G.S., Luders, B.D., Joseph, J.M., Roy, N., How, J.P.: Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns. Auton. Robots 35(1), 51–76 (2013). https://doi.org/10.1007/s10514-013-9334-3
Article Google Scholar
Kretzschmar, H., Spies, M., Sprunk, C., Burgard, W.: Socially compliant mobile robot navigation via inverse reinforcement learning. Int. J. Robot. Res. 35(11), 1289–1307 (2016). https://doi.org/10.1177/0278364915619772
Article Google Scholar
Everett, M., Chen, Y.F., How, J.P.: Motion planning among dynamic, decision-making agents with deep reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, September 2018. https://arxiv.org/pdf/1805.01956.pdf
Chen, C., Liu, Y., Kreiss, S., Alahi, A.: Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning (2018)
Google Scholar
Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware reinforcement learning for collision avoidance. arXiv, vol. abs/1702.01182 (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. CoRR, vol. abs/1505.04597 (2015). http://arxiv.org/abs/1505.04597
Johnson, J., Alahi, A., Li, F.: Perceptual losses for real-time style transfer and super-resolution. CoRR, vol. abs/1603.08155 (2016). http://arxiv.org/abs/1603.08155
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR, vol. abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. CoRR, vol. abs/1703.10593 (2017). http://arxiv.org/abs/1703.10593
Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Auton. Robots (2013). http://octomap.github.com
Min, P.: Binvox (2004–2017). http://www.patrickmin.com/binvox. Accessed 20 Feb 2017
Nooruddin, F.S., Turk, G.: Simplification and repair of polygonal models using volumetric techniques. IEEE Trans. Vis. Comput. Graph. 9(2), 191–205 (2003)
Article Google Scholar
Paszke, A., et al.: Automatic Differentiation in PyTorch (2017)
Google Scholar
Rupprecht, C., Laina, I., Baust, M., Tombari, F., Hager, G.D., Navab, N.: Learning in an uncertain world: representing ambiguity through multiple hypotheses. CoRR, vol. abs/1612.00197 (2016). http://arxiv.org/abs/1612.00197
Hess, W., Kohler, D., Rapp, H., Andor, D.: Real-time loop closure in 2D LIDAR SLAM. In: IEEE International Conference on Robotics and Automation (ICRA) 2016, pp. 1271–1278 (2016)
Google Scholar
Yamauchi, B.: A frontier-based approach for autonomous exploration. In: Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA 1997. ‘Towards New Computational Principles for Robotics and Automation’, pp. 146–151, July 1997
Google Scholar
Bai, S., Wang, J., Chen, F., Englot, B.: Information-theoretic exploration with Bayesian optimization. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1816–1822. IEEE (2016)
Google Scholar
Bai, S., Wang, J., Doherty, K., Englot, B.: Inference-enabled information-theoretic exploration of continuous action spaces. In: International Symposium of Robotics Research (2015)
Google Scholar
Wirth, S., Pellenz, J.: Exploration transform: a stable exploring algorithm for robots in rescue environments. In: IEEE International Workshop on Safety, Security and Rescue Robotics: SSRR 2007, pp. 1–5. IEEE (2007)
Google Scholar
Brockman, G., et al.: OpenAI Gym. CoRR, vol. abs/1606.01540 (2016). http://arxiv.org/abs/1606.01540
Chen, Y.F., Liu, M., Everett, M., How, J.: Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning, pp. 285–292, May 2017
Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008). http://dl.acm.org/citation.cfm?id=1390681.1442794
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. CoRR, vol. abs/1803.10892 (2018). http://arxiv.org/abs/1803.10892
Efron, B.: The Jackknife, The Bootstrap and Other Resampling Plans. ser. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1982). Lectures Given at Bowling Green State Univ., June 1980. https://cds.cern.ch/record/98913
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of the 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, PMLR, New York, NY, USA, 20–22 June 2016, vol. 48, pp. 1050–1059 (2016). http://proceedings.mlr.press/v48/gal16.html
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). http://dl.acm.org/citation.cfm?id=2627435.2670313

Download references

Acknowledgements

This work is partially supported by the Johns Hopkins University (JHU) Institute for Assured Autonomy (IAA) Fund.

Author information

Authors and Affiliations

Johns Hopkins University Applied Physics Lab, Laurel, MD, 20723, USA
Kapil D. Katyal, Katie Popek & I-Jeng Wang
Johns Hopkins University, Baltimore, MD, 21218, USA
Kapil D. Katyal, Gregory D. Hager, I-Jeng Wang & Chien-Ming Huang

Authors

Kapil D. Katyal
View author publications
You can also search for this author in PubMed Google Scholar
Katie Popek
View author publications
You can also search for this author in PubMed Google Scholar
Gregory D. Hager
View author publications
You can also search for this author in PubMed Google Scholar
I-Jeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chien-Ming Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kapil D. Katyal .

Editor information

Editors and Affiliations

Siemens, Princeton, USA
Helmut Degen
University of Central Florida, Orlando, FL, USA
Lauren Reinerman-Jones

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Katyal, K.D., Popek, K., Hager, G.D., Wang, IJ., Huang, CM. (2020). Prediction-Based Uncertainty Estimation for Adaptive Crowd Navigation. In: Degen, H., Reinerman-Jones, L. (eds) Artificial Intelligence in HCI. HCII 2020. Lecture Notes in Computer Science(), vol 12217. Springer, Cham. https://doi.org/10.1007/978-3-030-50334-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-50334-5_24
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50333-8
Online ISBN: 978-3-030-50334-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Prediction-Based Uncertainty Estimation for Adaptive Crowd Navigation

Abstract

Similar content being viewed by others

Learning Social and Physical Compliant Multi-modal Futures

Multi-objective deep reinforcement learning for crowd-aware robot navigation with dynamic human preference

Deep Learning for Predicting Pedestrian Trajectories in Crowds

Keywords

1 Introduction

2 Related Work

2.1 Deep Learning for Generative Models

2.2 Deep Learning for Navigation

2.3 Crowd Navigation

3 Occupancy Map Prediction

3.1 U-Net Feedforward Model

3.2 ResNet Feedforward Model

3.3 GAN Model

3.4 Occupancy Map Prediction Experiment and Results

4 Uncertainty-Aware Occupancy Map Prediction

5 Adaptive Crowd Navigation

5.1 Pedestrian Prediction

5.2 Crowd Navigation

6 Hardware Prototype

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation