Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The potential impact of cooperative perception within a group of autonomous vehicles is unquestionable in many application domains, such as rescue missions [1, 2], forest fire monitoring [3], wildlife tracking, border control [4, 5], surveillance [5, 6] and reconnaissance [7], and battle damage assessment. One of the most common and versatile means of perception in multi-robot cooperative tasks is visual sensing with one or more cameras that are capable of acquiring visual information based on cooperative approaches. However, because technology miniaturization has improved significantly, there has been a tendency to decrease the vehicles’ dimensions and payload [6]. This consequently brings new research challenges to the vision community with a natural transition to a bearing-only vision setup or to a smaller rigid baseline for stereo systems, which has an inherent impact in application scenarios where the goal is to estimate the position of targets whose depth distance exceeds the baseline. The result from a smaller stereo rigid baseline will be a 3D estimation position error increasing quadratically with depth [8, 9]. Therefore, and in the context of an application scenario of rescue and border control missions, as depicted in Fig. 1, where the goal is to detect and estimate the position of a target in 3D, the main question is:

How is it possible to produce 3D information based on bearing-only vision information using a team of robot observers?

Fig. 1.
figure 1

Motivation application scenarios. Left: Cooperative perception target tracking for the PCMMC FCT project. Right: Cooperative search and rescue task for the FP7 project ICARUS in La Spezia, Italy.

This paper proposes to address this issue with a novel multi-robot heterogeneous cooperative perception framework, defined as Uncertainty-based Multi-Robot Cooperative Triangulation (UCoT), capable of estimating the 3D dynamic target position based on bearing-only measurements. The multi-robot cooperative triangulation method has been introduced in [10] and is extended in this paper in order to handle the uncertainty of the observation model provided by each robot by weighting the contribution of each monocular bearing ray in a probabilistic manner.

The motivation for the proposed framework emerged from open issues in the state-of-the-art on cooperative perception, to which the presented framework contributes:

  • providing a multi-robot cooperative method to estimate 3D information based on bearing-only vision information;

  • integrate all sources of uncertainty associated to the position, attitude and image pixel target position of each bearing-only sensor in a probabilistic manner, in order to weight the contribution of each visual sensor to the cooperative triangulation.

  • improving a bearing-only vision technique applied to Unmanned Aerial Vehicles (UAVs) in order to estimate 3D information based on the flat earth assumption [11];

For the first point outlined, in the bearing-only vision setup it is intrinsically difficult to estimate depth and absolute scale [12]. Therefore, estimating 3D target position without a known target size is a research challenge [13]. Moreover, techniques such as monocular vision system Structure-from-Motion (SFM) or Visual Simultaneous Localization and Mapping (VSLAM) [14] have managed to achieve good results in depth estimation in indoor and even in outdoor map building scenarios [9]. However, they also present some constraints, such as high computational requirements, low camera dynamics, preferably with features available between frame for batch recursive process and large field of view. Therefore, and based on the motivation scenario, the available methods are not adequate to support scenarios where both robots and targets have dynamic behavior. Methods for the 3D target estimation with a bearing-only vision configuration based on the flat earth assumption [11], were developed for a particular case of aerial vehicles in order to estimate the 3D target position, with the depth information being provided by the vehicle’s altitude without taking terrain morphology into consideration. The results of estimating the target position using this assumption are less accurate 3D information and the inability to estimate the position of targets that are not moving on the ground.

There are several techniques that can be used in the context of multi-robot cooperative 3D perception based on bearing-only vision information. An example of that is an offline collaborative VSLAM method, defined as CoSLAM [15]. The method is capable of estimate the pose between cameras with overlapped views by combining the conventional SFM method, Sukkarieh [16], in the ANSER project where the depth problem is solved with artificial landmarks of known size, and the Zhu [17] method, where two robots with an omnidirectional camera estimated the flexible baseline based on an object of known size.

The paper is organized as follows: Sect. 2 presents the proposed framework, the multi-robot architecture, as well as the mathematical formulation for the UCoT. The proposed framework is validated in Sect. 3 by presenting the results obtained in an outdoor scenario based on cooperative perception with a Micro Aerial Vehicle (MAV) and an Unmanned Ground Vehicle (UGV) tracking a target. Section 4 provides the conclusions and outlines future work topics.

2 Bearing-Only Multi-Robot Cooperative Triangulation

This section introduces our multi-robot cooperative triangulation framework to estimate 3D target position based on bearing-only measurements. The relative position and orientation provided by each robot are addressed by the framework, and based on the geometric constraints the 3D target position is estimated by establishing a flexible and dynamic geometric baseline for the cooperative triangulation. Therefore, two methods are formulated in this section: the Mid-Point Multi-Robot Cooperative Triangulation(MidCoT) and the Uncertainty based Multi-Robot Cooperative Triangulation (UCoT). In the first method, the framework selects the line that is perpendicular to the shortest segment for both rays, and assumes that both bearing-only cameras will contribute equally to the estimation of the target position, see Fig. 3. In the second method, all sources of uncertainty associated with the position, attitude and image plane pixel target position are addressed, and the covariance provided by the intersection rays is evaluated in order to weight, in a probabilistic manner, the contribution of each ray to the estimation of the target position, see Figs. 3 and 4.

The uncertainty of the observation model provided by each robot, including the robot uncertainty about its own pose, is described using the first order uncertainty propagation, with the assumption that all sources of uncertainty can be modeled as being uncorrelated Gaussian noise.

In the following sections, the notation \(_{from}^{to}\mathrm {\pmb {\xi }_{n}}\) will be used to denote the transformation matrix from one coordinate to another coordinate frame. We call \(\{B\}\) to the robot body frame and \(\{W\}\) to the global frame expressed in the ECEF. The upper case notation in bold represents the matrix variables, while the lower case in bold represents the vectors, and finally the lower case represents the scalar variables.

2.1 Multi-Robot Architecture Framework

The envisioned multi-robot architecture framework, outlined in Fig. 2, describes the layers required to ensure a cooperative perception altruistic commitment that can share useful information between vehicles using a communication middleware. The architecture is independent from the position and attitude source of the information, as well as from the vision system.

Fig. 2.
figure 2

Architecture framework for multi-robot cooperative triangulation.

The architecture framework is composed of the following components:

  • Localization, responsible for providing the robot pose to the local state component, described by the following matrix \(^{W}_{B}\mathrm {\pmb {\mathcal {\xi }}}\) related to the global frame. In the outlined architecture, this information is provided by an IMU as \({\pmb {u}} \mathop {=}\limits ^{\mathrm {def.}}\begin{bmatrix} \phi&\theta&\psi \end{bmatrix}^T\), where \( (\phi ,\theta ,\psi )\) are respectively the roll, pitch and yaw angles, and a by GPS as \( \pmb {\varsigma } \mathop {=}\limits ^{\mathrm {def.}}\begin{bmatrix} \lambda&\varphi&h \end{bmatrix}^T\), where \((\lambda ,\varphi ,h)\) are respectively the latitude, longitude, and altitude;

  • Local State, provides an output 3 tuple \(\langle ^{W}{\pmb {C}}, ^{W}\mathrm {\pmb {\mathcal {R}}},\{^{W}{\pmb {d}}\}\rangle \) related to the global frame composed by the camera position \(^{W}{\pmb {C}}\), attitude \(^{W}\mathrm {\pmb {\mathcal {R}}}\) and ray vectors \(\{^{W}{\pmb {d}}\}\), which represent the direction vector from the points detected by the monocular vision system \(\{\pmb {m}\}\);

  • Feature Correspondence, responsible for evaluating the tuples shared by other robots, related to the local state component. This evaluation is performed based on Euclidean distance between two points projected in the global frame from the intersection rays \(\lambda _{i}^{W}{\pmb {d}_{i}}\) and \(\lambda _{j}^{W}{\pmb {d}_{j}}\) with the perpendicular vector \(^{W}{\pmb {d}_{\bot }}\)[10], as depicted in Fig. 4;

  • Multi-Robot Cooperative Triangulation, responsible for the 3D target estimation and covariance \(\pmb {\varSigma }_{Target}\) related to all sources of uncertainty, as described in Eq. (11) for the UCoT method.

2.2 Uncertainty Based Multi-Robot Cooperative Triangulation

The formulation proposed in this paper is an extension to the well-known stereo rigid baseline mid-point triangulation method proposed by Trucco [18], as depicted in Eq. (1), which is related to the reference frame of the camera on the left. The proposed method removes the rigid baseline in order to support a flexible and dynamic geometric baseline composed of the information provided by each bearing-only vision system, camera position \(^{W}{\pmb {C}}\) and attitude \(^{W}\mathrm {\pmb {\mathcal {R}}}\) relatively to the global frame, as depicted in Fig. 3.

Fig. 3.
figure 3

Relative pose between robots \(i\) and \(j\) with a bearing-only vision system, estimating the 3D target position. The camera geometry change over time and provide a flexible and dynamic baseline able to ensure cooperative triangulation.

$$\begin{aligned} \lambda _{i} ^{B}{\pmb {d}_{i}} - \lambda _{j} \pmb {\mathcal {R}}^{TB}{\pmb {d}_{j}} + \alpha (^{B}{\pmb {d}_{i}} \wedge \pmb {\mathcal {R}}^{TB}{\pmb {d}_{j}} ) =\,\pmb {\mathcal {T}} \end{aligned}$$
(1)

Therefore, based on a dynamic baseline approach, related to the body frame, with a pair of bearing-only vision systems defined as \(i,j\), and assuming that each camera knows its own rotation and translation matrix \(\pmb {\mathcal {T}}_{i}\), \(\pmb {\mathcal {R}}_{i}\), \(\pmb {\mathcal {T}}_{j}\), \(\pmb {\mathcal {R}}_{j}\) relatively to their body reference frame, it is possible to obtain a \(^{B}\mathrm {\pmb {\mathcal {R}}}=\pmb {\mathcal {R}}_{j}^{T}\pmb {\mathcal {R}}_{i}\) and \(^{B}\mathrm {\pmb {\mathcal {T}}}=\pmb {\mathcal {R}}_{j}^{T}(-\pmb {\mathcal {T}}_{j}) - \pmb {\mathcal {R}}_{i}^{T}(-\pmb {\mathcal {T}}_{i})\). Therefore, replacing the \(^{B}\mathrm {\pmb {\mathcal {R}}}\) and \(^{B}\mathrm {\pmb {\mathcal {T}}}\) in Eq. (1), it is possible to obtain

$$\begin{aligned} \begin{aligned} \lambda _{i} ^{B}{\pmb {d}_{i}} - \lambda _{j}^{B}{\pmb {d}_{j}} + \alpha (^{B}{\pmb {d}_{i}}\wedge ^{B}{\pmb {d}_{j}})= {}^{B}\mathrm {\pmb {\mathcal {T}}}_{i} - ^{B}\mathrm {\pmb {\mathcal {T}}}_{j} \end{aligned} \end{aligned}$$
(2)

with \(\lambda _{i} ^{B}{\pmb {d}_{i}}\) and \(\lambda _{j} ^{B}{\pmb {d}_{j}}\) being the direction vectors related to the body frame, and \(^{B}{\pmb {d}_{\bot }}=^{B}{\pmb {d}_{i}}\wedge ^{B}{\pmb {d}_{j}}\) the 3D intersection vector perpendicular to both \(i\) and \(j\) rays. Solving the linear system from Eq. (2) to obtain the coefficients, \(\lambda _{i}, \lambda _{j}, \alpha \), the triangulated point \(^{B}\mathrm {\pmb {\mathcal {P}}}\) will be over the midpoint of the line segments joining \(^{B}{\pmb {C}}_{i} + \lambda _{i} ^{B}{\pmb {d}_{i}}\) and \(^{B}{\pmb {C}}_{j} + \lambda _{j} ^{B}{\pmb {d}_{j}}\)

$$\begin{aligned} \begin{aligned} ^{B}\mathrm {\pmb {\mathcal {P}}}&= \varOmega _{i}\;^{B}\mathrm {\pmb {\mathcal {P}}}_{i} + \varOmega _{j}\;^{B}\mathrm {\pmb {\mathcal {P}}}_{j} \\&= \varOmega _{i}(^{B}{\pmb {C}}_{i} + \lambda _{i} ^{B}{\pmb {d}_{i}}) + \varOmega _{j}(^{B}{\pmb {C}}_{j} + \lambda _{j} ^{B}{\pmb {d}_{j}}) \end{aligned} \end{aligned}$$
(3)

with \(^{B}{\pmb {C}}_{i}\) and \(^{B}{\pmb {C}}_{j}\) being the camera position related to the body frame and \(\varOmega _{i}\), \(\varOmega _{j}\) the weights to be derived. The 3D target position related to the global frame, as shown in Fig. 3, is derived from the transformation matrix \(^{W}_{B}\mathrm {\pmb {\xi }}\) from body frame to world frame, as

$$\begin{aligned} \begin{aligned} ^{W}\mathrm {\pmb {\mathcal {P}}}&= \varOmega _{i}(^{W}{\pmb {C}}_{i} + \lambda _{i} ^{W}{\pmb {d}_{i}}) + \varOmega _{j}(^{W}{\pmb {C}}_{j} + \lambda _{j} ^{W}{\pmb {d}_{j}}) \end{aligned} \end{aligned}$$
(4)

and for \(\varOmega _{i}=\varOmega _{j}=\frac{1}{2}\) establishes a method that will be defined as MidCoT.

Fig. 4.
figure 4

Covariance 3D ellipse from the intersection rays \(\pmb {\varSigma }_{\pmb {\mathcal {P}}_{i}}\) and \(\pmb {\varSigma }_{\pmb {\mathcal {P}}_{j}}\). The MidCoT method is represented by the blue dot and the UCoT by the blue cross. The purple line is the vector \(^{W}{\pmb {d}_{\bot }}\) perpendicular to \(\lambda _{i}^{W}{\pmb {d}_{i}}\) and \(\lambda _{j}^{W}{\pmb {d}_{j}}\).

The geometric intersection in 3D between \(\lambda _{i}^{W}{\pmb {d}_{i}}\), \(\lambda _{j}^{W}{\pmb {d}_{j}}\) and the perpendicular vector \(^{W}{\pmb {d}_{\bot }}\) are shown in Figs. 3 and 4, as well as the mid-point derived from Eq. (4) between \(^{W}\mathrm {\pmb {\mathcal {P}}}_{i}\) and \(^{W}\mathrm {\pmb {\mathcal {P}}}_{j}\)(represented by the blue dot in the covariance ellipse, see Fig. 4). To ensure that all sources of uncertainty are taken into consideration when estimating the 3D target position, the \(\pmb {\varSigma }_{Target}\) will be estimated based on assumption that there is uncertainty in the input pixel localization \(\sigma _{\pmb {m}}\), in the cameras position \(\sigma _{\pmb {\varsigma }}\) and attitude \(\sigma _{\pmb {u}}\) relatively to the global frame, all of them modelled as uncorrelated zero-mean Gaussian random variables.

$$\begin{aligned} \begin{array}{ccc} \sigma _{\pmb {\varsigma }}=\begin{bmatrix}{} \sigma _{\lambda } &{} 0 &{} 0\\ 0 &{} \sigma _{\varphi } &{} 0\\ 0 &{} 0 &{} \sigma _{h} \end{bmatrix} &{} \sigma _{\pmb {u}}=\begin{bmatrix} \sigma _{\phi } &{} 0 &{} 0 \\ 0 &{} \sigma _{\theta } &{} 0 \\ 0 &{} 0 &{} \sigma _{\psi } \\ \end{bmatrix} &{} \sigma _{\pmb {m}}=\begin{bmatrix} \sigma _{m_{x}} &{} 0 \\ 0 &{} \sigma _{m_{y}} \end{bmatrix} \\ \end{array} \end{aligned}$$
(5)

Using the first-order uncertainty propagation, it is possible to approximate the distribution of the variables, defined in Sect. 2.1, as the input state vector \(\pmb {\mathcal {\nu }}_{(i,j)} = [\pmb {\varsigma }_{i},\pmb {u}_{i},\pmb {m}_{i},\pmb {\varsigma }_{j}, \pmb {u}_{j},\pmb {m}_{j}]\), from Eq. (4), as multivariate Gaussians. The \(\pmb {\varSigma }_{Target}\) covariance matrix approximately models the uncertainty in the 3D target estimation, which is computed from the noisy measurements of the Multi-Robot Cooperative Triangulation, as follows:

(6)

where \(\pmb {J_{\mathcal {P}}} \) stands for the Jacobian matrix of \(^{W}\mathrm {\pmb {\mathcal {P}}}\) in Eq. (4) by

(7)

with \(\pmb {\varLambda }_{i,j}\) being the input covariance matrix represented by a diagonal line relatively to all sources of uncertainty present in Eq. (5) for each monocular vision system

$$\begin{aligned} \pmb {\varLambda }_{i,j[16\times 16]} = \begin{bmatrix} \sigma _{\pmb {\varsigma }{i}[3\times 3]}&\cdots&&&\\ \cdots&\sigma _{\pmb {u}{i}[3\times 3]}&\cdots&&\\&\cdots&\sigma _{\pmb {m}{i}[2\times 2]}&\cdots&\\&\cdots&\sigma _{\pmb {\varsigma }{j}[3\times 3]}&\cdots&\\&&\cdots&\sigma _{\pmb {u}{j}[3\times 3]}&\cdots \\&&\cdots&\sigma _{\pmb {m}{j}[2\times 2]} \\ \end{bmatrix} \end{aligned}$$
(8)

As stated at the beginning of this paper, this work aims at addressing all sources of uncertainty provided by each intersection ray, using a probabilistic weight provided by the estimated uncertainty of each ray. Therefore, as previously described, using the first-order uncertainty propagation, it is possible to estimate the covariance \(\pmb {\varSigma }_{\pmb {\mathcal {P}}i}\) and \(\pmb {\varSigma }_{\pmb {\mathcal {P}}j}\) related to \(^{W}\mathrm {\pmb {\mathcal {P}}}_i\) and \(^{W}\mathrm {\pmb {\mathcal {P}}}_j\) as follows:

$$\begin{aligned} \pmb {\varSigma }_{\pmb {\mathcal {P}}i} = \pmb {J_{\mathcal {P}}}_i \pmb {\varLambda }_{i,j} \pmb {J_{\mathcal {P}}}_i^T&\; \; \; \pmb {J_{\mathcal {P}}}_{i[3\times 16]} =\pmb {\nabla }_{(\pmb {\mathcal {\nu }}_(i,j))} \; ^{W}\mathrm {\pmb {\mathcal {P}}}_i\nonumber \\ \pmb {\varSigma }_{\pmb {\mathcal {P}}j} = \pmb {J_{\mathcal {P}}}_j \pmb {\varLambda }_{i,j} \pmb {J_{\mathcal {P}}}_j^T&\; \; \; \pmb {J_{\mathcal {P}}}_{j[3\times 16]} =\pmb {\nabla }_{(\pmb {\mathcal {\nu }}_(i,j))} \; ^{W}\mathrm {\pmb {\mathcal {P}}}_j \end{aligned}$$
(9)

where \(\pmb {J_{\mathcal {P}}}_i\) and \(\pmb {J_{\mathcal {P}}}_j\) are respectively the Jacobian matrix from \(^{W}\mathrm {\pmb {\mathcal {P}}}_i\) and \(^{W}\mathrm {\pmb {\mathcal {P}}}_j\), and \(\pmb {\varLambda }_{i,j}\) is the input covariance matrix from Eq. (8). Therefore, with the uncertainty of each intersection ray \(\pmb {\varSigma }_{\pmb {\mathcal {P}}i}\) and \(\pmb {\varSigma }_{\pmb {\mathcal {P}}j}\), and the perpendicular vector \(^{W}{\pmb {d}_{\bot }}\), the probabilistic weight of each ray is expressed as

$$\begin{aligned} \begin{matrix} \varOmega _{i}=\frac{(^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}j}\; ^{W}{\pmb {d}_{\bot }}^{T})^2}{(^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}i} \; ^{W}{\pmb {d}_{\bot }}^{T})^2 + (^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}j} \; ^{W}{\pmb {d}_{\bot }}^{T})^2}&\; \; \varOmega _{j}=\frac{(^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}i}\; ^{W}\mathrm {\pmb {d}_{\bot }}^{T})^2}{(^{W}\mathrm {\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}i} \; ^{W}{\pmb {d}_{\bot }}^{T})^2 + (^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}j} \; ^{W}{\pmb {d}_{\bot }}^{T})^2} \end{matrix} \end{aligned}$$
(10)

Using these weight in Eq. (4) we effectively obtain the optimal solution over \(^{W}{\pmb {d}_{\bot }}\), and the novel method becomes Uncertainty based Multi-Robot Cooperative Triangulation (UCoT). This cooperative triangulation method will ensure that the uncertainty of each bearing-only sensor will be weighted, using \(\varOmega _{i},\varOmega _{j} \), in a probabilistic manner comparatively to the 3D target estimation.

$$\begin{aligned} ^{W}\mathrm {\pmb {\mathcal {P}}} =&\frac{(^{W}\mathrm {\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}j}\; ^{W}{\pmb {d}_{\bot }}^{T})^2}{(^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}i} \; ^{W}{\pmb {d}_{\bot }}^{T})^2 + (^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}j} \; ^{W}{\pmb {d}_{\bot }}^{T})^2}(^{W}{\pmb {C}}_i + \lambda _{i} ^{W}{\pmb {d}_{i}}) \nonumber \\&+ \frac{(^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}i}\; ^{W}{\pmb {d}_{\bot }}^{T})^2}{(^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}i} \; ^{W}{\pmb {d}_{\bot }}^{T})^2 + (^{W}{\pmb {d}_{\bot }} \; \pmb {\varSigma }_{\pmb {\mathcal {P}}j} \; ^{W}{\pmb {d}_{\bot }}^{T})^2}(^{W}{\pmb {C}}_j + \lambda _{j} ^{W}{\pmb {d}_{j}}) \end{aligned}$$
(11)

3 Implementation and Results

This section describes the outdoor scenario, the robot issues and the experimental results related to the implementation of single and multi-robot cooperative perception methods proposed in two experimental cases. The goal was to evaluate the 3D position estimation of a static and a dynamic target (Fig. 5).

Fig. 5.
figure 5

Left: UGV TIGRE. Right: Asctec Pelican MAV.

The outdoor scenario chosen to evaluate the proposed framework is a non-urban area with several landscape elements including vegetation and rocks, as depicted in Fig. 1. The heterogeneous robots used in both experimental cases were the UGV TIGRE [19] and the Asctec Pelican MAV. Both robots used the Linux operating system, with wireless communication. As far as the TIGRE’s main hardware issues are concerned, there were two cameras with a resolution of \(1278\times 958\) in a stereo rigid baseline (\(\sim 0.76\) m), a Novatel GPS receiver and an IMU Microstrain. The Pelican MAV is a commercial platform to which was added a downward monocular camera with a resolution of \(1280\times 1024\).

The evaluation procedure in both experimental cases was composed of four methods used to estimate the 3D target position, including the proposed framework. The methods under evaluation are:

  • TIGRE - Stereo rigid baseline, based on the Mid-Point Stereo Triangulation from Trucco [18], estimate 3D target position with the available stereo rigid baseline;

  • Pelican - Monocular 3D target estimation, which follows the flat earth assumption method to obtain the 3D target position. The method was developed by Beard [11] and depth is provided by the MAV altitude;

  • Mid-Point Multi-Robot Cooperative Triangulation (MidCoT), based on the assumption that each robot provides a bearing-only measurement, the framework will establish a dynamic baseline between the UGV and the MAV and with equally weight contribution, \(\varOmega _{i}=\varOmega _{j}=\frac{1}{2}\), of each bearing-only measurement, as expressed in Eq. (4);

  • Uncertainty based Multi-Robot Cooperative Triangulation (UCoT), based on the uncertainty of each bearing-only measurement \(\pmb {\varSigma }_{\pmb {\mathcal {P}}i}\) and \(\pmb {\varSigma }_{\pmb {\mathcal {P}}j}\), the method weights the contribution of each one of the bearing-only vision sensor, in a probabilistic manner, to estimate the 3D target position, as expressed in Eq. (11).

The experimental results from each method are represented in Figs. 6 and 7, depicted on the left side the trajectories of the UGV(blue triangle), of the MAV(magenta circle) and of the target(red star). The estimated position of the target related to the UCoT method is represented by the green circle. The 3D target estimation error for each method related to the Euclidean distance between the camera and the target are presented on the right side of both figures. The accuracy evaluation in both experimental cases was achieved based on the Real-Time Kinematic (RTK) GPS attached to the target as an exogenous ground truth system. The 3D covariance ellipse from the intersection rays \(\pmb {\varSigma }_{\pmb {\mathcal {P}}i}\) and \(\pmb {\varSigma }_{\pmb {\mathcal {P}}j}\) are expressed in Fig. 8 for both experimental cases, and are related to three instances with different Euclidean distances between the camera and the target. The trajectory of both robots are represented by the blue triangle, in the case of the UGV and in magenta for the MAV.

In the static target experimental case, the UGV was positioned at a distance of \(\sim 35\) m and the MAV hovering over the static target at a distance of \(\sim 15\) m. Both robots were given a target location task: the MAV hovered over the target and the UGV performed an approximation maneuver relatively to the estimated target position. The target was moving at a velocity of \(\sim 0.8\) m/s relatively to the dynamic target experimental case, and both robots had the same target location task as previously described, although now with a rule of safe distance between the robots and the target of \(\sim 2\) m.

Fig. 6.
figure 6

Experimental case for a static target. 3D target estimation error for each method, relatively to the distance between the camera and the target.

Fig. 7.
figure 7

Experimental case for a dynamic target. 3D target estimation error for each method, relatively to the distance between the camera and the target.

Fig. 8.
figure 8

Covariance \(\pmb {\varSigma }_{\pmb {\mathcal {P}}i}\) and \(\pmb {\varSigma }_{\pmb {\mathcal {P}}j}\) from the intersection rays related to three instances in a static (top) and dynamic target tracking (bottom) with the UCoT.

The results in both experimental cases related to the single perception methods are in accordance with the expected results, especially for the stereo rigid baseline, where the accuracy of the 3D target estimation is very low for targets where the depth distance exceeds the available baseline. In the case of the flat earth assumption applied to the MAV Pelican in order to estimate the 3D target, the results also reveal an error similar to the stereo baseline due to the terrain morphology of the outdoor scenario.

The MidCoT presented in Figs. 6 and 7 shows a 3D target estimation error independent from the distance to the target when compared to single perception methods. This behavior is reflected in the standard deviation improvement \(\sigma \) shown in Table 1. However, the overall mean \(\mu \) result is less optimistic when compared to single perception methods. In this method, the 3-tuple rays \(\langle ^{W}{\pmb {C}}_{n}, ^{W}_{B}\mathrm {\pmb {\xi }n},\{^{W}{\pmb {d}}_{n}\}\rangle \) are shared between robots, and will have the same weight \(\varOmega _{n}\) for the multi-robot cooperative triangulation. By assuming an equal contribution, the impact of the sensor uncertainty provided by each robot is disregarded. For the experimental case with a static target, the impact of both robots with the same weight \(\{\varOmega _{n}\}\) in the triangulation can be observed in Fig. 6 when the Euclidean distance between the UGV and the static target varies between 5 and 10 m, and the 3D target estimation error does not decrease as expected. Even though the UGV is closer to the target, its contribution will have the same impact as the MAV 3-tuple information, which is influenced by a high uncertainty in the estimated position(\( \sim 3\) m). This is due to the low-cost GPS assembled, and also to the fact that the relative height to the target is stable at \(\sim 15\) m. This behavior can be also observed during the experimental case with a moving target (Fig. 7) because the distance between the moving target and the UGV varies between 5 and 15 m, and the resulting error in the 3D target estimation is dominated by the uncertainty of the MAV.

Table 1. 3D target estimation mean \(\mu \) and standard deviation \(\sigma \) error in meters for each method in a static and dynamic target experimental cases.

The UCoT presented in Figs. 6 and 7 illustrates an accuracy improvement in the 3D target estimation when compared to the three previous methods. The equal weight \(\varOmega _{n}\) contribution from each robot, described as a limitation in MidCoT, is overcome in UCoT because it assumes the camera position and attitude uncertainty of each ray in a probabilistic manner in the weight \(\varOmega _{n}\). The overall improvement is depicted in Table 1 with a lower mean \(\mu \) and standard deviation \(\sigma \) error in the 3D target estimation position.

4 Conclusions and Future Work

This paper proposes a decentralized multi-robot cooperative framework to estimate the 3D position of highly dynamic targets based on bearing-only vision measurements. The uncertainty of the observation model provided by each robot is effectively address to achieve cooperative triangulation by weighting the contribution of each monocular bearing ray in a probabilistic manner.

The impact of the multi-robot cooperative triangulation framework was evaluated in an outdoor scenario with a team of heterogeneous robots composed of an UGV and a MAV.

For future work, we intend to improve the feature correspondence component through a multi-robot epipolar line with a dynamic narrow band search space based on the uncertainty associated to the robot pose and pixel error. The contribution to the framework will be a dynamic hyperboloid confidence narrow correspondence point search space correlated to the sensors uncertainty.