外连接是在数据库查询中经常用到的特性,通过连接可以对同一数据库的一个集合执行左外连接,并连接集合的文档进行筛选操作。MongoDB聚合的$lookup会给每个输入的文档新增一个数组字段,连接集合中符合匹配条件的文档会放在新增的数组字段中,重塑后的文档会被传递到管道的下一阶段。下面详细介绍下聚合连接的用法。
$lookup的用法
{
$lookup:
{
from: <要连接的集合>,
localField: <输入集合的字段>,
foreignField: <要连接集合的字段>,
let: { <var_1>: <表达式>, …, <var_n>: <表达式> },
pipeline: [ <要运行的管道操作> ],
as: <输出数组的字段名>
}
}
- from:指定要连接的集合(要求同一个库)。另外,此处也可以使用
$documents
关键字,相当于使用常量文档。 - localField:指定输入集合中的字段,这个字段会与连接集合的字段进行匹配。如果指定的字段不存在,则会用空值替代(注意:字段名写错不会报错哦,会用空值替代)
- foreignField:链接集合中进行匹配的字段,如果字段不存在,也会被当做空值。
- let:可选,let可以引入输入文档的字段用于pipeline管道参数。引入变量需要使用
$$<variable>
进行访问。 - pipeline:用于from参数指定集合的管道操作,它决定了连接集合的返回结果,当管道元素为0即pipeline=[]时返回结果。pipeline阶段不能包含
$out
和$merge
。pipeline不能直接访问外连文档中的字段,需要用let选项来声明后才能使用。 - as:指定要添加到输入文档中的新数组字段的名称。新数组字段包含 from 集合中的匹配文档。如果输入文档中已存在指定的名称,现有字段将被覆盖。
let引入变量的注意事项:
- let引入的变量可以用于pipeline参数的各个阶段包括内嵌的$lookup阶段。
- pipeline的
$match
阶段需要使用$expr
操作符才能访问引入的变量 - 当在
$expr
操作符中使用$eq
、$lt
、$lte
、$gt
、$gte
等比较操作符时,可以使用到引用集合的索引,但是要注意下面的限制:
- 无法使用复合索引
- 不能用于数组或未定义类型
- 比较操作涉及的字段路径不能大于一。
- pipeline中的非
$match
阶段访问变量不需要$expr
操作符
举例
最简单的单字段连接
创建order集合,并添加下列数据
db.orders.insertMany( [
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
{ "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
{ "_id" : 3 }
] )
创建inventory集合,并添加下列数据:
db.inventory.insertMany( [
{ "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
{ "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
{ "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
{ "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
{ "_id" : 5, "sku": null, "description": "Incomplete" },
{ "_id" : 6 }
] )
集合order通过item字段和inventory的sku字段进行连接:
db.orders.aggregate( [
{
$lookup:
{
from: "inventory",
localField: "item",
foreignField: "sku",
as: "inventory_docs"
}
}
] )
聚合结果:
{
"_id" : 1,
"item" : "almonds",
"price" : 12,
"quantity" : 2,
"inventory_docs" : [
{ "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
]
}
{
"_id" : 2,
"item" : "pecans",
"price" : 20,
"quantity" : 1,
"inventory_docs" : [
{ "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
]
}
{
"_id" : 3,
"inventory_docs" : [
{ "_id" : 5, "sku" : null, "description" : "Incomplete" },
{ "_id" : 6 }
]
}
数组字段外连接
如果 localField 是数组,会把localField的每个元素与foreignField 匹配,且无需 $unwind,举个例子。
创建classes集合并添加下列记录:
db.classes.insertMany( [
{ _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
{ _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
] )
创建 members 集合并添加下列记录:
db.members.insertMany( [
{ _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
{ _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
{ _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
{ _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
{ _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
{ _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
] )
下面是把classes和members进行连接,并且根据enrollmentlist字段与name字段进行匹配,匹配的时候会把enrollmentlist数组字段的每个值与name进行匹配,name在enrollmentlist中的记录会被匹配到:
db.classes.aggregate( [
{
$lookup:
{
from: "members",
localField: "enrollmentlist",
foreignField: "name",
as: "enrollee_info"
}
}
] )
聚合结果:
{
"_id" : 1,
"title" : "Reading is ...",
"enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
"days" : [ "M", "W", "F" ],
"enrollee_info" : [
{ "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
{ "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
{ "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
]
}
{
"_id" : 2,
"title" : "But Writing ...",
"enrollmentlist" : [ "giraffe1", "artie" ],
"days" : [ "T", "F" ],
"enrollee_info" : [
{ "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
{ "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
]
}
$lookup与$mergeObjects结合使用,合并联查后的数组
$mergeObjects
可以把多个文档合并到一个文档。在$lookup
后,可以增加个$mergeObjects
阶段,可以把连接后的文档与输入文档合并。
创建order集合并插入两条数据:
db.orders.insertMany( [
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
{ "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }
] )
创建items集合并插入几条数据
db.items.insertMany( [
{ "_id" : 1, "item" : "almonds", description: "almond clusters", "instock" : 120 },
{ "_id" : 2, "item" : "bread", description: "raisin and nut bread", "instock" : 80 },
{ "_id" : 3, "item" : "pecans", description: "candied pecans", "instock" : 60 }
] )
下面的操作首先在$lookup
阶段通过两个集合的item字段进行连接,然后在$replaceRoot
阶段使用$mergeObjects
合并items和orders集合的文档。
db.orders.aggregate( [
{
$lookup: {
from: "items",
localField: "item", // orders集合的item字段
foreignField: "item", // items集合的item字段
as: "fromItems"
}
},
{
$replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } }
},
{ $project: { fromItems: 0 } }
] )
聚合后的结果:
{
_id: 1,
item: 'almonds',
description: 'almond clusters',
instock: 120,
price: 12,
quantity: 2
},
{
_id: 2,
item: 'pecans',
description: 'candied pecans',
instock: 60,
price: 20,
quantity: 1
}
执行多重连接和相关子查询
管道可在连接集合上执行,并包含多个连接条件。连接条件可以引用输入集合中的字段,也可以引用外连集合中的字段。这样就可以在两个集合之间进行相关子查询。举例:
创建orders集合,并添加记录:
db.orders.insertMany( [
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
] )
创建warehouses集合并添加记录:
db.warehouses.insertMany( [
{ "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
{ "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
{ "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
{ "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
{ "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
] )
在 orders.item 和 warehouse.stock_item 字段上使用关联子查询。确保库存项目数量能满足订购数量。
db.orders.aggregate( [
{
$lookup:
{
from: "warehouses",
let: { order_item: "$item", order_qty: "$ordered" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$stock_item", "$$order_item" ] },
{ $gte: [ "$instock", "$$order_qty" ] }
]
}
}
},
{ $project: { stock_item: 0, _id: 0 } }
],
as: "stockdata"
}
}
] )
聚合后的结果:
{
_id: 1,
item: 'almonds',
price: 12,
ordered: 2,
stockdata: [
{ warehouse: 'A', instock: 120 },
{ warehouse: 'B', instock: 60 }
]
},
{
_id: 2,
item: 'pecans',
price: 20,
ordered: 1,
stockdata: [ { warehouse: 'A', instock: 80 } ]
},
{
_id: 3,
item: 'cookies',
price: 10,
ordered: 60,
stockdata: [ { warehouse: 'A', instock: 80 } ]
}
等价的SQL语句
SELECT *, stockdata
FROM orders
WHERE stockdata IN (
SELECT warehouse, instock
FROM warehouses
WHERE stock_item = orders.item
AND instock >= orders.ordered
);
执行不相关子查询
聚合管道$lookup
阶段可以在连接的集合上执行管道,它允许非相关子查询。非相关子查询不引用连接的文档字段。
创建absences集合并添加文档
db.absences.insertMany( [
{ "_id" : 1, "student" : "Ann Aardvark", sickdays: [ new Date ("2018-05-01"),new Date ("2018-08-23") ] },
{ "_id" : 2, "student" : "Zoe Zebra", sickdays: [ new Date ("2018-02-01"),new Date ("2018-05-23") ] },
] )
创建holidays集合并添加文档
db.holidays.insertMany( [
{ "_id" : 1, year: 2018, name: "New Years", date: new Date("2018-01-01") },
{ "_id" : 2, year: 2018, name: "Pi Day", date: new Date("2018-03-14") },
{ "_id" : 3, year: 2018, name: "Ice Cream Day", date: new Date("2018-07-15") },
{ "_id" : 4, year: 2017, name: "New Years", date: new Date("2017-01-01") },
{ "_id" : 5, year: 2017, name: "Ice Cream Day", date: new Date("2017-07-16") }
] )
将absences集合与holidays集合中的 2018 年假期信息连接起来:
db.absences.aggregate( [
{
$lookup:
{
from: "holidays",
pipeline: [
{ $match: { year: 2018 } },
{ $project: { _id: 0, date: { name: "$name", date: "$date" } } },
{ $replaceRoot: { newRoot: "$date" } }
],
as: "holidays"
}
}
] )
聚合结果:
{
_id: 1,
student: 'Ann Aardvark',
sickdays: [
ISODate("2018-05-01T00:00:00.000Z"),
ISODate("2018-08-23T00:00:00.000Z")
],
holidays: [
{ name: 'New Years', date: ISODate("2018-01-01T00:00:00.000Z") },
{ name: 'Pi Day', date: ISODate("2018-03-14T00:00:00.000Z") },
{ name: 'Ice Cream Day', date: ISODate("2018-07-15T00:00:00.000Z")
}
]
},
{
_id: 2,
student: 'Zoe Zebra',
sickdays: [
ISODate("2018-02-01T00:00:00.000Z"),
ISODate("2018-05-23T00:00:00.000Z")
],
holidays: [
{ name: 'New Years', date: ISODate("2018-01-01T00:00:00.000Z") },
{ name: 'Pi Day', date: ISODate("2018-03-14T00:00:00.000Z") },
{ name: 'Ice Cream Day', date: ISODate("2018-07-15T00:00:00.000Z")
}
]
}
等价的SQL:
SELECT *, holidays
FROM absences
WHERE holidays IN (
SELECT name, date
FROM holidays
WHERE year = 2018
);
执行简明关联子查询
从 MongoDB 5.0 开始,$lookup
支持简明关联子查询语法,改进了集合间的连接。新的简洁语法不再要求在$match
阶段的$expr
操作符中对外部字段和本地字段进行相等匹配。举例说明:
创建restaurants集合并添加记录
db.restaurants.insertMany( [
{
_id: 1,
name: "American Steak House",
food: [ "filet", "sirloin" ],
beverages: [ "beer", "wine" ]
},
{
_id: 2,
name: "Honest John Pizza",
food: [ "cheese pizza", "pepperoni pizza" ],
beverages: [ "soda" ]
}
] )
创建orders集合并添加记录
db.orders.insertMany( [
{
_id: 1,
item: "filet",
restaurant_name: "American Steak House"
},
{
_id: 2,
item: "cheese pizza",
restaurant_name: "Honest John Pizza",
drink: "lemonade"
},
{
_id: 3,
item: "cheese pizza",
restaurant_name: "Honest John Pizza",
drink: "soda"
}
] )
下面的聚合在执行pipeline
前,会先通过orders.restaurant_name和orders和restaurants进行匹配,然后再分别使用$$orders_drink
和$beverages
访问的orders.drink和restaurants.beverages字段,并在他们之间执行$in
数组匹配。
db.orders.aggregate( [
{
$lookup: {
from: "restaurants",
localField: "restaurant_name",
foreignField: "name",
let: { orders_drink: "$drink" },
pipeline: [ {
$match: {
$expr: { $in: [ "$$orders_drink", "$beverages" ] }
}
} ],
as: "matches"
}
}
] )
orders.drink和restaurants.drink字段中的"soda"值匹配。输出显示了匹配数组,并包含restaurants集合中所有已加入的匹配字段,执行的结果如下:
{
"_id" : 1, "item" : "filet",
"restaurant_name" : "American Steak House",
"matches" : [ ]
}
{
"_id" : 2, "item" : "cheese pizza",
"restaurant_name" : "Honest John Pizza",
"drink" : "lemonade",
"matches" : [ ]
}
{
"_id" : 3, "item" : "cheese pizza",
"restaurant_name" : "Honest John Pizza",
"drink" : "soda",
"matches" : [ {
"_id" : 2, "name" : "Honest John Pizza",
"food" : [ "cheese pizza", "pepperoni pizza" ],
"beverages" : [ "soda" ]
} ]
}