外连接是在数据库查询中经常用到的特性,通过连接可以对同一数据库的一个集合执行左外连接,并连接集合的文档进行筛选操作。MongoDB聚合的$lookup会给每个输入的文档新增一个数组字段,连接集合中符合匹配条件的文档会放在新增的数组字段中,重塑后的文档会被传递到管道的下一阶段。下面详细介绍下聚合连接的用法。

$lookup的用法

{
   $lookup:
      {
        from: <要连接的集合>,
        localField: <输入集合的字段>,
        foreignField: <要连接集合的字段>,
        let: { <var_1>: <表达式>, …, <var_n>: <表达式> },
        pipeline: [ <要运行的管道操作> ],
        as: <输出数组的字段名>
      }
}
  • from:指定要连接的集合(要求同一个库)。另外,此处也可以使用$documents关键字,相当于使用常量文档。
  • localField:指定输入集合中的字段,这个字段会与连接集合的字段进行匹配。如果指定的字段不存在,则会用空值替代(注意:字段名写错不会报错哦,会用空值替代)
  • foreignField:链接集合中进行匹配的字段,如果字段不存在,也会被当做空值。
  • let:可选,let可以引入输入文档的字段用于pipeline管道参数。引入变量需要使用$$<variable>进行访问。
  • pipeline:用于from参数指定集合的管道操作,它决定了连接集合的返回结果,当管道元素为0即pipeline=[]时返回结果。pipeline阶段不能包含$out$merge。pipeline不能直接访问外连文档中的字段,需要用let选项来声明后才能使用。
  • as:指定要添加到输入文档中的新数组字段的名称。新数组字段包含 from 集合中的匹配文档。如果输入文档中已存在指定的名称,现有字段将被覆盖。
    let引入变量的注意事项:
  • let引入的变量可以用于pipeline参数的各个阶段包括内嵌的$lookup阶段。
  • pipeline的$match阶段需要使用$expr操作符才能访问引入的变量
  • 当在$expr操作符中使用$eq$lt$lte$gt$gte等比较操作符时,可以使用到引用集合的索引,但是要注意下面的限制:
  • 无法使用复合索引
  • 不能用于数组或未定义类型
  • 比较操作涉及的字段路径不能大于一。
  • pipeline中的非$match阶段访问变量不需要$expr操作符

举例

最简单的单字段连接

创建order集合,并添加下列数据

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
] )

创建inventory集合,并添加下列数据:

db.inventory.insertMany( [
   { "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, "description": "Incomplete" },
   { "_id" : 6 }
] )

集合order通过item字段和inventory的sku字段进行连接:

db.orders.aggregate( [
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
] )

聚合结果:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

数组字段外连接

如果 localField 是数组,会把localField的每个元素与foreignField 匹配,且无需 $unwind,举个例子。
创建classes集合并添加下列记录:

db.classes.insertMany( [
   { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] },
   { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] }
] )

创建 members 集合并添加下列记录:

db.members.insertMany( [
   { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" },
   { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" },
   { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" },
   { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" },
   { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" },
   { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" }
] )

下面是把classes和members进行连接,并且根据enrollmentlist字段与name字段进行匹配,匹配的时候会把enrollmentlist数组字段的每个值与name进行匹配,name在enrollmentlist中的记录会被匹配到:

db.classes.aggregate( [
   {
      $lookup:
         {
            from: "members",
            localField: "enrollmentlist",
            foreignField: "name",
            as: "enrollee_info"
        }
   }
] )

聚合结果:

{
   "_id" : 1,
   "title" : "Reading is ...",
   "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ],
   "days" : [ "M", "W", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" },
      { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" }
   ]
}
{
   "_id" : 2,
   "title" : "But Writing ...",
   "enrollmentlist" : [ "giraffe1", "artie" ],
   "days" : [ "T", "F" ],
   "enrollee_info" : [
      { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" },
      { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" }
   ]
}

$lookup与$mergeObjects结合使用,合并联查后的数组

$mergeObjects可以把多个文档合并到一个文档。在$lookup后,可以增加个$mergeObjects阶段,可以把连接后的文档与输入文档合并。
创建order集合并插入两条数据:

db.orders.insertMany( [
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }
] )

创建items集合并插入几条数据

db.items.insertMany( [
  { "_id" : 1, "item" : "almonds", description: "almond clusters", "instock" : 120 },
  { "_id" : 2, "item" : "bread", description: "raisin and nut bread", "instock" : 80 },
  { "_id" : 3, "item" : "pecans", description: "candied pecans", "instock" : 60 }
] )

下面的操作首先在$lookup阶段通过两个集合的item字段进行连接,然后在$replaceRoot阶段使用$mergeObjects合并items和orders集合的文档。

db.orders.aggregate( [
   {
      $lookup: {
         from: "items",
         localField: "item",    // orders集合的item字段
         foreignField: "item",  // items集合的item字段
         as: "fromItems"
      }
   },
   {
      $replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } }
   },
   { $project: { fromItems: 0 } }
] )

聚合后的结果:

{
  _id: 1,
  item: 'almonds',
  description: 'almond clusters',
  instock: 120,
  price: 12,
  quantity: 2
},
{
  _id: 2,
  item: 'pecans',
  description: 'candied pecans',
  instock: 60,
  price: 20,
  quantity: 1
}

执行多重连接和相关子查询

管道可在连接集合上执行,并包含多个连接条件。连接条件可以引用输入集合中的字段,也可以引用外连集合中的字段。这样就可以在两个集合之间进行相关子查询。举例:
创建orders集合,并添加记录:

db.orders.insertMany( [
  { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
  { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
  { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
] )

创建warehouses集合并添加记录:

db.warehouses.insertMany( [
  { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
  { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
  { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
  { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
  { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
] )

在 orders.item 和 warehouse.stock_item 字段上使用关联子查询。确保库存项目数量能满足订购数量。

db.orders.aggregate( [
   {
      $lookup:
         {
           from: "warehouses",
           let: { order_item: "$item", order_qty: "$ordered" },
           pipeline: [
              { $match:
                 { $expr:
                    { $and:
                       [
                         { $eq: [ "$stock_item",  "$$order_item" ] },
                         { $gte: [ "$instock", "$$order_qty" ] }
                       ]
                    }
                 }
              },
              { $project: { stock_item: 0, _id: 0 } }
           ],
           as: "stockdata"
         }
    }
] )

聚合后的结果:

{
  _id: 1,
  item: 'almonds',
  price: 12,
  ordered: 2,
  stockdata: [
    { warehouse: 'A', instock: 120 },
    { warehouse: 'B', instock: 60 }
  ]
},
{
  _id: 2,
  item: 'pecans',
  price: 20,
  ordered: 1,
  stockdata: [ { warehouse: 'A', instock: 80 } ]
},
{
  _id: 3,
  item: 'cookies',
  price: 10,
  ordered: 60,
  stockdata: [ { warehouse: 'A', instock: 80 } ]
}

等价的SQL语句

SELECT *, stockdata
FROM orders
WHERE stockdata IN (
   SELECT warehouse, instock
   FROM warehouses
   WHERE stock_item = orders.item
   AND instock >= orders.ordered
);

执行不相关子查询

聚合管道$lookup阶段可以在连接的集合上执行管道,它允许非相关子查询。非相关子查询不引用连接的文档字段。
创建absences集合并添加文档

db.absences.insertMany( [
   { "_id" : 1, "student" : "Ann Aardvark", sickdays: [ new Date ("2018-05-01"),new Date ("2018-08-23") ] },
   { "_id" : 2, "student" : "Zoe Zebra", sickdays: [ new Date ("2018-02-01"),new Date ("2018-05-23") ] },
] )

创建holidays集合并添加文档

db.holidays.insertMany( [
 { "_id" : 1, year: 2018, name: "New Years", date: new Date("2018-01-01") },
 { "_id" : 2, year: 2018, name: "Pi Day", date: new Date("2018-03-14") },
 { "_id" : 3, year: 2018, name: "Ice Cream Day", date: new Date("2018-07-15") },
 { "_id" : 4, year: 2017, name: "New Years", date: new Date("2017-01-01") },
 { "_id" : 5, year: 2017, name: "Ice Cream Day", date: new Date("2017-07-16") }
] )

将absences集合与holidays集合中的 2018 年假期信息连接起来:

db.absences.aggregate( [
   {
      $lookup:
         {
           from: "holidays",
           pipeline: [
              { $match: { year: 2018 } },
              { $project: { _id: 0, date: { name: "$name", date: "$date" } } },
              { $replaceRoot: { newRoot: "$date" } }
           ],
           as: "holidays"
         }
    }
] )

聚合结果:

{
  _id: 1,
  student: 'Ann Aardvark',
  sickdays: [
    ISODate("2018-05-01T00:00:00.000Z"),
    ISODate("2018-08-23T00:00:00.000Z")
  ],
  holidays: [
    { name: 'New Years', date: ISODate("2018-01-01T00:00:00.000Z") },
    { name: 'Pi Day', date: ISODate("2018-03-14T00:00:00.000Z") },
    { name: 'Ice Cream Day', date: ISODate("2018-07-15T00:00:00.000Z")
    }
  ]
},
{
  _id: 2,
  student: 'Zoe Zebra',
  sickdays: [
    ISODate("2018-02-01T00:00:00.000Z"),
    ISODate("2018-05-23T00:00:00.000Z")
  ],
  holidays: [
    { name: 'New Years', date: ISODate("2018-01-01T00:00:00.000Z") },
    { name: 'Pi Day', date: ISODate("2018-03-14T00:00:00.000Z") },
    { name: 'Ice Cream Day', date: ISODate("2018-07-15T00:00:00.000Z")
    }
  ]
}

等价的SQL:

SELECT *, holidays
FROM absences
WHERE holidays IN (
   SELECT name, date
   FROM holidays
   WHERE year = 2018
);

执行简明关联子查询

从 MongoDB 5.0 开始,$lookup支持简明关联子查询语法,改进了集合间的连接。新的简洁语法不再要求在$match阶段的$expr操作符中对外部字段和本地字段进行相等匹配。举例说明:
创建restaurants集合并添加记录

db.restaurants.insertMany( [
   {
      _id: 1,
      name: "American Steak House",
      food: [ "filet", "sirloin" ],
      beverages: [ "beer", "wine" ]
   },
   {
      _id: 2,
      name: "Honest John Pizza",
      food: [ "cheese pizza", "pepperoni pizza" ],
      beverages: [ "soda" ]
   }
] )

创建orders集合并添加记录

db.orders.insertMany( [
   {
      _id: 1,
      item: "filet",
      restaurant_name: "American Steak House"
   },
   {
      _id: 2,
      item: "cheese pizza",
      restaurant_name: "Honest John Pizza",
      drink: "lemonade"
   },
   {
      _id: 3,
      item: "cheese pizza",
      restaurant_name: "Honest John Pizza",
      drink: "soda"
   }
] )

下面的聚合在执行pipeline前,会先通过orders.restaurant_name和orders和restaurants进行匹配,然后再分别使用$$orders_drink$beverages 访问的orders.drink和restaurants.beverages字段,并在他们之间执行$in数组匹配。

db.orders.aggregate( [
   {
      $lookup: {
         from: "restaurants",
         localField: "restaurant_name",
         foreignField: "name",
         let: { orders_drink: "$drink" },
         pipeline: [ {
            $match: {
               $expr: { $in: [ "$$orders_drink", "$beverages" ] }
            }
         } ],
         as: "matches"
      }
   }
] )

orders.drink和restaurants.drink字段中的"soda"值匹配。输出显示了匹配数组,并包含restaurants集合中所有已加入的匹配字段,执行的结果如下:

{
   "_id" : 1, "item" : "filet",
   "restaurant_name" : "American Steak House",
   "matches" : [ ]
}
{
   "_id" : 2, "item" : "cheese pizza",
   "restaurant_name" : "Honest John Pizza",
   "drink" : "lemonade",
   "matches" : [ ]
}
{
   "_id" : 3, "item" : "cheese pizza",
   "restaurant_name" : "Honest John Pizza",
   "drink" : "soda",
   "matches" : [ {
      "_id" : 2, "name" : "Honest John Pizza",
      "food" : [ "cheese pizza", "pepperoni pizza" ],
      "beverages" : [ "soda" ]
   } ]
}