目录
5. 批处理数据
6. 重命名集合
7. 删除数据
8. 引用数据库
9. 使用与索引相关的函数
大部分摘自《MongoDB大数据处理权威指南》(第3版)。
5. 批处理数据
MongoDB允许批量执行写入操作。通过这种方式,可首先定义数据集,再一次性写入它们。批量写入操作只能处理单一集合,可用于插入、更新或删除数据。
在批量写入数据之前,首先需要告诉MongoDB如何写入数据:有序还是无序。以有序方式执行操作时,MongoDB会按顺序执行操作列表。如果在处理一个写入操作时发生错误,就不处理剩下的操作。使用无序写入操作时,MongoDB以并行方式执行操作。如果在处理一个写入操作时发生错误,MongoDB将继续处理剩余的写入操作。
按以下步骤执行批处理:
- 初始化操作列表,有序操作的初始化函数是db.collection.initializeorderedBulkOp(),无序操作的初始化函数是db.collection.initializeUnorderedBulkOp()。
- 将要执行的操作插入操作列表。
- 使用execute()命令执行操作。
- 使用getOperations()评估输出(可选)。
下面看MongoDB官方文档中的例子:
> var bulk = db.users.initializeOrderedBulkOp();
> bulk.insert( { user: "abc123", status: "A", points: 0 } );
> bulk.insert( { user: "ijk123", status: "A", points: 0 } );
> bulk.insert( { user: "mop123", status: "P", points: 0 } );
> bulk.find( { status: "D" } ).remove();
> bulk.find( { status: "P" } ).update( { $set: { comment: "Pending" } } );
> bulk.execute();
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 3,
"nUpserted" : 0,
"nMatched" : 1,
"nModified" : 1,
"nRemoved" : 0,
"upserted" : [ ]
})
>
上面的批处理中顺序执行了3个插入、1个删除、1个更新操作。注意列表中最多可以包含1000个操作,超过此限制时,MongoDB会自动分割列表,把它们放在几个包含1000个操作的组中。执行后的数据如下:
> db.users.find();
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee341"), "user" : "abc123", "status" : "A", "points" : 0 }
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee342"), "user" : "ijk123", "status" : "A", "points" : 0 }
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee343"), "user" : "mop123", "status" : "P", "points" : 0, "comment" : "Pending" }
>
一旦使用execute()命令执行了批操作,就能够审查执行的写入操作。可以评估是否成功写入了所有数据,以及按什么顺序写入的。此外,一旦在写入期间出现问题,输出也有助于了解所执行的操作。使用getOperations()命令执行审查:
> bulk.getOperations();
[
{
"originalZeroIndex" : 0,
"batchType" : 1,
"operations" : [
{
"_id" : ObjectId("5bad83a52a4ee8fc88cee341"),
"user" : "abc123",
"status" : "A",
"points" : 0
},
{
"_id" : ObjectId("5bad83a52a4ee8fc88cee342"),
"user" : "ijk123",
"status" : "A",
"points" : 0
},
{
"_id" : ObjectId("5bad83a52a4ee8fc88cee343"),
"user" : "mop123",
"status" : "P",
"points" : 0
}
]
},
{
"originalZeroIndex" : 3,
"batchType" : 3,
"operations" : [
{
"q" : {
"status" : "D"
},
"limit" : 0
}
]
},
{
"originalZeroIndex" : 4,
"batchType" : 2,
"operations" : [
{
"q" : {
"status" : "P"
},
"u" : {
"$set" : {
"comment" : "Pending"
}
},
"multi" : true,
"upsert" : false
}
]
}
]
>
返回的数组包含operations键下处理的所有数据,batchType键表示执行的操作类型,1、2、3分别表示insert、update、delete操作。在无序列表中处理各类操作时,MongoDB会将这些操作按类型(插入、更新、删除)分组来提高性能。因此,应确保应用不依赖操作的执行顺序。有序列表的操作只会组合相同类型的连续操作,所以它们仍是按顺序处理的。
在一次处理大量数据时,批处理操作是非常有用的,还不会影响事先可用的数据集。
6. 重命名集合
> show collections;
audit
audit100
media
products
users
> db.users.renameCollection("users1");
{ "ok" : 1 }
> show collections;
audit
audit100
media
products
users1
>
MongoDB没有提供直接的修改数据库名称的函数,但可以使用renameCollection()把一个数据库中所有的集合移动到新库下,就相当于把整个库重命名了。
> show dbs;
admin 0.000GB
config 0.000GB
library 0.000GB
local 0.000GB
> use admin
switched to db admin
> db.runCommand({renameCollection: "test.test", to: "test1.test"});
{
"ok" : 0,
"errmsg" : "source namespace does not exist",
"code" : 26,
"codeName" : "NamespaceNotFound"
}
> db.runCommand({renameCollection: "library.users1", to: "test.users1"});
{ "ok" : 1 }
> show dbs;
admin 0.000GB
config 0.000GB
library 0.000GB
local 0.000GB
test 0.000GB
> use test;
switched to db test
> show collections;
users1
> use library;
switched to db library
> show collections;
audit
audit100
media
products
>
7. 删除数据
deleteOne()删除匹配条件的一个文档:
> db.users1.find({"status" : "A"});
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee341"), "user" : "abc123", "status" : "A", "points" : 0 }
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee342"), "user" : "ijk123", "status" : "A", "points" : 0 }
> db.users1.deleteOne({"status" : "A"});
{ "acknowledged" : true, "deletedCount" : 1 }
> db.users1.find({"status" : "A"});
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee342"), "user" : "ijk123", "status" : "A", "points" : 0 }
>
deleteMany()删除匹配条件的多个文档:
> db.users1.find();
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee342"), "user" : "ijk123", "status" : "A", "points" : 0 }
{ "_id" : ObjectId("5bad83a52a4ee8fc88cee343"), "user" : "mop123", "status" : "P", "points" : 0, "comment" : "Pending" }
> db.users1.deleteMany({});
{ "acknowledged" : true, "deletedCount" : 2 }
> db.users1.find();
>
db.users1.deleteMany({})将删除users1集合中的所有文档。
删除集合:
> db.users1.drop();
true
>
删除数据库:
> db.getName();
test
> db.dropDatabase();
{ "ok" : 1 }
>
注意dropDatabase()删除当前正在使用的数据库,所以一定要在执行前检查使用的是哪个数据库。
8. 引用数据库
MongoDB提供了两种方式实现文件间的引用:手动引用或使用DBRef标准。
(1)手动引用
手动引用通过在一个文档中使用另一个文档中的_id实现。下面是一个手动引用的例子。添加一个新文档,并在其中指定发布者的信息(注意_id字段):
> apress = ( { "_id" : "Apress", "Type" : "Technical Publisher", "Category" : ["IT",
... "Software","Programming"] } );
{
"_id" : "Apress",
"Type" : "Technical Publisher",
"Category" : [
"IT",
"Software",
"Programming"
]
}
> db.publisherscollection.insertOne(apress);
{ "acknowledged" : true, "insertedId" : "Apress" }
>
添加发布者信息之后,在media集合中添加一个的文档,它将指定Apress作为发布者的名字:
> book = ( { "Type" : "Book", "Title" : "Definitive Guide to MongoDB 3rd ed., The", "ISBN" : "978-1-4842-1183-0", "Publisher" : "Apress","Author" : ["Hows, David","Plugge, Eelco","Membrey,Peter","Hawkins, Tim"] } );
{
"Type" : "Book",
"Title" : "Definitive Guide to MongoDB 3rd ed., The",
"ISBN" : "978-1-4842-1183-0",
"Publisher" : "Apress",
"Author" : [
"Hows, David",
"Plugge, Eelco",
"Membrey,Peter",
"Hawkins, Tim"
]
}
> db.media.insertOne(book);
{
"acknowledged" : true,
"insertedId" : ObjectId("5bad93732a4ee8fc88cee344")
}
>
现在可以使用数据库引用了(这种引用的使用方式和RDBMS中的join毫无可比性)。先将含有发布者信息的文档赋给一个变量:
> book = db.media.findOne({"Publisher" : "Apress"});
{
"_id" : ObjectId("5baae32464e6602b766d94c0"),
"Type" : "Book",
"Title" : " Different Title",
"ISBN" : "978-1-4842-1183-0",
"Publisher" : "Apress",
"Author" : [ ]
}
>
为了获得发布者的信息,可以结合使用findOne()函数和点操作符:
> db.publisherscollection.findOne( { _id : book.Publisher } );
{
"_id" : "Apress",
"Type" : "Technical Publisher",
"Category" : [
"IT",
"Software",
"Programming"
]
}
>
(2)使用DBRef引用数据
DBRef提供了文档之间的引用数据的更正式的规范。使用DBRef的主要原因是,引用中文档所在集合可能发生变化。如果引用的一直都是相同的集合,那么手动引用数据也可以。
使用DBRef可以将数据库引用存储为标准的嵌入对象(JSON/BSON)。使用一种标准方式代表引用,意味着驱动和数据框架可以添加辅助方法,以标准的方法操作引用。
添加DBRef引用值的语法如下:
{ $ref : <collectionname>, $id : <id value>[, $db : <database name>] }
<collectionname>代表集合名称;<id value>代表被引用对象的_id字段;通过使用可选的$db可以引用其它数据库中的文档。下面是一个使用了DBRef的样例。首先清空两个集合并添加一个新文档:
> db.publisherscollection.drop();
true
> db.media.drop();
true
> apress = ( { "Type" : "Technical Publisher", "Category" : ["IT","Software","Programming"] } );
{
"Type" : "Technical Publisher",
"Category" : [
"IT",
"Software",
"Programming"
]
}
> db.publisherscollection.save(apress);
WriteResult({ "nInserted" : 1 })
>
通过变量名显示其内容:
> apress
{
"Type" : "Technical Publisher",
"Category" : [
"IT",
"Software",
"Programming"
],
"_id" : ObjectId("5bad9c3b2a4ee8fc88cee345")
}
>
现在在media集合中添加一个引用该数据的文档:
> book = { "Type" : "Book", "Title" : "Definitive Guide to MongoDB 3rd ed., The", "ISBN" : "978-1-4842-1183-0", "Author": ["Hows, David","Membrey, Peter","Plugge,Eelco","Hawkins, Tim"], Publisher : [ new DBRef ('publisherscollection',apress._id) ] };
{
"Type" : "Book",
"Title" : "Definitive Guide to MongoDB 3rd ed., The",
"ISBN" : "978-1-4842-1183-0",
"Author" : [
"Hows, David",
"Membrey, Peter",
"Plugge,Eelco",
"Hawkins, Tim"
],
"Publisher" : [
DBRef("publisherscollection", ObjectId("5bad9c3b2a4ee8fc88cee345"))
]
}
> db.media.save(book);
WriteResult({ "nInserted" : 1 })
> db.media.findOne();
{
"_id" : ObjectId("5bad9d5f2a4ee8fc88cee346"),
"Type" : "Book",
"Title" : "Definitive Guide to MongoDB 3rd ed., The",
"ISBN" : "978-1-4842-1183-0",
"Author" : [
"Hows, David",
"Membrey, Peter",
"Plugge,Eelco",
"Hawkins, Tim"
],
"Publisher" : [
DBRef("publisherscollection", ObjectId("5bad9c3b2a4ee8fc88cee345"))
]
}
>
修改文档所在集合:
> show collections;
media
publisherscollection
> db.publisherscollection.renameCollection("test");
{ "ok" : 1 }
> show collections;
media
test
> db.media.findOne();
{
"_id" : ObjectId("5bad9d5f2a4ee8fc88cee346"),
"Type" : "Book",
"Title" : "Definitive Guide to MongoDB 3rd ed., The",
"ISBN" : "978-1-4842-1183-0",
"Author" : [
"Hows, David",
"Membrey, Peter",
"Plugge,Eelco",
"Hawkins, Tim"
],
"Publisher" : [
DBRef("publisherscollection", ObjectId("5bad9c3b2a4ee8fc88cee345"))
]
}
> db.media.updateOne( { "_id" : ObjectId("5bad9d5f2a4ee8fc88cee346")}, {$set:{Publisher : [ new DBRef ('test',apress._id) ]}} );
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
> db.media.findOne();
{
"_id" : ObjectId("5bad9d5f2a4ee8fc88cee346"),
"Type" : "Book",
"Title" : "Definitive Guide to MongoDB 3rd ed., The",
"ISBN" : "978-1-4842-1183-0",
"Author" : [
"Hows, David",
"Membrey, Peter",
"Plugge,Eelco",
"Hawkins, Tim"
],
"Publisher" : [
DBRef("test", ObjectId("5bad9c3b2a4ee8fc88cee345"))
]
}
> db.test.findOne( { _id : ObjectId("5bad9c3b2a4ee8fc88cee345") } );
{
"_id" : ObjectId("5bad9c3b2a4ee8fc88cee345"),
"Type" : "Technical Publisher",
"Category" : [
"IT",
"Software",
"Programming"
]
}
>
9. 使用与索引相关的函数
(1)创建索引
> db.media.createIndex( { Title : 1 } );
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
>
基于media集合的所有文档的Title值创建出一个索引,:1表示升序,:-1表示降序。
- 在数组内嵌的键上创建索引
> db.media.insertOne( { "Type" : "CD", "Artist" : "Nirvana","Title" : "Nevermind", "Tracklist" : [ { "Track" : "1", "Title" : "Smells Like Teen Spirit", "Length" : "5:02" }, {"Track" : "2","Title" : "In Bloom", "Length" : "4:15" } ] } );
{
"acknowledged" : true,
"insertedId" : ObjectId("5badbcfd2a4ee8fc88cee347")
}
> db.media.createIndex( { "Tracklist.Title" : 1 } );
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
>
- 以整个子文档为键创建索引
> db.media.createIndex( { "Tracklist" : 1 } );
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}
>
上面的语句为数组中的所有元素构建索引。这意味着可以使用该索引搜索数组中的任何对象。这些键的类型被称为多键。
- 创建复合索引
> db.media.createIndex({"Tracklist.Title": 1, "Tracklist.Length": -1});
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 4,
"numIndexesAfter" : 5,
"ok" : 1
}
>
(2)索引相关的命令
- 强制使用某个索引查询数据
> db.media.find( { ISBN: " 978-1-4842-1183-0"} ) . hint ( { ISBN: -1 } );
Error: error: {
"ok" : 0,
"errmsg" : "error processing query: ns=library.mediaTree: ISBN $eq \" 978-1-4842-1183-0\"\nSort: {}\nProj: {}\n planner returned error: bad hint",
"code" : 2,
"codeName" : "BadValue"
}
>
可以看到,在没有定义索引的情况下,hint函数会出错。
> db.media.createIndex({ISBN: 1}, {background: true});
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 5,
"numIndexesAfter" : 6,
"ok" : 1
}
> db.media.find( { ISBN: "978-1-4842-1183-0"} ) . hint ( { ISBN: 1 } );
{ "_id" : ObjectId("5bad9d5f2a4ee8fc88cee346"), "Type" : "Book", "Title" : "Definitive Guide to MongoDB 3rd ed., The", "ISBN" : "978-1-4842-1183-0", "Author" : [ "Hows, David", "Membrey, Peter", "Plugge,Eelco", "Hawkins, Tim" ], "Publisher" : [ DBRef("test", ObjectId("5bad9c3b2a4ee8fc88cee345")) ] }
>
如果创建了ISBN键上的索引,查询执行成功。注意第一个命令中的background参数将保证索引在后台完成。默认情况下,索引的建立是在前台进行的,这会阻塞其它写入操作。background选项允许在后台建立索引,而不会阻塞其它写入操作。
为确保使用的是指定的索引,可以使用explain()函数,返回所选择的查询计划的相关信息。其中indexBounds值将显示出所使用的索引:
> db.media.find( { ISBN: "978-1-4842-1183-0"} ) . hint ( { ISBN: 1 } ).explain();
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "library.media",
"indexFilterSet" : false,
"parsedQuery" : {
"ISBN" : {
"$eq" : "978-1-4842-1183-0"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"ISBN" : 1
},
"indexName" : "ISBN_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"ISBN" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"ISBN" : [
"[\"978-1-4842-1183-0\", \"978-1-4842-1183-0\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "hdp4",
"port" : 27017,
"version" : "4.0.2",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
>
- 限制查询匹配
函数min()和max()用于限制查询匹配,只有在指定的min和max键之间的索引键才会返回。
> db.media.insertOne( { "Type" : "DVD", "Title" : "Matrix, The", "Released" : 1999} );
{
"acknowledged" : true,
"insertedId" : ObjectId("5badc41c2a4ee8fc88cee348")
}
> db.media.insertOne( { "Type" : "DVD", "Title" : "Blade Runner", "Released" : 1982 } );
{
"acknowledged" : true,
"insertedId" : ObjectId("5badc4222a4ee8fc88cee349")
}
> db.media.insertOne( { "Type" : "DVD", "Title" : "Toy Story 3", "Released" : 2010} );
{
"acknowledged" : true,
"insertedId" : ObjectId("5badc4282a4ee8fc88cee34a")
}
> db.media.ensureIndex( { "Released": 1 } );
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 6,
"numIndexesAfter" : 7,
"ok" : 1
}
>
上面的命令先插入了3条记录,然后在Released键上创建升序索引。在MongoDB 3.0以后版本中,ensureIndex()是createIndex()的别名。现在可以使用max()和min()命令:
> db.media.find() . min ( { Released: 1999 } ) . max ( { Released : 2010 } );
{ "_id" : ObjectId("5badc41c2a4ee8fc88cee348"), "Type" : "DVD", "Title" : "Matrix, The", "Released" : 1999 }
> db.media.find() . min ( { Released: 1999 } ) .max ( { Released : 2010 } ). hint ( { Released : 1 } );
{ "_id" : ObjectId("5badc41c2a4ee8fc88cee348"), "Type" : "DVD", "Title" : "Matrix, The", "Released" : 1999 }
>
注意,结果中包含min()值,但不包含max()值。一般来说,建议使用$gt和$lt(分别是大于和小于)而不是min()和max(),因为前者不要求存在索引。函数min()和max()主要用于复合键。