Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
Description
Improve docs on sharded data imports / ingesting large ammounts of data.
===============================
ARCHIVED
===============================
>>>>>>>>>>>>>>> Gregor's original description
Add simple example of mongodump / mongorestore in sharded environment such as this to here
Here I have some data files which show databases local, test, database1 and database2
$ ls -l ./data/a/
|
total 1572864
|
-rw------- 1 gregor staff 67108864 22 Aug 10:22 database1.0
|
-rw------- 1 gregor staff 134217728 22 Aug 10:21 database1.1
|
-rw------- 1 gregor staff 16777216 22 Aug 10:22 database1.ns
|
-rw------- 1 gregor staff 67108864 22 Aug 10:23 database2.0
|
-rw------- 1 gregor staff 134217728 22 Aug 10:21 database2.1
|
-rw------- 1 gregor staff 16777216 22 Aug 10:23 database2.ns
|
drwxr-xr-x 2 gregor staff 68 22 Aug 10:52 journal
|
-rw------- 1 gregor staff 268435456 22 Aug 10:23 local.0
|
-rw------- 1 gregor staff 16777216 22 Aug 10:23 local.ns
|
-rwxr-xr-x 1 gregor staff 0 22 Aug 10:52 mongod.lock
|
-rw------- 1 gregor staff 67108864 22 Aug 10:24 test.0
|
-rw------- 1 gregor staff 16777216 22 Aug 10:24 test.ns
|
There is no mongod process using these files so I will use mongodump to create a bson backup from the files directly. If there was a mongod running I would run mongodump against that process instead using the --host argument.
mongodump --dbpath ./data/a -o dataout
|
$ ls -l dataout/
|
total 0
|
drwxr-xr-x 2 gregor staff 68 22 Aug 10:58 *
|
drwxr-xr-x 4 gregor staff 136 22 Aug 10:58 database1
|
drwxr-xr-x 4 gregor staff 136 22 Aug 10:58 database2
|
drwxr-xr-x 5 gregor staff 170 22 Aug 10:58 test
|
Now I have the data in bson format - one directory per db. If I want to import only database1 into my sharded setup then I use command.
mongorestore --host localhost --port 27107 ./dataout/database1
|
If I have a mongos running on localhost on port 27017.
If I have previouly enabled sharding on that database and sharded a collection in it then I will see this. In this case I have only one chunk.
{ "_id" : "database1", "partitioned" : true, "primary" : "shard0002" }
|
database1.foo chunks:
|
shard0002 1
|
{ "_id" : { $minKey : 1 } } -->> { "_id" : { $maxKey : 1 } } on : shard0002 Timestamp(1000, 0)
|
but there is data in the collection
mongos> db.foo.stats()
|
{
|
"sharded" : true,
|
"ns" : "database1.foo",
|
"count" : 1,
|
"numExtents" : 1,
|
"size" : 36,
|
"storageSize" : 8192,
|
"totalIndexSize" : 8176,
|
"indexSizes" : {
|
"_id_" : 8176
|
},
|
"avgObjSize" : 36,
|
"nindexes" : 1,
|
"nchunks" : 1,
|
"shards" : {
|
"shard0002" : {
|
"ns" : "database1.foo",
|
"count" : 1,
|
"size" : 36,
|
"avgObjSize" : 36,
|
"storageSize" : 8192,
|
"numExtents" : 1,
|
"nindexes" : 1,
|
"lastExtentSize" : 8192,
|
"paddingFactor" : 1,
|
"systemFlags" : 1,
|
"userFlags" : 0,
|
"totalIndexSize" : 8176,
|
"indexSizes" : {
|
"_id_" : 8176
|
},
|
"ok" : 1
|
}
|
},
|
"ok" : 1
|
}
|
mongos>
|
Attachments
Issue Links
- related to
-
DOCS-2119 Fix mongodump --dbpath example
-
- Closed
-