[SERVER-9339] rollback files missing after rollback Created: 12/Apr/13  Updated: 11/Jul/16  Resolved: 23/Apr/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.0
Fix Version/s: 2.2.7, 2.4.4, 2.5.0

Type: Bug Priority: Critical - P2
Reporter: Dmitry Klionsky Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux CentOS 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


Attachments: File kelly_2013_01_01.mt_messages.2013-04-12T08-31-25.0.bson     File mongodb-10.10.0.222.log     File mongodb-10.10.0.223.log     File mongostat-10.10.0.222.log     File mongostat-10.10.0.223.log     File mt_messages.json     File rollback5.js    
Issue Links:
Depends
Related
is related to SERVER-9756 network partition where minority was ... Closed
Operating System: ALL
Steps To Reproduce:

Steps to reproduce (documents are simplified to give you the idea):

1. 10.10.0.222 is Primary, 10.10.0.223 is Secondary, and 10.10.0.155 is Arbiter.

2. Start writing 5000 requests and response documents
Requests are like:
db.mt_messages.update(

{key:Value}

, {$set:{req:ReqValue}}, true, false)
Responses are like:
db.mt_messages.update(

{key:Value}

, {$set:{resp:RespValue}}, true, false)
Both requests and responses are upserts to when they are both received the documents should look like:

{key:Value, req:ReqValue, resp:RespValue}

3. Kill 10.10.0.222 which is Primary
4. 10.10.0.223 becomes Primary
5. 10.10.0.223 Y which is Primary
6. Start 10.10.0.222 with replSet
7. 10.10.0.222 becomes Primary

8. Start 10.10.0.223 without replSet and make export
$ mongoexport -d kelly_2013_01_01 -c mt_messages -o mt_messages.json
9. Stop 10.10.0.223

10. Start 10.10.0.223 with replSet
11. 10.10.0.223 rollbacks as expected and becomes Secondary
12. After applying all rollbacks I end up with lost data even though the missing data exist in the export file mt_messages.json.

Now let's see the data.

kelly:PRIMARY> db.mt_messages.count()
5000

kelly:PRIMARY> db.mt_messages.find({rqt:{$exists:false}}).count()
31

kelly:PRIMARY> db.mt_messages.find({rps:{$exists:false}}).count()
5000

kelly:PRIMARY> db.mt_messages.find({rqt:{$exists:false}})

{ "_id" : ObjectId("5167c605efb160392cf4fbc2"), "imi" : "11589144", "omi" : "11582862", "ri" : "25dfe2d6-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.745Z") } { "_id" : ObjectId("5167c604efb160392cf4fbb6"), "imi" : "11589157", "omi" : "11582851", "ri" : "25e77082-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.486Z") } { "_id" : ObjectId("5167c604efb160392cf4fbbb"), "imi" : "11589154", "omi" : "11582857", "ri" : "25e6a206-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.583Z") } { "_id" : ObjectId("5167c604efb160392cf4fbbc"), "imi" : "11589151", "omi" : "11582859", "ri" : "25e4cc38-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.623Z") } { "_id" : ObjectId("5167c604efb160392cf4fbba"), "imi" : "11589153", "omi" : "11582854", "ri" : "25e6233a-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.522Z") } { "_id" : ObjectId("5167c604efb160392cf4fbbe"), "imi" : "11589143", "omi" : "11582858", "ri" : "25dee606-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.647Z") } { "_id" : ObjectId("5167c604efb160392cf4fbb8"), "imi" : "11589152", "omi" : "11582853", "ri" : "25e57e44-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.502Z") } { "_id" : ObjectId("5167c604efb160392cf4fbbf"), "imi" : "11589137", "omi" : "11582865", "ri" : "25dd3b76-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:53.673Z") } { "_id" : ObjectId("5167c605efb160392cf4fbe6"), "imi" : "11589102", "omi" : "11582905", "ri" : "25cf1b72-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.173Z") } { "_id" : ObjectId("5167c605efb160392cf4fbe7"), "imi" : "11589105", "omi" : "11582906", "ri" : "25d0161c-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.169Z") } { "_id" : ObjectId("5167c605efb160392cf4fbe5"), "imi" : "11589113", "omi" : "11582897", "ri" : "25d3438c-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.167Z") } { "_id" : ObjectId("5167c605efb160392cf4fbe2"), "imi" : "11589114", "omi" : "11582895", "ri" : "25d371d6-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.151Z") } { "_id" : ObjectId("5167c605efb160392cf4fbe0"), "imi" : "11589112", "omi" : "11582896", "ri" : "25d2f706-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.153Z") } { "_id" : ObjectId("5167c605efb160392cf4fbe1"), "imi" : "11589118", "omi" : "11582892", "ri" : "25d4d4fe-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.148Z") } { "_id" : ObjectId("5167c605efb160392cf4fbde"), "imi" : "11589115", "omi" : "11582893", "ri" : "25d3be5c-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.133Z") } { "_id" : ObjectId("5167c605efb160392cf4fbdf"), "imi" : "11589119", "omi" : "11582891", "ri" : "25d560b8-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.127Z") } { "_id" : ObjectId("5167c605efb160392cf4fbda"), "imi" : "11589116", "omi" : "11582889", "ri" : "25d43594-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.114Z") } { "_id" : ObjectId("5167c605efb160392cf4fbd9"), "imi" : "11589127", "omi" : "11582885", "ri" : "25d8f070-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.113Z") } { "_id" : ObjectId("5167c605efb160392cf4fbd7"), "imi" : "11589122", "omi" : "11582882", "ri" : "25d77d3a-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.093Z") } { "_id" : ObjectId("5167c605efb160392cf4fbd5"), "imi" : "11589131", "omi" : "11582869", "ri" : "25db5f2c-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" : ISODate("2013-01-02T18:02:54.090Z") }

Type "it" for more
kelly:PRIMARY>

We miss 31 updates.
Search the first "imi":"11589144" which is unique enough for this purpose.

on 10.10.0.222 the response is registered.
$ grep 11589144 /var/log/mongodb.log
Fri Apr 12 11:29:57.026 [conn29] update kelly_2013_01_01.mt_messages query:

{ ri: "25dfe2d6-a34b-11e2-994b-000c29a75b4d", imi: "11589144" }

update: { $set:

{ omi: "11582862", rpt: new Date(1357149773745), rps: "success" }

} nscanned:0 nupdated:1 fastmodinsert:1 keyUpdates:0 locks(micros) w:148 0ms

on 10.10.0.223 the request is registered.
$ grep 11589144 /var/log/mongodb.log
Fri Apr 12 11:30:12.765 [conn18] update kelly_2013_01_01.mt_messages query:

{ ri: "25dfe2d6-a34b-11e2-994b-000c29a75b4d", imi: "11589144" }

update: { $set: { ct: "funnel", ci: "493b3678-9dc8-11e2-8cce-00269e42f7a5", ui: "user", gi: "7dc235d0-c938-4b66-8f8c-c9037c7eace7", t: "regular", e: "default", b: "User", sa:

{ a: "375296660002", t: 1, n: 1 }

, da:

{ a: "375293615363", t: 1, n: 1 }

, rd: false, rqt: new Date(1357149779760) } } nscanned:1 nmoved:1 nupdated:1 keyUpdates:0 locks(micros) w:433 0ms

on 10.10.0.223 whole document is available according to the export file
$ grep 11589144 mt_messages.json
{ "_id" :

{ "$oid" : "5167c605efb160392cf4fbc2" }

, "b" : "User", "ci" : "493b3678-9dc8-11e2-8cce-00269e42f7a5", "ct" : "funnel", "da" :

{ "a" : "375293615363", "t" : 1, "n" : 1 }

, "e" : "default", "gi" : "7dc235d0-c938-4b66-8f8c-c9037c7eace7", "imi" : "11589144", "omi" : "11582862", "rd" : false, "ri" : "25dfe2d6-a34b-11e2-994b-000c29a75b4d", "rps" : "success", "rpt" :

{ "$date" : 1357149773745 }

, "rqt" :

{ "$date" : 1357149779760 }

, "sa" :

{ "a" : "375296660002", "t" : 1, "n" : 1 }

, "t" : "regular", "ui" : "user" }

on 10.10.0.223
$ bsondump kelly_2013_01_01.mt_messages.2013-04-12T08-31-25.0.bson 2>/dev/null | grep 11589144
IT'S NOT HERE! IT'S GONE!

Participants:

 Description   

I have mongo replica set made up from 3 members: one Primary, one Secondary, and one Arbiter. There are located on 10.10.222, 10.10.0.223, and 10.10.0.155.

kelly:PRIMARY> rs.status()
{
"set" : "kelly",
"date" : ISODate("2013-04-12T08:28:41Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "10.10.0.222:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 34,
"optime" :

{ "t" : 1365755246, "i" : 2767 }

,
"optimeDate" : ISODate("2013-04-12T08:27:26Z"),
"self" : true
},
{
"_id" : 1,
"name" : "10.10.0.223:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 24,
"optime" :

{ "t" : 1365755246, "i" : 2767 }

,
"optimeDate" : ISODate("2013-04-12T08:27:26Z"),
"lastHeartbeat" : ISODate("2013-04-12T08:28:39Z"),
"lastHeartbeatRecv" : ISODate("2013-04-12T08:28:40Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: 10.10.0.222:27017"
},

{ "_id" : 2, "name" : "10.10.0.155:27017", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 34, "lastHeartbeat" : ISODate("2013-04-12T08:28:39Z"), "lastHeartbeatRecv" : ISODate("2013-04-12T08:28:40Z"), "pingMs" : 0 }

],
"ok" : 1
}

They are started in interactive mode and killed by Ctrl-C, so unclear shutdown is not the case. Mongo version is 2.4.0 Linux 64bit with default settings. Database servers time is synchronized using NTP.

All updatess are upserts with `safe' mode, i.e.

{getLastError:1}

which as documentation states also means

{w:1}

and all writes go to the Primary.

By data loss I mean non-complete documents after applying rollbacks, for example

{key:Value, req:ReqValue}

. After analyzing many logs (application, mongod, mongostat) I realized that this happens when, for example, the request has been sent to 10.10.0.222 it was Primary initially, and the response has been sent to 10.10.0.223. The whole document is stored in the mongo export file. But after rollback on Y this response is gone.

I can to reproduce this issue 10 out of 10 times. Client code is Erlang and it's able to sustain temporary connection failures. If there's a write failure then the write is repeated later. I can see that missing updates are registered in mongo log files and acknowledged by current Primary (at some poing I traced the erlang driver and saw the replies in Wireshark). The problem seems to be in the rollback mechanics.



 Comments   
Comment by auto [ 29/Aug/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9339 do not query a node that is not in PRIMARY or SECONDARY state
Branch: v2.2
https://github.com/mongodb/mongo/commit/be66f1b8fc992251f4785adb4c632d3b4677e005

Comment by auto [ 29/Aug/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9339 ensure all rollback update/delete paths are recorded

Conflicts:
src/mongo/db/ops/update.cpp
Branch: v2.2
https://github.com/mongodb/mongo/commit/0364ef4bdd4c17a07801457334aee32b9f601828

Comment by auto [ 01/May/13 ]

Author:

{u'date': u'2013-04-30T15:58:46Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9339 do not query a node that is not in PRIMARY or SECONDARY state
Branch: v2.4
https://github.com/mongodb/mongo/commit/c6a32dc49c05a78ea322ef13ebb90443bbdb44f1

Comment by auto [ 30/Apr/13 ]

Author:

{u'date': u'2013-04-30T15:58:46Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9339 do not query a node that is not in PRIMARY or SECONDARY state
Branch: master
https://github.com/mongodb/mongo/commit/8746fa436394d630efec6ee1f4788c932dabb698

Comment by auto [ 30/Apr/13 ]

Author:

{u'date': u'2013-04-22T18:48:02Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9339 ensure all rollback update/delete paths are recorded

Conflicts:
src/mongo/db/ops/update.cpp
Branch: v2.4
https://github.com/mongodb/mongo/commit/6987dd1dd3f4990cb05212c9c77fd924eca4e8ba

Comment by Eric Milkie [ 25/Apr/13 ]

commit: 30b7ed6470b3c97b04e32b53137ece4314ee8ce8

Comment by Dmitry Klionsky [ 12/Apr/13 ]

My use case was to make sure that my app survives multiple failovers and continues to function as expected.
All seemed ok, until I came across this combination of steps.

>>The rollback directory generally shouldn't be a part of your normal workflow as you would have to deal with merge conflicts on the data, etc...
Right. And I made an utility to deal with the rollbacks, and with its help I figured that data is lost and then started to dig deeper.

>>It sounds like you should be using w=2 write concerns.
It seems I should.

Comment by Eliot Horowitz (Inactive) [ 12/Apr/13 ]

While we work on that, what is the use case you are working on?

The rollback directory generally shouldn't be a part of your normal workflow as you would have to deal with merge conflicts on the data, etc...

It sounds like you should be using w=2 write concerns.

Comment by Dmitry Klionsky [ 12/Apr/13 ]

Right, it should, but it's not there. Double checked on two machines.

Linux Mint 32 bit
db version v2.4.0
Fri Apr 12 17:27:40.498 git version: ce2d666c04b4a80af58e8bbb3388b0680e8cfeb6

CentOS 64 bit
db version v2.4.0
Fri Apr 12 17:28:59.233 git version: ce2d666c04b4a80af58e8bbb3388b0680e8cfeb6

Comment by Eliot Horowitz (Inactive) [ 12/Apr/13 ]

There should be a rollback directory in db2.

Will see if I can reproduce it not being there.

Comment by Dmitry Klionsky [ 12/Apr/13 ]

Here is it.

Open two terminal tabs Tab1 and Tab2

Tab1

$ mkdir db1 db2 db3
$ mongod --replSet test --dbpath ./db1 --logpath ./db1/mongodb.log --port 1111 &
[1] 9657
$ mongod --replSet test --dbpath ./db2 --logpath ./db2/mongodb.log --port 2222 &
[2] 9773
$ mongod --replSet test --dbpath ./db3 --logpath ./db3/mongodb.log --port 3333 &
[3] 9824

Tab2

$ mongo localhost:1111/test
> rs.initiate()

{ "info2" : "no configuration explicitly specified -- making one", "me" : "mint-ws:1111", "info" : "Config now saved locally. Should come online in about a minute.", "ok" : 1 } test:PRIMARY> rs.add("mint-ws:2222") { "ok" : 1 }

test:PRIMARY> rs.addArb("mint-ws:3333")

{ "ok" : 1 }

test:PRIMARY> rs.status()
est:PRIMARY> rs.status()
{
"set" : "test",
"date" : ISODate("2013-04-12T13:40:10Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "mint-ws:1111",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 142,
"optime" :

{ "t" : 1365773990, "i" : 1 }

,
"optimeDate" : ISODate("2013-04-12T13:39:50Z"),
"self" : true
},
{
"_id" : 1,
"name" : "mint-ws:2222",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 30,
"optime" :

{ "t" : 1365773990, "i" : 1 }

,
"optimeDate" : ISODate("2013-04-12T13:39:50Z"),
"lastHeartbeat" : ISODate("2013-04-12T13:40:10Z"),
"lastHeartbeatRecv" : ISODate("2013-04-12T13:40:10Z"),
"pingMs" : 1
},

{ "_id" : 2, "name" : "mint-ws:3333", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 20, "lastHeartbeat" : ISODate("2013-04-12T13:40:10Z"), "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"), "pingMs" : 0 }

],
"ok" : 1
}
test:PRIMARY> db.coll.update(

{key:1}

, {$set:{req:"req1"}}, true)
test:PRIMARY> db.coll.find()

{ "_id" : ObjectId("51680a6825870d2c6f155ddf"), "key" : 1, "req" : "req1" }

Tab 1

$ kill %1
[1] Done mongod --replSet test --dbpath ./db1 --logpath mongodb1.log --port 1111

Tab 2

> db = connect("localhost:2222/test")
connecting to: localhost:2222/test
test
test:PRIMARY> db.coll.update(

{key:1}

, {$set:{rsp:"resp1"}}, true)
test:PRIMARY> db.coll.find()

{ "_id" : ObjectId("51680a6825870d2c6f155ddf"), "key" : 1, "req" : "req1", "rsp" : "resp1" }

Tab 1

$ kill %2
[2]- Done mongod --replSet test --dbpath ./db2 --logpath mongodb2.log --port 2222
$ mongod --replSet test --dbpath ./db1 --logpath mongodb1.log --port 1111 &
[4] 10672

wait 1 min

$ mongod --replSet test --dbpath ./db2 --logpath mongodb2.log --port 2222 &
[5] 10941

Tab 2

test:SECONDARY> db = connect("localhost:1111/test")
connecting to: localhost:1111/test
test
test:PRIMARY> db.coll.find()

{ "_id" : ObjectId("51680a6825870d2c6f155ddf"), "key" : 1, "req" : "req1" }

Tab 1

$ cat ./db2/mongodb.log
...
Fri Apr 12 16:42:59.413 [rsBackgroundSync] replSet syncing to: mint-ws:1111
Fri Apr 12 16:43:00.413 [rsSyncNotifier] replset setting oplog notifier to mint-ws:1111
Fri Apr 12 16:43:04.427 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back
Fri Apr 12 16:43:04.427 [rsBackgroundSync] replSet rollback 0
Fri Apr 12 16:43:04.427 [rsBackgroundSync] replSet ROLLBACK
Fri Apr 12 16:43:04.427 [rsBackgroundSync] replSet rollback 1
Fri Apr 12 16:43:04.427 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Fri Apr 12 16:43:04.428 [rsBackgroundSync] replSet info rollback our last optime: Apr 12 16:41:52:1
Fri Apr 12 16:43:04.428 [rsBackgroundSync] replSet info rollback their last optime: Apr 12 16:40:53:1
Fri Apr 12 16:43:04.428 [rsBackgroundSync] replSet info rollback diff in end of log times: 59 seconds
Fri Apr 12 16:43:04.428 [rsBackgroundSync] replSet rollback found matching events at Apr 12 16:40:53:1
Fri Apr 12 16:43:04.428 [rsBackgroundSync] replSet rollback findcommonpoint scanned : 2
Fri Apr 12 16:43:04.428 [rsBackgroundSync] replSet replSet rollback 3 fixup
Fri Apr 12 16:43:04.454 [rsBackgroundSync] replSet rollback 3.5
Fri Apr 12 16:43:04.454 [rsBackgroundSync] replSet rollback 4 n:1
Fri Apr 12 16:43:04.454 [rsBackgroundSync] replSet minvalid=Apr 12 16:40:53 51680ee5:1
Fri Apr 12 16:43:04.455 [rsBackgroundSync] replSet rollback 4.6
Fri Apr 12 16:43:04.455 [rsBackgroundSync] replSet rollback 4.7
Fri Apr 12 16:43:04.456 [rsBackgroundSync] replSet rollback 5 d:0 u:1
Fri Apr 12 16:43:04.590 [rsBackgroundSync] replSet rollback 6
Fri Apr 12 16:43:04.590 [rsBackgroundSync] replSet rollback 7
Fri Apr 12 16:43:04.655 [rsBackgroundSync] replSet rollback done
Fri Apr 12 16:43:04.655 [rsBackgroundSync] replSet RECOVERING
Fri Apr 12 16:43:04.655 [rsBackgroundSync] replSet syncing to: mint-ws:1111
Fri Apr 12 16:43:05.415 [rsSync] replSet SECONDARY
...
$ ls ./db1
local.0 local.1 local.ns mongodb.log mongod.lock test.0 test.1 test.ns
$ ls ./db2
local.0 local.ns mongodb.log.2013-04-12T13-42-52 test.0 test.ns
local.1 mongodb.log mongod.lock test.1

There's no `rollback' directory! The second upsert is gone! Am I missing something?

Comment by Eliot Horowitz (Inactive) [ 12/Apr/13 ]

Do you have a script to do this or is it manual? Can you send code?

I'm a little confused by some of the steps, will look again, but have you considered the following:

T0: write to primary A
T1: w=1 response from A to client
T2: A shuts down before write replicated to B
T3: B becomes primary without write
T4: A restarts, rolls back write
T5: write is now gone, but is in the rollback file on A

have you checked the rollback file?

Have you tried this with w=2?

Comment by Dmitry Klionsky [ 12/Apr/13 ]

Step 5 should be `Kill 10.10.0.223 which is Primary'

Generated at Thu Feb 08 03:20:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.