[SERVER-16184] repeated WriteBackListener error on out-of-order upgrade Created: 17/Nov/14  Updated: 11/Jul/16  Resolved: 26/Nov/14

Status: Closed
Project: Core Server
Component/s: Logging, Sharding
Affects Version/s: 2.8.0-rc0
Fix Version/s: 2.8.0-rc2

Type: Bug Priority: Major - P3
Reporter: Kay Agahd Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-15938 Running enablesharding using 2.8 mong... Closed
Related
Operating System: Linux
Participants:

 Description   

When upgrading a cluster out-of-order, old v2.6 mongoses do not handle the lack of a WBL command in v2.8 mongods gracefully. There should be as clear an error as possible with information about how to proceed (maybe a link to the docs?).

Original Description:
We have a sharded cluster running mongodb v2.6.4 with authentication. Each of the 7 shards has 3 replicaset members. We added one replicaset member running v2.8-rc0 to 3 of the 7 shards without any problem. These 3 v2.8-rc0 replicaset members (called s571, s572, s573) used wiredTiger storage engine, compression set to zlib.
Besides the 3 confiservers, we also have 3 routers, each running v2.6.4.

Out of the sky the routers logged every millisecond the same WriteBackListener error. The logged ObjectId was always the same.
It begun with one of the v2.8 servers (s571). Round about 20 minutes later another v2.8 server was concerned (s572). Another 7 minutes later, also the third v2.8 server (s576) was logged (see logs below).
Why different shard members can be logged with the same ObjectId?

Besides these errors, the whole cluster worked well. One of the v2.8 servers was even elected primary.

The problem was that all routers wrote so many log lines (multiple GB's) that the disk was saturated after a few hours.

Since we didn't know yet the source of the problem, we removed all three v2.8 replicaset members from the three shards which stopped immediately the massive logging of the routers.

In mongodb's documentation we've read that it's possible to have a mixed replicaset consisting of v2.6 and v2.8 replSet members. Is that not true for sharding?

2014-11-16T21:28:30.393+0100 [conn8766077] end connection 172.16.64.13:44839 (3165 connections now open)
2014-11-16T21:28:30.415+0100 [conn8766043] end connection 172.16.64.13:44737 (3164 connections now open)
2014-11-16T21:28:30.457+0100 [mongosMain] connection accepted from 172.16.64.233:55931 #8766473 (3165 connections now open)
2014-11-16T21:28:30.460+0100 [conn8766473]  authenticate db: offerStore { authenticate: 1, user: "pl-shopImp", nonce: "xxx", key: "xxx" }
2014-11-16T21:28:30.635+0100 [conn8766058] end connection 172.16.64.13:44771 (3164 connections now open)
2014-11-16T21:28:30.654+0100 [conn8766030] end connection 172.16.64.13:44703 (3163 connections now open)
2014-11-16T21:28:30.765+0100 [conn8766032] end connection 172.16.64.13:44710 (3162 connections now open)
2014-11-16T21:28:30.799+0100 [conn8766054] end connection 172.16.64.13:44756 (3161 connections now open)
2014-11-16T21:28:30.810+0100 [conn8766025] end connection 172.16.64.13:44678 (3160 connections now open)
2014-11-16T21:28:30.865+0100 [conn8766039] end connection 172.16.64.13:44731 (3159 connections now open)
2014-11-16T21:28:30.869+0100 [conn8766067] end connection 172.16.64.13:44800 (3158 connections now open)
2014-11-16T21:28:30.926+0100 [conn8766028] end connection 172.16.64.13:44691 (3157 connections now open)
2014-11-16T21:28:31.079+0100 [conn8766056] end connection 172.16.64.13:44765 (3156 connections now open)
2014-11-16T21:28:31.090+0100 [conn8766064] end connection 172.16.64.13:44789 (3155 connections now open)
2014-11-16T21:28:31.157+0100 [conn8766034] end connection 172.16.64.13:44716 (3154 connections now open)
2014-11-16T21:28:31.202+0100 [conn8766026] end connection 172.16.64.13:44684 (3153 connections now open)
2014-11-16T21:28:31.554+0100 [mongosMain] connection accepted from 172.16.65.106:49683 #8766474 (3154 connections now open)
2014-11-16T21:28:31.557+0100 [conn8766474]  authenticate db: offerStore { authenticate: 1, user: "pl-shopImp", nonce: "xxx", key: "xxx" }
2014-11-16T21:28:31.563+0100 [conn8765935] end connection 172.16.64.245:52933 (3153 connections now open)
2014-11-16T21:28:31.851+0100 [conn8766078] end connection 172.16.64.13:44841 (3152 connections now open)
2014-11-16T21:28:31.922+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:28:31.922+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:28:31.923+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:28:31.924+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:28:31.924+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:28:31.925+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:28:31.926+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:28:31.926+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
...
2014-11-16T21:47:07.477+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.478+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.479+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.480+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.480+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.481+0100 [WriteBackListener-s572:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.481+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.481+0100 [WriteBackListener-s572:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.482+0100 [WriteBackListener-s572:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.482+0100 [WriteBackListener-s572:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.482+0100 [WriteBackListener-s572:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:47:07.483+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
...
2014-11-16T21:54:18.923+0100 [WriteBackListener-s573:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:54:18.923+0100 [WriteBackListener-s572:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:54:18.923+0100 [WriteBackListener-s571:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }
2014-11-16T21:54:18.923+0100 [WriteBackListener-s573:27017] writebacklisten command failed!  { ok: 0.0, errmsg: "no such cmd: writebacklisten", code: 59, bad cmd: { writebacklisten: ObjectId('5436835a68ed2a107cf9b3cf') } }



 Comments   
Comment by Githook User [ 25/Nov/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-16184 repeated WriteBackListener error on out-of-order upgrade

Abort v2.6 mongos when writebackListen command is not supported.
Branch: v2.6
https://github.com/mongodb/mongo/commit/4762a4c64a6d8cf89c4ca0e4a8030d6b41119cd9

Comment by Githook User [ 24/Nov/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-16184 Fix windows
Branch: master
https://github.com/mongodb/mongo/commit/02077a6e7d7215af65cfd88beda865672d45a442

Comment by Githook User [ 24/Nov/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-16184 repeated WriteBackListener error on out-of-order upgrade
Branch: master
https://github.com/mongodb/mongo/commit/3abc8d35f91605641b3e694c10ad987f64a84d31

Comment by Kay Agahd [ 17/Nov/14 ]

Thanks Jason for the fast and helpful reply!

Comment by J Rassi [ 17/Nov/14 ]

If you wish to beta-test 2.8.0-rc0, you should upgrade mongos to 2.8.0-rc0 before upgrading any of your mongod instances to 2.8.0-rc0. Upgrading mongos will resolve the errors you're encountering.

Note also that 2.8.0-rc0 is a development release, not a stable release. Do not run 2.8.0-rc0 on a production system.

Comment by Kay Agahd [ 17/Nov/14 ]

I meant "also the third v2.8 server (s573)" not s576 (just a typo)

Generated at Thu Feb 08 03:40:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.