[SERVER-9323] client session locks up when mongos socket goes into state warning: waiting for writeback 5166929d0000000000000492 from connection Created: 11/Apr/13  Updated: 08/Feb/23  Resolved: 25/Apr/13

Status: Closed
Project: Core Server
Component/s: Networking, Sharding
Affects Version/s: 2.2.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: James Blackburn Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-7771 WritebackListener thread can die whil... Closed
Operating System: ALL
Participants:

 Description   

Periodically (once per day, or so), we see a user session lockup. The application is blocked in socket receive on the client side.

On deeper investigation, we see in the mongos log, the following error associated with the clients connection:

Thu Apr 11 09:40:05 [mongosMain] connection accepted from 127.0.0.1:53783 #4647 (5 connections now open)
Thu Apr 11 09:40:05 [conn4647]  authenticate db: admin { authenticate: 1, user: "user", nonce: "c2254607da35344c", key: "0bf3cc880910cfc46a8
Thu Apr 11 09:40:05 [conn4647]  authenticate db: mongoose_user { authenticate: 1, user: "user", nonce: "290a3569e8124a12", key: "8316ad
Thu Apr 11 10:33:34 [conn4647] ChunkManager: time to load chunks for mongoose.centaur: 3ms sequenceNumber: 300 version: 184|1||50fe9e767e7521213b
Thu Apr 11 11:30:21 [conn4647] ChunkManager: time to load chunks for mongoose.centaur: 3ms sequenceNumber: 301 version: 186|1||50fe9e767e7521213b
Thu Apr 11 11:34:03 [conn4647] ChunkManager: time to load chunks for mongoose.centaur: 2ms sequenceNumber: 302 version: 187|3||50fe9e767e7521213b
Thu Apr 11 11:38:21 [conn4647] ChunkManager: time to load chunks for mongoose_user.centaur: 1ms sequenceNumber: 303 version: 40|1||514b1b95b
Thu Apr 11 11:38:33 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:427
Thu Apr 11 11:38:44 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:427
Thu Apr 11 11:38:55 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:427
...
Thu Apr 11 13:31:27 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3179 secs, currently at id 000000000000000000000000
Thu Apr 11 13:31:38 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3190 secs, currently at id 000000000000000000000000
Thu Apr 11 13:31:49 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3201 secs, currently at id 000000000000000000000000
Thu Apr 11 13:32:00 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3212 secs, currently at id 000000000000000000000000
Thu Apr 11 13:32:11 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3223 secs, currently at id 000000000000000000000000
Thu Apr 11 13:32:22 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3234 secs, currently at id 000000000000000000000000
Thu Apr 11 13:32:33 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3245 secs, currently at id 000000000000000000000000



 Comments   
Comment by Tad Marshall [ 07/Aug/13 ]

Hi Simone,

SUPPORT tickets are used when private data must be included in the ticket to address the problem, which is why you cannot view that ticket.

The resolution of SUPPORT-550 was that the root problem was fixed in SERVER-7771, which was fixed in MongoDB version 2.3.2 and the fix was backported to version 2.2.5.

I'll update this ticket with a link to SERVER-7771.

Thank you for bringing this omission to our attention!

If you find that you still have this problem after upgrading to version 2.2.5, please create a new ticket describing the problem you are having.

Tad

Comment by Simone Sciarrati [ 07/Aug/13 ]

Hi, we are having the same issue, I can't access SUPPORT-550, is this fixed in 2.2.4 or 2.2.5 or still unresolved?

S

Comment by Barrie Segal [ 25/Apr/13 ]

James,

Thanks for the info. We'll continue to work on this with you in SUPPORT-550.

Barrie

Comment by James Blackburn [ 24/Apr/13 ]

Hi Dan,

Sorry for the delay. One of our admins has managed to reproduce this twice today. I've attache mongos and mongod logs to:
SUPPORT-550

Do let us know if you need any more information.

Comment by Daniel Pasette (Inactive) [ 19/Apr/13 ]

Hi James, do you have log files from your mongod and mongos from the time period of the error?

Comment by James Blackburn [ 11/Apr/13 ]

The upper bit of the mongos log slightly truncated. Also the waiting field wraps at 3598...

Thu Apr 11 09:40:05 [mongosMain] connection accepted from 127.0.0.1:53783 #4647 (5 connections now open)
Thu Apr 11 09:40:05 [conn4647]  authenticate db: admin { authenticate: 1, user: "user", nonce: "c2254607da35344c", key: "0bf3cc880910cfc46a8a040197c1843b" }
Thu Apr 11 09:40:05 [conn4647]  authenticate db: mongoose_user { authenticate: 1, user: "user", nonce: "290a3569e8124a12", key: "8316ad6b039c4f9f7c67d315eb0ecbeb" }
Thu Apr 11 10:33:34 [conn4647] ChunkManager: time to load chunks for mongoose.centaur: 3ms sequenceNumber: 300 version: 184|1||50fe9e767e7521213b281407 based on: 183|13||50fe9e767e7521213b281407
Thu Apr 11 11:30:21 [conn4647] ChunkManager: time to load chunks for mongoose.centaur: 3ms sequenceNumber: 301 version: 186|1||50fe9e767e7521213b281407 based on: 184|1||50fe9e767e7521213b281407
Thu Apr 11 11:34:03 [conn4647] ChunkManager: time to load chunks for mongoose.centaur: 2ms sequenceNumber: 302 version: 187|3||50fe9e767e7521213b281407 based on: 186|1||50fe9e767e7521213b281407
Thu Apr 11 11:38:21 [conn4647] ChunkManager: time to load chunks for mongoose_user.centaur: 1ms sequenceNumber: 303 version: 40|1||514b1b95bf76a5d0f0c31e5e based on: 38|7||514b1b95bf76a5d0f0c31e5e
Thu Apr 11 11:38:33 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 11 secs, currently at id 000000000000000000000000
Thu Apr 11 11:38:44 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 22 secs, currently at id 000000000000000000000000
Thu Apr 11 11:38:55 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 33 secs, currently at id 000000000000000000000000
Thu Apr 11 11:39:06 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 44 secs, currently at id 000000000000000000000000
...
Thu Apr 11 12:37:35 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3553 secs, currently at id 000000000000000000000000
Thu Apr 11 12:37:46 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3564 secs, currently at id 000000000000000000000000
Thu Apr 11 12:37:57 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3575 secs, currently at id 000000000000000000000000
Thu Apr 11 12:38:08 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3586 secs, currently at id 000000000000000000000000
Thu Apr 11 12:38:19 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 3597 secs, currently at id 000000000000000000000000
Thu Apr 11 12:38:38 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 11 secs, currently at id 000000000000000000000000
Thu Apr 11 12:38:49 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 22 secs, currently at id 000000000000000000000000
Thu Apr 11 12:39:00 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 33 secs, currently at id 000000000000000000000000
Thu Apr 11 12:39:11 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 44 secs, currently at id 000000000000000000000000
Thu Apr 11 12:39:22 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 55 secs, currently at id 000000000000000000000000
Thu Apr 11 12:39:33 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 66 secs, currently at id 000000000000000000000000
Thu Apr 11 12:39:44 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 77 secs, currently at id 000000000000000000000000
Thu Apr 11 12:39:55 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 88 secs, currently at id 000000000000000000000000
...
Thu Apr 11 13:47:20 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 528 secs, currently at id 000000000000000000000000
Thu Apr 11 13:47:31 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 539 secs, currently at id 000000000000000000000000
Thu Apr 11 13:47:42 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 550 secs, currently at id 000000000000000000000000
Thu Apr 11 13:47:53 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 561 secs, currently at id 000000000000000000000000
Thu Apr 11 13:48:04 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 572 secs, currently at id 000000000000000000000000
Thu Apr 11 13:48:15 [conn4647] warning: waiting for writeback 5166929d0000000000000492 from connection dlonapahls210.maninvestments.com:27118:42718 for 583 secs, currently at id 000000000000000000000000

Comment by James Blackburn [ 11/Apr/13 ]

On the client side, the hang happens in:

  File "/users/is/ahlpypi/egg_cache/p/pymongo-2.4.2-py2.6-linux-x86_64.egg/pymongo/collection.py", line 481, in update
    check_keys, self.__uuid_subtype), safe)
  File "/users/is/ahlpypi/egg_cache/p/pymongo-2.4.2-py2.6-linux-x86_64.egg/pymongo/mongo_client.py", line 843, in _send_message
    sock_info)
  File "/users/is/ahlpypi/egg_cache/p/pymongo-2.4.2-py2.6-linux-x86_64.egg/pymongo/mongo_client.py", line 878, in __receive_message_on_socket
    header = self.__receive_data_on_socket(16, sock_info)
  File "/users/is/ahlpypi/egg_cache/p/pymongo-2.4.2-py2.6-linux-x86_64.egg/pymongo/mongo_client.py", line 866, in __receive_data_on_socket
    chunk = sock_info.sock.recv(length)

Generated at Thu Feb 08 03:20:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.