[JAVA-263] NullPointerException due to race condition during concurrent access to DBTCPTransport Created: 01/Feb/11  Updated: 17/Mar/11  Resolved: 22/Feb/11

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: 2.4, 2.5
Fix Version/s: 2.5

Type: Bug Priority: Critical - P2
Reporter: Mike Copley Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Single MongoDB instance, Java 6 (OS X 10.6.6).


Backwards Compatibility: Fully Compatible

 Description   

DBTCPConnector._set() is susceptible to a race condition where two threads call it at the same time, allowing a thread to mistaking think _masterPortPool is set when it is still null, and go on to cause a NullPointerException in DBTCPConnector$MyPort.get().

Further detail:

I have two threads invoking DB.getCollection("differentCollectionForEachThread").drop() at approximately the same time. These are the first connections to Mongo. Approximately every 2nd run of this causes a NullPointerException in DBTCPConnector$MyPort.get(). I've tried to make a simple test app to reproduce, but can't - timing issues are tricky to replicate.

The problem occurs here:

private boolean _set( ServerAddress addr )

{ if ( _curMaster == addr) // should check that _masterPortPool != null return false; _curMaster = addr; // _curMaster set before _masterPortPool. At this point _masterPortPool is still null. _masterPortPool = _portHolder.get( addr ); return true; }

Then in MyPort.get() the NPE happens:

_pool = _masterPortPool;
DBPort p = _pool.get(); // NPE

In the above code,

  • Thread 1 enters _set() first, sets _curMaster = addr.
  • Thread 2 enters _set() next, sees _curMaster already == addr so exits early.
  • Thread 2 enters get(), assigns _pool to _masterPortPool (null) then calls _pool.get() and NPE
  • Thread 1 continues in _set(), sets _masterPortPool to addr

Stack Trace (based on git head, commit a052b4f35af4069121cbf47adc88d6199563c4d4):

java.lang.NullPointerException
at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:296)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:205)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:271)
at com.mongodb.DB.command(DB.java:154)
at com.mongodb.DB.command(DB.java:139)
at com.mongodb.DBCollection.drop(DBCollection.java:683)

Suggested fix:

private boolean _set( ServerAddress addr )

{ if ( _curMaster == addr && _masterPortPool != null) return false; _curMaster = addr; _masterPortPool = _portHolder.get( addr ); return true; }

 Comments   
Comment by Antoine Girbal [ 22/Feb/11 ]

considering resolved until further report

Comment by Antoine Girbal [ 17/Feb/11 ]

did some refactoring for dbtcpconnector.
Basically removed class variables that were redundant and could lead to inconsistent states.
The NPE should be gone.
The masterPortPool can only be null at startup, if it was never set.
Even then:

  • for single server, it will be set to that server in constructor
  • for repl set, it will be probably set by the 1st operation that happens (call to checkmaster)
  • in case all servers of repl set are down from driver start up, it may remain null, and appropriate MongoException are thrown.
Comment by auto [ 17/Feb/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: JAVA-263: added NPE handling in rare edge case
https://github.com/mongodb/mongo-java-driver/commit/403d28d87b78fe84e4945c221272d74b53e5f4f3

Comment by auto [ 17/Feb/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: JAVA-263: NullPointerException due to race condition during concurrent access to DBTCPTransport, comprehensive refactoring
https://github.com/mongodb/mongo-java-driver/commit/8a955e265e40160742c8782d5e8284d1c0bed7b0

Comment by Antoine Girbal [ 16/Feb/11 ]

looking at it

Comment by Mike Copley [ 16/Feb/11 ]

Sounds like this case needs to be reopened then. Perhaps a more comprehensive synchronized block is required for this task.

Comment by David Dawson [ 14/Feb/11 ]

We've seen this error quite a bit. We took 2.5 trunk (with this fix applied) and ran with that for a while to see what result we could get, however we are still seeing occasional NPE on this code (although much reduced)

java.lang.NullPointerException
at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:302)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:206)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:271)
at com.mongodb.DBCursor._check(DBCursor.java:342)
at com.mongodb.DBCursor._hasNext(DBCursor.java:472)
at com.mongodb.DBCursor.hasNext(DBCursor.java:497)
.......

Comment by auto [ 08/Feb/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: JAVA-263: didnt mean to make it synchronized
https://github.com/mongodb/mongo-java-driver/commit/eb8d14e0ce975b1db2bb3742d1f25e1227d42825

Comment by Antoine Girbal [ 08/Feb/11 ]

thanks for report, fixed.
There is still a race condition whereby _currentMaster and _masterPortPool may point to different servers from one thread's point of view.
But that should not be a problem and will become consistent quickly.

Comment by auto [ 08/Feb/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: JAVA-263: NullPointerException due to race condition during concurrent access to DBTCPTransport
https://github.com/mongodb/mongo-java-driver/commit/b0e163ea02057f994464eaec5c5873be99332597

Generated at Thu Feb 08 08:51:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.