[JAVA-593] NullPointerException at com.mongodb.WriteResult.getN() Created: 02/Jul/12  Updated: 19/Oct/16  Resolved: 30/Jul/12

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: 2.7.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kay Agahd Assignee: Jeffrey Yemin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 64 Bit


Issue Links:
Related
is related to SERVER-6581 mongos GLE needs to report errmsg whe... Closed

 Description   

We may get a NPE on calling db.getCollection("offers").update(query, updates, false, true);

Caused by: java.lang.NullPointerException: no value for: n
	at org.bson.BasicBSONObject.getInt(BasicBSONObject.java:117) ~[na:na]
	at com.mongodb.WriteResult.getN(WriteResult.java:127) ~[na:na]
	...

We see the error from time to time in our logs and don't know yet how to repoduce it.

While tracing down the issue I found another bug:
https://jira.mongodb.org/browse/SERVER-6279
However, I don't know whether they are related or not.



 Comments   
Comment by Hendy Irawan [ 29/Apr/14 ]

may be related to JAVA-1193

Comment by Jeffrey Yemin [ 30/Jul/12 ]

OK, thanks for the feedback. Closing this out now.

Comment by Kay Agahd [ 30/Jul/12 ]

The error is gone since we replaced all SATA-HD's by SSD's. Since then, the system looks much more fluid. This confirms the suspicion that the above error is only thrown when mongo is overloaded.
We replaced HD's by SSD's because we observed high random IO which mongo was obviously not able to serialize more. For example, sometimes we observed up to 500 write requests per second even though only 3-5 MB per second were written to disk. Now, having SSD's inside, we may have up to 40,000 write requests per second which are still handled well.
In this case, scaling vertically was much cheaper than scaling horizontally since 1 server with HD's costs round about 5,000 EUR, thus each supplementary shard would have cost 15,000 EUR. Using SSD's, we need half of the hardware we used before by having much better performance.

Comment by Kay Agahd [ 10/Jul/12 ]

Thanks Scott. I've created a private JIRA with logs of mongos and the config dump here:
https://jira.mongodb.org/browse/SUPPORT-291

For now, we will use WriteConcern.SAFE to see if it fixes the NPE.

Comment by Scott Hernandez (Inactive) [ 10/Jul/12 ]

There is no indication from the output that it is new since it contains no date so I just wanted to make sure the output hadn't changed.

If you are going to be using WriteResult.getN() to check the status of the write then you should use WriteConcern.SAFE since you are effectively doing that anyway so there is no advantage to doing the error detection yourself, nor any change in performance. Basically, if you care about the status of the write, and check any properties of the WriteResult, you should always be using WriteConcern.SAFE or higher.

Can you upload the logs from the mongos server at covering 16:12:01 and provide a mongodump of the config db? As this is not a java issue it might be best to create a Core Server or Community Private issue to investigate the server part – why there is no error message coming back for the failed write.

Comment by Kay Agahd [ 10/Jul/12 ]

As I've written and as you can see, the string did not change.

We don't use WriteConcern.SAFE to speed-up updates. However, we should read the ok value so the client can repeat its update in case of failure. Is it possible to check the ok value using WriteConcern.NORMAL or do we need WriteConcern.SAFE for this?

Comment by Scott Hernandez (Inactive) [ 10/Jul/12 ]

Now that you have started using the new constructor, without a list of servers since it just a single mongos instance, has that string changed? Is this the new string?

Fundamentally you need to a WriteConcern.SAFE and not NORMAL if you are using the code you provided. Your code never checks to make sure that everything was okay with the write, and it wasn't as you can see from the ok value of 0 (false). This still seems like a strange response from the mongos instance but is valid, yet missing some data for the error message.

Comment by Kay Agahd [ 10/Jul/12 ]

Just for completeness:

16:12:01,052 ERROR OfferStoreMongo:410 [pool-9-thread-18] - Error while updating offers
java.lang.NullPointerException: no value for: n
        at org.bson.BasicBSONObject.getInt(BasicBSONObject.java:118) ~[mongo-java-driver-2.8.0.jar:na]
        at com.mongodb.WriteResult.getN(WriteResult.java:127) ~[mongo-java-driver-2.8.0.jar:na]
        ...
16:12:01,064 ERROR OfferStoreMongo:412 [pool-9-thread-18] - WriteResult.toString: { "serverUsed" : "sx210.ipx/172.16.64.161:27018" , "ok" : 0.0 , "errmsg" : ""}

Comment by Kay Agahd [ 10/Jul/12 ]

Yes, it's exactly the same String that I've posted above at Jul 04 2012 09:13:52 AM UTC

Comment by Scott Hernandez (Inactive) [ 09/Jul/12 ]

It is possible; can you provide the toString() value for the WriteResult when this happens?

Comment by Kay Agahd [ 09/Jul/12 ]

Please reopen. We found the same bug in our logs today even when it worked without problems during the whole weekend. As suggested, we are using the single argument constructor.
The NPE happend when mongo was quite busy. Could it be related?

Comment by Jeffrey Yemin [ 06/Jul/12 ]

Closing for real now.

Comment by Kay Agahd [ 06/Jul/12 ]

Yes it seems to fix the bug - no NPE last night!

Comment by Jeffrey Yemin [ 05/Jul/12 ]

Apologies, shouldn't have closed until you confirmed that.

Comment by Kay Agahd [ 05/Jul/12 ]

I'll keep you up to date if this workaround solved the NPE.

Comment by Jeffrey Yemin [ 05/Jul/12 ]

OK, closing this out then. Thanks for the report.

Comment by Kay Agahd [ 05/Jul/12 ]

I found a solution to my question above: my connection code sends the command replSetGetStatus to the admin db of the first given server address. If the result contains a field

{"info":"mongos"}

then it will use the single argument constructor and returns the mongo instance to the client application.
However, this is not optimal since it requires an additional temporary connection when a list of server addresses is given. It's rather a quick & dirty workaround until the next driver version (2.9.0) is released.

Comment by Kay Agahd [ 04/Jul/12 ]

Thanks, good to know!
Our connection code is also used by applications which connect to a replica-set instead to a mongos. The applications should not know whether they are connecting to a mongod or mongos. So, how can our connection code determine at runtime whether the given server addresses are mongod's or mongos'es, in order to use the single address constructor in case of mongos? Is it possible to connect to the a server to determine its type? How? Thanks!

Comment by Jeffrey Yemin [ 04/Jul/12 ]

Ah, so you're passing a list of mongos addresses to a constructor that is expecting a list of replica set members. The driver currently does not support load balancing across multiple mongos instances. Can you change your code to use the Mongo constructor that takes a single address and see if the error goes away? I suspect that it will.

Note that in the next driver release we will start supporting mongos failover (JAVA-381).

Comment by Kay Agahd [ 04/Jul/12 ]

VoilĂ :

mongos> sh.status()
--- Sharding Status --- 
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
	{  "_id" : "offerStoreDE1",  "host" : "offerStoreDE1/s118:27018,s126:27018,s131:27018" }
	{  "_id" : "offerStoreDE2",  "host" : "offerStoreDE2/s120:27018,s127:27018,s136:27018" }
	{  "_id" : "offerStoreDE3",  "host" : "offerStoreDE3/s117:27018,s124:27018,s129:27018" }
	{  "_id" : "offerStoreDE4",  "host" : "offerStoreDE4/s115:27018,s125:27018,s132:27018" }
	{  "_id" : "offerStoreDE5",  "host" : "offerStoreDE5/s128:27018,s135:27018,s209:27018" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "offerStore",  "partitioned" : true,  "primary" : "offerStoreDE2" }
		offerStore.offer chunks:
				offerStoreDE4	1045
				offerStoreDE3	1045
				offerStoreDE1	1046
				offerStoreDE2	1046
				offerStoreDE5	1046
			too many chunks to print, use verbose if you want to force print
	{  "_id" : "test",  "partitioned" : false,  "primary" : "offerStoreDE2" }
	{  "_id" : "*",  "partitioned" : false,  "primary" : "offerStoreDE3" }
 
mongos> 

We instantiate Mongo by using this constructor:

public Mongo( List<ServerAddress> replicaSetSeeds , MongoOptions options )

Param replicaSetSeeds contains three of our mongos addresses [sx210.ipx:27018, sx176.ipx:27018, sx177.ipx:27018].
Param options is a default instance of MongoOptions.
WriteConcern of com.mongodb.DB is set to WriteConcern.NORMAL and slaveOk is set to true.
Our update code:

    private int updateOffers(final DBObject query, Map<String, Object> simpleValues) {
        db.requestStart();
        WriteResult wr = null;
        try {
            final DBObject set = new BasicDBObject();
            final DBObject unset = new BasicDBObject();
            for (Entry<String, Object> e : simpleValues.entrySet()) {
                final String field = e.getKey();
                String col = COLUMNS.get(field);
                if (col == null) {
                    col = field;
                    LOG.warn(".storeExistingOffers: unknown field\"{}\", but trying to apply...", field);
                }
                final Object value = mapValue(field, e.getValue());
                if (value != null)
                    set.put(col, value);
                else
                    unset.put(col, value);
            }
            final DBObject updates = new BasicDBObject("$set", set).append("$unset", unset);
            wr = new RetryHandler<WriteResult>(retryPolicy).tryExecute(new Callable<WriteResult>() {
                @Override
                public WriteResult call() {
                    return db.getCollection(OFFERS).update(query, updates, false, true);
                }
            });
            return wr.getN();
        } catch (MongoException e) {
            LOG.error("Error while updating offers", e);
            return -1;
        }catch (NullPointerException e) {
            LOG.error("Error while updating offers", e);
            if(wr != null) {
                LOG.error("WriteResult.toString: {}", wr.toString());
            }else {
                LOG.error("WriteResult is null!");
            }
            return -1;
        }finally {
            db.requestDone();
        }
    }

COLUMNS is a com.google.common.collect.BiMap to map the field names of our offer object to the field names of our MongoDB.
Method mapValue and RetryHandler should be self-explanatory.

Comment by Scott Hernandez (Inactive) [ 04/Jul/12 ]

What does sh.status() show from the shell, and how are you connecting with java (what Mongo constructor are you using with what options)?

Can you post the code in the try block as well?

Comment by Kay Agahd [ 04/Jul/12 ]

I could reproduce it and print out the WriteResult instance.

The following java code produced the log lines shown below:

}catch (NullPointerException e) {
    LOG.error("Error while updating offers", e);
    if(wr != null) {
        LOG.error("WriteResult.toString: {}", wr.toString());
    }else {
        LOG.error("WriteResult is null!");
    }
    return -1;

logs:

01:22:45,492 ERROR OfferStoreMongo:438 [pool-7-thread-39] - Error while updating offers
java.lang.NullPointerException: no value for: n
        at org.bson.BasicBSONObject.getInt(BasicBSONObject.java:118) ~[mongo-java-driver-2.8.0.jar:na]
        at com.mongodb.WriteResult.getN(WriteResult.java:127) ~[mongo-java-driver-2.8.0.jar:na]
        ...
5,528 ERROR OfferStoreMongo:440 [pool-7-thread-39] - WriteResult.toString: { "serverUsed" : "sx210.ipx/172.16.64.161:27018" , "ok" : 0.0 , "errmsg" : ""}

Comment by Kay Agahd [ 03/Jul/12 ]

Yes, it's a sharded collection.
I've added some more logs to get the toString() value of the WriteResult instance. It's deployed and I'm waiting for the next error. I'll let you know asap.

Comment by Jeffrey Yemin [ 02/Jul/12 ]

Can you change your code to log the toString() value of the WriteResult instance in the case where you get this exception? It's definitely unusual not to have an "n" field in the getlasterror response, so I'd like to see the full response if you can get it.

Comment by Scott Hernandez (Inactive) [ 02/Jul/12 ]

Is this a sharded collection?

Comment by Kay Agahd [ 02/Jul/12 ]

We are using WriteConcern.NORMAL and are calling:

public WriteResult update( DBObject q , DBObject o , boolean upsert , boolean multi )

Comment by Jeffrey Yemin [ 02/Jul/12 ]

Any idea what method you called in DBCollection to get that WriteResult? Are you setting WriteConcern, or using the default?

Generated at Thu Feb 08 08:52:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.