[SERVER-5290] fail to insert docs with fields too long to index, and fail to create indexes where doc keys are too big Created: 12/Mar/12  Updated: 08/Nov/21  Resolved: 04/Dec/13

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: 2.5.5

Type: Improvement Priority: Major - P3
Reporter: Richard Kreuter (Inactive) Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by DOCS-1948 Document: After writes default to ret... Closed
Duplicate
is duplicated by SERVER-10749 Query results differ depending on the... Closed
is duplicated by SERVER-9256 update fails when field contains more... Closed
is duplicated by SERVER-11209 ERROR: key too large len:6597 max:102... Closed
is duplicated by SERVER-14976 Ghost Documents Closed
is duplicated by SERVER-15464 Mongodb cannot find document with lon... Closed
is duplicated by SERVER-16487 Regex not working properly on indexed... Closed
is duplicated by SERVER-16900 regex query fails on 2.4.5 when field... Closed
is duplicated by SERVER-2633 getLastError and Btree::insert failure Closed
is duplicated by SERVER-9447 Key values too large to index Closed
Related
related to SERVER-12828 Reindexing drops indexes if key conta... Closed
related to SERVER-12982 Could not restore backup data due to ... Closed
related to SERVER-8391 Pre-flight upgrade tool to check for ... Closed
related to SERVER-12834 Create flag to allow mongod to ignore... Closed
related to DOCS-2465 Comprehensive document about backward... Closed
related to SERVER-12233 Command to check for over-long index ... Closed
is related to SERVER-1716 Key too large to index / re-index inc... Closed
is related to SERVER-6417 duplicate _ids possible when values e... Closed
is related to SERVER-12247 Write commands allow un-indexable doc... Closed
is related to SERVER-12406 Replication fails to remove document ... Closed
is related to SERVER-3372 Allow indexing fields of arbitrary le... Closed
is related to SERVER-4271 Shard key (512 bytes) maximum is less... Closed
is related to SERVER-1016 when a key too large for index how do... Closed
is related to SERVER-35218 different result when query hint hint... Closed
Participants:
Case:

 Description   

When writes default to getting errors, we should just fail fast.

Behavior when a document with an index key is found to exceed the maximum key size:

  • insert new document with over-size index key fails with error msg. The document is not inserted.
  • update existing document with over-size index key fails with error msg. The existing document remains unchanged.
  • ensureIndex / reIndex on a collection with over-size index key fails with error msg. The index is not created.
  • compact on a collection with over-size index key succeeds. Documents with over-size keys are not inserted into the index.
  • mongorestore / mongoimport with indexed values too large rejects objects which do not suit. Effectively the result of doing an insert of each individual object.

Behavior on secondary nodes:

  • New replica set secondaries will insert document and build indexes on initial sync with an warning in the logs.
  • Replica set secondaries will replicate documents insert to a 2.4 primary, but print an error msg in the log.
  • Replica set secondaries will update documents updated on a 2.4 primary, but print an error msg in the log.

OLD DESCRIPTION:
When inserting a new document, if an indexed field is too long to store in the btree, we don't add the doc to the index, but do store the doc. This leads to peculiar behaviors (examples below). It would be good to have a mechanism to make these insertions just be errors and not store the document (we already do this for unique indexes, after all, so programmers using fire and forget can't really expect the docs to be present if they haven't checked).

// Create a document with too long an _id, but note that
// any indexed field can evince this problem.
var s="a"; for (i=0;i<10;i++) s+=s; print(s.length);
print(s+=s; s.length);
db.foo.insert({ _id : s});
 
// You can find the document if you do a tablescan
db.foo.find();
db.foo.count();
 
// But you can't find the document if you use the _id index.
db.foo.find().hint({_id:1});
 
// You also can't find the document if you're looking for it
// by _id:
db.foo.find({ _id : s });
db.foo.find({ _id : s }).hint({ $natural : 1});

The fix for this issue must encompass both insert/update as well as failing during ensureIndex calls if this condition is violated (similar to a unique index constraint failing).

Need to think hard about how this will work when a user upgrades a node with indexes built on top of invalid data. When they re-sync a new replset member the index create step would fail.



 Comments   
Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-5290: fix os x compile when getting btree max key size
Branch: master
https://github.com/mongodb/mongo/commit/ed76da14b416376045dfe84cf3ad5866bd4feb1f

Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-5290: remove test that relies on large index entries
Branch: master
https://github.com/mongodb/mongo/commit/7b9746e9ae487bc5e47da042a05fd31adbceaf81

Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-5290: update test to lower string size now that we don't allow indexes with keys too large
Branch: master
https://github.com/mongodb/mongo/commit/472e289e2a264b177913dc644368727fe6e9e147

Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-5290: fix typo in comment
Branch: master
https://github.com/mongodb/mongo/commit/4fa4012a60147450b8c201ebb90687596fe673d5

Comment by Githook User [ 04/Dec/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-5290: when trying to insert a document with a key too large to index, fail the insert
will also prevent creating an index on a field that is too large
that is only done for secondaries
Branch: master
https://github.com/mongodb/mongo/commit/cb7bea77aa5796af6016e3b8e0d725f1e4fb1d14

Comment by Daniel Pasette (Inactive) [ 02/Oct/12 ]

A side effect of this issue is that mongodump will skip documents whose _id field is too long to index unless run with --forceTableScan because it uses a snapshot query to dump docs.

Comment by tony tam [ 01/May/12 ]

I think this is more serious than just the above description. The bigger issue is that documents are inserted and the index field is too long, you can get in a situation where there are a large number of "unfindable" objects. This gets even worse when you spin up a new replica where it will FAIL to sync when exceeding a fixed number unfindable objects.

In my situation, we inserted an object and could not find it again. Then when it was not found, it was re-inserted. This caused > 1M bad records to be added in a single collection, which caused the replication to completely fail.

If a record is going to become unfindable or effectively corrupt the database, the insertion should be rejected by the server. If the client sends the insert as FAF, the data loss should be expected. If a safe write is requested, the client can "do the right thing" with knowledge that the write cannot be performed.

Generated at Thu Feb 08 03:08:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.