[SERVER-15969] Wired tiger aborts with unicode collection names Created: 05/Nov/14  Updated: 11/Jul/16  Resolved: 10/Nov/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.8.0-rc0
Fix Version/s: 2.8.0-rc0

Type: Bug Priority: Blocker - P1
Reporter: Bernie Hackett Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Found this running PyMongo's test suite. The test creates a collection like so:

db["Employés"].insert({"x": 1})

mongod log:

2014-11-05T09:11:30.336-0800 E STORAGE  [conn349] WiredTiger (22) [1415207490:336763][27686:0x7fe215b76700], session.create: Error parsing 'type=file,leaf_page_max=16k,,key_format=u,value_format=u,collator=mongo_index,app_metadata={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "pymongo_test.Employés" }' at byte 169: Unexpected character: Invalid argument
2014-11-05T09:11:30.337-0800 I -        [conn349] Invariant failure _depth > 0 src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 164
2014-11-05T09:11:30.347-0800 I CONTROL  [conn349] 
 0xfab7d9 0xf49721 0xf2fe23 0xdce27b 0xd544ec 0xd5bb48 0xdcdbc6 0x8c4e86 0x9c43b9 0x9c3e66 0x9c484d 0x9c528d 0x9c65d4 0x9c6cdd 0x9c8ba0 0x9e19a4 0x9e2864 0x9e32f1 0xc0f0db 0xabe0e3 0x81b800 0xf5e091 0x7fe2223f6083 0x7fe2217a803d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"BAB7D9"},{"b":"400000","o":"B49721"},{"b":"400000","o":"B2FE23"},{"b":"400000","o":"9CE27B"},{"b":"400000","o":"9544EC"},{"b":"400000","o":"95BB48"},{"b":"400000","o":"9CDBC6"},{"b":"400000","o":"4C4E86"},{"b":"400000","o":"5C43B9"},{"b":"400000","o":"5C3E66"},{"b":"400000","o":"5C484D"},{"b":"400000","o":"5C528D"},{"b":"400000","o":"5C65D4"},{"b":"400000","o":"5C6CDD"},{"b":"400000","o":"5C8BA0"},{"b":"400000","o":"5E19A4"},{"b":"400000","o":"5E2864"},{"b":"400000","o":"5E32F1"},{"b":"400000","o":"80F0DB"},{"b":"400000","o":"6BE0E3"},{"b":"400000","o":"41B800"},{"b":"400000","o":"B5E091"},{"b":"7FE2223EE000","o":"8083"},{"b":"7FE2216C0000","o":"E803D"}],"processInfo":{ "mongodbVersion" : "2.7.9-pre-", "gitVersion" : "b0d74105634a3a2040743061c53b0e5b4eff919d modules: subscription", "uname" : { "sysname" : "Linux", "release" : "3.17.1-gentoo-r1", "version" : "#1 SMP PREEMPT Sat Oct 18 14:28:01 PDT 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "42EC10887DA958FE24339D04E6801130BDD24593" }, { "b" : "7FFFB2EF2000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "5D043151BDD5B68B4A648925E84AEDAE5AE53B3C" }, { "b" : "7FE223AA5000", "path" : "/usr/lib64/libnetsnmpmibs.so.30", "elfType" : 3, "buildId" : "5E0C2B072D34DABAE4B49B620E01B3636A3BAFF6" }, { "b" : "7FE2238A1000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7FE223633000", "path" : "/usr/lib64/libnetsnmpagent.so.30", "elfType" : 3, "buildId" : "BDACC57ADDBABD2880F40B8898F10BB4B787ED33" }, { "b" : "7FE223429000", "path" : "/lib64/libwrap.so.0", "elfType" : 3 }, { "b" : "7FE223148000", "path" : "/usr/lib64/libnetsnmp.so.30", "elfType" : 3, "buildId" : "4FF3D71573DC1B25D04C273A93FB018C9FD613D2" }, { "b" : "7FE222D71000", "path" : "/usr/lib64/libcrypto.so.1.0.0", "elfType" : 3 }, { "b" : "7FE222B54000", "path" : "/usr/lib64/libsasl2.so.3", "elfType" : 3 }, { "b" : "7FE22290B000", "path" : "/usr/lib64/libgssapi_krb5.so.2", "elfType" : 3 }, { "b" : "7FE22260B000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7FE2223EE000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7FE222185000", "path" : "/usr/lib64/libssl.so.1.0.0", "elfType" : 3 }, { "b" : "7FE221F7D000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7FE221C79000", "path" : "/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6", "elfType" : 3, "buildId" : "03533E073EB196880F712F35CC7432DB5DCC53C2" }, { "b" : "7FE221A63000", "path" : "/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1", "elfType" : 3, "buildId" : "94638F4CBF08E0B78403AC6889486E77C7465FDF" }, { "b" : "7FE2216C0000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7FE223F2F000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 }, { "b" : "7FE221320000", "path" : "/usr/lib64/libperl.so.5.20", "elfType" : 3, "buildId" : "C921C557D67DF793C7C0DD8C02F8B9609EC3953A" }, { "b" : "7FE221108000", "path" : "/lib64/libnsl.so.1", "elfType" : 3 }, { "b" : "7FE220ED1000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3 }, { "b" : "7FE220CCE000", "path" : "/lib64/libutil.so.1", "elfType" : 3 }, { "b" : "7FE220AB8000", "path" : "/lib64/libz.so.1", "elfType" : 3 }, { "b" : "7FE2207E5000", "path" : "/usr/lib64/libkrb5.so.3", "elfType" : 3 }, { "b" : "7FE2205B4000", "path" : "/usr/lib64/libk5crypto.so.3", "elfType" : 3 }, { "b" : "7FE2203B0000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3 }, { "b" : "7FE2201A4000", "path" : "/usr/lib64/libkrb5support.so.0", "elfType" : 3 }, { "b" : "7FE21FFA0000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3 }, { "b" : "7FE21FD89000", "path" : "/lib64/libresolv.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xfab7d9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf49721]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xD3) [0xf2fe23]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit14registerChangeEPNS_12RecoveryUnit6ChangeE+0x12B) [0xdce27b]
 mongod(_ZN5mongo9KVCatalog14dropCollectionEPNS_16OperationContextERKNS_10StringDataE+0x19C) [0xd544ec]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry19AddCollectionChange8rollbackEv+0x118) [0xd5bb48]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit6_abortEv+0x66) [0xdcdbc6]
 mongod(_ZN5mongo15WriteUnitOfWorkD1Ev+0x16) [0x8c4e86]
 mongod(+0x5C43B9) [0x9c43b9]
 mongod(+0x5C3E66) [0x9c3e66]
 mongod(_ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE+0x10D) [0x9c484d]
 mongod(_ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE+0x22D) [0x9c528d]
 mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x34) [0x9c65d4]
 mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x39D) [0x9c6cdd]
 mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1C0) [0x9c8ba0]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9e19a4]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC84) [0x9e2864]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x1F1) [0x9e32f1]
 mongod(_ZN5mongo11newRunQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x108B) [0xc0f0db]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xBE3) [0xabe0e3]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x81b800]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x441) [0xf5e091]
 libpthread.so.0(+0x8083) [0x7fe2223f6083]
 libc.so.6(clone+0x6D) [0x7fe2217a803d]
-----  END BACKTRACE  -----
2014-11-05T09:11:30.347-0800 I -        [conn349] 
 
***aborting after invariant() failure



 Comments   
Comment by Bernie Hackett [ 10/Nov/14 ]

Appears to be fixed. Built against git hash 0549652e913b4c39dc00ec10bd1895c085b27bf3.

Comment by Eliot Horowitz (Inactive) [ 10/Nov/14 ]

Bernie - can you try with a build after 7am monday.

Comment by Michael Cahill [ 10/Nov/14 ]

I've fixed a WiredTiger bug in UTF-8 handling that was probably causing the original error. The fix is in WiredTiger's develop branch, here:

https://github.com/wiredtiger/wiredtiger/commit/c4f14ea06104009afe55e0e126e453e86b825475

Please let me know if there are any more problems after that change makes its way downstream.

Comment by Andy Schwerin [ 07/Nov/14 ]

Yes, it would be terrific if MongoDB can guarantee UTF-8 to the WiredTiger API, but we can certainly talk it over, depending on the effort involved.

We officially require that field names and collection names be UTF-8 encoded non-control unicode code points, but all that we currently enforce is that they not contain embedded NUL bytes.

Comment by Keith Bostic [ 07/Nov/14 ]

> I assume this is something MongoDB should be handling?

Yes, it would be terrific if MongoDB can guarantee UTF-8 to the WiredTiger API, but we can certainly talk it over, depending on the effort involved.

Comment by Bernie Hackett [ 07/Nov/14 ]

One question, what encoding formats do you support, that is, can we rely on seeing UTF-8 in the API, or are there other issues?

Well, PyMongo will always send strings encoded UTF-8, since that's what the BSON specification requires. That being said, not all drivers validate strings, and not all drivers are maintained by MongoDB.

Update: this is documented in WiredTiger's discussion on configuration strings...

I assume this is something MongoDB should be handling?

Comment by Keith Bostic [ 07/Nov/14 ]

Update: this is documented in WiredTiger's discussion on configuration strings (http://source.wiredtiger.com/2.4.1/config_strings.html), quoted strings are interpreted as UTF-8 values.

Comment by Keith Bostic [ 07/Nov/14 ]

I just opened WiredTiger issue #1353 to track this one (https://github.com/wiredtiger/wiredtiger/issues/1353), we'll get back to you on this.

One question, what encoding formats do you support, that is, can we rely on seeing UTF-8 in the API, or are there other issues?

Comment by Bernie Hackett [ 07/Nov/14 ]

It no longer aborts, just returns an obscure error to the client:

python:

$ python -m unittest -v test.test_collection.TestCollection.test_messages_with_unicode_collection_names
test_messages_with_unicode_collection_names (test.test_collection.TestCollection) ... ERROR
 
======================================================================
ERROR: test_messages_with_unicode_collection_names (test.test_collection.TestCollection)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_collection.py", line 2049, in test_messages_with_unicode_collection_names
    db[u"Employés"].insert({"x": 1})
  File "pymongo/collection.py", line 410, in insert
    _check_write_command_response(results)
  File "pymongo/helpers.py", line 205, in _check_write_command_response
    raise OperationFailure(error.get("errmsg"), error.get("code"), error)
OperationFailure: 22: Invalid argument
 
----------------------------------------------------------------------
Ran 1 test in 0.053s
 
FAILED (errors=1)

server log:

2014-11-07T10:46:42.184-0800 E STORAGE  [conn1649] WiredTiger (22) [1415386002:184579][6599:0x7f16ce318700], session.create: Error parsing 'type=file,leaf_page_max=16k,,key_format=u,value_format=u,collator=mongo_index,app_metadata={ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "pymongo_test.Employés" }' at byte 169: Unexpected character: Invalid argument

server info:

2014-11-07T10:59:01.238-0800 I CONTROL  [initandlisten] db version v2.7.9-pre-
2014-11-07T10:59:01.238-0800 I CONTROL  [initandlisten] git version: 496c16aa50b8f4d6ef21cfc5fee55c6ffa80bc86 modules: subscription

Comment by Andy Schwerin [ 07/Nov/14 ]

redbeard0531 believes that this issue is resolved for both collection names and index key field names. benety.goh or behackett, can you recheck at master?

Generated at Thu Feb 08 03:39:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.