[SERVER-22872] Order by is not working in 3.2.3 Created: 26/Feb/16  Updated: 04/May/16  Resolved: 29/Feb/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ITWEBTF SAXOBANK Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word Query Bug with no pipe symbol.docx     Microsoft Word Query Bug.docx    
Issue Links:
Related
related to SERVER-22890 3.2.3 performance regression Closed
is related to SERVER-15235 Regex query returns incorrect results... Closed
is related to SERVER-16622 RegEx query predicates using the | (v... Backlog
is related to SERVER-19402 Change semantics of sorting by array ... Closed
Operating System: ALL
Steps To Reproduce:

The query:

{ "Keys.Queryables" : { "$in" : [/^PostType\|ArticlePost\|en_GB_/] }, "Keys.SortId" : { "$lt" : ObjectId("56d061a36afc7d277434ca87") } }

The sorting order:

{"Keys.Queryables": -1}

or

{"Keys.Queryables": 1}

Notice that the documents are returned in the same order regardless of requesting ascending or descending.

See the documents and indexes in the attached Word document.

Participants:

 Description   

I query on a collection with three documents, expecting to get documents back ordered by the "Keys.Queryables" field.

This works as expected in 2.6.1, but in 3.2.3 the documents are always returned in ascending order.



 Comments   
Comment by ITWEBTF SAXOBANK [ 29/Mar/16 ]

Hi David,

It looks like I have confused you a bit with my last comment.

It is true that we could avoid using the '|' character and thereby avoid the immediate problem, but our analysis showed that doing this change with our current code and data is not feasible.

Please understand that sorting by an array field was not our first design choice. We struggled with full scans of collections with millions of documents, and after trying out several complex solutions that only partly solved the problem, we found that a solution based on sorting by array fields eliminated the full scans completely.

In other words, we eliminated complexity in our code by utilizing what the database is good at. This is a best practice approach.

To me it makes little sense that you ask us to change our code significantly - again - when it is clear that you broke backwards compatibility in 3.0.

Fixing the bug that causes escaped '|' not to be recognized as such would allow us to update to 3.2.3. However, if you make the next breaking change we will be stuck on that version. Since our main reason for upgrading is that we do not want to be stuck on an old and potentially unsupported version, it makes little sense to upgrade at all.

As a developer, I can partly understand why you want to keep the code clean and simple. Backwards compatibility is never easy.

However, MongoDb is now supposed to be a mature product so it is not acceptable that you are so hesitant to fix backwards compatibility bugs.

For now, our architect and management teams have lost faith in MongoDb.

Regards,
Brian

Comment by David Storch [ 22/Mar/16 ]

Hi itwebtf@saxobank.com,

Glad to hear that the modified example works as expected. Keep in mind that you should be able to work around this issue using the aggregation framework. You may also be able to use direct comparisons to strings rather than a regular expression in order to get the desired behavior.

Now to the important question - will it continue to work this way in future versions?

In general I would strongly caution against sorting by an array field. The problem with the current semantics is that an internal change to the planner can affect the expected sort order! This is a very strange situation to be in. We would much rather that the sort order is a logical property which is determinate for any (data, sort spec) pair. That is why we are considering making the breaking change described in SERVER-19402.

Best,
Dave

Comment by ITWEBTF SAXOBANK [ 01/Mar/16 ]

I have added a modified example in which I do not use a pipe symbol.

It seems to work in 3.2.3 exactly as it did in 2.6.11!

Now to the important question - will it continue to work this way in future versions?

I am asking because you have advocated a breaking change that would break both 3.x and 2.x versions - I don't agree, although I can understand why you would like to fix quite a few 'ordering by array fields' issues this way.

Regards,
Brian

Comment by ITWEBTF SAXOBANK [ 01/Mar/16 ]

When there is no pipe symbol, the query analyzer does not see the query as "complex".

Comment by ITWEBTF SAXOBANK [ 01/Mar/16 ]

Hi Dave,

Thanks for a thorough answer.

It seems that the core of this issue is our usage of the pipe '|' symbol, as in a reg ex the usage of such symbol will change the query from "simple" to "complex".

However, we do not use the pipe symbol in our reg exes, as we escape them and search for the literal character.

As I have mentioned in a comment in SERVER-22890, fixing the bug that causes the escaped pipe character to be recognized as a non-escaped pipe character in reg exes could help.

One question,

  • If the query engine recognizes escaped pipe characters correctly, or alternatively that we choose to use another character, would we achieve deterministic ordering (see below what I mean by "deterministic ordering")?

My gut feeling on SERVER-19402 is that I would not vote for it. I would expect a query to identify exactly one string in the array of matching documents, and I would expect ordering by that one string. It seems that this is the current behavior and any change to that would be breaking.

Regards,
Brian

Comment by David Storch [ 29/Feb/16 ]

Hi itwebtf@saxobank.com,

I have investigated this issue and have some results to report. I will explain some background information, then describe why you're seeing this behavior, and finally recommend some some steps for moving forward. There is quite a bit of detail here, so please let me know if anything is unclear.

Background

The problem query involves sorting by the value of an array field. The semantics of sorting by an array in MongoDB are somewhat surprising in that the value of the sort key depends on the query predicate. Consider a simple collection with no indices that has the following two documents:

> db.c.find()
{ "_id" : 1, "a" : [ 1, 3, 4, 6 ] }
{ "_id" : 2, "a" : [ 1, 2, 5, 6 ] }

Now consider the following query which sorts on the array a:

> db.c.find({a: {$gte: 4}}).sort({a: 1})
{ "_id" : 1, "a" : [ 1, 3, 4, 6 ] }
{ "_id" : 2, "a" : [ 1, 2, 5, 6 ] }

Why does this sort the document with _id 1 before the document with _id 2? The answer is due to how the MongoDB query engine generates sort keys. The engine supposes that there is an index {a: 1}. It then generates the index bounds that this query would scan on this hypothetical index—namely, the interval [4, +Infinity]. The sort key for a document is the first in-range array element. For doc _id 1, this element is "4"; for _id 2, it is "5". Since 4 is less than 5, doc 1 precedes doc 2. You can see the sort keys that the system is generating in 3.2.x versions using a new feature called sortKey $meta projection:

> db.c.find({a: {$gte: 4}}, {key: {$meta: 'sortKey'}}).sort({a: 1})
{ "_id" : 1, "a" : [ 1, 3, 4, 6 ], "key" : { "" : 4 } }
{ "_id" : 2, "a" : [ 1, 2, 5, 6 ], "key" : { "" : 5 } }

Let's consider one more example using the same collection:

> db.c.find({a: {$elemMatch: {$gte: 2, $lte: 3}}}, {key: {$meta: 'sortKey'}}).sort({a: 1})
{ "_id" : 2, "a" : [ 1, 2, 5, 6 ], "key" : { "" : 2 } }
{ "_id" : 1, "a" : [ 1, 3, 4, 6 ], "key" : { "" : 3 } }

The only thing that we changed is the query predicate, but you can see that the sort key generation algorithm yields different sort keys, and hence the resulting sort order is different.

Note also that we are evaluating an open proposal to change the semantics of sorting on an array: see SERVER-19402. This would be a backwards breaking change for some applications, so this proposal is being considered with care as to whether in can be implemented for the 3.4 release.

What changed from 2.6?

The behavior change you observed is due to the fix for SERVER-15235. This ticket describes a query correctness bug related to matching against regular expressions. As part of the fix, we made any regex containing the vertical bar ("|") character use wider index bounds. The rationale for this change is discussed in SERVER-16622. In particular, I would recommend reviewing this comment in which I explained some of the details to another user.

Since your queries include regular expressions with the "|" character, they will end up generating loose index bounds. Specifically, the index bounds will include all strings. This means that when the application-requested sort order is {"Keys.Queryables": 1}, the sort key for a particular document will simply be the smallest string in the Keys.Querables array. Similarly, when the sort is {"Keys.Queryables": -1}, the sort key for a document will be the largest string in the Keys.Querables array. Augmenting your queries with a sortKey $meta projection shows this to be the case. The sort key for each document is contained in the sortKey field in the output below:

=== ascending =========================================================
[
	{
		"_id" : ObjectId("56d0663a6afc7d1fecc17e34"),
		"marker" : 1,
		"Keys" : [
			{
				"PostId" : ObjectId("525536acde7ad909006226ea"),
				"SortId" : ObjectId("4d943457de7ad90724ecf57a"),
				"Queryables" : [
					"User|5229ce3b6b0d0b1294f7bdff|PublicProfileStream|en_GB_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|PublicProfileStreamWithoutComments|en_GB_1301558359",
					"Slug|video-something-finally-happening-2841_1301558359",
					"Locale|en_GB_1301558359",
					"PostId|525536acde7ad909006226ea_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|author_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|author|ArticlePost|en_GB_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|dashboard|MemberOfStream-DashBoard|en_GB_1301558359",
					"PostType|ArticlePost_1301558359",
					"PostType|ArticlePost|en_GB_1301558359",
					"Flag|MemberOfStream-SectionPage_1301558359",
					"Flag|MemberOfStream-DashBoard_1301558359"
				]
			}
		],
		"sortKey" : {
			"" : "Flag|MemberOfStream-DashBoard_1301558359"
		}
	},
	{
		"_id" : ObjectId("56d0663a6afc7d1fecc17e36"),
		"marker" : 2,
		"Keys" : [
			{
				"PostId" : ObjectId("525536e0de7ad90900624d14"),
				"SortId" : ObjectId("4d9437b3de7ad90724ecff80"),
				"Queryables" : [
					"Slug|savings-vs-profits-0180_1301559219",
					"Locale|en_GB_1301559219",
					"PostId|525536e0de7ad90900624d14_1301559219",
					"PostType|ArticlePost_1301559219",
					"PostType|ArticlePost|en_GB_1301559219",
					"Flag|MemberOfStream-SectionPage_1301559219",
					"Flag|MemberOfStream-DashBoard_1301559219"
				]
			}
		],
		"sortKey" : {
			"" : "Flag|MemberOfStream-DashBoard_1301559219"
		}
	},
	{
		"_id" : ObjectId("56d0663a6afc7d1fecc17e35"),
		"marker" : 3,
		"Keys" : [
			{
				"PostId" : ObjectId("52553d5bde7ad90724ee23b2"),
				"SortId" : ObjectId("520b7218de7ad90a88281055"),
				"Queryables" : [
					"Slug|konjunkturdaten-aus-china-befluegeln-industriemetalle-1784603071_1376481816",
					"Locale|en_GB_1376481816",
					"PostId|52553d5bde7ad90724ee23b2_1376481816",
					"PostType|ArticlePost_1376481816",
					"PostType|ArticlePost|en_GB_1376481816",
					"Flag|MemberOfStream-SectionPage_1376481816",
					"Flag|MemberOfStream-DashBoard_1376481816"
				]
			}
		],
		"sortKey" : {
			"" : "Flag|MemberOfStream-DashBoard_1376481816"
		}
	}
]
=== descending ========================================================
[
	{
		"_id" : ObjectId("56d0663a6afc7d1fecc17e34"),
		"marker" : 1,
		"Keys" : [
			{
				"PostId" : ObjectId("525536acde7ad909006226ea"),
				"SortId" : ObjectId("4d943457de7ad90724ecf57a"),
				"Queryables" : [
					"User|5229ce3b6b0d0b1294f7bdff|PublicProfileStream|en_GB_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|PublicProfileStreamWithoutComments|en_GB_1301558359",
					"Slug|video-something-finally-happening-2841_1301558359",
					"Locale|en_GB_1301558359",
					"PostId|525536acde7ad909006226ea_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|author_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|author|ArticlePost|en_GB_1301558359",
					"User|5229ce3b6b0d0b1294f7bdff|dashboard|MemberOfStream-DashBoard|en_GB_1301558359",
					"PostType|ArticlePost_1301558359",
					"PostType|ArticlePost|en_GB_1301558359",
					"Flag|MemberOfStream-SectionPage_1301558359",
					"Flag|MemberOfStream-DashBoard_1301558359"
				]
			}
		],
		"sortKey" : {
			"" : "User|5229ce3b6b0d0b1294f7bdff|dashboard|MemberOfStream-DashBoard|en_GB_1301558359"
		}
	},
	{
		"_id" : ObjectId("56d0663a6afc7d1fecc17e36"),
		"marker" : 2,
		"Keys" : [
			{
				"PostId" : ObjectId("525536e0de7ad90900624d14"),
				"SortId" : ObjectId("4d9437b3de7ad90724ecff80"),
				"Queryables" : [
					"Slug|savings-vs-profits-0180_1301559219",
					"Locale|en_GB_1301559219",
					"PostId|525536e0de7ad90900624d14_1301559219",
					"PostType|ArticlePost_1301559219",
					"PostType|ArticlePost|en_GB_1301559219",
					"Flag|MemberOfStream-SectionPage_1301559219",
					"Flag|MemberOfStream-DashBoard_1301559219"
				]
			}
		],
		"sortKey" : {
			"" : "Slug|savings-vs-profits-0180_1301559219"
		}
	},
	{
		"_id" : ObjectId("56d0663a6afc7d1fecc17e35"),
		"marker" : 3,
		"Keys" : [
			{
				"PostId" : ObjectId("52553d5bde7ad90724ee23b2"),
				"SortId" : ObjectId("520b7218de7ad90a88281055"),
				"Queryables" : [
					"Slug|konjunkturdaten-aus-china-befluegeln-industriemetalle-1784603071_1376481816",
					"Locale|en_GB_1376481816",
					"PostId|52553d5bde7ad90724ee23b2_1376481816",
					"PostType|ArticlePost_1376481816",
					"PostType|ArticlePost|en_GB_1376481816",
					"Flag|MemberOfStream-SectionPage_1376481816",
					"Flag|MemberOfStream-DashBoard_1376481816"
				]
			}
		],
		"sortKey" : {
			"" : "Slug|konjunkturdaten-aus-china-befluegeln-industriemetalle-1784603071_1376481816"
		}
	}
]

Next steps

I am closing this ticket as Works as Designed, since I don't see evidence of a bug in the system. Please watch/vote for SERVER-19402, since I think that ticket would provide much more meaningful semantics to sorting by an array in MongoDB. For the time being, I would highly recommend against an application design that requires sorting by an array field. As you have discovered, such queries are fragile since, according to MongoDB's historical sort semantics, the outcome of the sort depends on the query predicate as well as the internals of the query engine's bounds generation. Again, this is something we would like to change under SERVER-19402 to prevent issues like this in the future.

As a more immediate workaround, I think you should be able to use the aggregation system to obtain the desired behavior. If the result set is small, another workaround would be to compute the desired sort on the client side.

Best,
Dave

Comment by Ramon Fernandez Marina [ 26/Feb/16 ]

Thanks for your report itwebtf@saxobank.com, I'm able to reproduce the behavior you describe and we're investigating.

Comment by ITWEBTF SAXOBANK [ 26/Feb/16 ]

Correction, I meant that it works on 2.6.11.

Generated at Thu Feb 08 04:01:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.