[SERVER-2773] Allow jumps in the btree of indexes Created: 16/Mar/11  Updated: 24/Oct/11  Resolved: 24/Oct/11

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Yannick Koechlin Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Participants:

 Description   

It should be possible to jump to prefixes of (indexed) keys with the cursor and then move until that prefix changes (or beyond).



 Comments   
Comment by Ian Whalen (Inactive) [ 24/Oct/11 ]

@yannick, I'm closing as 'wont fix' given the time lapsed since last comment, but please reopen with the requested clarification if you continue to think this should be addressed.

Comment by Eliot Horowitz (Inactive) [ 18/Mar/11 ]

Still don't see why this is different than $gt

Comment by Yannick Koechlin [ 17/Mar/11 ]

ok sorry. here is the full context:

i have built a triple store with pluggable storage backends. The main backend is Kyoto Cabinets B+Tree and then i did an adapter for mongodb.
a triplestore saves RDF Statements in the form (subject,predicate,object). Now i do a 6 fold index for every combination: SPO,SOP,OSP,OPS,PSO,PSO.
to solve queries i now can resolve triples with any of the terms missing. e.g. i want to have all triples with (subject="x", object="y" ) i just go to the SO or OS index and jump to the offset. this gives set a. now there can be a second query where i want to know (subject="x", object="z" ). the intersection of these two queries will give my solution for a set of predicates!

now for mongodb:
inserted the triples as

{ "s":Binary("md5hashOfSubject", MD5_SUBTYPE) , "p":Binary("md5hashOfPredicate", MD5_SUBTYPE), "o":Binary("md5hashOfObject", MD5_SUBTYPE) }

.

then i do all 6 indexes:
db.ensureIndex(

{ "s":1, "p":1,"o":1 }

)
db.spo.ensureIndex(

{ "s":1, "o":1,"p":1 }

)
etc...

so now i want to solve the query from the example above. while i can just stream all results from both subqueries and mergejoin them on the client that has a disadvantage:
one set could be pretty small but its keys are low in the tree of the other set, thus i have to scan the second set for a long time. thus: jump in the tree, since we know the lowest possible key.

so the problem now is, that it generates significant overhead on the client side if i have to create a new cursor for every of these jumps.
as mentioned before, the fastest way of course would be to join the sets directly on the (routing)server, since no decode would happen.

bottom line: i just want to use the distributed b+tree of mongodb as fast and direct as possible.

ps. if i do db.ensureIndex(

{ "s":1, "p":1,"o":1 }

) and then db.ensureIndex(

{ "s":1, "p":1 }

) the second gets generated. necessary?

Comment by Eliot Horowitz (Inactive) [ 17/Mar/11 ]

I'm sorry, but I'm totally lost by your example...

Can you provide sample input and what you want out?

Comment by Yannick Koechlin [ 16/Mar/11 ]

sure:

https://gist.github.com/873350

i simplified this from my working code, hope you get the point.

actually i am doing a set intersection by merge joining. is there a faster way to do it?

but even if, you could imagine that one set is not in mongodb so its still relevant.

edit: of course the possibility of merge joining documents by a key directly on the mongos would be great and even faster

Comment by Eliot Horowitz (Inactive) [ 16/Mar/11 ]

I'm a bit confused, can you send an example of what you're trying to do

Comment by Yannick Koechlin [ 16/Mar/11 ]

It can be emulated, in fact i am doing that.
But you need to generate a new cursor and that makes things more complicated and slower.

At least with pymongo it works that way. This means that you can not build a simple generator like for "doc in cursor:" and jump with generators send() command, since you need to replace the cursor.

Comment by Eliot Horowitz (Inactive) [ 16/Mar/11 ]

Can you provide an example where this can't be emulated with $gt?

Generated at Thu Feb 08 03:01:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.