[SERVER-18914] Use column store instead of row store for data table Created: 10/Jun/15  Updated: 23/Oct/15  Resolved: 23/Oct/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: David Hows
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Participants:

 Description   

There is a potential performance win to change the data tables (as opposed to the index or metadata/catalog tables) to use column store format instead of row store (see: http://source.wiredtiger.com/develop/schema.html#schema_format_types).

This is because each document is stored with an internal integer id and normal access patterns can take advantage of columnar lookup. This would require a great deal of testing to prove out, but we've seen up to 30% perf improvements with some read only workloads.



 Comments   
Comment by Githook User [ 16/Sep/15 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-18914: Use x86intrin.h on Posix Platforms
Branch: master
https://github.com/mongodb/mongo/commit/078ce6c607932bc38ea5f9d82f097cd05c311d32

Comment by David Hows [ 16/Sep/15 ]

As none of the recent attempts to reproduce have succeeded marking as "Gone Away"

Comment by David Hows [ 16/Sep/15 ]

Ran Vanilla + x86intrin

Results are:

MongoDB shell version: 3.1.9-pre-
connecting to: test
Did 50 queries in: 39125 ms
Did 100 explained queries in: 15119 ms

I agree with Martin, and think this can be closed as Gone Away.

Comment by Martin Bligh [ 15/Sep/15 ]

I can't reproduce the original perf change on either 3.1.8 or 3.1.4 ... not sure what happened here.
Suspect we should close this as gone away ?

Comment by Mark Benvenuto [ 15/Sep/15 ]

keith.bostic We are setting HAVE_X86INTRIN_H in the Windows MongoDB build. We are not setting it in any other build at the moment. I will fix our various platform builds.

Comment by Martin Bligh [ 15/Sep/15 ]

I think the explain was only there to force it to iterate over the result, not actually interested in what it took. We later decided itcount() was better.
I reran the test that showed a big perf improvement before to check if it was just test setup that's different, and now I get about 4% improvement only on master.
Before change:

-  16.14%  mongod  mongod               [.] __wt_row_search                                                                                        
   - __wt_row_search                                                                                                                               
      - 97.54% __wt_btcur_search                                                                                                                   
           __curfile_search                                                                                                                        
           mongo::WiredTigerRecordStore::Cursor::seekExact(mongo::RecordId const&)                                                                 
           mongo::WorkingSetCommon::fetch(mongo::OperationContext*, mongo::WorkingSet*, unsigned long, mongo::unowned_ptr<mongo::SeekableRecordCurs
           mongo::FetchStage::work(unsigned long*)                                                                                                 
           mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                                 
         + mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)                                                                         
      + 2.46% __wt_btcur_search_near                                                                                                               
-   4.14%  mongod  mongod               [.] __wt_btcur_next                                                                                        
     __wt_btcur_next                                                                                                                               
     __curfile_next                                                                                                                                
     mongo::(anonymous namespace)::WiredTigerIndexCursorBase::advanceWTCursor()                                                                    
     mongo::(anonymous namespace)::WiredTigerIndexCursorBase::next(mongo::SortedDataInterface::Cursor::RequestedInfo)                              
     mongo::IndexScan::work(unsigned long*)                                                                                                        
     mongo::FetchStage::work(unsigned long*)                                                                                                       
     mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                                       
     mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)                                                                               
     mongo::(anonymous namespace)::generateBatch(int, mongo::ClientCursor*, mongo::_BufBuilder<mongo::TrivialAllocator>*, int*, mongo::Timestamp*, 
     mongo::getMore(mongo::OperationContext*, char const*, int, long long, bool*, bool*)                                                           
     mongo::receivedGetMore(mongo::OperationContext*, mongo::DbResponse&, mongo::Message&, mongo::CurOp&)                                          
     mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                             
     mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                              
     mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                            
     start_thread                                                                                                                                  
-   3.75%  mongod  mongod               [.] mongo::IndexScan::work(unsigned long*)                                                                 
   - mongo::IndexScan::work(unsigned long*)                                                                                                        
      - 99.59% mongo::FetchStage::work(unsigned long*)                                                                                             
           mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                                 
           mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)                                                                         
           mongo::(anonymous namespace)::generateBatch(int, mongo::ClientCursor*, mongo::_BufBuilder<mongo::TrivialAllocator>*, int*, mongo::Timest
           mongo::getMore(mongo::OperationContext*, char const*, int, long long, bool*, bool*)                                                     
           mongo::receivedGetMore(mongo::OperationContext*, mongo::DbResponse&, mongo::Message&, mongo::CurOp&)                                    
           mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                       
           mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                        
           mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                      
           start_thread                                                                                                                            
-   3.47%  mongod  mongod               [.] mongo::BSONObjBuilder::append(mongo::StringData, mongo::StringData)                                    
   - mongo::BSONObjBuilder::append(mongo::StringData, mongo::StringData)                                                                           
      - 68.50% mongo::(anonymous namespace)::toBsonValue(unsigned char, mongo::BufReader*, mongo::KeyString::TypeBits::Reader*, bool, mongo::BSONOb
           mongo::KeyString::toBson(char const*, unsigned long, mongo::Ordering, mongo::KeyString::TypeBits const&)                                
           mongo::(anonymous namespace)::WiredTigerIndexCursorBase::next(mongo::SortedDataInterface::Cursor::RequestedInfo)                        
           mongo::IndexScan::work(unsigned long*)                                                                                                  
           mongo::FetchStage::work(unsigned long*)                                                                                                 
           mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                                 
           mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)                                                                         
           mongo::(anonymous namespace)::generateBatch(int, mongo::ClientCursor*, mongo::_BufBuilder<mongo::TrivialAllocator>*, int*, mongo::Timest
           mongo::getMore(mongo::OperationContext*, char const*, int, long long, bool*, bool*)                                                     
           mongo::receivedGetMore(mongo::OperationContext*, mongo::DbResponse&, mongo::Message&, mongo::CurOp&)                                    
           mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                       
           mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                        
           mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                      
           start_thread                                                                                                                            
      + 31.50% mongo::KeyString::toBson(char const*, unsigned long, mongo::Ordering, mongo::KeyString::TypeBits const&)       

After change:

-  10.44%  mongod  mongod               [.] __wt_btcur_next                                                                                        
   - __wt_btcur_next                                                                                                                               
      - 99.73% __curfile_next                                                                                                                      
           mongo::(anonymous namespace)::WiredTigerIndexCursorBase::advanceWTCursor()                                                              
           mongo::(anonymous namespace)::WiredTigerIndexCursorBase::next(mongo::SortedDataInterface::Cursor::RequestedInfo)                        
           mongo::IndexScan::work(unsigned long*)                                                                                                  
           mongo::FetchStage::work(unsigned long*)                                                                                                 
           mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                                 
           mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)                                                                         
           mongo::(anonymous namespace)::generateBatch(int, mongo::ClientCursor*, mongo::_BufBuilder<mongo::TrivialAllocator>*, int*, mongo::Timest
           mongo::getMore(mongo::OperationContext*, char const*, int, long long, bool*, bool*)                                                     
           mongo::receivedGetMore(mongo::OperationContext*, mongo::DbResponse&, mongo::Message&, mongo::CurOp&)                                    
           mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                       
           mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                        
           mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                      
           start_thread                                                                                                                            
-   6.38%  mongod  mongod               [.] __cursor_valid                                                                                         
   - __cursor_valid                                                                                                                                
      - 99.84% __curfile_search                                                                                                                    
           mongo::WiredTigerRecordStore::Cursor::seekExact(mongo::RecordId const&)                                                                 
           mongo::WorkingSetCommon::fetch(mongo::OperationContext*, mongo::WorkingSet*, unsigned long, mongo::unowned_ptr<mongo::SeekableRecordCurs
           mongo::FetchStage::work(unsigned long*)                                                                                                 
           mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                                 
           mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)                                                                         
           mongo::(anonymous namespace)::generateBatch(int, mongo::ClientCursor*, mongo::_BufBuilder<mongo::TrivialAllocator>*, int*, mongo::Timest
           mongo::getMore(mongo::OperationContext*, char const*, int, long long, bool*, bool*)                                                     
           mongo::receivedGetMore(mongo::OperationContext*, mongo::DbResponse&, mongo::Message&, mongo::CurOp&)                                    
           mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                       
           mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                        
           mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                      
           start_thread                                                                                                                            
-   3.92%  mongod  mongod               [.] mongo::IndexScan::work(unsigned long*)                                                                 
   - mongo::IndexScan::work(unsigned long*)                                                                                                        
      - 99.33% mongo::FetchStage::work(unsigned long*)                                                                                             
           mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                                 
           mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*)                                                                         
           mongo::(anonymous namespace)::generateBatch(int, mongo::ClientCursor*, mongo::_BufBuilder<mongo::TrivialAllocator>*, int*, mongo::Timest
           mongo::getMore(mongo::OperationContext*, char const*, int, long long, bool*, bool*)                                                     
           mongo::receivedGetMore(mongo::OperationContext*, mongo::DbResponse&, mongo::Message&, mongo::CurOp&)                                    
           mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                       
           mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                        
           mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                      
           start_thread                                                                                                                            
      + 0.67% mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*)                                              
+   3.79%  mongod  mongod               [.] std::pair<std::__detail::_Node_iterator<mongo::RecordId, true, true>, bool> std::_Hashtable<mongo::Reco
+   3.66%  mongod  mongod               [.] mongo::IndexScan::~IndexScan()                                                                         
+   3.46%  mongod  mongod               [.] std::_Hashtable<mongo::RecordId, mongo::RecordId, std::allocator<mongo::RecordId>, std::__detail::_Iden
+   3.45%  mongod  mongod               [.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)                                            
+   3.19%  mongod  mongod               [.] __wt_col_search                                                                                        
+   3.09%  mongod  libc-2.19.so         [.] __memcpy_sse2_unaligned                                                                                
+   2.52%  mongod  mongod               [.] __wt_btcur_search                 

Comment by Keith Bostic (Inactive) [ 15/Sep/15 ]

david.hows, that's 10%+ on explained queries; how interesting are they?

HAVE_X86INTRIN_H won't affect key_format=r, it's row-store only; it might be worth re-running with WiredTiger-Vanilla + HAVE_X86INTRIN_H.

The only simplification I can think of for the row-store search in this case would be to get rid of the loop setup and use a switch statement (because we know there's a maximum number of 9 bytes in the row-store key_format=q encoding). If that's faster, we could configure it for row-store with key_format set to Q, q or r.

Comment by Keith Bostic (Inactive) [ 15/Sep/15 ]

michael.cahill, mark.benvenuto: shouldn't we be setting HAVE_X86INTRIN_H in MongoDB builds on Linux and Windows?

Comment by David Hows [ 15/Sep/15 ]

Ran this against e61e8a9cbd3c5c1e5a46fc74f4b5ab5ce879c115 and couldn't find a difference.

MMAPv1

MongoDB shell version: 3.1.9-pre-
connecting to: test
Did 50 queries in: 41253 ms
Did 100 explained queries in: 14392 ms

WiredTiger - Vanilla

MongoDB shell version: 3.1.9-pre-
connecting to: test
Did 50 queries in: 40788 ms
Did 100 explained queries in: 15291 ms

I made the change in key format anyway and got the following results, which looks like there is a small perf improvement.
WiredTiger - Modified key_format

MongoDB shell version: 3.1.9-pre-
connecting to: test
Did 50 queries in: 39739 ms
Did 100 explained queries in: 13477 ms

Added X86INTRIN headers into the compile and that didn't seem to have any impact (I couldn't find where we set the variable in the existing Scons environment).
WiredTiger - Modified key_format + X86INTRIN

MongoDB shell version: 3.1.9-pre-
connecting to: test
Did 50 queries in: 39780 ms
Did 100 explained queries in: 13156 ms

Comment by Michael Cahill (Inactive) [ 15/Sep/15 ]

One more point: also check if HAVE_X86INTRIN_H is set in MongoDB builds on Linux (if it is, I can't see where). Try turning it on (or off if I'm wrong about it not being set) with the existing __wt_lex_compare to see what impact it has.

Comment by Michael Cahill (Inactive) [ 15/Sep/15 ]

daveh86, can you please look at the workload in SERVER-18823 and check the performance of index reads with WT vs mmapv1? (I don't think we've done anything to make it better, but just make sure you can reproduce what is reported here).

Then take a look at changing key_format=r for record stores to see what performance is like – can you see an improvement?

Now the hard part: can we get some of that improvement without changing the on-disk format? In particular, can you use perf and/or Zoom to figure out where time is going in row-store lookups, then investigate whether any of it can be shaved off?

One idea I think is worth trying is replacing __wt_lex_compare and __wt_lex_compare_skip with really simple implementations – does that make any difference? Maybe we could selectively choose between a simple implementation for some key formats and the more general version we currently have, if you see the simpler version running faster on this workload.

Comment by Martin Bligh [ 15/Jun/15 ]

I believe current state of this is deciding whether to use clustered indices or this.

Comment by Keith Bostic (Inactive) [ 10/Jun/15 ]

I think it's pretty interesting.

Column-store doesn't store the (packed) int64 key into the physical file, of course, so there's additional in-memory and on-disk savings from using column-store.

Column-store still has to do a binary search of internal pages, but the "key" comparison is generally much less expensive than the byte strings of row-store.

Column-store does support a leaf_value_max setting larger than the page size; it's probably only lightly tested, but there's no reason that code should differ significantly from the corresponding row-store code.

I'm assuming these are variable-length values, with no repeated values; we might want to make it possible to configure RLE compression off, if it's never going to fire.

Row-store has been tuned/tested far more heavily than column-store, so I'd probably give any column-store release a little extra time for a good pounding.

Comment by Alexander Gorrod [ 10/Jun/15 ]

keith.bostic What do you think of this proposal? The idea is this:

MongoDB with WiredTiger currently maintains the data table with an internally managed "RecordId" key that is using q (int64) as the data type.

MongoDB also maintains indexes as separate row store tables. Those row store tables contain the index data and the RecordId as a reference into the data collection. All lookups in MongoDB are done through an index.

The proposal is that we could switch the data table from being a row store q to a column store r. The motivation is that we've seen cases where the cost of doing a binary search on the row store page forms a significant portion of time in a query (see SERVER-18823).

One piece of functionality I'm not sure is implemented in our column store implementation is setting a leaf_value_max larger than the page size.

Generated at Thu Feb 08 03:49:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.