Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
275 views
in Technique[技术] by (71.8m points)

MongoDB's performance on aggregation queries

After hearing so many good things about MongoDB's performance we decided to give Mongodb a try to solve a problem we have. I started by moving all the records we have in several mysql databases to a single collection in mongodb. This resulted in a collection with 29 Million documents (each one of them have at least 20 fields) which takes around 100 GB of space in the HD. We decided to put them all in one collection since all the documents have the same structure and we want to query and aggregate results on all those documents.

I created some indexes to match my queries otherwise even a simple count() would take ages. However, queries such as distinct() and group() still take way too long.

Example:

// creation of a compound index    
db.collection.ensureIndex({'metadata.system':1, 'metadata.company':1})

// query to get all the combinations companies and systems
db.collection.group({key: { 'metadata.system':true, 'metadata.company':true }, reduce: function(obj,prev) {}, initial: {} });

I took a look at the mongod log and it has a lot of lines like these (while executing the query above):

Thu Apr  8 14:40:05 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1048890 nreturned:417 154ms
Thu Apr  8 14:40:08 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1050205 nreturned:414 430ms
Thu Apr  8 14:40:18 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1049748 nreturned:201 130ms
Thu Apr  8 14:40:27 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1051925 nreturned:221 118ms
Thu Apr  8 14:40:30 getmore database.collection cid:973023491046432059 ntoreturn:0 query: {}  bytes:1053096 nreturned:250 164ms
...
Thu Apr  8 15:04:18 query database.$cmd ntoreturn:1 command  reslen:4130 1475894ms

This query took 1475894ms which is way longer than what I would expect (the result list has around 60 entries). First of all, is this expected given the large number of documents in my collection? Are aggregation queries in general expected to be so slow in mongodb? Any thoughts on how can I improve the performance?

I am running mongod in a single machine with a dual core and 10GB of memory.

Thank you.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The idea is that you improve the performance of aggregation queries by using MapReduce on a sharded database that is distributed over multiple machines.

I did some comparisons of the performance of Mongo's Mapreduce with a group-by-select statement in Oracle on the same machine. I did find that Mongo was approximately 25 times slower. This means that I have to shard the data over at least 25 machines to get the same performance with Mongo as Oracle delivers on a single machine. I used a collection/table with approximately 14 million documents/rows.

Exporting the data from mongo via mongoexport.exe and using the exported data as an external table in Oracle and doing a group-by in Oracle was much faster than using Mongo's own MapReduce.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...