java - mongodb aggregation longer than distinct -


here how distinct mongodb aggregation framework:

db.big.aggregate([ { "$project" : { "first_name" : "$first_name"}} , { "$group" : { "_id" : { "col1" : "$first_name"}}} , { "$limit" : 50000}]) 

and takes 3 seconds on collection has little more 2m documents. if run following query

db.big.distinct('first_name') 

i pretty same result in less 1 second. issue have distinct can not limit it, if collection has 1m distinct values attribute returned. there way use distinct more performant have limitation on number of elements returned. i'm using mongodb java driver, i'd need solution works it.

first, there no way @ moment use limit distinct shown in jira task.

second, distinct performance beat aggregation framework equivalent. bucketing/grouping heavier operation checking distinct values.

having that, there 1 way speed aggregation framework grouping stage, explained here asya kamsky. have documents sorted grouping key before grouping , having , index on sorting key (the $sort in aggregation framework use index).

db.big.ensureindex({ first_name: 1 }, {background:false})  db.big.aggregate([     {"$sort": { "first_name": 1 }},     {"$project": {         "first_name" : "$first_name"     }},     { "$group": {         "_id" : "$first_name"     }},     { "$limit" : 50000} ]) 

Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -