java - mongodb aggregation longer than distinct -
here how distinct mongodb aggregation framework:
db.big.aggregate([ { "$project" : { "first_name" : "$first_name"}} , { "$group" : { "_id" : { "col1" : "$first_name"}}} , { "$limit" : 50000}])
and takes 3 seconds on collection has little more 2m documents. if run following query
db.big.distinct('first_name')
i pretty same result in less 1 second. issue have distinct can not limit it, if collection has 1m distinct values attribute returned. there way use distinct more performant have limitation on number of elements returned. i'm using mongodb java driver, i'd need solution works it.
first, there no way @ moment use limit distinct shown in jira task.
second, distinct performance beat aggregation framework equivalent. bucketing/grouping heavier operation checking distinct values.
having that, there 1 way speed aggregation framework grouping stage, explained here asya kamsky. have documents sorted grouping key before grouping , having , index on sorting key (the $sort in aggregation framework use index).
db.big.ensureindex({ first_name: 1 }, {background:false}) db.big.aggregate([ {"$sort": { "first_name": 1 }}, {"$project": { "first_name" : "$first_name" }}, { "$group": { "_id" : "$first_name" }}, { "$limit" : 50000} ])
Comments
Post a Comment