java - mongodb aggregation longer than distinct -

- February 15, 2010

here how distinct mongodb aggregation framework:

db.big.aggregate([ { "$project" : { "first_name" : "$first_name"}} , { "$group" : { "_id" : { "col1" : "$first_name"}}} , { "$limit" : 50000}])

and takes 3 seconds on collection has little more 2m documents. if run following query

db.big.distinct('first_name')

i pretty same result in less 1 second. issue have distinct can not limit it, if collection has 1m distinct values attribute returned. there way use distinct more performant have limitation on number of elements returned. i'm using mongodb java driver, i'd need solution works it.

first, there no way @ moment use limit distinct shown in jira task.

second, distinct performance beat aggregation framework equivalent. bucketing/grouping heavier operation checking distinct values.

having that, there 1 way speed aggregation framework grouping stage, explained here asya kamsky. have documents sorted grouping key before grouping , having , index on sorting key (the $sort in aggregation framework use index).

db.big.ensureindex({ first_name: 1 }, {background:false})  db.big.aggregate([     {"$sort": { "first_name": 1 }},     {"$project": {         "first_name" : "$first_name"     }},     { "$group": {         "_id" : "$first_name"     }},     { "$limit" : 50000} ])

Search This Blog

Shell

java - mongodb aggregation longer than distinct -

Comments

Post a Comment

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -