python 2.7 - how to delete duplicates record from a mongodb database -
i have mongodb collection more 5 millions records. need delete duplicate entries. here code tried,
pymongo import mongoclient conn=mongoclient("mongodb://127.0.0.1:27017") db=conn.test cursor=db.coll.aggregate( [ {"$group": {"_id":{"instrument name":"$instrument name","high":"$high","low":"$low","v":"$v","date":"$date","close":"$close","open":"$open"}, "unique_ids": {"$addtoset": "$_id"}, "count": {"$sum": 1}}} ], { 'allowdiskuse': 'true' } ) response = [] doc in cursor: del doc["unique_ids"][0] id in doc["unique_ids"]: response.append(id) db.coll.remove({"_id": {"$in": response}})
but when try execute code getting error like,
traceback (most recent call last): file "delete_duplicate.py", line 12, in 'allowdiskuse': 'true' typeerror: aggregate() takes 2 arguments (3 given)
when run code in small data set without allowdiskuse deleting duplicate entries successfully.but when trying in large data set it's throwing error need use allowdiskuse if used geeting eror mentioned above.i using mongodb 3.0 version. ensureindex not work in platform.so please me out solve issue.
cursor = [{ "$group": { "_id": { "instrument name": "$instrument name", "high": "$high", "low": "$low", "v": "$v", "date": "$date", "close": "$close", "open": "$open" }, "unique_ids": { "$addtoset": "$_id" }, "count": { "$sum": 1 } } }]
then call
result = coll.aggregate(cursor, allowdiskuse=true)
Comments
Post a Comment