c# - NEST's method IndexMany to run synchronously -
i run small problem using nest's method indexmany (bulk index). found out when send amount of items elasticsearch indexed, response returned imidiately, not documents indexed @ point.
the problem shown on following code:
list<object> objecttoindex = new list<object>(); // assume 3000 items here elasticclient client = new elasticclient(settings); client.indexmany(objectstoindex, indexname, type); var readresult = client.search<t>(e => e .type(type) .index(indexname) .query(q => q .range(r => r.onfield(t => t.date).greaterorequals(dates[0]).lowerorequals(dates[1])) ) ); // read result contains 300-500 items system.threading.thread.sleep(2000); readresult = client.search<t>(e => e .type(type) .index(indexname) .query(q => q .range(r => r.onfield(t => t.date).greaterorequals(dates[0]).lowerorequals(dates[1])) ) ); // readresult contains 3000 items right
this problem me, because need bulk index documents , read them all. sure, can run thread.sleep(..) after bulk index, not solution me.
elasticsearch version 2.2.0 , nest client version 1.7.2.
so, there way force elastic/nest wait until documents indexed before continue?
nest 2.x not compatible elasticsearch 1.x; whilst may work part, untested against 1.x , there breaking changes between elasticsearch 1.x , 2.x reflected in changes in nest, example, server error responses, result in serialization exception @ runtime. should use latest nest/elasticsearch.net 1.x (currently 1.8.0) elasticsearch 1.x.
there's trade off here made between indexing rate , allowing newly indexed items available search. changing refresh interval 1 second longer such 30 seconds, or disabling whilst indexing (-1) , setting 1 second after finishing, may see better indexing rate @ cost of needing wait longer after indexing documents available search. in contrast, if having items indexed being available search possible more important, may send smaller bulk batch sizes call refresh in request such as
client.bulk(b => b .createmany(objecttoindex, (c, doc) => c .document(doc) .type(type) .index(indexname) ) .refresh() );
with caveat calling refresh more increase load on cluster , indexing going take longer.
if absolutely must wait until documents have been indexed, recommend doing count on search, reduce size of response needs deserialized
var countresponse = client.count<myclass>(c => c .type(type) .index(indexname) .query(q => q .range(r => r .onfield(t => t.date) .greaterorequals(dates[0]) .lowerorequals(dates[1]) ) ) ); var count = countresponse.count;
Comments
Post a Comment