c# - Millions of rows in the database, only so much needed -
problem summary:
- c# (mvc), entity framework 5.0 , oracle.
- i have couple of million rows in view joins 2 tables.
- i need populate dropdownlists filter-posibilities.
- the options in these dropdownlists should reflect actual contents of view column, distinct.
- i want update dropdownlists whenever select something, new options reflect filtered content, preventing choosing give 0 results.
- its slow.
question: whats right way of getting these dropdownlists populated?
now more detail.
-- goal of page --
the user presented dropownlists filter data in grid below. grid represents view (see "database") results filtered.
each dropdownlist represents filter column of view. once selected, rest of page updates. other dropdownlists contain posible values corresponding columns complies filter applied in first dropdownlist.
once user has selected couple of filters, he/she presses search button , grid below dropdownlists updates.
-- database --
i have view selects columns 2 tables, nothing fancy there. this:
select tbl1.blabla, tbl2.blabla etc etc table1 tbl1, table2 tbl2 bsl.bvz_id = bvz.id , bsl.einddatum null;
there total of 22 columns. 13 varchars (mostly small, 1 - 20, 1 of em has size of 2000!), 6 dates , 3 numbers (one of them size 38 , 1 of them 15,2).
there couple of indexes on tables, among relevant id's clause.
important thing know: cannot change database. maybe set index here , there, nothing major.
-- entity framework --
i created database first edmx in solution , mapped view. there classes both tables, need data both of them, don't know if need them. problem selecting things either table can't apply half of filtering, maybe there smart way's didn't think of yet.
-- view --
my view bound viewmodel. in there have ienumerable each dropdownlist. getter these gets data single ienumerable called nameofviewobjects. this:
public string selectedcolumn1{ get; set; } private ienumerable<selectlistitem> column1options; public ienumerable<selectlistitem> column1options { { if (column1options == null) { column1options= nameofviewobjects.select(item => item.column1).distinct() .select(item => new selectlistitem { value = item, text = item, selected = item.equals(selectedcolumn1, stringcomparison.invariantcultureignorecase) }); } return column1options; } }
the 2 solutions i've tried are:
- 1 - selecting columns in linq query need dropdownlists (the 2000 varchar not 1 of them , there 2 date columns), distinct on them , put results hashset. set nameofviewobjects point towards hashset. have wait 2 minutes complete, after that, populating dropdownlists instant (maybe second each of them).
model.beslissingen = new hashset<nameofviewobject>(dbbes.nameofviewobject .distinctby(item => new { item.varcharcolumn1, item.datecolumn1, item.datecolumn2, item.varcharcolumn2, item.varcharcolumn3, item.varcharcolumn4, item.varcharcolumn5, item.varcharcolumn6, item.varcharcolumn7, item.varcharcolumn8 } ) );
the big problem here object nameofviewobject quite large, , though using distinct here, resulting in less 100.000 results, still uses on 500mb of memory it. unacceptable, because there lot of users using screen (a lot be... 10 max, 5 average simultaniously).
- 2 - other solution use same linq query , point nameofviewobjects towards iqueryable produces. means every time view wants bind dropdownlist ienumerable, fire query find distinct values column in table millions of rows column it's getting values not indexed. takes around 1 minute each dropdownlist (i have 10), takes ages.
don't forget: need update dropdownlists every time 1 of them has it's selection changed.
-- question -- i'm going @ wrong way, or maybe 1 of these solutions should combined indexing of columns use, maybe should use way store data in memory, it's little, there must out there has done before , figured out smart. can please tell me best way handle situation this?
acceptable performance:
- having wait while (2 minutes) while page loads, fast after that.
- having wait couple of seconds every time dropdownlist changes
- the page not use more 500mb of memory
of course should have indexes on columns , combinations in clauses. no index means table scan , o(n) query times. cannot scale under circumstance.
you not need millions of entries in drop down. need smarter filtering database down manageable numbers of entries.
i'd take page google. type ahead helps narrow down entire internet graph groups of 25 or 50 per page, @ top. maybe manage that, too.
perhaps better answer search engine. if java developer might try lucene/solr , indexing. don't know .net equivalent is.
Comments
Post a Comment