performance - How to measure resource usage on partitionlevel in Service Fabric? -


with service fabric tools create custom metrics , capacities. way can make our own resource models resource balancer uses execute on runtime. monitor , use physical resources such as: memory, cpu , disk usage. works fine long keep using default load.

but load not static service/actor, use built-in dynamic load reporting. run problem, reportload works on level of partitions. partitions within same process on node. methods monitoring physical resources found using process smallest unit of measurement, such performancecounter. if value used there hunderds of partitions reporting same load , load not representative of partition.

so question is: how can resource usage measured on partition level?

not service instances , replicas hosted in same process, share thread pool default in .net! every time create new service instance, platform creates instance of service class (the 1 derives statefulservice or statelessservice) inside host process. great because it's fast, cheap, , can pack ton of services single host process , on each vm or machine in cluster.

but means resources shared, how know how each replica of each partition using?

the answer report load on virtual resources rather physical resources. idea you, service author, can keep track of measurement service, , formulate metrics information. here simple example of virtual resource that's based on physical resources:

suppose have web service. run load test on web service , determine maximum requests per second can handle on various hardware profiles (using azure vm sizes , made-up numbers example):

  • a2: 500 rps
  • d2: 1000 rps
  • d4: 1500 rps

now when create cluster, set capacities accordingly based on hardware profiles you're using. if have cluster of d2s, each node define capacity of 1000 rps.

then each instance (or replica if stateful) of web service reports average rps value. virtual resource can calculate per instance/replica. corresponds hardware profile, though you're not reporting cpu, network, memory, etc. directly. can apply can measure services, e.g., queue length, concurrent user count, etc.

if don't want define capacity specific requests per second, can take more general approach defining physical-ish capacities common resources, memory or disk usage. you're doing here defining usable memory , disk services rather total available. in services can keep track of how of each capacity each instance/replica uses. it's not total value, it's stuff know about. example if you're keeping track of data stored in memory, wouldn't include runtime overhead, temporary heap allocations, etc.

i have example of approach in reliable collection wrapper wrote reports load metrics strictly on amount of data store counting bytes: https://github.com/vturecek/metric-reliable-collections. doesn't report total memory usage, have come reasonable estimate of how overhead need , define capacities accordingly, @ same time not reporting temporary heap allocations , other transient memory usage, metrics reported should smoother , more representative of actual data you're storing (you don't want re-balance cluster because .net gc hasn't run yet, example).


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -