Skip to content
Daniel Berger edited this page May 10, 2018 · 30 revisions

Metrics in Azure

You will probably want to collect metrics on your various resources within Azure at some point. This page explains how to do that.

Available Metrics

To get metadata about the metrics information itself, you can use the Azure::Insights::MetricsService class. You must know the namespace provider and type of resource that you're using, as well as the resource name and resource group.

The New Way

In September 2016, Azure updated the metrics API to provide a single cross-platform set of metrics definitions. It also makes accessing metrics much simpler because you no longer need to access storage accounts and parse out information. This is now supported as of azure-armrest 0.6.0.

Caveat: Although this approach is easier and cross-platform, fewer metrics are available overall, so you may still need to use the older approach if you need specific information that isn't provided by the smaller definition list. More on that in a bit.

To see a list of the new metrics definitions, call the :list_definitions method with a resource object or resource ID:

conf = Azure::Armrest::Configuration.new(<credentials>)
mets = Azure::Armrest::Insights::MetricsService.new(conf)

vm1  = vms.get(linuxvm, rgroup)
vm2  = vms.get(windowsvm, rgroup)

def1 = mets.list_definitions(vm1.id) # or just vm1
def2 = mets.list_definitions(vm2.id) # or just vm2

pp def1.map{ |d| d.name.value }
pp def2.map{ |d| d.name.value }

If you inspect the names of each definition list, you will find that they are the same for both the Linux and the Windows VM:

[0] "Percentage CPU",
[1] "Network In",
[2] "Network Out",
[3] "Disk Read Bytes",
[4] "Disk Write Bytes",
[5] "Disk Read Operations/Sec",
[6] "Disk Write Operations/Sec"

Newer API versions add "CPU Credits Remaining" and "CPU Credits Consumed" as well.

Now that we know the names of the metrics definitions, we can get at the metrics that we want using the :list_metrics method. Although a filter isn't strictly necessary, it is generally highly recommended in order to limit the amount of information that you will retrieve per call.

filter = "(name.value eq 'Percentage CPU' or name.value eq 'Disk Read Bytes') "
filter << "and startTime eq 2017-02-01 and endTime eq 2017-02-03"

metrics = mets.list_metrics(vm1, filter) # or vm1.id

And that's it! No need to mess with storage accounts!

As mentioned above, however, there are far fewer metrics definitions using this approach. It is quite possible that you need information that is provided by the older definitions list, in which case you will need to use the older approach to getting metrics.

The Old Way

If you are unfamiliar with the available namespace providers and resource types, please see the ArmrestService#providers or ArmrestService#provider_info methods.

Below is a sample program that looks for metrics for available memory on a particular virtual machine.

require 'azure-armrest'

conf = Azure::Armrest::Configuration.new(<credentials>)
mets = Azure::Armrest::Insights::MetricsService.new(conf)

rgroup  = 'your_group'
vm_name = 'your_vm'

metrics = mets.list('Microsoft.Compute', 'virtualMachines', vm_name, rgroup)

time_grains = {
  "PT1M"  => "1 Minute",
  "PT5M"  => "5 Minutes",
  "PT1H"  => "1 Hour",
  "PT12H" => "12 Hours"
}

metrics.each do |metric|
  # Comment this out to see everything
  next unless metric.name.localized_value == "Memory available"

  puts "Metric Name: " + metric.name.localized_value
  puts "Start: " + Time.parse(metric.start_time).to_s
  puts "End: " + Time.parse(metric.end_time).to_s
  puts "Unit: " + metric.unit
  puts "Aggregation Type: " + metric.primary_aggregation_type

  metric.metric_availabilities.each do |ma|
    puts "=" * 40
    puts "Time Grain: " + time_grains[ma.time_grain]
    puts "Endpoint: " + ma.location.table_endpoint

    ma.location.table_info.each do |ti|
      puts "-" * 40
      puts "Table Name: " + ti.table_name
      puts "Start Time: " + Time.parse(ti.start_time).to_s
      puts "End Time: " + Time.parse(ti.end_time).to_s
    end
  end

  puts
end

Here is some sample output, edited for clarity:

Metric Name: Memory available
Start: 0001-01-01 00:00:00 UTC
End: 0001-01-01 00:00:00 UTC
Unit: Bytes
Aggregation Type: Average

==>
Time Grain: 1 Hour
Endpoint: https://your_storage1.table.core.windows.net/

Table Name: WADMetricsPT1HP10DV2S20160316
Start Time: 2016-03-16 00:00:00 UTC
End Time: 2016-03-26 00:00:00 UTC
----------------------------------------
Table Name: WADMetricsPT1HP10DV2S20160326
Start Time: 2016-03-26 00:00:00 UTC
End Time: 2016-04-05 00:00:00 UTC
<==

==>
Time Grain: 1 Minute
Endpoint: https://your_storage1.table.core.windows.net/

Table Name: WADMetricsPT1MP10DV2S20160316
Start Time: 2016-03-16 00:00:00 UTC
End Time: 2016-03-26 00:00:00 UTC
----------------------------------------
Table Name: WADMetricsPT1MP10DV2S20160326
Start Time: 2016-03-26 00:00:00 UTC
End Time: 2016-04-05 00:00:00 UTC
<==

This example shows 4 different tables, 2 of which store data in an hourly fashion, the other two minute by minute. Each table stores up to 10 days worth of data.

Notice that you can generally tell what kind of format the data is in by looking at the table name, e.g. the "PT1H" in the table name indicates hourly, while the "PT1M" indicates minutes. You can also derive the name of the storage account via the endpoint.

Looking at the endpoint, you can see that these tables are contained within the storage account "your_storage1", and you can see them in the new portal if you want. Just to keep things interesting, multiple VM's can use the same storage account, so you can't assume that all of the data retrieved from those tables is for a single VM, or whatever resource type you're getting metrics on.

Getting Metrics Data

The metrics data itself is contained within the relevant storage account. To get the storage account name, you will need to parse the endpoint that was shown above. You will then need to access the storage account with a key. For this example, we'll select the "Memory Used" metric.

metric   = metrics.select{ |m| m.name.localized_value == 'Memory used' }.first
ma       = metric.metric_availabilities.first
table    = ma.location.table_info.last
endpoint = ma.location.table_endpoint

uri = URI.parse(endpoint) # e.g. https://your_storage1.table.core.windows.net/
storage_acct_name = uri.host.split('.').first

puts "Storage account: #{storage_acct_name}"

sas = Azure::Armrest::StorageAccountService.new(conf)
storage_account = sas.get(storage_acct_name, rgroup)
key = sas.list_account_keys(storage_acct_name, rgroup).fetch('key1')

With this information you can now get data from the table. By default, Azure returns a maximum of 1000 rows per request. You can also filter the data.

filter = "CounterName eq '\\Memory\\UsedMemory'"
results1 = storage_account.table_data(table.table_name, key)
results2 = storage_account.table_data(table.table_name, key, :top => 10, :filter => filter)

If you want to get all rows, you can specify :all => true. Keep in mind that this can be a LOT of data, so we recommend always using a :filter option in conjunction with :all.

storage_account.table_data(table.table_name, key, :top => 10, :filter => filter, :all => true)

To get more than 1000 rows without necessarily grabbing everything all at once, you can use continuation tokens:

token = results2.continuation_token
more_results = storage_account.table_data(
  table.table_name,
  key,
  :top => 10,
  :filter => filter,
  :continuation_token => token
)

Another thing you can do to reduce the response size is to use the :select option to limit the results to just the results that you care about. Note that you will still receive various bits of ODATA information.

:select => "Total,Timestamp" # Add this to table_data call.

Inspecting Metrics Data

Finally, let's look at some actual table data. As mentioned earlier, multiple resources can share the same storage account. To determine which VM a particular record refers to, we have to inspect the partition key, or filter on it up front. For this example we inspect and parse it:

results.each do |data|
  puts "Name: #{data.counter_name.inspect}"
  puts "Timestamp: #{data.timestamp}"
  puts "Min: #{data.minimum}"
  puts "Max: #{data.maximum}"
  puts "Total: #{data.total}"
  puts "Count: #{data.count}"

  # Partition keys in Azure are resource strings that have been joined
  # and padded with text for sorting purposes.

  array = data.partition_key.split(/:\d\d\d\w/)
  index = array.index('virtualMachines')
  vm_name = array[index + 1, array.size - index].join('-')

  puts "VM Name: #{vm_name}\n"
end

And here is some sample output:

Name: "\\Memory\\UsedMemory"
Timestamp: 2016-04-04T16:00:16.993881Z
Min: 190840832.0
Max: 191889408.0
Total: 45828014080.0
Count: 240
VM Name: your_vm_name

Understanding the Data

Based on our earlier example you can see that the date range for our metric is between 2016-03-26 and 2016-04-05. You also know that the time grain is 1 hour (PT1H), the metric is specified as an "average" (as opposed to a "total"), and the unit is "bytes". Below is some sample data from that table.

The average, minimum and maximum specify the relevant value for a metric of type "average". The total specifies the value for a metric type of "total". The count indicates the number of individual data points in the average. A quick math check will show that the average is the total divided by the count.

The data points are every 15 seconds for metrics with a time grain of 1 minute, 30 seconds for metrics with a time grain of 5 or 60 minutes, and every 30 minutes for metrics retrieved with a time grain of 12 hours.

The Timestamp is the precise time at which the data was recorded. The Timestamp (R) is rounded to the time grain. In this case, it's rounded to the hour.

Name: "\\Memory\\UsedMemory"
Timestamp: 2016-04-06T13:00:18.2062982Z
Timestamp (R): 2016-04-06T12:00:00Z
Min: 192937984.0
Max: 192937984.0
Avg: 192937984.0
Total: 46305116160.0
Count: 240
VM Name: your_vm_name

Name: "\\Memory\\UsedMemory"
Timestamp: 2016-04-06T12:00:18.0958001Z
Timestamp (R): 2016-04-06T11:00:00Z
Min: 191889408.0
Max: 192937984.0
Avg: 192920507.73333332
Total: 46300921856.0
Count: 240
VM Name: your_vm_name

Here is another example from a different metric, but this uses a metric unit of "percent" instead of bytes.

Name: "\\Memory\\PercentAvailableMemory"
Timestamp: 2016-04-06T11:00:18.0374178Z
Timestamp (R): 2016-04-06T10:00:00Z
Min: 72.0
Max: 73.0
Avg: 72.2875
Total: 17349.0
Count: 240
VM Name: your_vm_name

Viewing the Data Visually

If you look at the new portal, you can edit the "Monitoring" graph below the name of the VM, and you will see a list of items that corresponds to the metric names that the azure-armrest gem provides. This is nice for lining up what you see with our gem vs what Microsoft is showing you with a graphical interface.

Another Way to Get Metrics Types

Diagnostics storage is treated as a VM extension, and as such information about diagnostics can be gathered from the VM extension information itself. This information is base64 encoded, and stored in XML format. It is up to you to parse it.

require 'azure-armrest'
require 'base64'

vms = Azure::Armrest::VirtualMachineService.new(conf)
vm = vms.get(vm_name, rgroup)

vm.resources.each do |r|
  p Base64.decode64(r.properties.settings.xml_cfg)
end

# Example output
<?xml version="1.0" encoding="UTF-8"?>
<WadCfg>
  <DiagnosticMonitorConfiguration overallQuotaInMB="4096">
    <DiagnosticInfrastructureLogs scheduledTransferPeriod="PT1M" scheduledTransferLogLevelFilter="Warning" />
    <PerformanceCounters scheduledTransferPeriod="PT1M">
      <PerformanceCounterConfiguration counterSpecifier="\Memory\AvailableMemory" sampleRate="PT15S" unit="Bytes">
        <annotation displayName="Memory available" locale="en-us" />
      </PerformanceCounterConfiguration>
      <PerformanceCounterConfiguration counterSpecifier="\Memory\PercentAvailableMemory" sampleRate="PT15S" unit="Percent">
        <annotation displayName="Mem. percent available" locale="en-us" />
      </PerformanceCounterConfiguration>

    ...and so on

Further Reading