Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically collect and track benchmark results over time #633

Open
4 tasks
chippmann opened this issue May 13, 2024 · 0 comments
Open
4 tasks

Automatically collect and track benchmark results over time #633

chippmann opened this issue May 13, 2024 · 0 comments
Labels
performance Related to performance problem

Comments

@chippmann
Copy link
Contributor

Internally we already talked about this a few times in the past and already outlined the first steps;
The goal is to track the performance of our binding over time in an automated and reproducible way.

As a reminder; we already have a benchmark project which tests the performance of our binding in key areas compared to typed and untyped gdscript and it already outputs a usable json with key metrics and raw data.

While this is useful, it never was a clear indication of our performance as it was never run on the same machine over time, and had to be executed manually.

Hence we decided the following which this issue should track:

  • Setup a self-hosted windows 11 github runner (we cannot use one provided by github as these do not have consistent performance characteristics. Like different CPU models with different IPC and so on)
  • Setup a github workflow which runs the benchmarks after changes
  • Aggregate the resulting data automatically in spreadsheets and visualize the performance over time (here we talked about firebase cloud functions and google sheets. more on that later)
  • Make these metrics publicly available for full transparency

The following has already been done:

  • I set up a self hosted github runner at home with dedicated hardware. The sole purpose of this runner is to run these benchmarks and nothing else.
  • A test spread sheet is set up (Note: the current data in this sheet is NOT representative! It's just TEST DATA!) and data is automatically aggregated with the following AppScript (a first draft):
const spreadsheetId = '<sheet_id>'

function doPost(e) {
  var jsonData = JSON.parse(e.postData.contents);
  var timestamp = new Date().toLocaleString();
  var spreadsheet = SpreadsheetApp.openById(spreadsheetId);
  
  // Removing all existing charts in the Dashboard
  var dashboard = spreadsheet.getSheetByName("Dashboard");
  if(dashboard) {
    var charts = dashboard.getCharts();
    for(var i=0; i<charts.length; i++) {
      dashboard.removeChart(charts[i]);
    }
  }
  
  for(var benchmark in jsonData['data']) {
    for(var language in jsonData['data'][benchmark]) {
      // Naming the sheet as 'benchmark|language'
      var sheetName = benchmark+"|"+language;
      var sheet = spreadsheet.getSheetByName(sheetName);
      
      // If the sheet does not exist, create a new one
      if(!sheet) {
        sheet = spreadsheet.insertSheet(sheetName);
        var header = ['Timestamp', 'avg', 'min', 'max', 'median', 'p05', 'p95'];
        sheet.appendRow(header);
      }
      
      // Convert JSON data to row data
      var row = [timestamp];
      row.push(jsonData['data'][benchmark][language]['avg']);
      row.push(jsonData['data'][benchmark][language]['min']);
      row.push(jsonData['data'][benchmark][language]['max']);
      row.push(jsonData['data'][benchmark][language]['median']);
      row.push(jsonData['data'][benchmark][language]['p05']);
      row.push(jsonData['data'][benchmark][language]['p95']);
      
      // Append language data to sheet
      sheet.appendRow(row);
    }
  }

  // Create benchmark comparison line graphs 
  var allSheets = spreadsheet.getSheets();
  if(!dashboard) {
    dashboard = spreadsheet.insertSheet("Dashboard");
  }

  var benchmarkCharts = {};
  var benchmarkChartsSeries = {};

  for(var i=0; i<allSheets.length; i++) {
    var sheetName = allSheets[i].getName();
    if(sheetName != "Dashboard"){
      var benchmarkName = sheetName.split("|")[0];
      var languageName = sheetName.split("|")[1];
      var lastRow = allSheets[i].getLastRow();
      
      // If it's a new benchmark, initialize a new chart builder
      if(!(benchmarkName in benchmarkCharts)) {
        benchmarkChartsSeries[benchmarkName] = [{labelInLegend: languageName}]
        benchmarkCharts[benchmarkName] = dashboard.newChart()
          .asLineChart()
          .setOption('title', 'Performance Trend for ' + benchmarkName)
          .setOption('hAxis.title', 'Time')
          .setOption('vAxis.title', 'Average Score');
      } else {
        benchmarkChartsSeries[benchmarkName].push({labelInLegend: languageName})
      }
      
      // Adding the range from current sheet excluding header and adding language as a series
      benchmarkCharts[benchmarkName].addRange(allSheets[i].getRange(2, 1, lastRow - 1, 2));
    }
  }
  
  // Build and insert all benchmark charts
  var position = 1;
  for(var benchmark in benchmarkCharts) {
    // Update position for the new chart.
    benchmarkCharts[benchmark]
      .setPosition(position * 20, 1, 0, 0)
      .setOption('series', benchmarkChartsSeries[benchmark]);
    var chart = benchmarkCharts[benchmark].build();
    dashboard.insertChart(chart);
    position++; // Update position for the next chart
  }
}

The following needs to be done:

  • Give RDP access to other maintainers for maintenance
  • Add self-hosted runner to utopia-rise organisation
  • Setup workflow to run the benchmarks on changes
  • Setup final spread sheet and app script

The following points can be improved at a later stage:

  • Possibly Migrating from AppScript and google sheets to Firebase cloud functions and firestore (or Supabase equivalents)
  • Run benchmarks as part of PR pipeline to see possible performance problems as part of the PR process
@chippmann chippmann added the performance Related to performance problem label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to performance problem
Projects
None yet
Development

No branches or pull requests

1 participant