Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateSessionFromArray doesn't work #21946

Open
woaixiaoxiao opened this issue Sep 1, 2024 · 3 comments
Open

CreateSessionFromArray doesn't work #21946

woaixiaoxiao opened this issue Sep 1, 2024 · 3 comments
Labels
platform:mobile issues related to ONNX Runtime mobile; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot

Comments

@woaixiaoxiao
Copy link

Describe the issue

I now want multiple threads to load the same model and perform inference in a data-parallel manner. To reduce memory usage, I want to avoid having each session individually read the ONNX file from disk into memory. The approach I am currently taking is to first read the ONNX file into memory and then use CreateSessionFromArray to create sessions. I referred to this issue: #8328 during this process. However, it doesn't seem to be working as expected; CreateSessionFromArray does not save memory usage.

To reproduce

you can use the python script to get the onnx file, and the use c++ code to run。

import torch
import torch.nn as nn

class LSTM(nn.Module):
    def __init__(self, input_size, output_size, out_channels, num_layers, device):
        super(LSTM, self).__init__()
        self.device = device
        self.input_size = input_size
        self.hidden_size = input_size
        self.num_layers = num_layers
        self.output_size = output_size

        self.lstm = nn.LSTM(input_size=self.input_size,
                            hidden_size=self.hidden_size,
                            num_layers=self.num_layers,
                            batch_first=True)

        self.out_channels = out_channels

        self.fc = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, x):
        out, _ = self.lstm(x)

        if self.out_channels == 1:
            out = out[:, -1, :]
            out = self.fc(out)
            return out
        
        return out


batch_size = 1
input_size = 20
seq_len = 5
output_size = 10
num_layers = 1000
out_channels = 1

model = LSTM(input_size, output_size, out_channels, num_layers, "cpu")
model.eval() 

input_names = ["input"]    
output_names  = ["output"]  

x = torch.randn((batch_size, seq_len, input_size))
y = model(x)

torch.onnx.export(model, x, 'lstm.onnx', verbose=True, input_names=input_names, output_names=output_names,
  dynamic_axes={'input':[0], 'output':[0]} )
#include "onnxruntime_c_api.h"
#include "onnxruntime_session_options_config_keys.h"
#include <chrono>
#include <cstddef>
#include <fstream>
#include <ios>
#include <iostream>
#include <memory>
#include <onnxruntime_cxx_api.h>
#include <unistd.h>
#include <vector>

std::vector<char> loadModel(const char *model_path) {
  std::ifstream model_file(model_path, std::ios::binary | std::ios::ate);
  if (!model_file.is_open()) {
    throw std::runtime_error("无法打开模型文件");
  }

  std::streamsize size = model_file.tellg();
  model_file.seekg(0, std::ios::beg);

  std::vector<char> buffer(size);
  if (!model_file.read(buffer.data(), size)) {
    throw std::runtime_error("无法读取模型文件");
  }

  return buffer;
}

inline size_t getCurrentRSS() {
  std::ifstream stat_stream("/proc/self/stat", std::ios_base::in);
  std::string pid, comm, state, ppid, pgrp, session, tty_nr;
  std::string tpgid, flags, minflt, cminflt, majflt, cmajflt;
  std::string utime, stime, cutime, cstime, priority, nice;
  std::string O, itrealvalue, starttime;
  unsigned long vsize;
  long rss;
  stat_stream >> pid >> comm >> state >> ppid >> pgrp >> session >> tty_nr >>
      tpgid >> flags >> minflt >> cminflt >> majflt >> cmajflt >> utime >>
      stime >> cutime >> cstime >> priority >> nice >> O >> itrealvalue >>
      starttime >> vsize >> rss;
  stat_stream.close();
  return rss * sysconf(_SC_PAGE_SIZE);
}

inline size_t checkMemoryUsage(const std::string &point) {
  size_t memory = getCurrentRSS();
  std::cout << "memory usage at " << point << ": " << memory / (1024.0 * 1024.0)
            << " MB" << std::endl;
  return memory;
}

// ref_func does not use CreateSessionFromArray
std::vector<float> ref_func(int thread_num) {
  Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "Default");
  Ort::SessionOptions session_options;
  session_options.SetIntraOpNumThreads(1);
  session_options.SetGraphOptimizationLevel(
      GraphOptimizationLevel::ORT_ENABLE_ALL);

  const char *model_path = "../5_rnn/lstm.onnx";

  size_t before = checkMemoryUsage("before create session");
  // 创建线程池和多个session
  std::vector<std::unique_ptr<Ort::Session>> sessions;
  for (int i = 0; i < thread_num; ++i) {
    sessions.push_back(
        std::make_unique<Ort::Session>(env, model_path, session_options));
  }
  size_t after = checkMemoryUsage("after create session");
  std::cout << "create session memory usage: "
            << (after - before) / (1024.0 * 1024.0) << " MB" << std::endl;
  return std::vector<float>();
}

std::vector<float> test_func(int thread_num) {
  Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "Default");

  Ort::SessionOptions session_options;
  session_options.SetIntraOpNumThreads(1);
  session_options.SetGraphOptimizationLevel(
      GraphOptimizationLevel::ORT_ENABLE_ALL);
  session_options.AddConfigEntry(
      kOrtSessionOptionsConfigUseORTModelBytesDirectly, "1");

  const char *model_path = "../5_rnn/lstm.onnx";
  std::vector<char> model_data = loadModel(model_path);

  std::cout << "model size: " << model_data.size() / (1024.0 * 1024.0) << " MB"
            << std::endl;

  auto before_create_session = checkMemoryUsage("before create session");

  // 创建线程池和多个session
  std::vector<std::unique_ptr<Ort::Session>> sessions;
  for (int i = 0; i < thread_num; ++i) {

    sessions.push_back(std::make_unique<Ort::Session>(
        env, model_data.data(), model_data.size(), session_options));
  }

  auto after_create_session = checkMemoryUsage("after create session");
  std::cout << "create session memory usage: "
            << (after_create_session - before_create_session) /
                   (1024.0 * 1024.0)
            << " MB" << std::endl;

  return std::vector<float>();
}

int main() {
  //   std::cout << "========= dose not use CreateSessionFromArray ========= "
  //             << std::endl;
  //   ref_func(8);
  std::cout << "========= use CreateSessionFromArray ========= " << std::endl;
  test_func(8);
  return 0;
}

Urgency

No response

Platform

Linux

OS Version

centos8

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-linux-x64-gpu-1.19.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Sep 1, 2024
@woaixiaoxiao
Copy link
Author

One interesting thing is that only one function can be tested at a time; otherwise, the test results may be inaccurate due to memory used by the previous function not being released in time.

image

image

image

@skottmckay
Copy link
Contributor

InferenceSession::Run is stateless and can be called concurrently. Given that, do you need multiple sessions with the same model?

The settings to use bytes directly require an ORT format model. See https://onnxruntime.ai/docs/performance/model-optimizations/ort-format-models.html#convert-onnx-models-to-ort-format

Copy link
Contributor

github-actions bot commented Oct 3, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:mobile issues related to ONNX Runtime mobile; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants