Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radix tree prototype for profiler #245

Merged
merged 6 commits into from
Nov 12, 2023

Conversation

q40603
Copy link
Collaborator

@q40603 q40603 commented Oct 20, 2023

This pull request is the next step for 190

Basic Information

The current scope-based timer enables one to simply profile the program.
One can insert a macro at target function to record the execution time and exact hit count.
The scope-based timer used a map with function name as key to record how many times a function is executed.

std::map<std::string, TimedEntry> m_entry;

With such information, a profiling report could be generated.

Euler1DCore.march_alpha2 : count = 51 , time = 0.0365707 (second)
Euler1DCore.setup_march : count = 1 , time = 2.2676e-05 (second)
Euler1DCore::density : count = 51 , time = 0.0020337 (second)
...

Problem to solve

Rather than knowing how many times a single function is executed, I think people will be more interested in how many times a single call path is hit during profiling as a single function may have different callers.
Also, with call path information, I am able to generate a flame graph in the future.
To record the call path, I propose to use a radix tree to dynamically construct the call graph.

Each node acts as an function being executed, and it has:

  • TimedEntry
  • List of children node that represent callees

Prototype expected output

root - Count: 0 - Time: 0
  a - Count: 1 - Time: 5
    b - Count: 1 - Time: 3
    c - Count: 1 - Time: 7
      d - Count: 1 - Time: 9
        e - Count: 1 - Time: 1
    f - Count: 1 - Time: 2.41 

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Build the code using CMake
  • Put everything under modmesh namespace
  • Use a header file
  • Ending comment
  • Put test code in Google test directory

cpp/modmesh/toggle/RadixTree.cpp Outdated Show resolved Hide resolved
cpp/modmesh/toggle/RadixTree.cpp Outdated Show resolved Hide resolved
cpp/modmesh/toggle/RadixTree.cpp Outdated Show resolved Hide resolved
cpp/modmesh/toggle/RadixTree.cpp Outdated Show resolved Hide resolved
cpp/modmesh/toggle/RadixTree.cpp Outdated Show resolved Hide resolved
@yungyuc yungyuc added the performance Profiling, runtime, and memory consumption label Oct 21, 2023
@yungyuc
Copy link
Member

yungyuc commented Oct 25, 2023

@q40603 Please let me know what's your progress or plan

@q40603
Copy link
Collaborator Author

q40603 commented Oct 25, 2023

@yungyuc
I have fixed the key std::string comparison using a unordered_map to hash the function name. (code not committed yet)
I will later fix the namespace issue and build the code with CMake.
After that, I will profile the modmesh with my prototype.
The ETA should be 2023/10/29 Sunday.

@yungyuc
Copy link
Member

yungyuc commented Nov 5, 2023

@q40603 if you encounter CI issue, try to rebase to the latest master to pick up #248 .

Please let a global comment when you want the code to be reviewed or have anything to discuss.

@q40603
Copy link
Collaborator Author

q40603 commented Nov 7, 2023

  • Build the code using CMake
  • Put everything under modmesh namespace
  • Use a header file
  • Ending comment
  • Put test code in Google test directory

todo

  1. add more test cases in gtest (e.g. tree print)

questions:

  1. I am afraid that I used too much raw pointers

@yungyuc
Copy link
Member

yungyuc commented Nov 8, 2023

  • Build the code using CMake
  • Put everything under modmesh namespace
  • Use a header file
  • Ending comment
  • Put test code in Google test directory

todo

  1. add more test cases in gtest (e.g. tree print)

It's a good idea to add more tests. For printing, I'd suggest to do it in Python, so the first step is to make pybind11 wrapper.

Testing string representation in Python is way easier than doing that in C++.

It is a separate task. You should use another PR to do it. I will write up an issue to keep track of the followup work.

questions:

  1. I am afraid that I used too much raw pointers

Using raw pointers is the right thing to do. Inside the tree implementation we should either use a pointer or an index that works like a raw pointer. The ordinary smart pointers (STL ones) are too heavy-weight for performance.

Could you please make the following finishing touches?

  • Pass all CI
  • Squash and rebase the branch to make history clean

Hopefully the code can be merged in a day or two. But it depends on your time.

@q40603 q40603 force-pushed the quentin/profiler_radix_tree branch 3 times, most recently from 5c344bf to 5d18142 Compare November 9, 2023 09:54
Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please avoid touching unrelated files. You probably need to rebase to the latest master.

.github/workflows/devbuild.yml Outdated Show resolved Hide resolved
Enhance:
1. generate key for function name to do key comparison
2. move getChild logic from tree to node

enhance:
1. use try_emplace to get function id

enhance:
1. add operator overloading in TimedEntry for printing.
2. add template for better usability

enhance:
1. add ref in comment

enhance:
1. rename getFunctionId to getID for general purpose

enhance:
1. change to header file
2. add testing to gtests dir

radix tree prototype for profiler

Enhance:
1. generate key for function name to do key comparison
2. move getChild logic from tree to node

enhance:
1. use try_emplace to get function id

enhance:
1. add operator overloading in TimedEntry for printing.
2. add template for better usability

enhance:
1. add ref in comment

enhance:
1. rename getFunctionId to getID for general purpose

enhance:
1. change to header file
2. add testing to gtests dir

remove int main in header file

clang format

add parameter dialog

RParameter: support int64/double types

RParameter: allow building parameters list at runtime

modmesh/params.py: add license

RParameter: clang-format fix

modmesh/params.py: fix flake8 issue

check PUI version

modmesh/params.py: fix test example

[hotfix] reopen brew update

freeze llvm version
@q40603 q40603 force-pushed the quentin/profiler_radix_tree branch from 5d18142 to f0a8a19 Compare November 9, 2023 10:11
@yungyuc
Copy link
Member

yungyuc commented Nov 9, 2023

Please format and tidy the code before push. Using clang-format and clang-tidy locally will save your time.

@yungyuc yungyuc changed the title radix tree prototype for profiler [draft] radix tree prototype for profiler Nov 11, 2023
@q40603
Copy link
Collaborator Author

q40603 commented Nov 12, 2023

  • Please avoid touching unrelated files. You probably need to rebase to the latest master.

Ready for review,

  • Formated the code and passed the CI
  • Add more tests in gtest.

@q40603
Copy link
Collaborator Author

q40603 commented Nov 12, 2023

Also, it just came up to me that should I handle recursive function profiling?
Because if the recursive function is executed, the tree will become very large and deep.
But I also wonder if recursive function is used in high performance computing?

@yungyuc
Copy link
Member

yungyuc commented Nov 12, 2023

Also, it just came up to me that should I handle recursive function profiling? Because if the recursive function is executed, the tree will become very large and deep. But I also wonder if recursive function is used in high performance computing?

Yes, we should handle recursion. And you are right that we tend to avoid recursion in HPC. To be precise, we do not like function calls in tight loop, not the recursion itself. Good (fast) recursion may be replaced to loop so that in the optimized code it does not cause runtime issue. However, as you pointed out, in the profiling mode, recursion creates crazy stack entries.

As a profiling tool we need to handle recurse. It is not because the code wants to use recursion, but to be able to analyze where recursion happens so that we can eliminate it. Sometimes we tolerate the non-ideal construct for quick delivery, and the profiling tool needs to show proper information.

@yungyuc yungyuc merged commit 1f72bad into solvcon:master Nov 12, 2023
11 checks passed
@tigercosmos
Copy link
Collaborator

it seems the code is not yet usable. is there any follow-up? @q40603

@tigercosmos
Copy link
Collaborator

I found it's somehow difficult to find the hierarchical representation of a object

say an object is

struct Foo {
  void Bar1();
  Void Bar2();
};

slightly modifying the proposal output, I expect to get something like this:

Foo
    Bar1 - Count: 1 - Time: 3
    Bar2 - Count: 1 - Time: 7

I expect to get a hierarchical representation like Foo::Bar1, so we can leverage the radix tree, but seems that C++ does't natively support it, and there is no common solution for this.

@yungyuc
Copy link
Member

yungyuc commented Dec 1, 2023

@q40603 please comment.

@tigercosmos
Copy link
Collaborator

I already found the answer, will open a PR soon :)

@q40603
Copy link
Collaborator Author

q40603 commented Dec 2, 2023

I found it's somehow difficult to find the hierarchical representation of a object

say an object is

struct Foo {
  void Bar1();
  Void Bar2();
};

slightly modifying the proposal output, I expect to get something like this:

Foo
    Bar1 - Count: 1 - Time: 3
    Bar2 - Count: 1 - Time: 7

I expect to get a hierarchical representation like Foo::Bar1, so we can leverage the radix tree, but seems that C++ does't natively support it, and there is no common solution for this.

@tigercosmos Thanks for bringing this up. From you comment, it seems that you want to use the radix tree to visualize the hierarchical representation of an object?

The radix tree prototype is targeted at improving the TimeRegistry so function call path could be recorded during profiling.

To profile the program, a user is expected to put the macro at the function he or she is interested in, such as Euler1DCore.cpp, a macro is put at the beginning of the function.

In your case, the macro that needed to be put in function Bar1 should be like this

MODMESH_TIME("Foo::Bar1");

Consider the following code.

struct Foo f;
f.Bar1();
f.Bar2();
f.Bar1();

The profiling report should be like this

root
  Foo::Bar1 - Count: 2 - Time: 5.4
  Foo::Bar2 - Count: 1 - Time: 2.1

This was referenced Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Profiling, runtime, and memory consumption
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants