Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[memprof] Add extractCallsFromIR #115218

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kazutakahirata
Copy link
Contributor

This patch adds extractCallsFromIR, a function to extract calls from
the IR, which will be used to undrift call site locations in the
MemProf profile.

In a nutshell, the MemProf undrifting works as follows:

  • Extract call site locations from the IR.
  • Extract call site locations from the MemProf profile.
  • Undrift the call site locations with longestCommonSequence.

This patch implements the first bullet point above. Specifically,
given the IR, the new function returns a map from caller GUIDs to
lists of corresponding call sites. For example:

Given:

foo() {
f1();
f2(); f3();
}

extractCallsFromIR returns:

Caller: foo ->
{{(Line 1, Column 3), Callee: f1},
{(Line 2, Column 3), Callee: f2},
{(Line 2, Column 9), Callee: f3}}

where the line numbers, relative to the beginning of the caller, and
column numbers are sorted in the ascending order. The value side of
the map -- the list of call sites -- can be directly passed to
longestCommonSequence.

To facilitate the review process, I've only implemented basic features
in extractCallsFromIR in this patch.

  • The new function extracts calls from the LLVM "call" instructions
    only. It does not look into the inline stack.
  • It does not recognize or treat heap allocation functions in any
    special way.

I will address these missing features in subsequent patches.

This patch adds extractCallsFromIR, a function to extract calls from
the IR, which will be used to undrift call site locations in the
MemProf profile.

In a nutshell, the MemProf undrifting works as follows:

- Extract call site locations from the IR.
- Extract call site locations from the MemProf profile.
- Undrift the call site locations with longestCommonSequence.

This patch implements the first bullet point above.  Specifically,
given the IR, the new function returns a map from caller GUIDs to
lists of corresponding call sites.  For example:

Given:

  foo() {
    f1();
    f2(); f3();
  }

extractCallsFromIR returns:

  Caller: foo ->
    {{(Line 1, Column 3), Callee: f1},
     {(Line 2, Column 3), Callee: f2},
     {(Line 2, Column 9), Callee: f3}}

where the line numbers, relative to the beginning of the caller, and
column numbers are sorted in the ascending order.  The value side of
the map -- the list of call sites -- can be directly passed to
longestCommonSequence.

To facilitate the review process, I've only implemented basic features
in extractCallsFromIR in this patch.

- The new function extracts calls from the LLVM "call" instructions
  only.  It does not look into the inline stack.
- It does not recognize or treat heap allocation functions in any
  special way.

I will address these missing features in subsequent patches.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 6, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Kazu Hirata (kazutakahirata)

Changes

This patch adds extractCallsFromIR, a function to extract calls from
the IR, which will be used to undrift call site locations in the
MemProf profile.

In a nutshell, the MemProf undrifting works as follows:

  • Extract call site locations from the IR.
  • Extract call site locations from the MemProf profile.
  • Undrift the call site locations with longestCommonSequence.

This patch implements the first bullet point above. Specifically,
given the IR, the new function returns a map from caller GUIDs to
lists of corresponding call sites. For example:

Given:

foo() {
f1();
f2(); f3();
}

extractCallsFromIR returns:

Caller: foo ->
{{(Line 1, Column 3), Callee: f1},
{(Line 2, Column 3), Callee: f2},
{(Line 2, Column 9), Callee: f3}}

where the line numbers, relative to the beginning of the caller, and
column numbers are sorted in the ascending order. The value side of
the map -- the list of call sites -- can be directly passed to
longestCommonSequence.

To facilitate the review process, I've only implemented basic features
in extractCallsFromIR in this patch.

  • The new function extracts calls from the LLVM "call" instructions
    only. It does not look into the inline stack.
  • It does not recognize or treat heap allocation functions in any
    special way.

I will address these missing features in subsequent patches.


Full diff: https://github.com/llvm/llvm-project/pull/115218.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h (+35)
  • (modified) llvm/lib/Transforms/Instrumentation/MemProfiler.cpp (+47)
  • (modified) llvm/unittests/Transforms/Instrumentation/CMakeLists.txt (+1)
  • (added) llvm/unittests/Transforms/Instrumentation/MemProfUseTest.cpp (+108)
diff --git a/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h b/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
index f92c6b4775a2a2..076a2785bbaa77 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
@@ -57,6 +57,41 @@ class MemProfUsePass : public PassInfoMixin<MemProfUsePass> {
   IntrusiveRefCntPtr<vfs::FileSystem> FS;
 };
 
+namespace memprof {
+
+struct LineLocation {
+  LineLocation(uint32_t L, uint32_t D) : LineOffset(L), Column(D) {}
+
+  void print(raw_ostream &OS) const;
+  void dump() const;
+
+  bool operator<(const LineLocation &O) const {
+    return LineOffset < O.LineOffset ||
+           (LineOffset == O.LineOffset && Column < O.Column);
+  }
+
+  bool operator==(const LineLocation &O) const {
+    return LineOffset == O.LineOffset && Column == O.Column;
+  }
+
+  bool operator!=(const LineLocation &O) const {
+    return LineOffset != O.LineOffset || Column != O.Column;
+  }
+
+  uint64_t getHashCode() const { return ((uint64_t)Column << 32) | LineOffset; }
+
+  uint32_t LineOffset;
+  uint32_t Column;
+};
+
+// A pair of a call site location and its corresponding callee GUID.
+using CallEdgeTy = std::pair<LineLocation, uint64_t>;
+
+// Extract all calls from the IR.  Arrange them in a map from caller GUIDs to a
+// list of call sites, each of the form {LineLocation, CalleeGUID}.
+DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> extractCallsFromIR(Module &M);
+
+} // namespace memprof
 } // namespace llvm
 
 #endif
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
index 70bee30fd151f6..fef11d9ffe306f 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
@@ -795,6 +795,53 @@ struct AllocMatchInfo {
   bool Matched = false;
 };
 
+DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>>
+memprof::extractCallsFromIR(Module &M) {
+  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;
+
+  auto GetOffset = [](const DILocation *DIL) {
+    return (DIL->getLine() - DIL->getScope()->getSubprogram()->getLine()) &
+           0xffff;
+  };
+
+  for (Function &F : M) {
+    if (F.isDeclaration())
+      continue;
+
+    for (auto &BB : F) {
+      for (auto &I : BB) {
+        const DILocation *DIL = I.getDebugLoc();
+        if (!DIL)
+          continue;
+
+        if (!isa<CallBase>(&I) || isa<IntrinsicInst>(&I))
+          continue;
+
+        auto *CB = dyn_cast<CallBase>(&I);
+        auto *CalledFunction = CB->getCalledFunction();
+        if (!CalledFunction || CalledFunction->isIntrinsic())
+          continue;
+
+        StringRef CalleeName = CalledFunction->getName();
+        uint64_t CallerGUID =
+            IndexedMemProfRecord::getGUID(DIL->getSubprogramLinkageName());
+        uint64_t CalleeGUID = IndexedMemProfRecord::getGUID(CalleeName);
+        LineLocation Loc = {GetOffset(DIL), DIL->getColumn()};
+        Calls[CallerGUID].emplace_back(Loc, CalleeGUID);
+      }
+    }
+  }
+
+  // Sort each call list by the source location.
+  for (auto &KV : Calls) {
+    auto &Calls = KV.second;
+    llvm::sort(Calls);
+    Calls.erase(llvm::unique(Calls), Calls.end());
+  }
+
+  return Calls;
+}
+
 static void
 readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
             const TargetLibraryInfo &TLI,
diff --git a/llvm/unittests/Transforms/Instrumentation/CMakeLists.txt b/llvm/unittests/Transforms/Instrumentation/CMakeLists.txt
index 1f249b0049d062..80fac2353be416 100644
--- a/llvm/unittests/Transforms/Instrumentation/CMakeLists.txt
+++ b/llvm/unittests/Transforms/Instrumentation/CMakeLists.txt
@@ -8,6 +8,7 @@ set(LLVM_LINK_COMPONENTS
 )
 
 add_llvm_unittest(InstrumentationTests
+  MemProfUseTest.cpp
   PGOInstrumentationTest.cpp
   )
 
diff --git a/llvm/unittests/Transforms/Instrumentation/MemProfUseTest.cpp b/llvm/unittests/Transforms/Instrumentation/MemProfUseTest.cpp
new file mode 100644
index 00000000000000..21c7537852c4df
--- /dev/null
+++ b/llvm/unittests/Transforms/Instrumentation/MemProfUseTest.cpp
@@ -0,0 +1,108 @@
+//===- MemProfUseTest.cpp - MemProf use tests -----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/AsmParser/Parser.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/ProfileData/MemProf.h"
+#include "llvm/Support/SourceMgr.h"
+#include "llvm/Transforms/Instrumentation/MemProfiler.h"
+
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+
+namespace {
+using namespace llvm;
+
+TEST(MemProf, ExtractDirectCallsFromIR) {
+  // The following IR is generated from:
+  //
+  // void f1();
+  // void f2();
+  // void f3();
+  //
+  // void foo() {
+  //   f1();
+  //   f2(); f3();
+  // }
+  StringRef IR = R"IR(
+define dso_local void @_Z3foov() !dbg !10 {
+entry:
+  call void @_Z2f1v(), !dbg !13
+  call void @_Z2f2v(), !dbg !14
+  call void @_Z2f3v(), !dbg !15
+  ret void, !dbg !16
+}
+
+declare !dbg !17 void @_Z2f1v()
+
+declare !dbg !18 void @_Z2f2v()
+
+declare !dbg !19 void @_Z2f3v()
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7, !8}
+!llvm.ident = !{!9}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
+!1 = !DIFile(filename: "foobar.cc", directory: "/")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 1, !"MemProfProfileFilename", !"memprof.profraw"}
+!6 = !{i32 8, !"PIC Level", i32 2}
+!7 = !{i32 7, !"PIE Level", i32 2}
+!8 = !{i32 7, !"uwtable", i32 2}
+!9 = !{!"clang"}
+!10 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !1, file: !1, line: 5, type: !11, scopeLine: 5, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!11 = !DISubroutineType(types: !12)
+!12 = !{}
+!13 = !DILocation(line: 6, column: 3, scope: !10)
+!14 = !DILocation(line: 7, column: 3, scope: !10)
+!15 = !DILocation(line: 7, column: 9, scope: !10)
+!16 = !DILocation(line: 8, column: 1, scope: !10)
+!17 = !DISubprogram(name: "f1", linkageName: "_Z2f1v", scope: !1, file: !1, line: 1, type: !11, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized)
+!18 = !DISubprogram(name: "f2", linkageName: "_Z2f2v", scope: !1, file: !1, line: 2, type: !11, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized)
+!19 = !DISubprogram(name: "f3", linkageName: "_Z2f3v", scope: !1, file: !1, line: 3, type: !11, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized)
+)IR";
+
+  LLVMContext Ctx;
+  SMDiagnostic Err;
+  std::unique_ptr<Module> M = parseAssemblyString(IR, Err, Ctx);
+  ASSERT_TRUE(M);
+
+  auto Calls = memprof::extractCallsFromIR(*M);
+
+  // Expect exactly one caller.
+  ASSERT_THAT(Calls, testing::SizeIs(1));
+
+  auto It = Calls.begin();
+  ASSERT_NE(It, Calls.end());
+
+  const auto &[CallerGUID, CallSites] = *It;
+  EXPECT_EQ(CallerGUID, memprof::IndexedMemProfRecord::getGUID("_Z3foov"));
+  ASSERT_THAT(CallSites, testing::SizeIs(3));
+
+  // Verify that call sites show up in the ascending order of their source
+  // locations.
+  EXPECT_EQ(CallSites[0].first.LineOffset, 1U);
+  EXPECT_EQ(CallSites[0].first.Column, 3U);
+  EXPECT_EQ(CallSites[0].second,
+            memprof::IndexedMemProfRecord::getGUID("_Z2f1v"));
+
+  EXPECT_EQ(CallSites[1].first.LineOffset, 2U);
+  EXPECT_EQ(CallSites[1].first.Column, 3U);
+  EXPECT_EQ(CallSites[1].second,
+            memprof::IndexedMemProfRecord::getGUID("_Z2f2v"));
+
+  EXPECT_EQ(CallSites[2].first.LineOffset, 2U);
+  EXPECT_EQ(CallSites[2].first.Column, 9U);
+  EXPECT_EQ(CallSites[2].second,
+            memprof::IndexedMemProfRecord::getGUID("_Z2f3v"));
+}
+} // namespace

memprof::extractCallsFromIR(Module &M) {
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;

auto GetOffset = [](const DILocation *DIL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this logic into the LineLocation constructor? Then it will be fully encapsulated in that struct instead of having a bit of the logic outside and a bit inside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am torn. I understand your point, but then we would need to pass DIL to LineLocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, any reason you are opposed to passing DIL to it?

llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h Outdated Show resolved Hide resolved

auto *CB = dyn_cast<CallBase>(&I);
auto *CalledFunction = CB->getCalledFunction();
if (!CalledFunction || CalledFunction->isIntrinsic())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will skip indirect calls too. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's intended. In the future, we may be able to make undrifting smarter with respect to indirect calls. For now, I'd like to land the very basic feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to note this?

llvm/lib/Transforms/Instrumentation/MemProfiler.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@kazutakahirata kazutakahirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR. Please take a look. Thanks!

memprof::extractCallsFromIR(Module &M) {
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;

auto GetOffset = [](const DILocation *DIL) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am torn. I understand your point, but then we would need to pass DIL to LineLocation.


auto *CB = dyn_cast<CallBase>(&I);
auto *CalledFunction = CB->getCalledFunction();
if (!CalledFunction || CalledFunction->isIntrinsic())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's intended. In the future, we may be able to make undrifting smarter with respect to indirect calls. For now, I'd like to land the very basic feature.

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

memprof::extractCallsFromIR(Module &M) {
DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;

auto GetOffset = [](const DILocation *DIL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, any reason you are opposed to passing DIL to it?


auto *CB = dyn_cast<CallBase>(&I);
auto *CalledFunction = CB->getCalledFunction();
if (!CalledFunction || CalledFunction->isIntrinsic())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to note this?

const auto &[CallerGUID, CallSites] = *It;
EXPECT_EQ(CallerGUID, memprof::IndexedMemProfRecord::getGUID("_Z3foov"));
ASSERT_THAT(CallSites, testing::SizeIs(3));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move the testing::Pair, testing::FieldsAre and memprof::IndexedMemProfRecord::getGUID to using decls above in the anonymous namespace so that you don't have to fully qualify them each time you use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants