Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bunch of microoptimizations #1600

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

olefirenque
Copy link
Contributor

No description provided.

@olefirenque olefirenque force-pushed the bunch-of-microoptimizations branch from ca140d7 to b846569 Compare February 1, 2024 21:51

// avoiding extra allocations with a static storage for m_covers
static xr_vector<std::optional<CCoverPoint>> quadTreeStaticStorage;
Copy link
Contributor Author

@olefirenque olefirenque Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a good trade-off between allocs/RSS. It would be nice to hear an opinion on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not recommended to use global/static objects which allocate memory dynamically.

Comment on lines 102 to +110
if (!actual())
prepare();
{
static std::mutex prepareMutex;
std::lock_guard lock(prepareMutex);

// Double-checked locking
if (!actual())
prepare();
}
Copy link
Contributor Author

@olefirenque olefirenque Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was only non-concurrent part for xr_parallel_for in CSpaceRestrictionShape::fill_shape. Since it is called only once for a sequence of CSpaceRestrictor::inside calls, maybe it would be better to extract this to reduce contention

xr_vector<bool> m_temp;
// vector<bool> is not applicable for `m_temp`
// since it is filled in parallel_for (https://timsong-cpp.github.io/cppwp/container.requirements.dataraces).
xr_vector<int> m_temp;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it wasn't thread-safe before

if (m_temp[i] && critical_cover(i))
m_covers->insert(xr_new<CCoverPoint>(ai().level_graph().vertex_position(ai().level_graph().vertex(i)), i));
for (auto &p : quadTreeStaticStorage)
if (p.has_value())
Copy link
Contributor Author

@olefirenque olefirenque Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed that accessing m_temp and calling critical_cover took some time here

for (u32 i = 0; i < levelVertexCount; ++i)
        if (m_temp[i] && critical_cover(i))
            m_covers->insert(xr_new<CCoverPoint>(ai().level_graph().vertex_position(ai().level_graph().vertex(i)), i));

image

Copy link
Member

@Xottab-DUTY Xottab-DUTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small initial review. I didn't checked changes in xrGame yet.
Please, fix the code style there as per review comments :)

src/xrAICore/Navigation/level_graph_inline.h Outdated Show resolved Hide resolved
src/xrAICore/Navigation/level_graph_inline.h Outdated Show resolved Hide resolved
src/xrAICore/Navigation/level_graph_inline.h Outdated Show resolved Hide resolved
src/Layers/xrRender_R2/r2_R_lights.cpp Outdated Show resolved Hide resolved
src/Layers/xrRender_R2/r2_R_lights.cpp Show resolved Hide resolved
@Xottab-DUTY Xottab-DUTY added Enhancement Renderer AI Artificial Intelligence labels Feb 1, 2024
src/xrGame/space_restriction_shape.cpp Outdated Show resolved Hide resolved
src/xrGame/space_restriction_shape.cpp Outdated Show resolved Hide resolved
src/xrGame/space_restriction_shape.cpp Outdated Show resolved Hide resolved
@olefirenque olefirenque force-pushed the bunch-of-microoptimizations branch from baf96ec to ec99fb2 Compare February 1, 2024 23:49
@olefirenque olefirenque force-pushed the bunch-of-microoptimizations branch from ec99fb2 to d9da42f Compare February 2, 2024 00:08
Comment on lines +94 to +103
{
if (inside(graph.vertex_id(&vertex), true) &&
!inside(graph.vertex_id(&vertex), false))
m_border_chunk.push_back(graph.vertex_id(&vertex));
}
std::lock_guard lock(mergeMutex);
if (m_border.capacity() < m_border.size() + m_border_chunk.size())
m_border.reserve(m_border.size() + m_border_chunk.size());
for (auto x : m_border_chunk)
m_border.push_back(x);
Copy link
Contributor Author

@olefirenque olefirenque Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little research on this, and it seemed to me that the body of CBorderMergePredicate::operator(...) is a bigger problem than merging a certain number of chunks under lock
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it might be irrelevant since I accidentally used UI Freeze option.
I'm sorry for the misinformation. Apparently, this flame graph represents only those samples that were sampled when the NPC's spawning was lagging.

The UI Freeze event indicates time intervals where the application was unable to respond to user input. More specifically, these are time intervals where window messages were not pumped for more than 200 ms or processing of a particular message took more than 200 ms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olefirenque, what profiler did you use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this strongly looks like tracy but I might be wrong there

@Xottab-DUTY Xottab-DUTY requested a review from vTurbine February 3, 2024 14:56
@olefirenque olefirenque force-pushed the bunch-of-microoptimizations branch from df2b8ae to 6f1c24d Compare February 3, 2024 15:06
@Xottab-DUTY
Copy link
Member

Xottab-DUTY commented May 23, 2024

@olefirenque, hi! Sorry for a big delay in review & merging!
I've made big improvements to the task manager: now it supports lambda with captures.
Could you rebase this PR to accommodate the changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Artificial Intelligence Enhancement Renderer
Projects
Status: To do
Development

Successfully merging this pull request may close these issues.

4 participants