Skip to content

Conversation

@med-ayssar
Copy link
Contributor

@med-ayssar med-ayssar commented Jul 20, 2025

Fix for #3532
The main issue occurs when CompressedSpikes is disabled — the wrong connection gets deleted. The following image illustrates the problem."

Issues:

  • With compressed-spikes off, std::lower_bound was being applied on unsorted vector.

  • With compressed-spikes off, and regardless of (1), disconnecting a connection was not implemented correctly, because the query on the snode_id and tnode_id delivers a wrong lcid.

  • While testing, I discovered that disconnecting a connection, it changes the entry in the sourceTable, and when working with a sorted sourceTable (compressed-spikes=true), we might get an undefined behaviour

connections

@med-ayssar med-ayssar changed the title Fix issue: #3532: Disconnect given compressedSpikes Fix issue: #3532: Disconnect issue with CompressedSpikes:ON/Off Jul 20, 2025
@gtrensch gtrensch added T: Bug Wrong statements in the code or documentation S: Normal Handle this with default priority labels Jul 25, 2025
@gtrensch gtrensch added this to Kernel Jul 25, 2025
@github-project-automation github-project-automation bot moved this to In progress in Kernel Jul 25, 2025
@heplesser heplesser added S: High Should be handled next I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) and removed S: Normal Handle this with default priority labels Aug 7, 2025
@heplesser
Copy link
Contributor

@med-ayssar Thanks for your detective work and the PR! I will look at it in detail asap.

Copy link
Contributor

@heplesser heplesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@med-ayssar Thanks for the PR! I am not entirely sure I follow the logic around nth_equal(), see comments below.

I also wonder if, for the uncompressed case, we should not take a different approach. In that case, I believe, we simply need to run through all connections of a given synapse type linearly and disconnect the first one for which source and target match, and which has not yet been disconnected. I guess that means parallel iteration through source and target tables. The split into find_first_source() and find_first_target() does not make sense in that case.

We also need some tests, including tests where there are multiple connections between the same source-target pair and we delete one at a time.

@github-project-automation github-project-automation bot moved this from In progress to PRs pending approval in Kernel Aug 8, 2025
@med-ayssar
Copy link
Contributor Author

@heplesser: Is it possible to setup a meeting to discuss this? Any time after 17:30 should work for me.

@med-ayssar
Copy link
Contributor Author

@heplesser :You’re a bit fast. I’m just using this time to adjust things while waiting for other processes to compile at work :D, but final version at night :D.

@med-ayssar
Copy link
Contributor Author

med-ayssar commented Sep 22, 2025

@heplesser: By the way, when deleting a connection, we mark the corresponding entry in the source table with MAX::value. As a result, under compressed spike-on conditions, calling std::lower_bound may exhibit undefined behavior because the sources_table is no longer guaranteed to be sorted.

[std::lower_bound](https://en.cppreference.com/w/cpp/algorithm/lower_bound.html#:~:text=exists.-,If%20the%20elements,%2C%20the%20behavior%20is%20undefined.,-(until%20C%2B%2B20):
the elements elem of [first, last) are not partitioned with respect to the expression bool(elem < value), the behavior is undefined.

@heplesser
Copy link
Contributor

@med-ayssar Good catch that you noticed the undefined behavior of lower_bound once we have disconnected sources. I see two solutions here: Either a hand-written lower bound function that handles DISABLED_NODE_ID in a sensible way, but this will likely cause considerable overhead in a function that could be called a lot during structural plasticity simulations. Or to add another bit flag in the Source class to indicate disabled classes explicitly. In that way, we can use std::lower_bound(). We would still have 61 bits left for the node IDs, enough for roughly 10^18 neurons, far more than the 10^11 in the human brain.

@med-ayssar
Copy link
Contributor Author

I found another issue that occurs when compressed spikes is turned off. It should be fixed now, and once the CI is green, I’ll update the description with an explanation of the problem.

@med-ayssar
Copy link
Contributor Author

med-ayssar commented Sep 25, 2025

@heplesser please take a look again.

  • Shuffling the edges list is important. The problem does not occur if connections are deleted in order, but when edges are randomly selected for deletion, it breaks the source table (check the Description).

  • Handling the UB will be the next commit, or does it make sense to have it in its own PR?

@med-ayssar
Copy link
Contributor Author

@heplesser (CC @JanVogelsang ): It seems that in C++20, std::lower_bound no longer has undefined behavior.
https://en.cppreference.com/w/cpp/algorithm/lower_bound.html#:~:text=until%20C%2B%2B20)-,Equivalent%20to,.,-(since%20C%2B%2B20

Am I getting it correctly?

@JanVogelsang
Copy link
Contributor

@heplesser (CC @JanVogelsang ): It seems that in C++20, std::lower_bound no longer has undefined behavior. https://en.cppreference.com/w/cpp/algorithm/lower_bound.html#:~:text=until%20C%2B%2B20)-,Equivalent%20to,.,-(since%20C%2B%2B20

Am I getting it correctly?

Hm good question, it's not 100% . But couldn't we just use a custom Comp-Operator here which handles disabled node ids? Something like:

[]( const Source& source, const size_t node_id ){
  return source.get_node_id() == DISABLED_NODE_ID or source.get_node_id() < node_id;
}

Or what am I missing here? Maybe I didn't understand our problem.

@med-ayssar
Copy link
Contributor Author

  • With a custom-predicate, it is then 100% UB (check the std::lower_bound description).

  • Disabling a node in the source table causes the table to become unsorted when using compressed-spikes -> then UB with std::lower_bound(predicate)

@heplesser
Copy link
Contributor

@med-ayssar Thanks for working this through so carefully! As far as I understand the C++ documentation, there is no difference concerning undefined behavior for std::lower_bound() between C++20 and earlier (or later) versions. I fear we need to separate the compressed and non-compressed cases after all. For the compressed case, we can first search based on the source, and then from there on based on the target, because we know that targets are consecutive for a given source. For non-compressed, I think we just need to do a linear search in which we look at source and target of each element until we find one that matches on both and is not disabled.

I did not quite get your point that disconnecting breaks the ordering of the source table.

@med-ayssar
Copy link
Contributor Author

med-ayssar commented Sep 25, 2025

In my last commits, I separated the search for the connection in the find_connection function.

When we disconnect, we disable the wrong source entry, but now it should work

If you test with the current master, and delete the edges as the given order of GetConnection, and then try the same experiment with randomly selecting edges, you will see the difference, it will break throwing an exception that the connection doesn't exist.

The reason is that a previous deletion of a connection disabled the wrong source-entry.

@JanVogelsang
Copy link
Contributor

Ah yes, true. But HEP's suggestion would fix it (disabled flag).

@med-ayssar
Copy link
Contributor Author

To Summarize:

  1. With compressed-spikes off, std::lower_bound was being applied on unsorted vector -> now it is fixed.

  2. With compressed-spikes off, and regardless of (1), disconnecting a connection was not implemented correctly, because the query on the snode_id and tnode_id delivers a wrong lcid -> now is fixed.

  3. While testing, I discovered that disconnecting a connection, it changes the entry in the sourceTable, and when working with a sorted sourceTable (compressed-spikes=true), we might get an undefined behaviour -> not fixed yet.

@med-ayssar
Copy link
Contributor Author

med-ayssar commented Sep 25, 2025

[Off-topic]: In NEST, disconnecting means marking certain objects (i.e, sourceTable and connectionTable entries, in such case, calling the disable function. Calling nest.GetConnections should then remove those marked entries. Unfortunately, this is not the case. However; the result ofnest.GetConnections is still correct, but the underlying infrastructure might be in an inconsistent state.

It happens here: check the while-loop, and you will see why nothing is removed in some cases, despite they are marked for removal (disabled).

  1. Example 1 : n edges, and mark the first for deletion -> the first won't be deleted, because we start from the end of the table.
  2. when deleting non-consecutive edges -> the function only wants to delete a whole range! -> so no deletion.
size_t
nest::SourceTable::remove_disabled_sources( const size_t tid, const synindex syn_id )
{
  if ( sources_[ tid ].size() <= syn_id )
  {
    return invalid_index; // no source table entry for this synapse model
  }

  BlockVector< Source >& mysources = sources_[ tid ][ syn_id ];
  const size_t max_size = mysources.size();
  if ( max_size == 0 )
  {
    return invalid_index; // no connections for this synapse model
  }

  // lcid needs to be signed, to allow lcid >= 0 check in while loop
  // to fail; afterwards we can be certain that it is non-negative and
  // we can static_cast it to index
  long lcid = max_size - 1;
  while ( lcid >= 0 and mysources[ lcid ].is_disabled() )
  {
    --lcid;
  }
  const size_t first_invalid_lcid = static_cast< size_t >( lcid + 1 ); // loop stopped on first valid entry or -1
  if ( first_invalid_lcid == max_size )
  {
    return invalid_index; // all lcids are valid, nothing to remove
  }

  mysources.erase( mysources.begin() + first_invalid_lcid, mysources.end() );
  return first_invalid_lcid;
}

@heplesser
Copy link
Contributor

@med-ayssar In isolation, you are right that remove_disabled_sources() will not work correctly. It strictly depends on a source table that has been sorted after any connections have been disabled. But then it will work correctly.

@med-ayssar
Copy link
Contributor Author

med-ayssar commented Sep 26, 2025

Fair enough, but remember, the source table is not always sorted ( compressed spikes are off ). Otherwise the PR is ready for reviewing.

@cjotade cjotade self-assigned this Oct 6, 2025
@heplesser heplesser requested a review from cjotade October 7, 2025 19:52
@@ -0,0 +1,83 @@
# -*- coding: utf-8 -*-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a regression test, please rename it to test_regression_issue-3532.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: High Should be handled next T: Bug Wrong statements in the code or documentation

Projects

Status: PRs pending approval

Development

Successfully merging this pull request may close these issues.

6 participants