-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement masking to control how embedded points are updated #620
base: master
Are you sure you want to change the base?
Conversation
82fbb28
to
c35db5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. I would like to consider different names than pin_mask
. It is a small thing, and naming is hard (I don't have a particularly good alternative suggestion), but good names help users a lot. My best suggestion at the moment is inertia
or sample_inertia
(similar to sample_weight
used in other sklearn models). Of course we would need then then pass through 1.0 - inertia
but that seems tractable. I would welcome your thoughts however.
@@ -2671,7 +2727,7 @@ def fit_transform(self, X, y=None): | |||
r_emb: array, shape (n_samples) | |||
Local radii of data points in the embedding (log-transformed). | |||
""" | |||
self.fit(X, y) | |||
self.fit(X, y, pin_mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to make this a keyword arg in case we get more fit parameters later (such as a init
fit param).
self.fit(X, y, pin_mask) | |
self.fit(X, y, pink_mask=pin_mask) |
Note that the remaining test failures are due to how coveralls plays with azure, so you can safely ignore them. Another note: I may start a new 0.6dev branch and target this at that so we can merge it in sooner and work on it more easily without messing with the main branch. Let me know if you think that would be a good idea. |
I have now tested this, and I found out that I additionally needed to block this rescaling such that my points (which have coordinates slightly outside [0, 10]) stay fixed. |
Also, I would like to base my pinned initial embedding on the spectral embedding, and I suggest this change to get it. Update: The suggested change was merged into master 👍 |
Matthieu raised the rescaling issue with me elsewhere. It is a little tricky as the actual init does need to land in a reasonable spot, or the resulting embedding can go very badly. Leaving the rescaling in ensured that we had a sensible starting point. Otherwise there is the question of whether we leave it to the user -- it is not hard to accidentally provide a bad initialization that produces unexpected results and is hard to diagnose as to what is going wrong. I was hoping to avoid that if possible. Perhaps a reasonable option would be to come up with some semi-reasonable checks and warn if the provided initialization is troublesome? |
I also think that warning instead of rescaling is the way to go. Perhaps "one of the coordinate ranges is outside [8, 12] (i.e. more than 20 % off)" is a semi-reasonable condition? This assumes that the layout optimization is independent of absolute embedding location. |
A colleague working on an interactive visualization tool had an interesting scenario. A user could (say) inspect and then drag certain points to left/right and "pin" the x-axis of the dragged points while leaving other axes of the dragged points adaptable. Supporting also a 2-D nsamples x dim |
c35db5f
to
c7fe35e
Compare
No description provided.