Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug with ensembleGBdist("euclidean") #24

Open
sgbaird opened this issue Feb 21, 2022 · 4 comments
Open

Possible bug with ensembleGBdist("euclidean") #24

sgbaird opened this issue Feb 21, 2022 · 4 comments

Comments

@sgbaird
Copy link
Collaborator

sgbaird commented Feb 21, 2022

I’ve been using ‘ensembleGBdist’ for some calculations and I’m a bit confused by the results if I use the ‘euclidean’ option. Here are parity plots comparing the distances obtained for the 388 Olmsted GBs using different distance functions:

image image image

So the ensembleGBdist with the ‘omega’ argument seems to give the right distances (compared to the true GBdist4(‘omega’) function), whereas the unsymmetrized scaled Euclidean distance (2*pdist) gives the right values for small distances but has overestimates as expected, but ensembleGBdist with the ‘euclidean’ argument seems to give completely wrong results. It is this last comparison that I’m confused about. Why would ensembleGBdist(‘euclidean’) be so wrong compared to ensembleGBdist(‘omega’), especially when the unsymmetrized Euclidean distance (2*pdist) gives expected results?

@sgbaird
Copy link
Collaborator Author

sgbaird commented Feb 21, 2022

@oliverkjohnson can you include a reproducer for me? I.e. script for me to reproduce these results. I looked at the function, and nothing jumps out at me as obviously wrong/different between the "omega" and "euclidean" versions.

You can add code formatting with MATLAB syntax via (note I escaped the backticks so it wouldn't do the formatting):
```matlab
a = 1
print("")
```

which looks like this:

a = 1
print("")

@sgbaird sgbaird changed the title Possible bug with EnsembleGBdist("euclidean") Possible bug with ensembleGBdist("euclidean") Feb 21, 2022
@oliverkjohnson
Copy link
Collaborator

I was doing it with the Olmsted data, but here's the script with some random input data (results are similar):

% generate some random GBs
nGB = 1e2;
coords = get_ocubo(nGB); % norm sqrt(2)
coords = get_octpairs(coords); %symmetrize (using default reference GBO)
coords = normr(coords); % unit norm

% K=50 ensemble using 'omega' distance
pdO = ensembleGBdist(sqrt(2)*coords,sqrt(2)*coords,50,'omega');

% K=50 ensemble using 'euclidean' distance
pdE = ensembleGBdist(sqrt(2)*coords,sqrt(2)*coords,50,'euclidean');

% unsymmetrized scaled euclidean distance
distanceScalingFactor = 2;
pdInputData = distanceScalingFactor*pdist(coords).'; % sub-diagonal lower-triangular values

% true symmetrized toby distance
k = reshape((1:(nGB.^2)).',nGB,nGB);
[i,j] = ind2sub(size(k),k(tril(true(size(k)),-1)));
pdTrue = GBdist4(sqrt(2)*coords(i,:),sqrt(2)*coords(j,:),32,'omega'); % should be norm sqrt(2)

% Plot
figure
plot(rad2deg(pdTrue),rad2deg(pdO),'k.'); 
axis equal tight; 
xlabel('GBdist4(''omega'') [$d_{\Omega}^{\circ}$]','interpreter','latex'); 
ylabel('ensembleGBdist($K=50$,''omega'') [$d_{\Omega}^{\circ}$]','interpreter','latex');

figure
plot(rad2deg(pdTrue),rad2deg(pdE),'k.'); 
axis equal tight; 
xlabel('GBdist4(''omega'') [$d_{\Omega}^{\circ}$]','interpreter','latex'); 
ylabel('ensembleGBdist($K=50$,''euclidean'') [$d_{\Omega}^{\circ}$]','interpreter','latex');

figure
plot(rad2deg(pdTrue),rad2deg(pdInputData),'k.'); 
axis equal tight; 
xlabel('GBdist4(''omega'') [$d_{\Omega}^{\circ}$]','interpreter','latex'); 
ylabel('2*pdist [$d_{\Omega}^{\circ}$]','interpreter','latex');

@oliverkjohnson
Copy link
Collaborator

I should also note that in order to get ensembleGBdist to return a pairwise distance vector instead of the full distance matrix, I commented out line 54 of ensembleGBdist (the call to squareform).

@sgbaird
Copy link
Collaborator Author

sgbaird commented Feb 22, 2022

@oliverkjohnson thanks for sharing that script. This really helped to get right to debugging. It turns out to have been a bug with how the function I wrote (normr(a-b)) interfaced with pdist as a custom distance metric. I fixed it by using the built-in (which is also the default) euclidean distance of pdist. A bit silly that I didn't do this originally. The only other two places I use ensembleGBdist.m is to display a max distance in plotting.m.

interp/code/plotting.m

Lines 955 to 970 in 9d979e8

%% largest minimum (symmetrized) distance between any 2 GBs
load('oct50000.mat','pts')
npts = 20000;
pts = pts(1:npts,:);
K = 20;
fpath = fullfile(datafolder,['pd' int2str(npts) '-K' int2str(K)]);
%%
pd = ensembleGBdist(pts,[],K, "squareform", true);
save(fpath,'pd','npts','K','-v7.3')
%%
load(fpath,'pd')
%%
mpd = 2*max(pd,[],'all');
disp(['max pd (degrees):' num2str(mpd)])
pd2 = ensembleGBdist(sqrt2norm(pts),[],1, "squareform", true);
mpd2 = 2*max(pd2,[],'all');

I don't appear to have used this number in the published manuscript. When I talk about the max dimension of a VFZ, it's not in an ensemble sense and comes from a different portion of the code.

Thank you for catching this! This was meant to be a convenience function, but it's not very convenient if it spits out garbage with the default distance metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants