Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the MATLAB style correlation function in genes.py #27

Open
YCHuang0610 opened this issue Jun 21, 2024 · 2 comments
Open

Comments

@YCHuang0610
Copy link

https://github.com/alegiac95/Imaging-transcriptomics/blob/46adb1df85c37123c77226d023b6d763edae8aca/imaging_transcriptomics/genes.py#L116C1-L125C36

Hi,
I am wondering if there is some thing wrong with the correlation here.

        def correlate(c1, c2):
            """Return the MATLAB style correlation between two vectors."""
            return np.corrcoef(np.hstack((c1, c2)), rowvar=False)[0, 1:]

        _res = pls_regression(gene_exp, imaging_data.reshape(
            imaging_data.shape[0], 1),
                              n_components=self.n_components,
                              n_boot=0, n_perm=0)
        r1 = correlate(_res.get("x_scores"), scan_data.reshape(
            scan_data.shape[0], 1))

In this code: "r1 = correlate(_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))". The _res.get("x_scores") is c1 which is the components from the PLS results, and the scan_data.reshape(scan_data.shape[0], 1)) is c2 which is a imaging data column. And then combined the component columns and the single imaging data column into a combined matrix using np.hstack.

So the last column in this combined matrix here is imaging data and the rest columns are components. And then using np.corrcoef with the rowvar=False we can get a correlation matrix with the element [m, n] represents the correlation coefficient between the column m and column n.

As far as I understand, the r1 represents to the correlation coefficients between the imaging data and different PLS components. However, here in the combined matrix, the imaging data is the last column. And when the code slicing the correlation matrix with the [0, :], it returns the correlation coefficients between the first component (the first column from c1) and the rest of the components as well as the imaging data.

For example, if there are 2 components, the r1 here is [coeffience between component 1 and 2, coeffience between component 1 and imaging data ].

Is that correct?

@alegiac95
Copy link
Owner

Hi, you are correct in how you interpret the results, however in the function i return only the elements from 1, so the correlation between component 1 and 2 is not returned

@YCHuang0610
Copy link
Author

YCHuang0610 commented Jul 4, 2024

Hi, you are correct in how you interpret the results, however in the function i return only the elements from 1, so the correlation between component 1 and 2 is not returned

Thanks for you reply.

However, in the code:

np.corrcoef(np.hstack((c1, c2)), rowvar=False)

which is actually equivalent to:

np.corrcoef(np.hstack((_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))), rowvar=False)

Since scan_data.reshape(scan_data.shape[0], 1) is passed as the last element in line 124:

r1 = correlate(_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))

Let's assume that _res.get("x_scores") has two columns for two components PC_1 and PC_2.
Then the combined matrix of np.hstack((_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))) will be

$$ \begin{bmatrix} PC_1 & PC_2 & Scan \end{bmatrix} $$

So the correlation matrix returned from np.corrcoef(np.hstack((c1, c2)), rowvar=False) will be (to simplify, I use | to implicate the correlation between two elements):

$$ \begin{bmatrix} PC_1 | PC_1 & PC_1 | PC_2 & PC_1 | Scan \\\ PC_2 | PC_1 & PC_2 | PC_2 & PC_2 | Scan \\\ Scan | PC_1 & Scan | PC_2 & Scan | Scan \end{bmatrix} $$

And finally, when the function returns, slice the matrix using [0, 1:] to extract the elements in the first row after the second column.
Which returns:

$$ r_1 = \begin{bmatrix} PC_1 | PC_2 & PC_1 | Scan \end{bmatrix} $$

I am wondering whether the positions of _res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1) should be reversed to let scan data to be the first element. Like this:

r1 = correlate(scan_data.reshape(scan_data.shape[0], 1), _res.get("x_scores"))

====================================================
Here are some of the example:
191aab0b659f9159fefe246c7a178b3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants