Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcp ref implementation #2895

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

xcp ref implementation #2895

wants to merge 2 commits into from

Conversation

DhanusML
Copy link
Contributor

Notation

Data is a matrix $X\in\mathbb{R}^{p\times n}$. Each column is a $p$-dimensional vector sampled independently. The matrix $X$ is assumed to be stored in column-major fashion.

xcp

Matrix of cross products is a $p\times p$ matrix whose $(i,j)$th entry is given by
$$C_{ij} = \sum_{k=1}^n(x_{ik}-\mu_i)(x_{jk}-\mu_j).$$
Here $x_{ik}$ is the $i$th component of the $k$th random vector.

The implementation requires this to be computed in batches of $X$, where the raw sum is passed between the batch computations. This can be done using the following:

Let $\mu'$ be the mean of the data until the previous batch and let $\mu$ be the mean including the data from current batch. Let $n'$ be the total number of data points until the previous batch and $S'$ be their sum (i.e., $\mu' = S'/n'$ and $\mu = S/n$).

Then the $(i,j)$ entry of $C$ can be computed using
$$C_{ij}\leftarrow C_{ij}+(\mu_j'-\mu_j)S_i' + (\mu_i'-\mu_i)S_j' + (\mu_i\mu_j-\mu_i'\mu_j')n' + \sum_{k=n'+1}^n(x_{ik}-\mu_i)(x_{jk}-\mu_j).$$

This can be simplified to
$$C_{ij} \leftarrow C_{ij} + \frac{S_i' S_j'}{n'} - \frac{S_i S_j}{n} + \sum_{k=n'+1}^n x_{ik}x_{jk}$$
or
$$C \leftarrow C + \frac{S'S'^t}{n'} - \frac{SS^t}{n} + XX^t$$

The level3 BLAS routine GEMM is used to compute the matrix products.

This routine computes the matrix of cross product
of data stored in column major format, in batches.

For matrix X of dimensions p x n, the i,j th entry
of the cross product matrix is

C_ij = \sum_k (x_ik-\mu_i) (x_jk-\mu_k)

where x_ij is the jth element of the ith row, of the matrix X.

Implementation uses the BLAS routine GEMM.

Signed-off-by: Dhanus M Lal <[email protected]>
double alpha;
if (accWtOld != 0)
{
double * sumOld = daal::services::internal::service_malloc<double, cpu>(nFeatures, sizeof(double));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double * sumOld = daal::services::internal::service_malloc<double, cpu>(nFeatures, sizeof(double));
double* const sumOld = daal::services::internal::service_malloc<double, cpu>(nFeatures, sizeof(double));

sumOld[i] = sum[i];
}
// S_old S_old^t/accWtOld
alpha = 1.0 / accWtOld;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any checks for overflow?

Comment on lines +223 to +225
transb = 'T';
alpha = 1.0;
beta = 1.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend not to use the same variables. It's confusing and also can be dangerous if someone will forgot to change them.

Comment on lines +223 to +227
transb = 'T';
alpha = 1.0;
beta = 1.0;
blasInst.xgemm(&transa, &transb, &nFeatures, &nFeatures, &nVectors, &alpha, data, &nFeatures, data, &nFeatures, &beta, crossProduct,
&nFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
transb = 'T';
alpha = 1.0;
beta = 1.0;
blasInst.xgemm(&transa, &transb, &nFeatures, &nFeatures, &nVectors, &alpha, data, &nFeatures, data, &nFeatures, &beta, crossProduct,
&nFeatures);
{
constexpr char transb = 'T';
constexpr double alpha = 1.0;
constexpr double beta = 1.0;
blasInst.xgemm(&transa, &transb, &nFeatures, &nFeatures, &nVectors, &alpha, data, &nFeatures, data, &nFeatures, &beta, crossProduct,
&nFeatures);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants