-
-
Notifications
You must be signed in to change notification settings - Fork 872
GSoC 2025 ‐ Shabareesh Shetty
I’m Shabareesh Shetty, an undergraduate at the NMAMIT, Nitte, Udupi. I am from Mangalore, Karnataka, India. I started exploring scientific computation because of my educational background, Robotics and Artificial Intelligence. My interest in robot kinematics and machine learning helped me diving deep into scientific computing. Additionally, I like exploring applications of Artificial Intelligence in different sectors.
In linear algebra, BLAS routines are predominantly employed to perform operations on matrices and vectors. BLAS are commonly utilized in the development of advanced linear algebra software owing to their efficiency, portability, and extensive accessibility. The major goals associated with the project are:
Add JavaScript implementation. Add C implementation. Re-implementing BLAS subroutines using the existing lapack-netlib Fortran implementation to free-form the current version of Fortran-95. Write node.js bindings to allow calling BLAS routines in compiled C/ Fortran from JavaScript. I tried to implement the same approach as I mentioned in the proposal but diverged slightly as I understood the requirement of the project more deeply. Start with the Level 2 routines. In each level, the priority will be first real, followed by complex datatypes. For real data types, the first priority was real double precision, followed by real single precision, and similarly for complex double and single-precision. I proposed to implement both single and double precision simultaneously but with right guidance and understanding, I understood that I should implement any one of the precision first, after getting the approval for that, I can move to the next one.
During the community bonding period, I began coding by focusing on the BLAS Level 2 routines, which involve matrix-vector operations, as outlined in tracking issue #2039. I initially started with C implementation of the Level 2 packages because I started gaining grip on the C implementations in the pre GSoC period and also because they had a higher priority. Parallelly, I tried to implement level 1 complex routines and it went better than I expected. Implementing Fortran was initially challenging, but when I started reading the existing code as well as with some practice it seemed to be bit easier than before. Also I have made contributions for the JavaScript implementation which I had a good grip, but those were less prioritized as some of them required R&D, as they were new types of packaged especially banded matrices.
Native Implementation Level 1 Signature:
sswap( N, x, strideX, y, strideY )
Ndarray Implementation Level 1 Signature:
sswap( N, x, strideX, offsetX, y, strideY, offsetY )
This additional offset parameter in the ndarray implementation gives the user the freedom to select their starting index for the operation along with the stride.
There were only 2 packages remaining to be implemented in level 1 real routines and I tried cover those and they are in the reviewing stage.
In level 2 routines, we have an additional parameter which is order
. The existing reference LAPACK implementation is Fortran-based and hence follows a column-major layout by default. However, in JavaScript, we can provide the user with the freedom to choose whether they want to pass the matrix in a row-major or column-major order. This flexibility is important since matrices are represented as arrays in JavaScript, ensuring contiguous memory allocation. However, the key innovation comes after this. It became a bit easy for me as there were existing similar implementations in the repository, but things became challenging when moving to banded matrices.
This is a small example:
A = [ 1, 2, 3 ]
[ 4, 5, 6 ] (2X3)
A = [ 1, 2, 3, 4, 5, 6 ] (row-major)
A = [ 1, 4, 2, 5, 3, 6 ] (column-major)
Native Implementation Level 2 Signature:
sgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )
sgemv( order, trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )
Similar to the ndarray implementation for the Level 1 routine, we have an offset parameter associated with the matrix and the vector. Additionally, there are use cases where we need to perform operations on a specific submatrix within a larger global matrix. To accommodate this, our ndarray implementation includes the following parameters:
- sa1: stride along the first dimension of matrix A.
- sa2: stride along the second dimension of matrix A. These parameters give users full flexibility to utilize the BLAS implementation as needed, even allowing for negative stride values depending on the use case. At each stage of implementation, the idea was to reduce code duplication and maintain cache optimality.
Ndarray implementation signature:
sgemv( trans, M, N, alpha, A, strideA1, strideA2, offsetA, x, strideX, offsetX, beta, y, strideY, offsetY )
A short example to understand how stride parameters for matrix A operate:
A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0 ],
[ 999.0, 4.0, 999.0, 5.0, 999.0, 6.0 ],
[ 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ],
A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0, 999.0, 4.0, 999.0, 5.0, 999.0, 6.0, 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ] (row-major)
Now let's say our required sub-matrix for operation is:
A = [ 1, 2, 3 ],
[ 4, 5, 6 ],
[ 7, 8, 9 ]
Hence, the stride parameters along with offset give flexibility here.
sa1 = 6
sa2 = 2
offsetA = 1
With the initial offset, we reach the position of 1 in the array representation, and using the sa1 (1-->6) and sa2 (1-->2) we perform the required operation seamlessly.
Once the implementation was ready for a package, the next step was adding test suites. However, most existing BLAS implementation test suites were neither comprehensive nor extensive. But again there were reference implementations for me done by previous contributors and that helped me to create comprehensive test suites. Also I have ensured 100% test coverage. Also, side by side, I tried to complete 100% test coverage for the existing packages. Also, given our unique ndarray implementation covering all edge cases, particularly validating how the custom strides, offsets, and other parameters interacted was important. Thus, each of the possible combinations for the variation of the parameters involved was checked, and now we can claim that we have a robust test suite available for our routines. Also, the test coverage report script command helped me a lot.
Test coverage report command:
$ make test-cov TESTS_FILTER=".*/math/base/special/floorn/.*"
The one trick I used to use to generate test suited is writing my own custom C code using cblas
and give my required inputs as parameters.
This is the example custom code for:
/**
* Example usage of CBLAS zgemv
*
* Performs y := alpha*A*x + beta*y
*/
#include <stdio.h>
#include <complex.h>
#include <cblas.h>
int main() {
// Matrix dimensions
int m = 2; // rows
int n = 2; // cols
// Scalars
double complex alpha = 1.0 + 0.0*I;
double complex beta = 0.0 + 0.0*I;
// Matrix A (row-major order)
double complex A[] = {
1.0 + 1.0*I, 2.0 + 2.0*I, 3.0 + 3.0*I,
4.0 + 4.0*I
};
// Vector x (length n)
double complex x[] = { 1.0 + 2.0*I, 3.0 + 4.0*I };
// Vector y (length m)
double complex y[] = { 0.0 + 0.0*I, 0.0 + 0.0*I };
// Perform y := alpha*A*x + beta*y
// Using Row-major, No transpose
cblas_zgemv(
CblasRowMajor, // Row-major storage
CblasNoTrans, // No transpose
m, n, // Dimensions of A
&alpha, A, n, // alpha, A, lda = n
x, 1, // x, incx
&beta, y, 1 // beta, y, incy
);
// Print result
printf("Result y = [%.2f%+.2fi, %.2f%+.2fi]\n",
creal(y[0]), cimag(y[0]),
creal(y[1]), cimag(y[1]));
return 0;
}
The same follows for the other Level 2 routines as mentioned above the operation might look similar but the matrix representation varied along packages.
There exist sets such as:
- SGEMV - where A is an MXN matrix.
- SSYMV - where A is an NXN matrix.
- STPMV - where A is an NXN matrix supplied as a packed form
- STBMV - where A is an NXN band matrix
After getting a few of the real single and double-precision routines over the finish line, I moved on to implementing a Level 2 complex routine: (zher). This was one of the most challenging part for me as there were no reference implementations and that was the reason I wanted to implement these, so that I can create reference implementations for the future. I also tried to implement (zher2) which is another complex Level 2 routine.
Ndarray implementation signature:
zher( uplo, N, α, x, strideX, offsetX, A, sa1, sa2, offsetA )
After doing some work here, I moved to level 3 real routines, in the beginning it felt quite difficult, but thanks to the reference package dgemm
as well as guidance from mentors, as time went it became easier and I implemented for all level 3 real routines as per the tracking issue #2039 and they are in the reviewing stages. The challenging part was building test suites but I tried figure out those with my custom code as well as mentor guidance.
Ndarray implementation signature:
dsymm( side, uplo, M, N, α, A, strideA1, strideA2, offsetA, B, strideB1, strideB2, offsetB, β, C, strideC1, strideC2, offsetC )
Also during this period I also implemented some generic packages too which was out of the scope of the tracking issue #2039 but it is really important for us, in the process of building higher level APIs. I moved to implement this as my mentors suggested me and after implementing some, I understood the importance of it.
Ndarray implementation signature:
gger.ndarray( M, N, α, x, strideX, offsetX, y, strideY, offsetY, A, strideA1, strideA2, offsetA )
In the meantime, I also made some mistakes, got the roadmap from the mentors, took a step back, changed my implementation techniques and I again come back stronger. One of the mistake was I used to implement routines in a rush without getting reviewed, with guidance and understanding, I slowed down a bit and started doing taking some steps to reduce error and get better implementation.
This is what my GSoC workflow looks like. Each and every package I implemented has different stories behind. As well as many of them has various supports from the mentors. The thing I liked the most in this workflow is the shift in the thinking of my mind from finding problems and thinking the solution to actually implementing it and testing it out.
Merged PRs:
- 7013: feat: add c implementation for blas/base/dgemv
- 6984: feat: add c implementation for blas/base/sgemv
The above two routines perform one of the matrix-vector operations y = α*A*x + β*y
or y = α*A^T*x + β*y
, where α
and β
are scalars, x
and y
are vectors, and A
is an M
by N
matrix. The reference LAPACK implementation contains parameters:
dgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY ) (double-precision)
sgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY ) (single-precision)
Native C implementation:
c_dgemv( layout, trans, M, N, alpha, \*A, LDA, \*X, strideX, beta, \*Y, strideY )
c_sgemv( layout, trans, M, N, alpha, \*A, LDA, \*X, strideX, beta, \*Y, strideY )
Ndarray C implementation:
c_dgemv_ndarray( trans, M, N, alpha, \*A, sa1, sa2, oa, \*X, sx, ox, beta, \*Y, sy, oy )
c_sgemv_ndarray( trans, M, N, alpha, \*A, sa1, sa2, oa, \*X, sx, ox, beta, \*Y, sy, oy )
- 6566: feat: add c implementation for blas/base/dsyr
- 7127: feat: add c implementation for blas/base/ssyr
The above two routines perform the symmetric rank 1 operation A = α*x*x^T + A
where α
is a scalar, x
is an N
element vector, and A
is an N
by N
symmetric matrix. The reference LAPACK implementation contains parameters:
dsyr( uplo, N, alpha, x, strideX, a, lda ) (double-precision)
ssyr( uplo, N, alpha, x, strideX, a, lda ) (single-precision)
Native C implementation:
c_dsyr( layout, uplo, N, alpha, \*X, sx, \*A, LDA )
c_ssyr( layout, uplo, N, alpha, \*X, sx, \*A, LDA )
Ndarray C implementation:
c_dsyr_ndarray( uplo, N, alpha, \*X, sx, ox, \*A, sa1, sa2, oa )
c_ssyr_ndarray( uplo, N, alpha, \*X, sx, ox, \*A, sa1, sa2, oa )
The above routine perform the rank 1 operation A = α*x*y^T + A
, where α
is a scalar, x
is an M
element vector, y
is an N
element vector, and A
is an M
by N
matrix.
Native JavaScript implementation:
gaxpy( N, alpha, x, strideX, y, strideY )
Ndarray JavaScript implementation:
gaxpy.ndarray( N, alpha, x, strideX, offsetX, y, strideY, offsetY )
The intention of this implementation is to add accessor support for the generic package as we do in other generic packages. I wanted to maintain a clean structure of the packages before moving on to a new one.
The above routine find the index of the first element having the maximum absolute value.
Native JavaScript implementation:
igamax( N, x, strideX )
Ndarray JavaScript implementation:
igamax.ndarray( N, x, strideX, offsetX )
The above routine scale a double-precision complex floating-point vector by a double-precision complex floating-point constant and add the result to a double-precision complex floating-point vector. The reference LAPACK implementation contains parameters:
zaxpy( N, alpha, x, strideX, y, strideY ) (double-precision)
Native C implementation:
c_zaxpy( N, alpha, \*X, strideX, \*Y, strideY )
Ndarray C implementation:
c_zaxpy_ndarray( N, alpha, \*X, strideX, offsetX, \*Y, strideY, offsetY )
The above routine perform the rank 1 operation A = α*x*y^T + A
, where α
is a scalar, x
is an M
element vector, y
is an N
element vector, and A
is an M
by N
matrix.
Native JavaScript implementation:
gger( order, M, N, α, x, sx, y, sy, A, lda )
Ndarray JavaScript implementation:
gger.ndarray( M, N, α, x, sx, ox, y, sy, oy, A, sa1, sa2, oa )
The above two routines perform one of the matrix-vector operations y = α*A*x + β*y
or y = α*A^T*x + β*y
, where α
and β
are scalars, x
and y
are vectors, and A
is an M
by N
matrix.
Native JavaScript implementation:
ggemv( order, trans, M, N, α, A, LDA, x, sx, β, y, sy )
Ndarray JavaScript implementation:
ggemv.ndarray( trans, M, N, α, A, sa1, sa2, oa, x, sx, ox, β, y, sy, oy )
The above two routines perform the symmetric rank 1 operation A = α*x*x^T + A
where α
is a scalar, x
is an N
element vector, and A
is an N
by N
symmetric matrix.
Native JavaScript implementation:
gsyr( order, uplo, N, α, x, sx, A, LDA )
Ndarray JavaScript implementation:
gsyr.ndarray( uplo, N, α, x, sx, ox, A, sa1, sa2, oa )
The above two routines perform the matrix-matrix operation C = α*op(A)*op(B) + β*C
where op(X)
is either op(X) = X
or op(X) = X^T
, α
and β
are scalars, A
, B
, and C
are matrices, with op(A)
an M
by K
matrix, op(B)
a K
by N
matrix, and C
an M
by N
matrix.
Native JavaScript implementation:
ggemm( ord, ta, tb, M, N, K, α, A, lda, B, ldb, β, C, ldc )
Ndarray JavaScript implementation:
ggemm.ndarray( ta, tb, M, N, K, α, A, sa1, sa2, oa, B, sb1, sb2, ob, β, C, sc1, sc2, oc )
Out of the scope of BLAS
, I have made a contribution in other repository too.
- [7301: feat: refactor and add native addon for complex/float64/conj(https://github.com/stdlib-js/stdlib/pull/7301)
The above package return the [complex conjugate][complex-conjugate] of a double-precision complex floating-point number.
JavaScript implementation:
conj( z )
This contribution is done to add native adding as it was missing in this package.
Apart from these implementations I also carried out some refactoring, cleanups, adding test cases and so on. These were not that critical to implement but were required to maintain a clean BLAS
repository.
This is the list of things I did:
- 7124: test: add test cases for blas/base/dgemv
- 7125: test: add test cases for blas/base/sgemv
- 7128: test: add test cases for blas/base/dsyr
- 7131: test: add test cases for blas/base/ssyr
- 7132: test: add test cases for blas/base/dsyr2
- 7133: test: add test cases for blas/base/ssyr2
- 7163: test: add test cases for blas/base/dtrmv
- 7164: test: add test cases for blas/base/strmv
- 7165: test: add test cases for blas/base/dtrsv
- 7166: test: add test cases for blas/base/strsv
- 7168: docs: change package naming and examples for complex/float32/base/sub
- 7169: docs: change package naming and examples for complex/float32/base/add
- 7176: docs: change variable naming for blas/base/scnrm2
- 7180: docs: change examples for complex/float64/base/div
- 7190: test: change structure of fixtures for blas/base/dtrmv
- 7191: docs: change examples for complex/float64/base/mul-add
- 7192: docs: change examples for complex/float64/base/sub
- 7193: docs: change variable naming for blas/base/scasum
- 7196: docs: update examples for complex/float64/base/scale
- 7207: refactor: rename variable in blas/base/saxpy
- 7208: refactor: rename variable in blas/base/sdsdot
- 7209: refactor: rename variable in blas/base/srotg
- 7210: refactor: rename variable in blas/base/sdot
- 7212: refactor: rename variable in blas/base/sasumpw
- 7213: refactor: rename variable in blas/base/scusumkbn
- 7248: refactor: rename variable in blas/base/snansumkbn
- 7249: refactor: rename variable in blas/base/snansumkbn2
- 7250: refactor: rename variable in blas/base/
- 7252: test: change structure of fixtures for blas/base/strmv
- 7253: docs: update examples for complex/float64/base/add
- 7264: docs: update examples for complex/float64/base/identity
- 7265: docs: update examples for complex/float64/base/mul
- 7266: docs: update examples for complex/float64/base/div
- 7267: docs: update examples for complex/float64/base/mul-add
- 7270: docs: update examples for complex/float64/base/neg
- 7271: docs: update examples for complex/float64/base/sub
- 7288: docs: update examples for complex/float32/base/add
- 7289: docs: update examples for complex/float32/base/identity
- 7290: docs: update examples for complex/float32/base/mul
- 7291: docs: update examples for complex/float32/base/neg
- 7292: docs: update examples for complex/float32/base/scale
- 7293: docs: update examples for complex/float32/base/sub
- 7294: docs: update examples for complex/float32/conj
- 7295: docs: update examples for complex/float64/conj
- 7399: docs: update jsdoc for blas/base/csscal
- 7401: fix: use appropriate variable in error message
- 7449: test: update test expected values
- 7536: docs: fix data type
- 7557: test: add appropriate values in test cases
- 7631: docs: use appropriate data type in string interpolation
- 7689: refactor: update include header guards for blas/base/dsyr
- 7691: docs: fix method invocation in example
- 7708: refactor: update include header guards for single precision complex package
- 7709: refactor: update include header guards for double precision complex package
- 7710: refactor: update include header guards for single precision real package
- 7711: refactor: update include header guards for double precision real package
- 7732: bench: update parameters for ndarray benchmarking
- 7829: docs: fix function alias in example
- 7895: test: add user reference matrix
The list above consists of PRs that are merged after the commencement of the GSoC period. There are a few routines which I worked and were merged before this period such as that for csscal, etc.
Open PRs:
- 6847: feat: add blas/base/zdotc
- 6862: feat: add blas/base/dtbmv
- 7016: feat: add c implementation for blas/base/dtrmv
- 7086: feat: add c and fortran implementation for blas/base/zdscal
- 7119: feat: add blas/base/zher
- 7157: feat: add c implementation for blas/base/dtrsv
- 7257: feat: add blas/base/zher2
- 7284: feat: add blas/base/grot
- 7359: feat: add blas/base/dtrsm
- 7366: feat: add blas/base/dtrmm
- 7382: feat: add blas/base/dsymm
- 7412: feat: add blas/base/dsyrk
- 7437: feat: add blas/base/dsyr2k
Also I have some draft PRs left, these PRs are left as draft because they depend on the approval of other PRs and also to make reviewing process easier.
We have progressed a lot in the BLAS. Currently, Level 1 real packages are almost implemented. Coming to the level 1 complex packages we have almost implemented that too, there are 2 packages that need some work to be done. Similarly to the level 2 real packages, there are implementations for them too. While coming to the level 2 complex packages there we have progressed slightly with implementation for 4 packages zher
, zher2
, cher
and cher2
. While we come to the level real 3 packages, we have implementation for all the packages, but we have not made progress in level 3 complex packages but we are in progress of that by implementing zgemm
.
Note: Open PRs, Draft PRs that are on the finish line as well as merged PRs are included under the implementation.
Though there was good progress in BLAS still we have works remaining to do on BLAS.
Level 1 single precision complex packages:
- Implementation of
crotg
- C/FORTRAN implementation of
csscal
Level 1 double precision complex packages:
- Implementation of
zrotg
Level 2 single and double precision complex packages:
- As I have mentioned earlier only 2/17 packages are implemented currently
Level 3 single and double precision complex packages:
- All the 9 packages are remaining to be implemented
Note: 1. I have not mentioned the WebAssembly implementations as there are bunch of implementations remaining.
- I have also not mentioned implementations that have blockers because we need to clear them first.
Each new implementation had different challenges and lessons, but when I see my works during my initial days, the lessons that I have learnt has changed me a lot.
Some of the major coding lessons that I have learnt:
- Double checking of the codes
- Commit messages need to be comprehensive
Apart from these I have learnt many best practices, techniques as well as a bunch of technical and mathematical knowledge.
There were times when I was stuck in problems for 3-4 days and then my mentors would help me out.
Some lessons turned into habits for me. Now I spend times reading documentations, codebases and figure out things.
This was a transformation in my mindset!
Each and every step in this journey has amazing stories behind it! The lessons I have learnt, the guidance I have got, the difficulties I have faced, the connections I have built, all together bought me a long way ahead.
I really want to thank Athan for the constant support and guidance till the end of the program. Learnt many lessons from him and they gave me amazing results not only in coding but also in my day to day life.
Also thanks to Aman for his insights and sharing his experiences on BLAS and he too helped me a lot during this overall journey.
Thanks to all the GSoC 2025 contributors for being joyful and interactive. All of them were very talented and this motivated me to work hard to cope up with them.
stdlib gave me a lot more than what I expected, increased my capabilities and gave me a broad spectrum of knowledge to me. I look forward to still be a part of this amazing organization!