Skip to content

GSoC 2025 ‐ Shabareesh Shetty

Shabareesh Shetty edited this page Aug 30, 2025 · 18 revisions

About me

I’m Shabareesh Shetty, an undergraduate at the NMAMIT, Nitte, Udupi. I am from Mangalore, Karnataka, India. I started exploring scientific computation because of my educational background, Robotics and Artificial Intelligence. My interest in robot kinematics and machine learning helped me diving deep into scientific computing. Additionally, I like exploring applications of Artificial Intelligence in different sectors.

Project overview

In linear algebra, BLAS routines are predominantly employed to perform operations on matrices and vectors. BLAS are commonly utilized in the development of advanced linear algebra software owing to their efficiency, portability, and extensive accessibility. The major goals associated with the project are:

Add JavaScript implementation. Add C implementation. Re-implementing BLAS subroutines using the existing lapack-netlib Fortran implementation to free-form the current version of Fortran-95. Write node.js bindings to allow calling BLAS routines in compiled C/ Fortran from JavaScript. I tried to implement the same approach as I mentioned in the proposal but diverged slightly as I understood the requirement of the project more deeply. Start with the Level 2 routines. In each level, the priority will be first real, followed by complex datatypes. For real data types, the first priority was real double precision, followed by real single precision, and similarly for complex double and single-precision. I proposed to implement both single and double precision simultaneously but with right guidance and understanding, I understood that I should implement any one of the precision first, after getting the approval for that, I can move to the next one.

Project recap

During the community bonding period, I began coding by focusing on the BLAS Level 2 routines, which involve matrix-vector operations, as outlined in tracking issue #2039. I initially started with C implementation of the Level 2 packages because I started gaining grip on the C implementations in the pre GSoC period and also because they had a higher priority. Parallelly, I tried to implement level 1 complex routines and it went better than I expected. Implementing Fortran was initially challenging, but when I started reading the existing code as well as with some practice it seemed to be bit easier than before. Also I have made contributions for the JavaScript implementation which I had a good grip, but those were less prioritized as some of them required R&D, as they were new types of packaged especially banded matrices.

Native Implementation Level 1 Signature:

sswap( N, x, strideX, y, strideY )

Ndarray Implementation Level 1 Signature:

sswap( N, x, strideX, offsetX, y, strideY, offsetY )

This additional offset parameter in the ndarray implementation gives the user the freedom to select their starting index for the operation along with the stride.

There were only 2 packages remaining to be implemented in level 1 real routines and I tried cover those and they are in the reviewing stage.

In level 2 routines, we have an additional parameter which is order. The existing reference LAPACK implementation is Fortran-based and hence follows a column-major layout by default. However, in JavaScript, we can provide the user with the freedom to choose whether they want to pass the matrix in a row-major or column-major order. This flexibility is important since matrices are represented as arrays in JavaScript, ensuring contiguous memory allocation. However, the key innovation comes after this. It became a bit easy for me as there were existing similar implementations in the repository, but things became challenging when moving to banded matrices.

This is a small example:

A = [ 1, 2, 3 ]
    [ 4, 5, 6 ] (2X3)
A = [ 1, 2, 3, 4, 5, 6 ] (row-major)
A = [ 1, 4, 2, 5, 3, 6 ] (column-major)

Native Implementation Level 2 Signature:

sgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )
sgemv( order, trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )

Similar to the ndarray implementation for the Level 1 routine, we have an offset parameter associated with the matrix and the vector. Additionally, there are use cases where we need to perform operations on a specific submatrix within a larger global matrix. To accommodate this, our ndarray implementation includes the following parameters:

  • sa1: stride along the first dimension of matrix A.
  • sa2: stride along the second dimension of matrix A. These parameters give users full flexibility to utilize the BLAS implementation as needed, even allowing for negative stride values depending on the use case. At each stage of implementation, the idea was to reduce code duplication and maintain cache optimality.

Ndarray implementation signature:

sgemv( trans, M, N, alpha, A, strideA1, strideA2, offsetA, x, strideX, offsetX, beta, y, strideY, offsetY )

A short example to understand how stride parameters for matrix A operate:

A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0 ], 
    [ 999.0, 4.0, 999.0, 5.0, 999.0, 6.0 ], 
    [ 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ],

A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0, 999.0, 4.0, 999.0, 5.0, 999.0, 6.0, 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ] (row-major)

Now let's say our required sub-matrix for operation is:

A = [ 1, 2, 3 ],
    [ 4, 5, 6 ],
    [ 7, 8, 9 ]

Hence, the stride parameters along with offset give flexibility here.

sa1 = 6
sa2 = 2
offsetA = 1

With the initial offset, we reach the position of 1 in the array representation, and using the sa1 (1-->6) and sa2 (1-->2) we perform the required operation seamlessly.

Once the implementation was ready for a package, the next step was adding test suites. However, most existing BLAS implementation test suites were neither comprehensive nor extensive. But again there were reference implementations for me done by previous contributors and that helped me to create comprehensive test suites. Also I have ensured 100% test coverage. Also, side by side, I tried to complete 100% test coverage for the existing packages. Also, given our unique ndarray implementation covering all edge cases, particularly validating how the custom strides, offsets, and other parameters interacted was important. Thus, each of the possible combinations for the variation of the parameters involved was checked, and now we can claim that we have a robust test suite available for our routines. Also, the test coverage report script command helped me a lot.

Test coverage report command:

$ make test-cov TESTS_FILTER=".*/math/base/special/floorn/.*"

The one trick I used to use to generate test suited is writing my own custom C code using cblas and give my required inputs as parameters.

This is the example custom code for:

/**
* Example usage of CBLAS zgemv
*
* Performs y := alpha*A*x + beta*y
*/

#include <stdio.h>
#include <complex.h>
#include <cblas.h>

int main() {
    // Matrix dimensions
    int m = 2;  // rows
    int n = 2;  // cols

    // Scalars
    double complex alpha = 1.0 + 0.0*I;
    double complex beta  = 0.0 + 0.0*I;

    // Matrix A (row-major order)
    double complex A[] = {
        1.0 + 1.0*I,  2.0 + 2.0*I,  3.0 + 3.0*I,
        4.0 + 4.0*I
    };

    // Vector x (length n)
    double complex x[] = { 1.0 + 2.0*I, 3.0 + 4.0*I };

    // Vector y (length m)
    double complex y[] = { 0.0 + 0.0*I, 0.0 + 0.0*I };

    // Perform y := alpha*A*x + beta*y
    // Using Row-major, No transpose
    cblas_zgemv(
        CblasRowMajor,     // Row-major storage
        CblasNoTrans,      // No transpose
        m, n,              // Dimensions of A
        &alpha, A, n,      // alpha, A, lda = n
        x, 1,              // x, incx
        &beta, y, 1        // beta, y, incy
    );

    // Print result
    printf("Result y = [%.2f%+.2fi, %.2f%+.2fi]\n",
           creal(y[0]), cimag(y[0]),
           creal(y[1]), cimag(y[1]));

    return 0;
}

The same follows for the other Level 2 routines as mentioned above the operation might look similar but the matrix representation varied along packages.

There exist sets such as:

  • SGEMV - where A is an MXN matrix.
  • SSYMV - where A is an NXN matrix.
  • STPMV - where A is an NXN matrix supplied as a packed form
  • STBMV - where A is an NXN band matrix

After getting a few of the real single and double-precision routines over the finish line, I moved on to implementing a Level 2 complex routine: (zher). This was one of the most challenging part for me as there were no reference implementations and that was the reason I wanted to implement these, so that I can create reference implementations for the future. I also tried to implement (zher2) which is another complex Level 2 routine.

Ndarray implementation signature:

zher( uplo, N, α, x, strideX, offsetX, A, sa1, sa2, offsetA )

After doing some work here, I moved to level 3 real routines, in the beginning it felt quite difficult, but thanks to the reference package dgemm as well as guidance from mentors, as time went it became easier and I implemented for all level 3 real routines as per the tracking issue #2039 and they are in the reviewing stages. The challenging part was building test suites but I tried figure out those with my custom code as well as mentor guidance.

Ndarray implementation signature:

dsymm( side, uplo, M, N, α, A, strideA1, strideA2, offsetA, B, strideB1, strideB2, offsetB, β, C, strideC1, strideC2, offsetC )

Also during this period I also implemented some generic packages too which was out of the scope of the tracking issue #2039 but it is really important for us, in the process of building higher level APIs. I moved to implement this as my mentors suggested me and after implementing some, I understood the importance of it.

Ndarray implementation signature:

gger.ndarray( M, N, α, x, strideX, offsetX, y, strideY, offsetY, A, strideA1, strideA2, offsetA )

In the meantime, I also made some mistakes, got the roadmap from the mentors, took a step back, changed my implementation techniques and I again come back stronger. One of the mistake was I used to implement routines in a rush without getting reviewed, with guidance and understanding, I slowed down a bit and started doing taking some steps to reduce error and get better implementation.

This is what my GSoC workflow looks like. Each and every package I implemented has different stories behind. As well as many of them has various supports from the mentors. The thing I liked the most in this workflow is the shift in the thinking of my mind from finding problems and thinking the solution to actually implementing it and testing it out.

Completed work

Merged PRs:

The above two routines perform one of the matrix-vector operations y = α*A*x + β*y or y = α*A^T*x + β*y, where α and β are scalars, x and y are vectors, and A is an M by N matrix. The reference LAPACK implementation contains parameters:

dgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY ) (double-precision)
sgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY ) (single-precision)

Native C implementation:

c_dgemv( layout, trans, M, N, alpha, \*A, LDA, \*X, strideX, beta, \*Y, strideY )
c_sgemv( layout, trans, M, N, alpha, \*A, LDA, \*X, strideX, beta, \*Y, strideY )

Ndarray C implementation:

c_dgemv_ndarray( trans, M, N, alpha, \*A, sa1, sa2, oa, \*X, sx, ox, beta, \*Y, sy, oy )
c_sgemv_ndarray( trans, M, N, alpha, \*A, sa1, sa2, oa, \*X, sx, ox, beta, \*Y, sy, oy )

The above two routines perform the symmetric rank 1 operation A = α*x*x^T + A where α is a scalar, x is an N element vector, and A is an N by N symmetric matrix. The reference LAPACK implementation contains parameters:

dsyr( uplo, N, alpha, x, strideX, a, lda ) (double-precision)
ssyr( uplo, N, alpha, x, strideX, a, lda )  (single-precision)

Native C implementation:

c_dsyr( layout, uplo, N, alpha, \*X, sx, \*A, LDA )
c_ssyr( layout, uplo, N, alpha, \*X, sx, \*A, LDA )

Ndarray C implementation:

c_dsyr_ndarray( uplo, N, alpha, \*X, sx, ox, \*A, sa1, sa2, oa )
c_ssyr_ndarray( uplo, N, alpha, \*X, sx, ox, \*A, sa1, sa2, oa )

The above routine perform the rank 1 operation A = α*x*y^T + A, where α is a scalar, x is an M element vector, y is an N element vector, and A is an M by N matrix.

Native JavaScript implementation:

gaxpy( N, alpha, x, strideX, y, strideY )

Ndarray JavaScript implementation:

gaxpy.ndarray( N, alpha, x, strideX, offsetX, y, strideY, offsetY )

The intention of this implementation is to add accessor support for the generic package as we do in other generic packages. I wanted to maintain a clean structure of the packages before moving on to a new one.

The above routine find the index of the first element having the maximum absolute value.

Native JavaScript implementation:

igamax( N, x, strideX )

Ndarray JavaScript implementation:

igamax.ndarray( N, x, strideX, offsetX )

The above routine scale a double-precision complex floating-point vector by a double-precision complex floating-point constant and add the result to a double-precision complex floating-point vector. The reference LAPACK implementation contains parameters:

zaxpy( N, alpha, x, strideX, y, strideY ) (double-precision)

Native C implementation:

c_zaxpy( N, alpha, \*X, strideX, \*Y, strideY )

Ndarray C implementation:

c_zaxpy_ndarray( N, alpha, \*X, strideX, offsetX, \*Y, strideY, offsetY )

The above routine perform the rank 1 operation A = α*x*y^T + A, where α is a scalar, x is an M element vector, y is an N element vector, and A is an M by N matrix.

Native JavaScript implementation:

gger( order, M, N, α, x, sx, y, sy, A, lda )

Ndarray JavaScript implementation:

gger.ndarray( M, N, α, x, sx, ox, y, sy, oy, A, sa1, sa2, oa )

The above two routines perform one of the matrix-vector operations y = α*A*x + β*y or y = α*A^T*x + β*y, where α and β are scalars, x and y are vectors, and A is an M by N matrix.

Native JavaScript implementation:

ggemv( order, trans, M, N, α, A, LDA, x, sx, β, y, sy )

Ndarray JavaScript implementation:

ggemv.ndarray( trans, M, N, α, A, sa1, sa2, oa, x, sx, ox, β, y, sy, oy )

The above two routines perform the symmetric rank 1 operation A = α*x*x^T + A where α is a scalar, x is an N element vector, and A is an N by N symmetric matrix.

Native JavaScript implementation:

gsyr( order, uplo, N, α, x, sx, A, LDA )

Ndarray JavaScript implementation:

gsyr.ndarray( uplo, N, α, x, sx, ox, A, sa1, sa2, oa )

The above two routines perform the matrix-matrix operation C = α*op(A)*op(B) + β*C where op(X) is either op(X) = X or op(X) = X^T, α and β are scalars, A, B, and C are matrices, with op(A) an M by K matrix, op(B) a K by N matrix, and C an M by N matrix.

Native JavaScript implementation:

ggemm( ord, ta, tb, M, N, K, α, A, lda, B, ldb, β, C, ldc )

Ndarray JavaScript implementation:

ggemm.ndarray( ta, tb, M, N, K, α, A, sa1, sa2, oa, B, sb1, sb2, ob, β, C, sc1, sc2, oc )

Out of the scope of BLAS, I have made a contribution in other repository too.

  • [7301: feat: refactor and add native addon for complex/float64/conj(https://github.com/stdlib-js/stdlib/pull/7301)

The above package return the [complex conjugate][complex-conjugate] of a double-precision complex floating-point number.

JavaScript implementation:

conj( z )

This contribution is done to add native adding as it was missing in this package.

Apart from these implementations I also carried out some refactoring, cleanups, adding test cases and so on. These were not that critical to implement but were required to maintain a clean BLAS repository.

This is the list of things I did:

The list above consists of PRs that are merged after the commencement of the GSoC period. There are a few routines which I worked and were merged before this period such as that for csscal, etc.

Open PRs:

Also I have some draft PRs left, these PRs are left as draft because they depend on the approval of other PRs and also to make reviewing process easier.

Current state

We have progressed a lot in the BLAS. Currently, Level 1 real packages are almost implemented. Coming to the level 1 complex packages we have almost implemented that too, there are 2 packages that need some work to be done. Similarly to the level 2 real packages, there are implementations for them too. While coming to the level 2 complex packages there we have progressed slightly with implementation for 4 packages zher, zher2, cher and cher2. While we come to the level real 3 packages, we have implementation for all the packages, but we have not made progress in level 3 complex packages but we are in progress of that by implementing zgemm.

Note: Open PRs, Draft PRs that are on the finish line as well as merged PRs are included under the implementation.

What remains

Though there was good progress in BLAS still we have works remaining to do on BLAS.

Level 1 single precision complex packages:

  • Implementation of crotg
  • C/FORTRAN implementation of csscal

Level 1 double precision complex packages:

  • Implementation of zrotg

Level 2 single and double precision complex packages:

  • As I have mentioned earlier only 2/17 packages are implemented currently

Level 3 single and double precision complex packages:

  • All the 9 packages are remaining to be implemented

Note: 1. I have not mentioned the WebAssembly implementations as there are bunch of implementations remaining.

  1. I have also not mentioned implementations that have blockers because we need to clear them first.

Challenges and lessons learned

Each new implementation had different challenges and lessons, but when I see my works during my initial days, the lessons that I have learnt has changed me a lot.

Some of the major coding lessons that I have learnt:

  • Double checking of the codes
  • Commit messages need to be comprehensive

Apart from these I have learnt many best practices, techniques as well as a bunch of technical and mathematical knowledge.

There were times when I was stuck in problems for 3-4 days and then my mentors would help me out.

Some lessons turned into habits for me. Now I spend times reading documentations, codebases and figure out things.

This was a transformation in my mindset!

Conclusion

Each and every step in this journey has amazing stories behind it! The lessons I have learnt, the guidance I have got, the difficulties I have faced, the connections I have built, all together bought me a long way ahead.

I really want to thank Athan for the constant support and guidance till the end of the program. Learnt many lessons from him and they gave me amazing results not only in coding but also in my day to day life.

Also thanks to Aman for his insights and sharing his experiences on BLAS and he too helped me a lot during this overall journey.

Thanks to all the GSoC 2025 contributors for being joyful and interactive. All of them were very talented and this motivated me to work hard to cope up with them.

stdlib gave me a lot more than what I expected, increased my capabilities and gave me a broad spectrum of knowledge to me. I look forward to still be a part of this amazing organization!

Clone this wiki locally