forked from flame/libflame
-
Notifications
You must be signed in to change notification settings - Fork 13
Tmp dev #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pradeeptrgit
wants to merge
1,062
commits into
dev
Choose a base branch
from
tmp-dev
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Tmp dev #2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AMD Internal : [CPUPL-4598] Change-Id: Id7fb339ecf3efa2535cf88807773ca928bdbe41c
Test-Suite related files have been reformatted using customized clang-format tool. Also, python script to do the formatting is added. Script usage: python/python3 fla_format_code.py <path to folder or specific file> Examples: python fla_format_code.py test/main/src python fla_format_code.py test/main/src/test_getrf.c AMD-Internal: CPUPL-4751 Signed-off-by: tprnaidu <[email protected]> Change-Id: I46a4e2a441309517bae6d27c014a699572384f76
Rename global_thread_mutex to fla_global_thread_mutex. Make it static as its used only in FLA_Context.c AMD-Internal: CPUPL-4957 Change-Id: I7a9d4273c2c2203ef7ef553a01d1075afa245dbc
details: netlib cmake file uses cmake_path command which is handled from cmake 3.20.0 version AMD-Internal: [CPUPL-4890] Signed-off-by: ksaithar <[email protected]> Change-Id: Ib14c7cbd2ee4ad296c37703d9895558254dfbe94
details:Added extreme value test cases AMD-Internal: [CPUPL-4768] Signed-off-by: ksaithar <[email protected]> Change-Id: Ia648037ca7a191ceec660ecaaf129d577ab710be
Added new test API to verify LAPACK GELSS API functionality Disabled row space checking test for gelss, gels, gelsd Signed-off-by: vprasada <[email protected]> Change-Id: I5d0ffd416dcb6d2e0412f57094c4ee383a6a8908
Remove variables in code paths where they are set but not used Signed-off-by: Vibhav Gupta <[email protected]> AMD-Internal: CPUPL-4361 Change-Id: If5b1858a178a63bb4b5ba90eb354a562e6fb56fd
…riable Support added for environment variable, AOCL_ENABLE_INSTRUCTIONS. With this, users can set specific ISA code path to use for optimized functions. If user has chosen higher level ISA than supported by target CPU, we choose best supported architecture on target CPU. If user has chosen a lower level ISA, then same will be used. Any ISA selection lower than AVX2 defaults to generic reference code path. Valid values for AOCL_ENABLE_INSTRUCTIONS: SSE2, AVX, AVX2, AVX512 and GENERIC. All values are case-insensitive. AMD-Internal: CPUPL-4611 Change-Id: I780278c6a2ebe12ce0e61310917b9d666c8111f5
…t inputs of gbtrf and gbtrs APIs details:Added early return and incorrect input value test cases AMD-Internal: [CPUPL-4981] [CPUPL-4982] Signed-off-by: ksaithar <[email protected]> Change-Id: Ic3f0afcd8312c4a0b1dc37dce339368c8901c6d3
details:Added extreme value test cases AMD-Internal: [CPUPL-4981] Signed-off-by: ksaithar <[email protected]> Change-Id: I72ec8a07d1f433d8b48a564c0c5172894fe70f4b
AMD-Internal: CPUPL-4751 Signed-off-by: tprnaidu <[email protected]> Change-Id: Ia03d15ea194522614b5108e037e3f04706519cd3
AMD Internal : [CPUPL-5031] Change-Id: I09a42edfb1bd12e3b893a2a1ae1f5ff7f15074ff
Added test cases for SYTRF Signed-off-by: Venkatesha Ch<[email protected]> AMD-Internal: CPUPL-4309 Change-Id: Ic137b5a1455c41417f05b1c80cc8b6137b4b099a
1. Lapack code added for DTRTRI and DTRTI2 api's. 2. In lined gemv , dscal, dswap and dtrmv blas api’s for small input sizes. AMD Internal : [CPUPL-4604] Change-Id: I34f0d87dc55b21d2c7384ab7133a89026ded560f
In accordance with the defination of xerbla() in netlib, the return type is updated to void. Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: I2e797473e25dd8008a97f0ab8457a1ed0b00ab90
Optimization of code paths of DGESVD for M >= N cases. Function inlining and operations optimization performed. Minimization of work buffer usage for the optimized paths. AMD Internal: CPUPL-4606 Signed off by: Vasanth R ([email protected]) Change-Id: I68dd865e04d4001bd4baa346149de026ba19525b
Optimize DGELS for small sizes between 8 to 100. Optimizations steps - DLANGE API vectorized for functionality that finds largest number in a matrix - Skip further processing if last column is found to be 0 on entry in DLARF - Vectorized code in DLARF fusing DGEMV and DGER operations in the path taken for left apply of elementary reflector. AVX2 in this commit. AVX512 will be done in followup commit - Add ctest for covering performance tests for DGELS with M in range 10-40 and N 10 Few Eigen netlib tests fail threshold test with the current optimization. DEV and DVX netlib tests compare Eigen values generated by DGEEV call where only Eigen value is requested vs when both Eigen values and vectors are requested. But the test compares the output directly by value and not norm of their difference. Hence those failures are ignored as the difference in output with reference and intrinsic path is in 15th decimal digit or higher. AMD-Internal: CPUPL-4593 Change-Id: I866de586a4f57788704c75b862361efa730c940f
Added AOCL-BLAS version of dgbtf2 API. Here the algorithm will directly call compute kernels of idamax(), dscal(), dswap() and dger(). This changes is applicable only when AOCL-BLAS feature is enabled. CPUPL-4599 Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: Iaa377a283c89cdb521aa2c73f04e09ac96427d52
Updated the algorithm of dgbtrs to directly call dger() BLIS kernel. This change is applicable only when AOCL_BLAS feature is enabled. CPUPL-4601 Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: Ie3d98908d16742db71d629bb5e6ca5eab62d081a
With AOCL-BLAS feature enabled, observed drot declaration conflict issue at built time. Removed blis.h from fla_lapack_avx2_kernels.h to resolve this. Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: Id5bf257bf688cc8838c910dcf82d420a1b22c75f
1. singularity check added for Non-unit matrix to fix the issues AMD Internal : [CPUPL-5070] Change-Id: Id0c2d9fb565d67afed070718608d4ec360e4534a
VU and VL value updates. Signed-off-by: Venkatesha <[email protected]> Change-Id: Ib06075d6f6a796963b4268f487d8532275c364cc
…GTSV Added incorrect inputs, early return and extreme value test cases. AMD-Internal: [CPUPL-5001], [CPUPL-5002], [CPUPL-5003], [CPUPL-5004] Signed-off-by: sujithhp <[email protected]> Change-Id: I0c75116f2d44ce7b57df0dcccb19e412d409caca
Compiler flag -mavx512dq was missed out in auto tools build of AOCL-LAPACK and was only set in CMake based build. This is fixed now. AMD-Internal: CPUPL-5086 Change-Id: I83051c78a6396a267e32102ed5b4e63d647ca448
AMD Internal : [CPUPL-5088] Change-Id: I1ada887837abb1689725d59fca382f22c911e6aa
Change-Id: I7f754061a237f387848124c6d34955b630669d0f
To prevent symbol redefinition conflicts between BLIS and libFLAME when AOCL_BLAS feature is enabled, renaming dim_t to fla_dim_t in libFLAME library. Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: I535f10d409ccc0009e2f2b1ef1a1ac7a07f01ea4
Internal BLIS kernels expect signed 64-bit integers in LP64 mode. Accordingly, the necessary variables in the dgbtf2 and dgbtrs APIs have been updated. Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: I2cf6ea3437a3b79430e079d65704b23c6691fe50
Temporarily disabled AOCL-BLAS version of ZGETRF. Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: I746ba35d84023ceca98316c9dc468ec10b4a3eb2
…d clang-format tool.
Updates:
Enhanced the scripts/fla_format_code.py to accept single/multiple
file/files or folder/folders path/paths at a time as an input.
Script usage:
python/python3 scripts/fla_format_code.py \
<specific file/folder or list of multiple files/folders paths>
Examples:
python3 scripts/fla_format_code.py test/main/src
python3 scripts/fla_format_code.py test/main/src src/map
python3 scripts/fla_format_code.py test/main/src/test_getrf.c
python3 scripts/fla_format_code.py test/main/src/test_getrf.c src/map/lapack2flamec/FLA_gesdd.c
AMD-Internal: CPUPL-4751
Signed-off-by: tprnaidu <[email protected]>
Change-Id: I89a3ca48ccb843d1b61ea8d2bc7326b367d21c1f
1. Division of two pointer variable holding the same address , one variable getting updated value and another variable not updating due to that we are experiencing
wrong results. To avoid that the numerator variable copy into local variable before division would solve the issue.
AMD Internal : [CPUPL-5915]
Change-Id: I532dc852061ba184438e7f89da8890aaae591e4f
Included netlib lapack 3.12 test suite support to window. Signed-off-by: Venkatesha <[email protected]> Change-Id: Ieda921505d570df8b4ca16f1ecac2ac5f344d1c3
Updating DTL log to new macro based statments Signed-off-by: Venkatesha <[email protected]> AMD-Internal: [CPUPL-6333] Change-Id: I7b7a4ad386382d57c52b2621f98f022b6a505b87
NAN checks added in fringe kernel cases in DLARFG optimized code. This enables propagation of NAN values to the output norm from inputs. Change-Id: I107ac1921d38dd598b61f794dbbffd9a79b10205
Printing of test results moved to validate functions to have flexibility of printing intermediate results. Only failed cases are added for printing. AMD Internal: CPUPL-4319 Change-Id: I2181d95d46f34e30e0e4447c493f312e7a86f5f2
Benchmarked DGESDD for differnt threads and sizes. Size thresholds derived from the data to choose optimal number of threads for different sub modules/APIs. Affected APIs are DORMQR, DORMLQ and DLABRD. Signed-off-by: Vasanthakumar R <[email protected]> AMD-Internal: CPUPL-5828 Change-Id: Ie87b1174eafc3c181fdda916bbfe469337b40f75
1. Updated dsyevd(), dlaed4() and dlaed6() to store machine machine parameters in static varaiables. 2. Optimized for loop blocks in dlaed4(). Change-Id: I8aa14f34b09727cda1d7eb97053c0e7e6181168a Signed-off-by: Sridhar Govindaswamy <[email protected]>
DGELS was not correctly recognizing singular matrix due to precision error from dgeqrf and dlarf APIs This commit makes the following changes to resolve the issue 1. Changing SSE/AVX2/AVX512 fmadd instructions in dgeqrf and dlarf to serparate multiply and addition operations. 2. Changes to the vector kernels of dgeqrf and dlarf to improve performance. 3. Further, a change has been made in validate_sygvd to correct OV/UV testing. 4. With the current change, existing netlib-test errors are reduced. But addition 2 errors are introduced for DEV and 12 errors for DVX. These errors only show in netlib-tests with the custom ilaenv parameters defined for these tests. With the default ilaenv values of libflame, all these tests passes. Change-Id: I75f7448fc731bdeffa64515c4f9bf4231b78f652 Signed-off-by: samahmad [email protected] AMD-Internal: CPUPL-5869
libflame_interface.hh file split into multiple files each contain different categories of APIs. The split files are named according to the categories. Missing APIs were added into libflame_interface.hh where the other split .hh files are also included via #include directive. AMD-Internal: CPUPL-6373 Change-Id: Id725721470bcefbe63bc3a165824d6ac2d166e04
For static library build of AOCL-LAPACK, the header path of AOCL-Utils is sufficient. But recent changes made in usage of AOCL_ROOT forced setting AOCL-Utils path as well. Fixed the same AMD-Internal: CPUPL-6421 Change-Id: I4b5d98dffd6f02ea8b0a5406b279adc3474772da
Generating consolidated libflame_interface.hh file during configure time. AMD-Internal: CPUPL-6373 Change-Id: I95a3421a2d19d1f4ff4fd6bb7728fd6b80086e6f
In cmake file to set dependent libraries path, the variable name for AOCL-Utils was wrongly set in Windows path section. This was causing issue in static library generation. Fixed the same. Also fixed incorrect header path for AOCL-Utils under Windows OS config. AMD-Internal: CPUPL-6424 Change-Id: Id0e4223e63ae6102099455eb6aea111af89897f5
Updated validate function to support new status printing macro, in case of n = 0 Signed-off-by: Venkatesha <[email protected]> AMD-Internal: [CPUPL-6423] Change-Id: I1ca509c3ef2bc9a8202a4d619ac97ce139bc5c27
Fix for [unused-variable] and [used uninitialized] found in windows build Signed-off-by: Venkatesha <[email protected]> Change-Id: Ib028834275adbaf0a1bf4b039525dd321f4bb480
The commit make the following changes:
1. Change libflame package name to aocl-lapack.
2. Change dependent blis package name to aocl-blas.
3. Use aocl-blas/aocl-blas-mt based on threading
config.
4. aocl-lapack pc file is now genreated in
"{CMAKE_INSTALL_PREFIX}/pkgconfig/" dir instead
of "{CMAKE_INSTALL_PREFIX}/share/pkgconfig/"
4. The following error is fixed:
"Error in running lapack tests, when linking to
AOCL-BLAS using pkgconfig"
5. Using Python3 module instead of hardcoded
python executable.
Change-Id: I021d67a1ddbb79b6d8d0b5d39a3b676e17fdeb7a
Signed-off-by: samahmad <[email protected]>
AMD-Internal: CPUPL-6426
The vector kernels of LANGE were not correctly handling NaN values as per reference code. This commit makes following changes: 1. For each loaded vector, it will check for Nan values and store it in a flag register. After the loop it will check flag, if the flag is true then it will return Nan. 2. Enable Extreme test cases for LANGE main test. Change-Id: I1c0c7f9ba1d9840f9ebea9aa010a09bdda085209 Signed-off-by: samahmad <[email protected]> AMD-Internal: CPUPL-6417
-> Fixed DGELSS failure in lapacke_col_major test Argument passed to compute_matrix_norm was wrong, updated with expected argument. -> Fixed windows ilp64 build warnings Signed-off-by: Venkatesha <[email protected]> Change-Id: I4f51faadb026a80f3e9c506177a98ac65e5cf592
Updated the doxygen file input path based on recent cpp changes Signed-off-by: Venkatesha <[email protected]> Change-Id: I395a085b4f14addcbc4e7f022f7bfcf7e7f3903b
Main test suite is displaying zero error for all GESDD tests. Updated the residual variables to display correct values. AMD-Internal: CPUPL-6473 Signed-off-by: dnikku <[email protected]> Change-Id: I222e0bef09de88cbdd47f7d63910ca3cdf46405a
-> This commit fixes defect of aocl-utils dependency not being
reflected in the pkg-config file.
-> pkg-config file is now generated in
{{INSTALL_DIR}}/lib/pkgconfig
Change-Id: Ic1a60e9a118db1e772ff237541c5c2dad7279293
Signed-off-by: samahmad <[email protected]>
AMD-Internal: CPUPL-6436
HINT paths for AOCL-Utils header needed change. Also, fixed the logic so that header path of AOCL-Utils is sufficient for static builds. AMD-Internal: [CPUPL-6489] Change-Id: I1877380b31694d64905780887b9452f1b6d81a10
Added test_ormqr.c, validate_ormqr.c for validating performance and
accuracy of LAPACK API ORMQR.
NOTE: 1) Modified ORGQR, ORG2R input m, n to valid values
in case of config inputs.
2) Added ORGQR workspace query for getting lwork value instead
of using lwork from geqrf.
3) Added code to display the selected interface for main test
in output including matrix layout for lapacke.
AMD-Internal: CPUPL-6485
Change-Id: Ied015eaeb64df3513bb92c49d9c51ece2cb5a51a
Signed-off-by: dnikku <[email protected]>
Added ASAN flag in C_FLAGS, LD_FLAGS to resolve the issue. AMD-Internal: CPUPL-6475 Change-Id: Idd675ac06c13a8d0b46344811448976b399be559
Fixed an issue with incorrect argument being passed in the wrapper code of ZGETRFNPI for upper case and lower case versions. Also, minor spacing issues in parameters of SPFFRT2 and SPFFRTx functions. AMD-Internal: CPUPL-6540 Change-Id: I325bd216c74c2719dfb37fa66f66734c02f2a3c6
Performance regression observed during benchmarking addressed by re-tuning optimal threads allocation for DGESDD sub-modules. AMD Internal: CPUPL-6487 Change-Id: Ic98436b7e4d0575bff89d9148386bbcbdd730ac4
BLAS_LIBRARY is not set correctly when using pkgconfig to find AOCL-BLAS library. Fixed variable name mismatch while updating BLAS_LIBRARY Change-Id: Ic2d35c337c88d1a3e2d7deb0f834e6ca397d257b Signed-off-by: samahmad <[email protected]> AMD-Internal: CPUPL-6436
-> Updated main test suite to use lapacke.h header file instead of current Test Prototype. -> Created new directory test/main/src/lapacke, where all lapacke invoke functions are defined and declared. -> Updated Makefile and CMakeLists.txt accordingly. Signed-off-by: Venkatesha <[email protected]> AMD-Internal: [CPUPL-6542] Change-Id: I61553a8283db81c903c94075ff86bc8d3c559894
Updated main testsuite to work with case insensitive input character arguments AMD-Internal: CPUPL-6554 Signed-off-by: dnikku <[email protected]> Change-Id: Ifc967e2739ff2915353806b6c5944dc28a5b540f
Rolling back aocl-lapack to flame and aocl-blas dependency to blis to keep external applications dependency on these package names. Change-Id: I2c2a8388d6994fa16133970044d09739c01079d4 Signed-off-by: samahmad <[email protected]> AMD-Internal: CPUPL-6436
Test suite updates to include lapacke.h resulted in test build failure for make build. Updated Makefile to fix the issue. Signed-off-by: Venkatesha <[email protected]> AMD-Internal: [CPUPL-6583] Change-Id: If2b66fb41616c854be22ae687f7e1afebd4a1845
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.