Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: enable SIMD instructions for x86_64 targets #423

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Itsusinn
Copy link
Member

🤔 This is a ...

  • Performance optimization

@Itsusinn
Copy link
Member Author

Itsusinn commented May 21, 2024

https://github.com/Watfaq/clash-rs/releases/tag/v0.1.17

❯ elfx86exts clash-x86_64-unknown-linux-gnu
File format and CPU architecture: Elf, X86_64
MODE64 (call)
CMOV (cmova)
SSE1 (xorps)
SSE2 (movdqa)
BMI (tzcnt)
AES (aeskeygenassist)
AVX (vmovups)
NOVLX (vmovups)
AVX2 (vpbroadcastb)
SSSE3 (pshufb)
PCLMUL (pclmulqdq)
SSE41 (pblendw)
SSE3 (lddqu)
SHA (sha1rnds4)
BMI2 (shlx)
AVX512 (kmovw)
VLX (vpcmpltud)
ADX (adox)
3DNow (femms)
NOT64BITMODE (xchg)
Instruction set extensions used: 3DNow, ADX, AES, AVX, AVX2, AVX512, BMI, BMI2, CMOV, MODE64, NOT64BITMODE, NOVLX, PCLMUL, SHA, SSE1, SSE2, SSE3, SSE41, SSSE3, VLX
CPU Generation: Unknown

Seems SIMD is enabled by default on x86_64 targets.
But I wonder what will happen if old cpus use those binaries.

@Itsusinn Itsusinn closed this May 21, 2024
@Itsusinn Itsusinn reopened this May 21, 2024
@Itsusinn
Copy link
Member Author

It might be introduced by ring

ibigbug
ibigbug previously approved these changes May 21, 2024
@ibigbug
Copy link
Member

ibigbug commented May 21, 2024

It looks like we shouldn't target a specific CPU which would cause degradation on other platforms? Unless we build binary for each cpu ?

https://github.com/rust-lang/portable-simd/blob/master/beginners-guide.md

@Itsusinn
Copy link
Member Author

It looks like we shouldn't target a specific CPU which would cause degradation on other platforms? Unless we build binary for each cpu ?

I use target CPU because there are about 5~8 target features need to be written. Using target cpu is shorter.

And I dont think it downgarde perf on other cpu. The sandy bridge is quite a modern CPU. It supports sse1 to sse4 , aes and sha instructions which includes default instrctions we are using now.

@ibigbug
Copy link
Member

ibigbug commented May 21, 2024

Okay makes sense

@VendettaReborn
Copy link
Contributor

It sounds like a easy-to-obtain improvement, but I'd like to share Rob pike's programming principles
from my view, the real problem is, we even don't know where the bottleneck is, and we don't have any measures except the performance boost in theory.
Btw, I'm also thinking about ways to compare clash-rs's performance with clash.meta and sing-box. It would be fun. Maybe we can move to discussions.

@Itsusinn
Copy link
Member Author

from my view, the real problem is, we even don't know where the bottleneck is, and we don't have any measures except the performance boost in theory.

We do need some performance characterization methods.But I think the performance of clash-rs is limited to user actual network condition.(CN2, BGP etc.) The perf test on local loopback could never be reach in realworld.

@VendettaReborn
Copy link
Contributor

that's true, the network condition between test machines must be stable enough, in another words, they'd better exist under the same LAN

@Itsusinn Itsusinn enabled auto-merge (squash) July 3, 2024 14:06
@Itsusinn Itsusinn disabled auto-merge July 3, 2024 14:06
@ibigbug ibigbug dismissed their stale review July 3, 2024 14:43

Need figure out a perf baseline

Signed-off-by: iHsin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants