Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block domains instead of prefixes #78

Open
lorello opened this issue Apr 27, 2023 · 4 comments
Open

block domains instead of prefixes #78

lorello opened this issue Apr 27, 2023 · 4 comments

Comments

@lorello
Copy link

lorello commented Apr 27, 2023

I really appreciate this small software, I like to add my 2 cents on configuration of filter.txt file.

I need to ignore URLs like:

Probably an antivirus trying to access the webpage for some security check.

Instead of multiply the lines in the configuration file for each subdomain used by the software house, I'd like to ignore the domain kaspersky-labs.com entirely.

I'm not a developer but I think I could change this line:

if strings.HasPrefix(r.Body.BlockedURI, value) {

and use strings.Contains instad of strings.HasPrefix, if you think is an acceptable change I'll try to arrange a PR for this.

@jacobbednarz
Copy link
Owner

thanks for raising the issue! using strings.Contains was originally considered for simplicity and flexibility however it is quite a bit slower which is important for the amount of throughput this tool can see.

cat >bench_test.go <<EOL
package main

import (
	"strings"
	"testing"
)

func BenchmarkStringContains(b *testing.B) {
	for n := 0; n < b.N; n++ {
		strings.Contains("example", "example")
	}
}

func BenchmarkStringPrefix(b *testing.B) {
	for n := 0; n < b.N; n++ {
		strings.HasPrefix("example", "example")
	}
}

EOL
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/test/test
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
BenchmarkStringContains-8   	463955420	        2.560 ns/op
BenchmarkStringPrefix-8     	1000000000	        1.144 ns/op
PASS
ok  	github.com/test/test	3.143s

if we were to change the matching semantics, i think we'd need to have a think about how we can efficiently do the fuzzy hostname matches.

@lorello
Copy link
Author

lorello commented May 11, 2023

Thanks @jacobbednarz the performance impact is very high, thinking about this, could we try to match just the domain removing the path and then using strings.HasSuffix instead?

In my filters list I have only domains, I don't know if there is also a use case where you need to manage paths. In such case could we add a specific file to ignore just domains?

Instead of a list like:

https://ssl.google-analytics.com
https://peboki.wukedowoki.com
https://connect.facebook.net
https://data1.bevuak.com
https://data2.bevuak.com
https://data3.bevuak.com
https://www.gstatic.com
https://fonts.gstatic.com
https://gc.kis.v2.scr.kaspersky-labs.com
https://ff.kis.v2.scr.kaspersky-labs.com
https://me.kis.v2.scr.kaspersky-labs.com

I could use:

google-analytics.com
wukedowoki.com
facebook.net
bevuak.com
gstatic.com
scr.kaspersky-labs.com

Instead of 11 lines I need only 6 lines, but I don't know if the raw performance of strings.Suffix and strings.Prefix are the same.

@lorello
Copy link
Author

lorello commented Dec 18, 2023

Hi @jacobbednarz I've used for the last months my fork of your project: master...lorello:go-csp-collector:master
In this fork I've implemented a specific list for domains and I'm satisfied with this solution (the feature reduced a lot the length of my list), let me know if your're interested in a PR, I can try to rebase my project on your latest changes.
If not we can close this issue, thanks

@jacobbednarz
Copy link
Owner

i'm happy to review a PR for it. we can address some of the error handling improvements and documentation in that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants