Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have some way of doing bulk requests #42

Open
cudeso opened this issue Mar 11, 2025 · 8 comments
Open

Have some way of doing bulk requests #42

cudeso opened this issue Mar 11, 2025 · 8 comments
Assignees

Comments

@cudeso
Copy link

cudeso commented Mar 11, 2025

This is strictly not related to PyPDNS, but rather to the the CIRCL Passive DNS services.
Having some way of doing bulk requests, similar to how Team Cymru does this with the whois query.

Use case:
I have a set of +/- 32k IPs and I want to know if they have been observed by pdns. Querying them individually via PyPDNS takes a very, very, very long time.
pdnsresult = pypdns.rfc_query(ip)

@adulau
Copy link
Member

adulau commented Mar 11, 2025

Did you paginate? via dribble-paginate-count ?

@cudeso
Copy link
Author

cudeso commented Mar 11, 2025

As far as I could check in the docs dribble-paginate-count isn't part of the rfc_query request. I'll check it again.

Is dribble-paginate-count not for paginating the result set? In my use case I have about 32k IPs in the request, and from estimateguessing only 10% of it will have an answer/match.

@Rafiot
Copy link
Member

Rafiot commented Mar 11, 2025

Paginate will paginate the result set, so you will still need to do 32K queries (you can parallelize them, but it will still take a while). One thing to speed it up it to query only A and AAAA records for example.

The other issue to keep in mind with the pagination is that the response order is non-deterministic, so even if you paginate, you won't know if you got the most recent entries until you get the complete set.

@cudeso
Copy link
Author

cudeso commented Mar 12, 2025

I'm just doing a 'stupid' pdns = pypdns.rfc_query(ip) query in
https://github.com/cudeso/tools/blob/master/minimedusa/parse_minimedusa.py
Might have to filter it for A/AAAA to get better results.
It takes about 2h to get the minimedusa results parsed. Not fast, but we only parse it once a day so still acceptable.

@cudeso
Copy link
Author

cudeso commented Mar 12, 2025

So now, if only CIRCL / @adulau would provide a WHOIS service similar to Team Cymru I could limit outbound firewall rules to *circl.lu only ;-)

@adulau
Copy link
Member

adulau commented Mar 12, 2025

@cudeso to summarize :

  • A bulk interface to query Passive DNS records (my only concern is that it may be very large for some record types).
  • A GeoOpen-included response for the associated IP, providing something similar to the WHOIS output of Team Cymru.

Some ideas:

@adulau adulau self-assigned this Mar 12, 2025
@Rafiot
Copy link
Member

Rafiot commented Mar 12, 2025

One thing that would also be very useful for large responses is to have away to get only get the recent entries, like "ignore entries with a last-seen older than 30 days" for example (for Lookyloo, that would be good enough). Or "give me the 20 most recent A entries".

@cudeso
Copy link
Author

cudeso commented Mar 14, 2025

@cudeso to summarize :

  • A bulk interface to query Passive DNS records (my only concern is that it may be very large for some record types).

As @Rafiot remarked; having only the most recent ones returned is OK. For bulk querying limit to seen last 30d/60d.

  • A GeoOpen-included response for the associated IP, providing something similar to the WHOIS output of Team Cymru.

Some ideas:

Yes. An extra key for pdns would be great. Then there's only one external resource to use if you want to do these types of queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants