-
-
Notifications
You must be signed in to change notification settings - Fork 48
[GSoC'25]: Add multi-vendor system monitoring and enhanced interface statistics #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop-abhishek
Are you sure you want to change the base?
[GSoC'25]: Add multi-vendor system monitoring and enhanced interface statistics #339
Conversation
…tem queries MIBs, converted sys MIBs to async
…e polling issue added a max polling timeout
WalkthroughIntroduces an asynchronous SNMP polling stack and removes the prior synchronous poller. Converts numerous MIB query methods to async, adds new Cisco/Juniper process queries, updates requirements, and wires async polling into the systemd poller. Extends the DB schema, ingestion, and GraphQL API to store and expose system CPU/memory stats and enhanced interface counters. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Timer as systemd timer
participant Poller as switchmap_poller
participant AsyncPoll as async_poll.devices
participant SNMP as async_snmp_manager.Interact
participant MIB as async_snmp_info.Query
participant API as Central API
Timer->>Poller: trigger loop
Poller->>AsyncPoll: run_devices(max_concurrent)
AsyncPoll->>AsyncPoll: Build device list, semaphores
par for each device
AsyncPoll->>SNMP: Validate credentials / session
SNMP-->>AsyncPoll: Authorized session or fail
alt authorized
AsyncPoll->>MIB: everything()
MIB-->>AsyncPoll: aggregated misc/system/l1/l2/l3
AsyncPoll->>API: POST results
API-->>AsyncPoll: 2xx/4xx/5xx
else unauthorized/fail
AsyncPoll-->>AsyncPoll: mark failure
end
end
AsyncPoll-->>Poller: summary (success/fail counts)
sequenceDiagram
autonumber
participant Ingest as DB Ingest Topology.process()
participant SysStat as Topology.systemstat()
participant Tbl as db.table.systemstat
participant Model as SystemStat Model
Ingest->>SysStat: collect CISCO/JUNIPER process metrics
SysStat->>Tbl: device_exists(idx_device)
alt exists
SysStat->>Tbl: update_row(idx_systemstat, row)
else create
SysStat->>Tbl: insert_row([row])
end
Tbl-->>Ingest: upsert complete
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Our Pull Request Approval ProcessWe have these basic policies to make the approval process smoother for our volunteer team. Testing Your CodePlease make sure your code passes all tests. Our test code coverage system will fail if either of these two conditions occur:
The process helps maintain the overall reliability of the code base and is a prerequisite for getting your PR approved. Assigned reviewers regularly review the PR queue and tend to focus on PRs that are passing. ReviewersWhen your PR has been assigned reviewers contact them to get your code reviewed and approved via:
Reviewing Your CodeYour reviewer(s) will have the following roles:
🎯 Contacting the person who assigned the mentors is not advised unless they make a request. Do not @ the person who did the assignment otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
switchmap/poller/snmp/mib/cisco/mib_ciscoietfip.py (1)
71-90
: Do not truncate MAC addresses; produce normalized format instead.
mac_value[:12]
risks corrupting values ifgeneral.octetstr_2_string
returns colon‑separated strings (e.g.,"00:11:22:33:44:55"
becomes"00:11:22:33"
). Return the full MAC, optionally normalized.- for key, mac_value in values.items(): - final["cInetNetToMediaPhysAddress"][key] = mac_value[:12] + for key, mac_value in values.items(): + # keep full MAC; normalize if needed (strip colons) + mac = mac_value.replace(":", "").lower() if ":" in mac_value else mac_value.lower() + final["cInetNetToMediaPhysAddress"][key] = macswitchmap/poller/snmp/mib/generic/mib_if_64.py (1)
73-87
: Async migration incomplete — convert remaining accessors to async and await snmp_object callsrg output shows multiple occurrences of
results = self.snmp_object.swalk(...)
in switchmap/poller/snmp/mib/generic/mib_if_64.py inside synchronous accessors — these return coroutines and will break at runtime. Required fixes:
- Make
layer1()
async and await each_get_data(...)
call (or await individual fetchers).- Make all
ifHC*
/ifHighSpeed
accessors async andawait self.snmp_object.swalk(...)
; update their callers.- Make
_get_data(...)
async andawait func()
; update call sites.- Mirror concurrency from
mib_if.py
(async gather + semaphore=10) to limit device load.switchmap/poller/snmp/mib/cisco/mib_ciscovlaniftablerelationship.py (1)
115-121
: Fix VLAN index parsing (nodes[-2]
bug).You’re extracting the VLAN ID from the wrong OID node. With normalized=False, the instance index is the last node. Using
nodes[-2]
will yield3
for all rows (from the base OID), corrupting results.Apply:
- for oid, ifindex in results.items(): - nodes = oid.split(".") - vlan = int(nodes[-2]) - data_dict[ifindex] = [vlan] + for full_oid, ifindex in results.items(): + nodes = full_oid.split(".") + vlan = int(nodes[-1]) + data_dict[ifindex] = [vlan]switchmap/poller/snmp/mib/cisco/mib_ciscovtp.py (1)
400-419
: Harden VLAN bitset parsing and trunk lookup; fix VLAN off‑by‑one.
- Use bytes(value) for hexlify.
- Derive bit length from payload (don’t assume 1024).
- Avoid KeyError on missing trunk status.
- VLAN IDs start at 1.
- data_dict = defaultdict(dict) - length_in_bits = 1024 + from collections import defaultdict + data_dict = defaultdict(list) + length_in_bits = None @@ - results = await self.snmp_object.swalk(oid, normalized=True) + results = await self.snmp_object.swalk(oid, normalized=True) for key, value in results.items(): @@ - vlans_hex = binascii.hexlify(value).decode("utf-8") - binary_string = bin(int(vlans_hex, base))[2:].zfill(length_in_bits) + buf = bytes(value) + vlans_hex = binascii.hexlify(buf).decode("utf-8") + # Use actual payload size if not preset + bitlen = (len(buf) * 8) if not length_in_bits else length_in_bits + binary_string = bin(int(vlans_hex, base))[2:].zfill(bitlen) @@ - if trunkstatus[ifindex] == 1: - for svlan, state in enumerate(binary_string): - vlan = int(svlan) + if trunkstatus.get(ifindex) == 1: + for svlan, state in enumerate(binary_string, start=1): + vlan = svlan if int(state) == 1: - if ifindex in data_dict: - data_dict[ifindex].append(vlan) - else: - data_dict[ifindex] = [vlan] + data_dict[ifindex].append(vlan)switchmap/poller/snmp/mib/cisco/mib_ciscostack.py (1)
112-117
: Fix potential crash: defaultdict(dict) key lookup returns dict, leading to unhashable key.If a port key is missing in dot1dbaseport,
dot1dbaseport[int(key)]
yields{}
(from defaultdict), anddata_dict[ifindex] = value
will raiseTypeError: unhashable type 'dict'
. Use a plain dict in_portifindex
and guard with.get()
inportduplex
.Apply:
@@ - results = await self.snmp_object.swalk(oid, normalized=True) - for key, value in results.items(): - # Assign duplex value to ifindex key - ifindex = dot1dbaseport[int(key)] - data_dict[ifindex] = value + results = await self.snmp_object.swalk(oid, normalized=True) + for key, value in results.items(): + # Assign duplex value to ifIndex key; skip if mapping unavailable + ifindex = dot1dbaseport.get(int(key)) + if ifindex is None: + continue + data_dict[ifindex] = value @@ - data_dict = defaultdict(dict) + data_dict = {}Also applies to: 131-133
switchmap/poller/snmp/mib/generic/mib_lldp.py (1)
315-321
: KeyError risk when mapping baseport→ifIndexDirect indexing
self._baseportifindex[key_index]
will raise if the map is incomplete (you noted this in comments). Use.get()
and guard.- else: - ifindex = self._baseportifindex[key_index] + else: + ifindex = self._baseportifindex.get(key_index)
🧹 Nitpick comments (61)
.gitignore (1)
246-249
: Anchor ignore to repo root and clarify section header.Without a leading slash, this pattern ignores any run_local_tests.sh anywhere in the tree. If you only mean the root file, anchor it. Also align the header with existing section naming.
-#temp ignores -run_local_tests.sh +# Temporary ignores (developer-specific) +# If only the root copy should be ignored, anchor the pattern: +/run_local_tests.shIf the intent is to ignore developer-local variants across the repo instead, consider a generic pattern like
*.local.sh
and documenting it near the top “Developer Section”.bin/tools/create_db_tables.py (1)
65-73
: Remove commented-out seed block; keep repo cleanSince the dummy OUI insertion is intentionally retired, delete the commented code instead of leaving it in place.
Apply this diff within this block:
- # oui.insert_row( - # IOui( - # oui=None, - # organization=None, - # enabled=1, - # ) - # ) - # Skip dummy OUI insertion - real data will be populated from file + # Skip dummy OUI insertion — real data will be populated from filerequirements.txt (1)
13-13
: Pin aiohttp (and ideally requests) for reproducible builds.Unpinned network libs risk breakage at deploy time. Please pin
aiohttp
(and consider pinningrequests
alongside it).Example:
-requests -aiohttp +requests==2.32.3 +aiohttp==3.9.5If you prefer range pins, enforce an upper bound (e.g., <4.0).
switchmap/poller/snmp/mib/generic/mib_if_64.py (1)
339-371
: Redundantsafe
parameter branches; both paths are identical.Both
ifhcinoctets
andifhcoutoctets
execute the same code inif safe is False
andelse
. Either implement actual failsafe logic or remove the parameter/branch.Example simplification:
- def ifhcinoctets(self, safe=False, oidonly=False): + async def ifhcinoctets(self, oidonly=False): # ... - if safe is False: - results = self.snmp_object.swalk(oid, normalized=True) - else: - results = self.snmp_object.swalk(oid, normalized=True) + results = await self.snmp_object.swalk(oid, normalized=True)Apply similar change to
ifhcoutoctets
.Also applies to: 372-404
switchmap/poller/snmp/base_query.py (1)
47-65
: Async supported(): good; add explicit type hints and minor micro‑polish.Implementation is correct. Consider adding a return type and simplifying boolean assignment.
- async def supported(self): + async def supported(self) -> bool: """Return device's support for the MIB.""" - validity = False - if await self.snmp_object.oid_exists(self.test_oid) is True: - validity = True - return validity + return await self.snmp_object.oid_exists(self.test_oid)switchmap/poller/snmp/mib/cisco/mib_ciscoietfip.py (1)
92-129
: Nit: fix comment typos."deximal" → "decimal".
- # Convert deximal value to hex, + # Convert decimal value to hex,switchmap/poller/snmp/mib/generic/mib_ipv6.py (1)
71-90
: Async conversion LGTM; minor doc/comment nits.
- Docstring header says "CISCO-IETF-IP-MIB" though this is IPv6; consider aligning.
- Fix "deximal" → "decimal".
-"""Class interacts with CISCO-IETF-IP-MIB.""" +"""Class interacts with IPV6-MIB (IPv6 NetToMedia).""" @@ - # Convert deximal value to hex, + # Convert decimal value to hex,Also applies to: 92-138
switchmap/poller/snmp/mib/cisco/mib_ciscovlaniftablerelationship.py (1)
105-106
: Use list accumulator for possible multi‑VLAN mapping.Safer to store a list and append to avoid silent overwrite on duplicates.
- data_dict = defaultdict(dict) + from collections import defaultdict + data_dict = defaultdict(list)And adjust the loop:
- data_dict[ifindex] = [vlan] + data_dict[ifindex].append(vlan)switchmap/poller/snmp/mib/juniper/mib_juniperprocess.py (3)
49-52
: Vendor/log message mismatch and stray print.Avoid
- except Exception as e: - print(f"Error in Cisco system queries: {e}") - log.log2warning(1316, f" Error in Cisco Process MIB: {e}") + except Exception as e: + log.log2warning(1316, f"Juniper process/memory query error: {e}")
81-84
: Coerce pyasn1 values to int before summation.Guard against non‑native ints from SNMP.
- used_memory = sum(results.values()) + used_memory = sum(int(v) for v in results.values())
93-95
: Same here for free memory.- free_memory = sum(results.values()) + free_memory = sum(int(v) for v in results.values())switchmap/poller/snmp/mib/cisco/mib_ciscovtp.py (1)
131-135
: Catching bare Exception.Prefer narrowing to expected SNMP exceptions or at least log the class name for triage.
- except Exception as e: - log.log2warning( - 1001, f"CISCO-VTP layer1 query failed: {name}: {e}" - ) + except Exception as e: + log.log2warning( + 1001, + f"CISCO-VTP layer1 query failed: {name}: {e.__class__.__name__}: {e}", + )switchmap/poller/snmp/mib/generic/mib_entity.py (1)
92-111
: Make gather resilient to partial failures.Returning exceptions currently aborts the whole call; handle them and proceed.
- ) = await asyncio.gather( + ) = await asyncio.gather( self.entphysicalhardwarerev(), self.entphysicalfirmwarerev(), self.entphysicalsoftwarerev(), self.entphysicalname(), self.entphysicalmodelname(), self.entphysicalserialnum(), self.entphysicalclass(), self.entphysicaldescr(), - ) + , return_exceptions=True) + + # Normalize exceptions to empty dicts + def _ok(x): return {} if isinstance(x, Exception) else x + hw_rev, fw_rev, sw_rev, name, model, serial, classtype, description = map(_ok, (hw_rev, fw_rev, sw_rev, name, model, serial, classtype, description))switchmap/server/db/attributes.py (1)
14-14
: Remove unused importgraphql.pyutils.description isn’t used.
-from graphql.pyutils import description
switchmap/poller/snmp/mib/generic/mib_ip.py (3)
84-109
: Fetch both ARP tables concurrentlySmall perf win and simpler structure.
- # Get interface ipNetToMediaTable data - values = await self.ipnettomediatable() - for key, value in values.items(): - final["ipNetToMediaTable"][key] = value - - # Get interface ipNetToPhysicalPhysAddress data - values = await self.ipnettophysicalphysaddress() - for key, value in values.items(): - final["ipNetToPhysicalPhysAddress"][key] = value + ipv4_task = self.ipnettomediatable() + ipv6_task = self.ipnettophysicalphysaddress() + v4, v6 = await asyncio.gather(ipv4_task, ipv6_task, return_exceptions=True) + if isinstance(v4, Exception): + log.log2warning(1007, f"IP-MIB IPv4 ARP walk failed: {v4}") + v4 = {} + if isinstance(v6, Exception): + log.log2warning(1008, f"IP-MIB IPv6 ARP walk failed: {v6}") + v6 = {} + for k, v in v4.items(): + final["ipNetToMediaTable"][k] = v + for k, v in v6.items(): + final["ipNetToPhysicalPhysAddress"][k] = v
110-146
: Add error handling around swalk for IPv4 ARPAvoids bubbling exceptions that can abort whole layer3 collection.
- results = await self.snmp_object.swalk(oid, normalized=False) + try: + results = await self.snmp_object.swalk(oid, normalized=False) + except Exception as e: + log.log2warning(1009, f"ipNetToMediaTable walk failed: {e}") + return data_dict
147-203
: Likewise add error handling around IPv6 ARP walk- results = await self.snmp_object.swalk(oid, normalized=False) + try: + results = await self.snmp_object.swalk(oid, normalized=False) + except Exception as e: + log.log2warning(1010, f"ipNetToPhysicalPhysAddress walk failed: {e}") + return data_dictswitchmap/poller/snmp/mib/cisco/mib_ciscovlanmembership.py (1)
70-95
: Run vmVlan and vmPortStatus concurrently to reduce poll latency.These two awaits are independent; gather them for a small but free win and consistency with other async MIBs.
Apply:
- # Get interface vmVlan data - values = await self.vmvlan() - for key, value in values.items(): + # Get interface vmVlan and vmPortStatus data concurrently + vmvlan_values, vmportstatus_values = await asyncio.gather( + self.vmvlan(), self.vmportstatus() + ) + for key, value in vmvlan_values.items(): final[key]["vmVlan"] = value - # Get interface vmPortStatus data - values = await self.vmportstatus() - for key, value in values.items(): + for key, value in vmportstatus_values.items(): final[key]["vmPortStatus"] = valueAdd import at top of file:
+import asyncio
switchmap/poller/snmp/mib/cisco/mib_ciscoc2900.py (1)
70-94
: Parallelize the two per-port queries.Both queries are independent; gather them to cut round-trip time.
Apply:
@@ - # Get interface c2900PortDuplexStatus data - values = await self.c2900portduplexstatus() - for key, value in values.items(): - final[key]["c2900PortDuplexStatus"] = value - - # Get interface c2900PortLinkbeatStatus data - values = await self.c2900portlinkbeatstatus() - for key, value in values.items(): - final[key]["c2900PortLinkbeatStatus"] = value + # Get both datasets concurrently + duplex_values, linkbeat_values = await asyncio.gather( + self.c2900portduplexstatus(), self.c2900portlinkbeatstatus() + ) + for key, value in duplex_values.items(): + final[key]["c2900PortDuplexStatus"] = value + for key, value in linkbeat_values.items(): + final[key]["c2900PortLinkbeatStatus"] = valueAdd import at top of file if missing:
+import asyncio
switchmap/poller/snmp/async_poller.py (2)
85-87
: Unreachable return after fatal log.
log.log2die
likely raises; the subsequentreturn _data
is dead code. Remove or downgrade to non-fatal log if you intend to return.- if bool(self.snmp_object) is False: - log.log2die(1001, f"No valid SNMP object for {self._hostname} ") - return _data + if bool(self.snmp_object) is False: + log.log2die(1001, f"No valid SNMP object for {self._hostname} ")
44-46
: Docstring punctuation nit.Typo: “Returns;” → “Returns:”.
- Returns; + Returns:switchmap/poller/snmp/mib/generic/mib_qbridge.py (3)
131-159
: Ensure bridge mapping is available when dot1qPvid is called directly.If dot1qpvid() is invoked without prior layer1(), baseportifindex may be None. Load it defensively.
@@ if oidonly is True: return oid - results = await self.snmp_object.swalk(oid, normalized=True) + # Ensure bridge mapping is prepared + await self._get_bridge_data() + + results = await self.snmp_object.swalk(oid, normalized=True) for key, value in results.items(): ifindex = self.baseportifindex.get(int(key)) if ifindex is not None: data_dict[ifindex] = value
160-169
: Correct docstring: keys are VLAN IDs, not ifIndex.dot1qVlanStaticName is indexed by VLAN; update wording.
- """Return dict of Q-BRIDGE-MIB dot1qVlanStaticName per port. + """Return dict of Q-BRIDGE-MIB dot1qVlanStaticName per VLAN. @@ - data_dict: Dict of dot1qVlanStaticName using ifIndex as key + data_dict: Dict of dot1qVlanStaticName using VLAN ID as key
180-183
: Safer decoding of OctetStrings.Avoid
bytes(value)
on str; decode bytes/bytearray and fallback to str otherwise.- for key, value in results.items(): - data_dict[key] = str(bytes(value), encoding="utf-8") + for key, value in results.items(): + name = ( + value.decode("utf-8", errors="replace") + if isinstance(value, (bytes, bytearray)) + else str(value) + ) + data_dict[key] = nameswitchmap/server/db/table/systemstat.py (5)
3-3
: Remove unused imports.
and_
andnull
are not used.-from sqlalchemy import select, update, and_, null +from sqlalchemy import select, update
10-35
: Clarify behavior and clean up comments.
- Function name suggests a boolean, but it returns a row object or False; update the docstring to match behavior or rename the function later.
- Remove
#!
comments (they trigger Ruff EXE00x "shebang" warnings).def idx_exists(idx): - """Determine whether primary key exists. + """Fetch SystemStat by primary key. - Args: - idx: idx_systemstat + Args: + idx: idx_systemstat Returns: - results: SystemStat object + result: SystemStat object or False if not found """ @@ - # Return + # Return first match (or False) for row in rows: - #! i think we are just finding if row exist we are ret bool result = _rows.systemstat(row) - #! why breaking just after appending result for single row - #! are we just checking if there exists a single row or not break return result
73-78
: Avoid set() on possibly unhashable objects; dedupe deterministically.
set(rows)
can raise TypeError if objects override eq/hash. Deduplicate by a stable key instead.- # Remove any duplicates - rows = list(set(rows)) + # Remove duplicates by device index (last one wins) + _by_device = {} + for r in rows: + _by_device[r.idx_device] = r + rows = list(_by_device.values())
81-88
: Guard against missing attributes.If a caller passes an unexpected object, this will raise. Consider validating required fields.
- for row in rows: + for row in rows: + if not hasattr(row, "idx_device"): + continue inserts.append( { "idx_device": row.idx_device, "cpu_5min": row.cpu_5min, "mem_used": row.mem_used, "mem_free": row.mem_free } )
94-121
: Don’t update foreign key unless intentionally re-linking.Updating
idx_device
on an existing SystemStat can silently reassign the record. Omit unless there’s a real migration..values( { - "idx_device": row.idx_device, "cpu_5min": None if row.cpu_5min is None else row.cpu_5min, "mem_used": None if row.mem_used is None else row.mem_used, "mem_free": None if row.mem_free is None else row.mem_free }
switchmap/server/db/ingest/update/device.py (2)
8-8
: Remove unused import.-from sqlalchemy import false
156-165
: Replace#!
notes with standard comments.These trigger Ruff EXE00x warnings and aren’t actionable in code.
- #! this is also tricky to understand, understand this as well + # Tracks completion status for SystemStat updates. @@ - def systemstat(self): - """Provide the value of the 'systemstat' property.""" + def systemstat(self): + """Value of the 'systemstat' property.""" @@ - def systemstat(self,value): - """Set the 'systemstat' property.""" + def systemstat(self, value): + """Set the 'systemstat' property."""switchmap/poller/snmp/mib/cisco/mib_ciscoprocess.py (5)
52-58
: Simplify exception handling.With
return_exceptions=True
,asyncio.gather
won’t raise. Drop outer try/except; log per-metric if needed.- try: - cpu_data, memory_used_data, memory_free_data = await asyncio.gather( + cpu_data, memory_used_data, memory_free_data = await asyncio.gather( self.cpmcputotal5minrev(), self.memorypoolused(), self.memorypoolfree(), return_exceptions=True - ) + ) @@ - except Exception as e: - print(f"Error in Cisco system queries: {e}") - log.log2warning(1316, f" Error in Cisco Process MIB: {e}") - return final
72-80
: Remove noisy prints; use debug logging if needed.- results = await self.snmp_object.swalk(oid, normalized=True) - print(f"results: {results.items()}") - print(f"results: {results}") + results = await self.snmp_object.swalk(oid, normalized=True)
83-110
: Tighten exception handling and types for memory used.- try: - results = await self.snmp_object.swalk(oid, normalized=True) - used_memory = sum(results.values()) - - return used_memory - except Exception as e: - print(f"Error querying Cisco MemoryPoolUsed: {e}") - return None + try: + results = await self.snmp_object.swalk(oid, normalized=True) + except Exception: + log.log2warning(1316, "Cisco MemoryPoolUsed walk failed") + return None + try: + return int(sum(int(v) for v in results.values())) + except Exception: + return None
122-141
: Remove unused variable and mirror used-branch handling.- # Initialize key variables - data_dict = defaultdict(dict) - # Process OID - Enhanced memory pool free (high capacity) oid = ".1.3.6.1.4.1.9.9.48.1.1.1.6" @@ - try: - results = await self.snmp_object.swalk(oid, normalized=True) - free_memory = sum(results.values()) - - return free_memory - - except Exception as e: - print(f"Error querying Cisco MemoryPoolFree: {e}") - return None + try: + results = await self.snmp_object.swalk(oid, normalized=True) + except Exception: + log.log2warning(1316, "Cisco MemoryPoolFree walk failed") + return None + try: + return int(sum(int(v) for v in results.values())) + except Exception: + return None
60-81
: Minor: consistent return type and key casting.
int(key)
assumes numeric OID suffix; keep, but consider guarding with try/except to skip bad items.switchmap/poller/async_poll.py (3)
79-87
: Count “skipped” separately to avoid inflating failures.Skip-file returns False and is counted as failed. Track skipped explicitly for clearer summaries.
- success_count = sum(1 for r in results if r is True) - error_count = sum(1 for r in results if isinstance(r, Exception)) - failed_count = len(results) - success_count - error_count + success_count = sum(1 for r in results if r is True) + error_count = sum(1 for r in results if isinstance(r, Exception)) + skipped_count = sum(1 for r in results if r == "SKIPPED") + failed_count = len(results) - success_count - error_count - skipped_count @@ - f"{success_count} succeeded, {failed_count} failed, " - f"{error_count} errors" + f"{success_count} succeeded, {failed_count} failed, " + f"{error_count} errors, {skipped_count} skipped"And in device(), return "SKIPPED" instead of False when skip-file is present.
- log.log2debug(1404, log_message) - return False + log.log2debug(1404, log_message) + return "SKIPPED"
150-170
: Treat all 2xx as success and set a client timeout.- async with session.post(url, json=data) as res: - if res.status == 200: + timeout = aiohttp.ClientTimeout(total=30) + async with session.post(url, json=data, timeout=timeout) as res: + if 200 <= res.status < 300: log_message = ( f"Successfully polled and posted data " f"for {hostname}" ) log.log2debug(1407, log_message) else:
189-192
: Add retry with backoff for transient SNMP/HTTP timeouts (per prior preference).Implement 2–3 retries on
asyncio.TimeoutError
/aiohttp.ClientError
and SNMP timeouts aroundinitialize_snmp()
andquery()
.Would you like me to submit a follow-up patch adding exponential backoff (e.g., 3 retries at 1s/2s/4s) specifically for timeout errors?
switchmap/poller/snmp/mib/juniper/mib_junipervlan.py (4)
167-174
: Guard baseport→ifIndex mapping.Use
.get
to avoid KeyError when bridge data is incomplete.- for key, value in results.items(): - ifindex = self.baseportifindex.get(int(key)) - if ifindex is not None: - data_dict[ifindex] = value + for key, value in results.items(): + ifindex = self.baseportifindex.get(int(key)) + if ifindex is not None: + data_dict[ifindex] = value
191-207
: Avoid KeyError on missing vlan_map/baseport entries.Both lookups should be safe; skip entries when mappings are absent.
- for key in sorted(results.keys()): + for key in sorted(results.keys()): # The key is the full OID. Split this into its component nodes nodes = key.split(".") @@ - vlan_id = nodes[-2] - vlan_tag = self.vlan_map[int(vlan_id)] + vlan_id = nodes[-2] + vlan_tag = self.vlan_map.get(int(vlan_id)) + if vlan_tag is None: + continue @@ - baseport_value = nodes[-1] - ifindex = self.baseportifindex[int(baseport_value)] + baseport_value = nodes[-1] + ifindex = self.baseportifindex.get(int(baseport_value)) + if ifindex is None: + continue
226-233
: Robust decode of VLAN names.Handle non-bytes gracefully.
- for vlan_id, value in results.items(): + for vlan_id, value in results.items(): # Get VLAN tag - vlan_tag = self.vlan_map[int(vlan_id)] + vlan_tag = self.vlan_map.get(int(vlan_id)) + if vlan_tag is None: + continue @@ - data_dict[vlan_tag] = str(bytes(value), encoding="utf-8") + if isinstance(value, (bytes, bytearray)): + data_dict[vlan_tag] = bytes(value).decode("utf-8", errors="ignore") + else: + data_dict[vlan_tag] = str(value)
247-258
: Minor: type-cast to int explicitly.Ensure values are ints even if agent returns strings.
- for key, value in results.items(): + for key, value in results.items(): # Process OID - data_dict[int(key)] = int(value) + try: + data_dict[int(key)] = int(value) + except Exception: + continueswitchmap/poller/snmp/mib/cisco/mib_ciscocdp.py (2)
70-100
: Parallelize the three CDP walks and harden against failuresRun the three helpers concurrently and log per-call failures. This matches the async pattern used elsewhere and reduces poll latency.
+from switchmap.core import log +import asyncio @@ - # Get interface cdpCacheDeviceId data - values = await self.cdpcachedeviceid() - for key, value in values.items(): - final[key]["cdpCacheDeviceId"] = value - - # Get interface cdpCachePlatform data - values = await self.cdpcacheplatform() - for key, value in values.items(): - final[key]["cdpCachePlatform"] = value - - # Get interface cdpCacheDevicePort data - values = await self.cdpcachedeviceport() - if values is not None: - for key, value in values.items(): - final[key]["cdpCacheDevicePort"] = value + # Query all in parallel + results = await asyncio.gather( + self.cdpcachedeviceid(), + self.cdpcacheplatform(), + self.cdpcachedeviceport(), + return_exceptions=True, + ) + method_names = [ + "cdpCacheDeviceId", + "cdpCachePlatform", + "cdpCacheDevicePort", + ] + for name, values in zip(method_names, results): + if isinstance(values, Exception): + log.log2warning(1301, f"CISCO-CDP layer1 query failed: {name}: {values}") + continue + if values: + for key, value in values.items(): + final[key][name] = value
123-129
: Use direct bytes decode; avoid nested str(bytes(...)) conversionspysnmp returns bytes; decoding directly is clearer and avoids surprises if value is already bytes.
- data_dict[ifindex] = str(bytes(value), encoding="utf-8") + data_dict[ifindex] = value.decode("utf-8", errors="ignore") @@ - data_dict[ifindex] = str(bytes(value), encoding="utf-8") + data_dict[ifindex] = value.decode("utf-8", errors="ignore") @@ - data_dict[ifindex] = str(bytes(value), encoding="utf-8") + data_dict[ifindex] = value.decode("utf-8", errors="ignore")Also applies to: 152-158, 181-187
switchmap/poller/snmp/mib/generic/mib_lldp.py (1)
138-171
: Normalize string cleaning for lldpRemSysName (match other getters)Use
general.cleanstring
for consistency with lldpRemSysDesc/PortDesc.- data_dict[ifindex] = str(bytes(value), encoding="utf-8") + data_dict[ifindex] = general.cleanstring( + value.decode("utf-8", errors="ignore") + )switchmap/server/db/models.py (1)
204-228
: SystemStat model: add engine args for consistency and review FK default
- Consider adding
__table_args__ = {"mysql_engine": "InnoDB"}
for parity with other tables.idx_device
default/server_default of 1 risks orphaning stats on the wrong device if not explicitly set.-class SystemStat(BASE): +class SystemStat(BASE): @@ - __tablename__ = "smap_systemstat" + __tablename__ = "smap_systemstat" + __table_args__ = {"mysql_engine": "InnoDB"} @@ - idx_device = Column( + idx_device = Column( ForeignKey(Device.idx_device, ondelete="CASCADE"), - nullable=False, - index=True, - default=1, - server_default=text("1") + nullable=False, + index=True )switchmap/poller/snmp/async_snmp_info.py (4)
45-46
: Top-level container: prefer defaultdict to match downstream assumptionsPast code relied on
defaultdict
at the top-level for safe nested writes.- data = {} + data = defaultdict(lambda: defaultdict(dict))
348-356
: Use get_running_loop in coroutines (modern asyncio best practice)
asyncio.get_event_loop()
is deprecated in coroutines on newer Python.- loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop()Apply similarly in _add_layer1/_layer2/_layer3 helpers.
318-336
: _add_data doesn’t need to be asyncMake it a regular function and drop unnecessary awaits at call sites.
-async def _add_data(source, target): +def _add_data(source, target): @@ - return target + return targetAnd update callers:
- data = await _add_data(result, data) + data = _add_data(result, data)(Do this in _add_layer1/_layer2/_layer3.)
372-375
: Exception handling style nits
- Narrow exceptions where feasible or at least log full context (MIB name already logged).
- Ruff TRY300: move the final
return data
into anelse
to emphasize the normal path.Also applies to: 405-407, 437-439, 469-471
switchmap/poller/snmp/async_snmp_manager.py (4)
216-219
: Avoid blind Exception in reachability check.Catching bare Exception can hide real faults (e.g., coding errors). Limit to expected network/SNMP errors and log.
Apply this diff:
- except Exception: - # Not Contactable - contactable = False + except (PySnmpError, asyncio.TimeoutError, OSError) as e: + # Not contactable due to SNMP/transport error + log.log2debug(1302, f"Reachability check failed: {e}") + contactable = False
324-329
: Narrow exception scope in _oid_exists_get.Same concern: broadened catch masks defects; prefer expected failures.
- except Exception as e: + except (PySnmpError, asyncio.TimeoutError, OSError) as e: log.log2warning( 1305, f"OID existence check failed for {oid_to_get}: {e}" ) return False
354-359
: Narrow exception scope in _oid_exists_walk.Mirror the get-path handling.
- except Exception as e: + except (PySnmpError, asyncio.TimeoutError, OSError) as e: log.log2warning( 1306, f"Walk existence check failed for {oid_to_get}: {e}" ) return False
836-846
: Extract BULK walk limits into named constants.Hardcoded 50 iterations, 25 reps, and 10k safety cap are fine defaults; promote to module-level constants for clarity and easier tuning.
+# Walk tuning constants +_BULK_MAX_ITERATIONS = 50 +_BULK_MAX_REPETITIONS = 25 +_WALK_RESULT_CAP = 10_000 ... - max_iterations = 50 + max_iterations = _BULK_MAX_ITERATIONS ... - 25, + _BULK_MAX_REPETITIONS, ... - if len(results) > 10000: + if len(results) > _WALK_RESULT_CAP:Also applies to: 870-876, 924-933
switchmap/poller/snmp/mib/generic/mib_bridge.py (2)
347-351
: Use bytes.hex() for MACs to avoid Unicode decode errors.general.octetstr_2_string() decodes as UTF‑8 first; raw MAC bytes aren’t guaranteed UTF‑8 and may raise UnicodeDecodeError.
- data_dict[new_key] = general.octetstr_2_string(mac_value) + # mac_value is bytes; convert to lowercase hex safely + data_dict[new_key] = bytes(mac_value).hex()If you prefer a central fix, update switchmap/core/general.py:octetstr_2_string to return bytes_string.hex(). I can open a follow-up PR if you want.
440-452
: Remove unused helper._snmp_octetstr_2_string is unused after the async refactor.
-def _snmp_octetstr_2_string(binary_value): - """Convert SNMP OCTETSTR to string. - ... - """ - # Convert and return - result = "".join(["%0.2x" % ord(_) for _ in binary_value.decode("utf-8")]) - return result.lower()switchmap/poller/snmp/mib/generic/mib_if.py (4)
115-136
: Add egress errors/discards to meet “enhanced interface statistics.”Issue #340 asks for error and discard stats; adding ifOutErrors and ifOutDiscards completes the set.
queries = [ @@ (self.ifOutMulticastPkts, "ifOutMulticastPkts"), (self.ifLastChange, "ifLastChange"), (self.ifInUcastPkts,"ifInUcastPkts"), (self.ifOutUcastPkts,"ifOutUcastPkts"), (self.ifInErrors, "ifInErrors"), - (self.ifInDiscards, "ifInDiscards") + (self.ifInDiscards, "ifInDiscards"), + (self.ifOutErrors, "ifOutErrors"), + (self.ifOutDiscards, "ifOutDiscards"), ]Add these methods near the bottom (mirroring existing patterns):
+ async def ifOutErrors(self, oidonly=False): + data_dict = defaultdict(dict) + oid = ".1.3.6.1.2.1.2.2.1.20" + if oidonly is True: + return oid + results = await self.snmp_object.swalk(oid, normalized=True) + for k, v in results.items(): + data_dict[int(k)] = v + return data_dict + + async def ifOutDiscards(self, oidonly=False): + data_dict = defaultdict(dict) + oid = ".1.3.6.1.2.1.2.2.1.19" + if oidonly is True: + return oid + results = await self.snmp_object.swalk(oid, normalized=True) + for k, v in results.items(): + data_dict[int(k)] = v + return data_dict
185-205
: Unused ‘safe’ param in octet methods.Both ifInOctets/ifOutOctets ignore
safe
. Either remove the param or routesafe=True
to an alternate path; as-is it’s dead API surface.Do any callers pass
safe=
today? If not, I recommend dropping the arg to avoid confusion.Also applies to: 216-247
273-276
: Decode bytes defensively.For OctetString fields, use
.decode(..., errors="replace")
to avoid surprises.- data_dict[int(key)] = str(bytes(value), encoding="utf-8") + data_dict[int(key)] = ( + value.decode("utf-8", errors="replace") + if isinstance(value, (bytes, bytearray)) + else str(value) + )Apply similarly in ifAlias (Lines 418-421) and ifName (Lines 447-450).
486-509
: Prefer bytes.hex() for MAC addresses.Hex-encoding bytes avoids UTF-8 decode assumptions and is faster/simpler.
- data_dict[int(key)] = general.octetstr_2_string(value) + data_dict[int(key)] = bytes(value).hex()
from switchmap.core import general | ||
from switchmap.poller.configuration import ConfigPoller | ||
from switchmap.poller import poll | ||
from switchmap.poller import async_poll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Ensure lockfile removal and error logging around async poll run
If async_poll.run_devices raises, the lockfile persists and future runs are blocked. Wrap the call with try/finally and log exceptions.
@@
- # Post data to the remote server
+ # Post data to the remote server
while True:
@@
- # Create lockfile
- open(self.lockfile, "a").close()
-
- # Poll after sleeping
- async_poll.run_devices(max_concurrent_devices=max_concurrent)
-
- # Delete lockfile
- os.remove(self.lockfile)
+ # Create lockfile
+ open(self.lockfile, "a").close()
+ try:
+ # Poll after sleeping
+ async_poll.run_devices(max_concurrent_devices=max_concurrent)
+ except Exception as e:
+ log.log2warning(1126, f"Polling run failed: {e}")
+ finally:
+ # Always remove lockfile
+ try:
+ os.remove(self.lockfile)
+ except FileNotFoundError:
+ pass
Also confirm run_devices is sync or async:
And confirm SNMP concurrency semaphore remains 10 per prior guidance:
Also applies to: 77-77, 92-92
🏁 Script executed:
#!/bin/bash
rg -nP '^(async\s+)?def\s+run_devices\(' switchmap/poller/async_poll.py -n -C2
Length of output: 114
🏁 Script executed:
#!/bin/bash
rg -nP 'asyncio\.Semaphore\(\s*10\s*\)' switchmap/poller/snmp/async_snmp_manager.py -n
Length of output: 51
Ensure lockfile removal and exception logging around async_poll.run_devices
Wrap the run_devices call in try/except/finally so the lockfile is always removed and failures are logged. File: bin/systemd/switchmap_poller (lines 35, 77, 92).
@@
- # Post data to the remote server
+ # Post data to the remote server
while True:
@@
- # Create lockfile
- open(self.lockfile, "a").close()
-
- # Poll after sleeping
- async_poll.run_devices(max_concurrent_devices=max_concurrent)
-
- # Delete lockfile
- os.remove(self.lockfile)
+ # Create lockfile
+ open(self.lockfile, "a").close()
+ try:
+ # Poll after sleeping
+ async_poll.run_devices(max_concurrent_devices=max_concurrent)
+ except Exception as e:
+ log.log2warning(1126, f"Polling run failed: {e}")
+ finally:
+ # Always remove lockfile
+ try:
+ os.remove(self.lockfile)
+ except FileNotFoundError:
+ pass
Confirmed: run_devices is synchronous (def run_devices(max_concurrent_devices=None) in switchmap/poller/async_poll.py). Confirmed: SNMP semaphore is asyncio.Semaphore(10) in switchmap/poller/snmp/async_snmp_manager.py:156.
Committable suggestion skipped: line range outside the PR's diff.
# oui.insert_row( | ||
# IOui( | ||
# oui=None, | ||
# organization=None, | ||
# enabled=1, | ||
# ) | ||
# ) | ||
# Skip dummy OUI insertion - real data will be populated from file | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fail fast if OUI source file is missing/empty; removal of dummy seed changes init guarantees
By skipping the dummy IOui
seed, DB init now depends entirely on setup/data/mac_address_file.txt
. If that file is absent/empty or parsing fails, IOui
may end up empty and downstream lookups could break. Add prechecks and fail fast, and consider wrapping the populate step with basic error handling.
Example guard (place just before calling _oui.update_db_oui(oui_filepath)
):
# Pre-check OUI source before population
if not os.path.isfile(oui_filepath) or os.path.getsize(oui_filepath) == 0:
print(f"ERROR: OUI source not found or empty at: {oui_filepath}")
sys.exit(2)
try:
_oui.update_db_oui(oui_filepath)
except Exception as exc:
print(f"ERROR: Failed to populate OUI data from {oui_filepath}: {exc}")
sys.exit(2)
To verify impacts and file presence, run:
🏁 Script executed:
#!/bin/bash
# 1) Ensure the OUI file exists and is non-empty
test -f setup/data/mac_address_file.txt && stat -c '%n %s bytes' setup/data/mac_address_file.txt || echo "MISSING: setup/data/mac_address_file.txt"
# 2) Inspect code paths that may rely on a sentinel/any OUI rows
rg -nP -C2 '\bupdate_db_oui\b|\bIOui\b|mac_address_file\.txt|OUI' --type=py
Length of output: 22593
Fail fast if OUI source file is missing/empty; add error handling around _oui.update_db_oui in bin/tools/create_db_tables.py
Skipping the dummy IOui seed makes DB init depend solely on setup/data/mac_address_file.txt (which is missing in the repo) and _oui.update_db_oui can raise FileNotFoundError or pandas parse/empty-data errors that are currently uncaught. Add a pre-check and wrap the population call:
# Pre-check OUI source before population
if not os.path.isfile(oui_filepath) or os.path.getsize(oui_filepath) == 0:
print(f"ERROR: OUI source not found or empty at: {oui_filepath}")
sys.exit(2)
try:
_oui.update_db_oui(oui_filepath)
except Exception as exc:
print(f"ERROR: Failed to populate OUI data from {oui_filepath}: {exc}")
sys.exit(2)
Location: bin/tools/create_db_tables.py (around lines 64–76)
🤖 Prompt for AI Agents
In bin/tools/create_db_tables.py around lines 64 to 76, add a pre-check that the
OUI source file (oui_filepath) exists and is non-empty using os.path.isfile and
os.path.getsize, and if missing/empty print an ERROR with the path and exit with
sys.exit(2); then wrap the call to _oui.update_db_oui(oui_filepath) in a
try/except that catches Exception, prints an ERROR including the exception
message and the filepath, and exits with sys.exit(2) to fail fast on
parse/FileNotFound/empty-data errors.
#Async SNMP polling | ||
pysnmp==6.2.6 | ||
# pysnmp 7.x also availble but 6.2.6 is more stable as of now | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pysnmp note is good; also pin pyasn1 to avoid 6.2.6 breakage and fix typos.
With pysnmp 6.2.6, newer pyasn1 (>=0.6.1) can break APIs. Please pin pyasn1 and tidy the comments.
Apply:
-#Async SNMP polling
-pysnmp==6.2.6
-# pysnmp 7.x also availble but 6.2.6 is more stable as of now
+# Async SNMP polling
+pysnmp==6.2.6
+pyasn1==0.6.0 # required for pysnmp 6.2.x compatibility
+# pysnmp 7.x also available, but 6.2.6 is more stable in our stack
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#Async SNMP polling | |
pysnmp==6.2.6 | |
# pysnmp 7.x also availble but 6.2.6 is more stable as of now | |
# Async SNMP polling | |
pysnmp==6.2.6 | |
pyasn1==0.6.0 # required for pysnmp 6.2.x compatibility | |
# pysnmp 7.x also available, but 6.2.6 is more stable in our stack |
🤖 Prompt for AI Agents
In requirements.txt around lines 18 to 21, the pysnmp entry and comment should
be clarified and pyasn1 pinned to avoid breakage with pysnmp 6.2.6; update the
comment to fix typos and remove trailing blank line, and add a pyasn1==0.4.8 (or
another tested compatible version) requirement directly below pysnmp to lock the
dependency and prevent API breakage.
from .mib.cisco import CiscoProcessQuery | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Align MIB keys and complete Juniper system query before enabling in QUERIES
- Cisco memory keys are currently nested under "CISCO-PROCESS-MIB" in CiscoProcessQuery.system(), but ingestion expects "CISCO-MEMORY-POOL-MIB" for memory. This will result in mem_used/mem_free being None in device.systemstat().
- JuniperProcessQuery.system() appears incomplete (empty gather, Cisco‑specific keys/msgs). Adding it to QUERIES will call an unfinished collector.
Action:
- Move Cisco memory to "CISCO-MEMORY-POOL-MIB".
- Implement Juniper.system() with proper tasks and "JUNIPER-MIB" keys.
Proposed fixes (other files):
Cisco memory keys (mib_ciscoprocess.py):
- if memory_used_data and not isinstance(memory_used_data, Exception):
- final["CISCO-PROCESS-MIB"]["ciscoMemoryPoolUsed"] = memory_used_data
-
- if memory_free_data and not isinstance(memory_free_data, Exception):
- final["CISCO-PROCESS-MIB"]["ciscoMemoryPoolFree"] = memory_free_data
+ if memory_used_data and not isinstance(memory_used_data, Exception):
+ final["CISCO-MEMORY-POOL-MIB"]["ciscoMemoryPoolUsed"] = memory_used_data
+ if memory_free_data and not isinstance(memory_free_data, Exception):
+ final["CISCO-MEMORY-POOL-MIB"]["ciscoMemoryPoolFree"] = memory_free_data
Juniper.system() (mib_juniperprocess.py):
- async def system(self):
- """Get system resource data from Juniper devices."""
- # Initialize key variables
- final = defaultdict(lambda: defaultdict(dict))
-
- # Get CPU and memory data concurrently
- try:
- cpu_data, memory_used_data, memory_free_data = await asyncio.gather(
-
- )
-
- # Populate final results
- if cpu_data and not isinstance(cpu_data, Exception):
- final["JUNIPER-PROCESS-MIB"]["cpmCPUTotal5minRev"] = cpu_data
-
- if memory_used_data and not isinstance(memory_used_data, Exception):
- final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolUsed"] = memory_used_data
-
- if memory_free_data and not isinstance(memory_free_data, Exception):
- final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolFree"] = memory_free_data
-
- except Exception as e:
- print(f"Error in Cisco system queries: {e}")
- log.log2warning(1316, f" Error in Cisco Process MIB: {e}")
- return final
+ async def system(self):
+ """Get system resource data from Juniper devices."""
+ final = defaultdict(lambda: defaultdict(dict))
+ try:
+ cpu_data, mem_used, mem_free = await asyncio.gather(
+ self.operatingcpu(),
+ self.operatingmemoryused(),
+ self.operatingmemoryfree(),
+ return_exceptions=True,
+ )
+ if cpu_data and not isinstance(cpu_data, Exception):
+ final["JUNIPER-MIB"]["jnxOperatingCPU"] = cpu_data
+ if mem_used and not isinstance(mem_used, Exception):
+ final["JUNIPER-MIB"]["jnxOperatingMemoryUsed"] = mem_used
+ if mem_free and not isinstance(mem_free, Exception):
+ final["JUNIPER-MIB"]["jnxOperatingMemoryFree"] = mem_free
+ except Exception as e:
+ log.log2warning(1316, f"Error in Juniper MIB: {e}")
+ return final
return final
Verification:
Also applies to: 25-26, 38-51
🏁 Script executed:
#!/bin/bash
# Confirm ingestion expects CISCO-MEMORY-POOL-MIB
rg -nP 'CISCO-MEMORY-POOL-MIB|ciscoMemoryPool(Used|Free)' switchmap/server/db/ingest/update/device.py
# Inspect Juniper system() body
rg -nP 'class\s+JuniperProcessQuery|async\s+def\s+system' -n -C3 switchmap/poller/snmp/mib/juniper/mib_juniperprocess.py
Length of output: 812
Move Cisco memory keys to CISCO-MEMORY-POOL-MIB and implement Juniper.system() before enabling in QUERIES
- Ingest expects CISCO-MEMORY-POOL-MIB (device reads ciscoMemoryPoolUsed/ciscoMemoryPoolFree — switchmap/server/db/ingest/update/device.py lines 685, 688, 690–691). Change assignments in switchmap/poller/snmp/mib/cisco/mib_ciscoprocess.py from "CISCO-PROCESS-MIB" to "CISCO-MEMORY-POOL-MIB".
- Implement JuniperProcessQuery.system() (switchmap/poller/snmp/mib/juniper/mib_juniperprocess.py) — the current method only initializes final and does not gather/assign CPU/memory. Add concurrent calls to operatingcpu(), operatingmemoryused(), operatingmemoryfree() and populate the appropriate JUNIPER keys (e.g., jnxOperatingCPU, jnxOperatingMemoryUsed, jnxOperatingMemoryFree). Do not add Juniper to QUERIES until done.
🤖 Prompt for AI Agents
In switchmap/poller/snmp/__init__.py around lines 22 to 23, update the SNMP MIB
handling per the review: change the Cisco memory key assignments in
switchmap/poller/snmp/mib/cisco/mib_ciscoprocess.py from "CISCO-PROCESS-MIB" to
"CISCO-MEMORY-POOL-MIB" so ciscoMemoryPoolUsed/ciscoMemoryPoolFree are read from
the correct MIB; and implement JuniperProcessQuery.system() in
switchmap/poller/snmp/mib/juniper/mib_juniperprocess.py to concurrently call
operatingcpu(), operatingmemoryused(), and operatingmemoryfree(), populate the
JUNIPER keys (jnxOperatingCPU, jnxOperatingMemoryUsed, jnxOperatingMemoryFree)
on the result object, and ensure Juniper is not added to QUERIES until this is
complete.
if isinstance(result, Exception): | ||
log.log2warning(f"{key} failed: {result}") | ||
elif result: | ||
data[key] = result | ||
print(f"final data: {data}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect log2warning signature; this will raise at runtime
log.log2warning
requires (code, message). Pass a code and remove the bare print below.
- if isinstance(result, Exception):
- log.log2warning(f"{key} failed: {result}")
+ if isinstance(result, Exception):
+ log.log2warning(1004, f"{key} failed: {result}")
@@
- print(f"final data: {data}")
+ log.log2debug(1003, f"final data sections: {list(data.keys())}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if isinstance(result, Exception): | |
log.log2warning(f"{key} failed: {result}") | |
elif result: | |
data[key] = result | |
print(f"final data: {data}") | |
if isinstance(result, Exception): | |
log.log2warning(1004, f"{key} failed: {result}") | |
elif result: | |
data[key] = result | |
log.log2debug(1003, f"final data sections: {list(data.keys())}") |
🤖 Prompt for AI Agents
In switchmap/poller/snmp/async_snmp_info.py around lines 59 to 63 the call to
log.log2warning uses the wrong signature and will raise at runtime; change the
call to pass a code and a message (for example the key as code and f"{key}
failed: {result}" as message) instead of a single string, and remove the debug
print("final data: {data}") so the function logs the failure correctly and does
not print to stdout.
# Get ifindex data directly | ||
oid = ".1.3.6.1.2.1.2.2.1.1" | ||
ifindex_results = await self.snmp_object.swalk(oid, normalized=True) | ||
ifindex_data = {int(k): v for k, v in ifindex_results.items()} | ||
# Get the difference between ifIndex and dot1dBasePortIfIndex | ||
oid = ".1.3.6.1.2.1.17.1.4.1.2" | ||
results = self._snmp_object.swalk(oid, normalized=True) | ||
results = await self.snmp_object.swalk(oid, normalized=True) | ||
for _bridge_index, ifindex in results.items(): | ||
bridge_index = int(_bridge_index) | ||
offset = int(ifindex) - bridge_index | ||
break | ||
|
||
# Populate the dictionary keyed by dot1dBasePortIfIndex | ||
for ifindex, _ in sorted(self._ifindex.items()): | ||
for ifindex, _ in sorted(ifindex_data.items()): | ||
bridge_index = ifindex - offset | ||
data_dict[bridge_index] = ifindex | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect baseport→ifIndex mapping (constant-offset assumption).
dot1dBasePortIfIndex is not guaranteed to be a linear offset from ifIndex. Computing a single offset and applying it globally will mis-map ports on many devices.
Replace the offset logic with a direct mapping from BRIDGE-MIB:
- # Get ifindex data directly
- oid = ".1.3.6.1.2.1.2.2.1.1"
- ifindex_results = await self.snmp_object.swalk(oid, normalized=True)
- ifindex_data = {int(k): v for k, v in ifindex_results.items()}
- # Get the difference between ifIndex and dot1dBasePortIfIndex
- oid = ".1.3.6.1.2.1.17.1.4.1.2"
- results = await self.snmp_object.swalk(oid, normalized=True)
- for _bridge_index, ifindex in results.items():
- bridge_index = int(_bridge_index)
- offset = int(ifindex) - bridge_index
- break
-
- # Populate the dictionary keyed by dot1dBasePortIfIndex
- for ifindex, _ in sorted(ifindex_data.items()):
- bridge_index = ifindex - offset
- data_dict[bridge_index] = ifindex
+ # Build direct mapping: dot1dBasePortIfIndex.{bridgePort} = ifIndex
+ oid = ".1.3.6.1.2.1.17.1.4.1.2"
+ results = await self.snmp_object.swalk(oid, normalized=True)
+ for bridge_index_str, ifindex in results.items():
+ bridge_index = int(bridge_index_str)
+ data_dict[bridge_index] = int(ifindex)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Get ifindex data directly | |
oid = ".1.3.6.1.2.1.2.2.1.1" | |
ifindex_results = await self.snmp_object.swalk(oid, normalized=True) | |
ifindex_data = {int(k): v for k, v in ifindex_results.items()} | |
# Get the difference between ifIndex and dot1dBasePortIfIndex | |
oid = ".1.3.6.1.2.1.17.1.4.1.2" | |
results = self._snmp_object.swalk(oid, normalized=True) | |
results = await self.snmp_object.swalk(oid, normalized=True) | |
for _bridge_index, ifindex in results.items(): | |
bridge_index = int(_bridge_index) | |
offset = int(ifindex) - bridge_index | |
break | |
# Populate the dictionary keyed by dot1dBasePortIfIndex | |
for ifindex, _ in sorted(self._ifindex.items()): | |
for ifindex, _ in sorted(ifindex_data.items()): | |
bridge_index = ifindex - offset | |
data_dict[bridge_index] = ifindex | |
# Build direct mapping: dot1dBasePortIfIndex.{bridgePort} = ifIndex | |
oid = ".1.3.6.1.2.1.17.1.4.1.2" | |
results = await self.snmp_object.swalk(oid, normalized=True) | |
for bridge_index_str, ifindex in results.items(): | |
bridge_index = int(bridge_index_str) | |
data_dict[bridge_index] = int(ifindex) |
🤖 Prompt for AI Agents
In switchmap/poller/snmp/mib/generic/mib_bridge.py around lines 371 to 387, the
code assumes a constant offset between ifIndex and dot1dBasePortIfIndex and
computes a global offset then applies it to all ports; instead, fetch
dot1dBasePortIfIndex (.1.3.6.1.2.1.17.1.4.1.2) and build the mapping directly by
iterating its results so each bridge port (dot1dBasePort) maps to the exact
ifIndex returned by the BRIDGE-MIB (i.e., convert keys and values to int and set
data_dict[bridge_port_index] = int(ifindex) for each entry), removing the global
offset computation and relying on the direct MIB mapping.
super().__init__(snmp_object, "", tags=["layer3"]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t override supported() to always True; pass a real test_oid and defer to base
Unconditional True can cause unnecessary queries/timeouts on unsupported devices. Provide a test OID and call the base check.
- super().__init__(snmp_object, "", tags=["layer3"])
+ # Use a stable IP-MIB OID to probe support (ipNetToMediaTable)
+ super().__init__(snmp_object, ".1.3.6.1.2.1.4.22.1.2", tags=["layer3"])
@@
- async def supported(self):
- """Return device's support for the MIB.
- ...
- """
- # Support OID
- validity = True
-
- # Return
- return validity
+ async def supported(self):
+ """Return device's support for the MIB."""
+ return await super().supported()
Also applies to: 68-83
🤖 Prompt for AI Agents
In switchmap/poller/snmp/mib/generic/mib_ip.py around lines 66-67 (and similarly
lines 68-83), the supported() method currently returns True unconditionally;
change it to provide a concrete test_oid for an IP-related OID and call the base
class supported() implementation (or otherwise perform the real SNMP test) so
the check defers to the base logic and only returns True when the test OID is
present, avoiding needless queries/timeouts on unsupported devices.
cpu_data, memory_used_data, memory_free_data = await asyncio.gather( | ||
|
||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken gather: no awaitables → unpack will crash.
await asyncio.gather()
with no args returns ()
, causing ValueError
when unpacking into three vars.
- cpu_data, memory_used_data, memory_free_data = await asyncio.gather(
-
- )
+ cpu_data, memory_used_data, memory_free_data = await asyncio.gather(
+ self.operatingcpu(),
+ self.operatingmemoryused(),
+ self.operatingmemoryfree(),
+ return_exceptions=True,
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
cpu_data, memory_used_data, memory_free_data = await asyncio.gather( | |
) | |
cpu_data, memory_used_data, memory_free_data = await asyncio.gather( | |
self.operatingcpu(), | |
self.operatingmemoryused(), | |
self.operatingmemoryfree(), | |
return_exceptions=True, | |
) |
final["JUNIPER-PROCESS-MIB"]["cpmCPUTotal5minRev"] = cpu_data | ||
|
||
if memory_used_data and not isinstance(memory_used_data, Exception): | ||
final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolUsed"] = memory_used_data | ||
|
||
if memory_free_data and not isinstance(memory_free_data, Exception): | ||
final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolFree"] = memory_free_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong MIB/metric labels (Cisco names used for Juniper).
Use Juniper names to avoid confusing downstream aggregations.
- if cpu_data and not isinstance(cpu_data, Exception):
- final["JUNIPER-PROCESS-MIB"]["cpmCPUTotal5minRev"] = cpu_data
+ if cpu_data and not isinstance(cpu_data, Exception):
+ final["JUNIPER-MIB"]["jnxOperatingCPU"] = cpu_data
- if memory_used_data and not isinstance(memory_used_data, Exception):
- final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolUsed"] = memory_used_data
+ if memory_used_data and not isinstance(memory_used_data, Exception):
+ final["JUNIPER-MIB"]["jnxOperatingMemoryUsed"] = memory_used_data
- if memory_free_data and not isinstance(memory_free_data, Exception):
- final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolFree"] = memory_free_data
+ if memory_free_data and not isinstance(memory_free_data, Exception):
+ final["JUNIPER-MIB"]["jnxOperatingMemoryFree"] = memory_free_data
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
final["JUNIPER-PROCESS-MIB"]["cpmCPUTotal5minRev"] = cpu_data | |
if memory_used_data and not isinstance(memory_used_data, Exception): | |
final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolUsed"] = memory_used_data | |
if memory_free_data and not isinstance(memory_free_data, Exception): | |
final["JUNIPER-MEMORY-POOL-MIB"]["juniperMemoryPoolFree"] = memory_free_data | |
if cpu_data and not isinstance(cpu_data, Exception): | |
final["JUNIPER-MIB"]["jnxOperatingCPU"] = cpu_data | |
if memory_used_data and not isinstance(memory_used_data, Exception): | |
final["JUNIPER-MIB"]["jnxOperatingMemoryUsed"] = memory_used_data | |
if memory_free_data and not isinstance(memory_free_data, Exception): | |
final["JUNIPER-MIB"]["jnxOperatingMemoryFree"] = memory_free_data |
🤖 Prompt for AI Agents
In switchmap/poller/snmp/mib/juniper/mib_juniperprocess.py around lines 41-47
the code is assigning Cisco-style MIB/metric labels; replace them with the
correct Juniper MIB names and metric keys. Specifically, change the MIB and
metric keys to the Juniper equivalents (e.g. use Juniper CPU and memory
OIDs/names such as jnx... entries from JUNIPER-PROCESS-MIB /
JUNIPER-MEMORY-POOL-MIB instead of Cisco cpm* names), update the dict keys to
match the exact Juniper MIB field names, and validate the exact names against
the Juniper MIB definitions before committing. Ensure you only change the label
strings (no logic change) and run tests to confirm downstream aggregations use
the new Juniper names.
class SystemStatAttribute: | ||
"""Descriptive attributes of the SystemStat table. | ||
A generic class to mutualize description of attributes for both queries | ||
and mutations. | ||
""" | ||
|
||
idx_systemstat = graphene.Int(description="Primary key index") | ||
idx_device = graphene.Int(description="Device Index Foreign key ") | ||
cpu_5min = graphene.Int(description=" Device cpu_5min_usage") | ||
mem_used = graphene.Int(description="Device memory_used") | ||
mem_free = graphene.Int(description="Device memory_free") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Avoid GraphQL Int overflow for memory metrics
GraphQL Int is 32‑bit. mem_used/mem_free in bytes can exceed 2^31‑1, causing serialization errors. Use a safer scalar (e.g., String) or expose values in KB/MB as Int.
Option A: switch to String (bytes)
- mem_used = graphene.Int(description="Device memory_used")
- mem_free = graphene.Int(description="Device memory_free")
+ mem_used = graphene.String(description="Device memory_used (bytes)")
+ mem_free = graphene.String(description="Device memory_free (bytes)")
Option B: keep Int but change units (update ingestion accordingly)
- mem_used = graphene.Int(description="Device memory_used")
- mem_free = graphene.Int(description="Device memory_free")
+ mem_used = graphene.Int(description="Device memory_used (KB)")
+ mem_free = graphene.Int(description="Device memory_free (KB)")
If you prefer a BigInt scalar, confirm availability and wire it in:
🌐 Web query:
Does Graphene (v2/v3) provide a built-in BigInt scalar, or should we use a custom scalar/library?
💡 Result:
Short answer: There’s no guaranteed documented BigInt in the public Graphene docs — implement a custom scalar (or use a community scalar library) unless your installed graphene package exposes a BigInt scalar.
Details:
- Official Graphene docs list only the standard GraphQL scalars (Int, Float, String, Boolean, ID) and show how to create custom scalars. [1]
- The Graphene codebase contains a BigInt scalar implementation in the types module in some releases, so some installed graphene versions may already expose graphene.types.scalars.BigInt — check your installed package/version. [2]
- If your version doesn’t include BigInt, create a custom Scalar (serialize as string/int) or use a maintained scalar library (e.g. graphql-scalars / other community packages). [1][3]
References:
[1] Graphene scalars docs.
[2] Graphene types module (BigInt scalar in source).
[3] graphql-scalars (community scalar library).
Fix GraphQL Int overflow for memory metrics (switchmap/server/db/attributes.py:406-419)
GraphQL Int is 32‑bit; mem_used/mem_free in bytes may exceed 2^31‑1 and cause serialization errors. Change the field type or units.
Option A: switch to String (bytes)
- mem_used = graphene.Int(description="Device memory_used")
- mem_free = graphene.Int(description="Device memory_free")
+ mem_used = graphene.String(description="Device memory_used (bytes)")
+ mem_free = graphene.String(description="Device memory_free (bytes)")
Option B: keep Int but change units (KB)
- mem_used = graphene.Int(description="Device memory_used")
- mem_free = graphene.Int(description="Device memory_free")
+ mem_used = graphene.Int(description="Device memory_used (KB)")
+ mem_free = graphene.Int(description="Device memory_free (KB)")
If you prefer a BigInt scalar: Graphene does not guarantee a built-in BigInt — confirm whether your installed package exposes graphene.types.scalars.BigInt; if not, add a custom BigInt scalar or use a community scalar library (e.g., graphql-scalars) and wire it in before switching.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class SystemStatAttribute: | |
"""Descriptive attributes of the SystemStat table. | |
A generic class to mutualize description of attributes for both queries | |
and mutations. | |
""" | |
idx_systemstat = graphene.Int(description="Primary key index") | |
idx_device = graphene.Int(description="Device Index Foreign key ") | |
cpu_5min = graphene.Int(description=" Device cpu_5min_usage") | |
mem_used = graphene.Int(description="Device memory_used") | |
mem_free = graphene.Int(description="Device memory_free") | |
class SystemStatAttribute: | |
"""Descriptive attributes of the SystemStat table. | |
A generic class to mutualize description of attributes for both queries | |
and mutations. | |
""" | |
idx_systemstat = graphene.Int(description="Primary key index") | |
idx_device = graphene.Int(description="Device Index Foreign key ") | |
cpu_5min = graphene.Int(description=" Device cpu_5min_usage") | |
mem_used = graphene.String(description="Device memory_used (bytes)") | |
mem_free = graphene.String(description="Device memory_free (bytes)") |
Please fix the conflicting files |
Please also review the code rabbit suggestions. |
What kind of change does this PR introduce?
Issue Number:
Fixes #340
Snapshots/Videos:
If relevant, did you update the documentation?
Summary
Does this PR introduce a breaking change?
Checklist
CodeRabbit AI Review
Test Coverage
Other information
Have you read the contributing guide?
Summary by CodeRabbit
New Features
Performance
Chores