Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved wifis confuse location provider #3

Open
mvglasow opened this issue Mar 7, 2015 · 5 comments
Open

Moved wifis confuse location provider #3

mvglasow opened this issue Mar 7, 2015 · 5 comments
Assignees

Comments

@mvglasow
Copy link

mvglasow commented Mar 7, 2015

My office moved to a new location last year, along with its 7 wifi access points. The new location is approximately 2 km (as the crow flies) from the old one.

When I try to get a location from the location provider in the office, the location returned is about half way between the old and the new one. Further investigation revealed that I pick up 12 wifis in the new location, of which 9 are in the wifi catalog. One is still at the old location (according to the wifi catalog), two are in the vicinity of the new location, while the remaining ones are about halfway between the two locations.

This is probably an issue with server-side processing rather than with the location provider. We'll probably need some logic on the server side that checks for signs of BSSIDs moving position. This could be done by checking the maximum distance between two measurements – when it exceeds a certain threshold, it is a sign that the base station may have moved. The threshold itself could be dependent on signal strength, with a lower threshold being applied to stronger signals.

Wifis that exceed that threshold need to be treated specially. Options include:

  • Reduce the accuracy of the wifi position proportionally to the ratio between threshold and actual distance
  • Analyze the history of measurements: if pairs of subsequent measurements are generally below the threshold but very few pairs exceed it, this is an indicator of a single position change – use only measurements taken after the threshold was last exceeded and discard all earlier ones. If thresholds are frequently exceeded, this may be a sign of a constantly moving wifi which should not be relied upon for location.
@wish7code
Copy link
Member

We'll probably need some logic on the server side that checks for signs of BSSIDs moving position

This is related to another issue, the wifi blacklisting issue, I would like to address in the near future.. See https://code.google.com/p/openbmap/issues/detail?id=47

All your points back from then are valid and I agree this should be handled server-side.
Nevertheless imho the problem has become even more difficult since then: the number of real mobile wifis (i.e. car/lorry wifis) has risen dramatically over time. So it's more difficult to distinguish between 'moved' wifis (new location) and 'moving' wifis (mobile wifis)

Anectdotal evidence:
I now use SatStat quite frequently, which is really helpful in making these issues more transparent. Recently, I've been overtaken by another car, which obviously has been scanned in Italy, but wasn't successfully identified as moving wifi:-) Suddendly the map switched to Italy, to come back to the true location only a couple of seconds later. Bad thing..

@wish7code wish7code self-assigned this Mar 9, 2015
@mvglasow
Copy link
Author

mvglasow commented Mar 9, 2015

Happy to hear you are a frequent user of SatStat now :-) I recently had the opposite case: in Italy, on the A4 around Novara, the map suddenly switched to a location just outside of Munich. At first I thought of a mobile wifi that had not been blacklisted, but as I was driving on the same stretch of the A4 again two weeks later, the same thing happened again. I suspect this is not a moved wifi but a BSSID collision (these things happen occasionally with cheap hardware).

If we manage to filter out moving wifis and, for each BSSID that has collisions, we have alternating measurements for at least two locations, a collision would have an effect similar to a moving wifi.

However, some filtering will also be necessary in the location provider.

It would help if we had cell data available: we could simply take the coverage area of the cell and analyze the locations of wifis received. Wifis outside the coverage area would get a penalty (i.e. lower weight) proportional to their distance from the border of the coverage area.

Another approach would be to group wifis received into pairs and calculate the distance for each pair. If it exceeds a certain threshold (about 2 times the range of a wifi), both wifis get penalized. That way, a single outlier would get a fairly high penalty (once for every other wifi observed) while the other wifis would get just a low one (once for the outlier). However, this approach will not work well for an entire office network as in the case I described above – I'd recommend outlier analysis only in addition to comparisons with cell data.

@n76
Copy link

n76 commented Jun 27, 2015

Not (yet) using this backend, but the way I am dealing with moved and moving APs in my standalone WiFi backend for the unifiedNLP has several layers:

First, when building the database any AP with a "_nomap" suffix or some other patterns like "iphone", "ipad" and "android" in the name are discarded. I am not sure that using SSID based filtering is sustainable as each transit company in the world that offers WiFi onboard seems to use their own names. And use of portable WiFi hotspots is growing and they will not conform to consistent naming conventions. Still, Google promotes the _nomap suffix so some will use that. And the default cellphone WiFi tethering setups often have iphone or android or samsung in the names.

Second, when using GPS data to build and maintain the database, if a new sample for an AP is more than a "moved threshold" away from the current estimate then the old information is discarded and a "move guard" count is set. The move guard specifies the number of samples consistent with the new location we must sample before we trust the location of that AP. Any AP with the move guard set will be ignored when computing positions.

Third, when determining a location from the APs seen, I group the APs together such that APs that are too far apart are in separate groups. Then the group with the largest count is used to determine location. I require the largest group to have at least 2 APs. So a single moved or moving AP will be ignored.

So far I have not noticed a false position due to a moved or moving AP with this logic. However it will fail in some cases like a group of APs being moved or moving. The original poster's moved office could still be problematical unless new GPS based sampling marks the APs as moved. And if you are on a cruise ship with multiple APs whose relative position is constant there will be a problem.

SatStat is a great tool for monitoring the performance of network location operation. Many thanks to @mvglasow for that!

@wish7code
Copy link
Member

Guess, we've got some same ideas ;-)

First, when building the database any AP with a "_nomap" suffix or some other patterns like "iphone", "ipad" and "android" in the name are discarded.

There's a list of stop words in place server-side (e.g. iphone, android, some typical car hotspots as Audi, Postbus, ....). _nomap wifis are even ignored before, as they're filtered out in the Radiobeacon scanning client.

And use of portable WiFi hotspots is growing and they will not conform to consistent naming conventions.

Excactly my observation, especially lorry drivers seem to love standard hardware, e.g. Huawei sticks with differing ssids..

Second, when using GPS data to build and maintain the database, if a new sample for an AP is more than a "moved threshold" away from the current estimate then the old information is discarded and a "move guard" count is set.

At this point we've got two rather rudimentary checks in place: (1) e.g. if cell or wifi measurements are more than say 10km away from each other, the wifi is ignored server-side. In other words, the cell/wifi will never make it to the server database. (2) On the client side the openbmap UnifiedNLP provider constantly monitors the returned results for plausibility: if the new calculated location is too far away from the current position (e.g. because wifi has moved), the results are discarded.

Third, when determining a location from the APs seen, I group the APs together such that APs that are too far apart are in separate groups. Then the group with the largest count is used to determine location. I require the largest group to have at least 2 APs. So a single moved or moving AP will be ignored.

Sounds very interesting.. As soon as I've got some free time, I'll try to setup a prototype.

SatStat is a great tool for monitoring the performance of network location operation. Many thanks to @mvglasow for that!

As Michael might be reading here, let's celebrate him for SatStat ;-)
I simply love it!

@mvglasow
Copy link
Author

As Michael might be reading here, let's celebrate him for SatStat ;-)

Thanks for the compliment :-) I am indeed following this conversation and have a few cents on that, too.

First, when building the database any AP with a "_nomap" suffix or some other patterns like "iphone", "ipad" and "android" in the name are discarded.

There's a list of stop words in place server-side (e.g. iphone, android, some typical car hotspots as Audi, Postbus, ....). _nomap wifis are even ignored before, as they're filtered out in the Radiobeacon scanning client.

Server-side refers to the geolocation API, so it would apply only in online mode, right? Since I work mostly in offline mode, we'd need a solution here, too. Technically it shouldn't be too hard to filter out certain wifis and not query them at all. On the other hand, it's error prone in both directions: blacklisting "Audi" will also block "Audi-Zentrum Ingolstadt" and similar, which would presumably be stationary. This is why I'm skeptical of filtering these things out during collection – but that's a different story.

And use of portable WiFi hotspots is growing and they will not conform to consistent naming conventions.

+1. The only sure sign of a moving hotspot is that it's moving.

Second, when using GPS data to build and maintain the database, if a new sample for an AP is more than a "moved threshold" away from the current estimate then the old information is discarded and a "move guard" count is set.

I think we will need something like this for building the database (the offline DB as well as the one used for the location API). The exact way to do it might need some refinement – I'd introduce some kind of reliability indicator based on how frequently the wifi moves and how long it has remained in one position since the last move. A wifi that has been in the same place for years will get a high reliability indicator, the wifis from the moved office would be somewhat less reliable, and a portable hotspot that keeps popping up in a new place once a month will get low reliability.

At this point we've got two rather rudimentary checks in place: (1) e.g. if cell or wifi measurements are more than say 10km away from each other, the wifi is ignored server-side. In other words, the cell/wifi will never make it to the server database.

However, these samples are the ones that might indicate situations such as the moved office. These samples need to be considered so we can detect movement – see above.

(2) On the client side the openbmap UnifiedNLP provider constantly monitors the returned results for plausibility: if the new calculated location is too far away from the current position (e.g. because wifi has moved), the results are discarded.

Sounds good – unless the old location was erratic due to moved wifis and the like. One rather simple check that we could do "lookup-side" (client in offline mode, geolocation API in online mode) is to first do a cell-based lookup, then ignore or (better) penalize wifis which are more than a certain distance away from the cell's location.

Moving cell towers might be an issue here (yes, I've read the article, too). I guess we need some more information on the ID they use – if they change IDs as they change position, each position would appear to be a different cell and thus not be an issue. Since the full cell ID contains some geographical information in the form of the LAC, I would expect IDs to change if the tower is moved into the area of a different LAC. This means that the ID could only be maintained as long as the tower is in the area of the same LAC.

We could solve that by doing cell lookups based on just MCC-MNC-LAC and applying an even higher penalty to a wifi that is outside the area of the current LAC.

Besides that, we'll probably need to extend the profiling discussed above to cell towers. A wifi that is "too far" away from the current cell would not get penalized if the tower is mobile (though LAC penalty could still be applied).

Third, when determining a location from the APs seen, I group the APs together such that APs that are too far apart are in separate groups. Then the group with the largest count is used to determine location. I require the largest group to have at least 2 APs. So a single moved or moving AP will be ignored.

So far I have not noticed a false position due to a moved or moving AP with this logic. However it will fail in some cases like a group of APs being moved or moving. The original poster's moved office could still be problematical unless new GPS based sampling marks the APs as moved.

An idea that I've had is to do the following on the lookup side:

  • Examine each wifi seen against each of the others. If they are more than a certain distance apart, apply a penalty to both (the penalty could depend on distance).
  • Then use the inverse of the penalty as a weight when calculating the position estimate.

A single outlier would get heavily penalized (once for every other wifi around). The other wifis would still get a penalty, but a much lower one (just one for the outlier but not between each other).

The moved office example would still cause some distortion – the extent depends on the ratio between "good" and "bad" wifis in range. Of course, we could combine this approach with the above, further increasing the penalty for wifis which are very far away from the current cell.

As a further refinement we could consider the "position reliability" described further above. Assuming each transmitter has a reliability expressed as a percentage (100% is best, 0% is worst), then the penalty formula for each wifi could be something like:

penalty = excess_distance * other_reliability * (1 - own_reliability)

summed up over all other wifis in view.

That would penalize wifis:

  • the further they are away from other wifis
  • the more wifis they exceed a certain distance from
  • the more reliable the positions of these other wifis are considered
  • the less reliable their own position is considered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants