You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The accessibility tools field is increasingly adopting machine learning. It is unclear if the way test cases are currently written is going to be useful in determining if an ML based accessibility tool is a reasonably good implementation of the ACT rule. Some possible issue are:
A tool could be specifically trained on the test cases in the ACT rule. An implementation might be able to get those specific test cases right, but still have a very high rate of false positives. Should that be considered a valid implementation?
Because test cases are very small, they lack the context of real-world problems. For example, a tool that relies on machine vision to recognise buttons will not do well on unstyled test cases. Many test cases are unstyled, because the style isn't directly relevant to the rule, so there probably are issues there.
Because ML tools are inherently heuristic based, it is very possible to have a highly accurate implementation, that still gets some of the test cases wrong. Is this a problem? Is there some threshold that should be applied when deciding if an implementation is complete?
The text was updated successfully, but these errors were encountered:
The accessibility tools field is increasingly adopting machine learning. It is unclear if the way test cases are currently written is going to be useful in determining if an ML based accessibility tool is a reasonably good implementation of the ACT rule. Some possible issue are:
A tool could be specifically trained on the test cases in the ACT rule. An implementation might be able to get those specific test cases right, but still have a very high rate of false positives. Should that be considered a valid implementation?
Because test cases are very small, they lack the context of real-world problems. For example, a tool that relies on machine vision to recognise buttons will not do well on unstyled test cases. Many test cases are unstyled, because the style isn't directly relevant to the rule, so there probably are issues there.
Because ML tools are inherently heuristic based, it is very possible to have a highly accurate implementation, that still gets some of the test cases wrong. Is this a problem? Is there some threshold that should be applied when deciding if an implementation is complete?
The text was updated successfully, but these errors were encountered: