My team has a robust digital accessibility program and processes for WCAG conformance in our apps. Because of this, there are definitely accessibility defects that get caught and addressed in order of impact and business priority like any other bug. Obviously we want to aim for 100% accessibility for our users, but it's a continual work in progress as new enhancements or changes are released.
I'm stuck on the appropriate measurement to indicate support. If we have 50 common tasks and the most central 10 tasks are solid but some supporting (but also common) tasks have a contrast fail or accessibleLabel missing, does that make the whole app not supporting the feature? If "completing the task" is the rubric there are a whole range of interpretations for that.
In a complex app, I anticipate that a group like ours will have strong support for many of the Accessibility Nutrition Labels accessibility features across tasks and devices, but realistically never be 100% free of defects for a given Apple Accessibility feature, even among core tasks.
As I consider the next steps for Nutrition Labels, I do not see anything in the documentation that gives a sort of baseline or measurement for inclusion. We plan to test all steps to complete a task, and log defects accordingly with an assigned timeline for fixing them (as would be true for functional defects).
Quoting from the Additional Guidance section:
If bugs are significant enough to change your answer according to the Accessibility Nutrition Labels evaluation criteria, then don’t indicate support for the accessibility feature. To avoid misleading users, you shouldn’t enter answers that aren’t aligned with the evaluation criteria.
Ultimately each app owner is responsible for determining what their core, common tasks are, and whether their app is in line with the criteria for each label.
Whether a particular bug prevents the user from completing a common task is subjective depending on the skill of the user, but some common sense applications apply. For example, if a missing label on a Delete button causes a permanent, destructive action, it's obviously more serious than a missing label on a non-desctructive menu. Likewise, a single contrast failure at a 4.49:1 luminosity ratio (just below the minimum recommended threshold) will not be as impactful to the user as a more extreme, or more pervasive, problem in the app.
Also from the end of the Tips section:
You may consider exceptions to the recommendations here if users with disabilities would find them reasonable. For example, you might hide a button from VoiceOver in one view if it duplicates functionality that’s already available and discoverable elsewhere.
If you aren't sure about whether the example missing label or contrast issue prevents a user from successfully completing a task, consider hiring users with various disabilities either as full-time employees, or as contracted part-time participants in your regular TestFlight evaluation.
The closing guidance on each criteria page may be helpful with your evaluation, too:
Even after you’re able to indicate support for [accessibility feature] in the common tasks of your app, there are likely further improvements you’ll be able to make to the accessibility of your app. Re-evaluate your app's support for [accessibility feature] every time you update your app. Set a goal to make your app more accessible to more people in every release.
I'm hopeful it's clear that acknowledges 100% perfection is not achievable, and therefore not expected. However, "if bugs are significant enough to change your answer according to the Accessibility Nutrition Labels evaluation criteria, then don’t indicate support for the accessibility feature."
Good luck!