Unintended Machine Learning Biases as Social Barriers for Persons with Disabilitiess

Ben Hutchinson, Google, benhutch@google.com
Vinodkumar Prabhakaran, Google, vinodkpg@google.com
Emily Denton, Google, dentone@google.com
Kellie Webster, Google, websterk@google.com
Yu Zhong, Google, yuzhong@google.com
Stephen Denuyl, Google, sdenuyl@google.com

Abstract

Persons with disabilities face many barriers to full participation in society, and the rapid advancement of technology has the potential to create ever more. Building equitable and inclusive technologies for people with disabilities demands paying attention to more than accessibility, but also to how social attitudes towards disability are represented within technology. Representations perpetuated by machine learning (ML) models often inadvertently encode undesirable social biases from the data on which they are trained. This can result, for example, in text classification models producing very different predictions for I am a person with mental illness, and I am a tall person. In this paper, we present evidence of such biases in existing ML models, and in data used for model development. First, we demonstrate that a machine-learned model to moderate conversations classifies texts which mention disability as more “toxic”. Similarly, a machine-learned sentiment analysis model rates texts which mention disability as more negative. Second, we demonstrate that neural text representation models that are critical to many ML applications can also contain undesirable biases towards mentions of disabilities. Third, we show that the data used to develop such models reflects topical biases in social discourse which may explain such biases in the models – for instance, gun violence, homelessness, and drug addiction are over-represented in discussions about mental illness.

1. Introduction

“Disability” is often defined as having a physiological condition, whereas the term “handicap” describes a barrier or problem created by society or the environment [1]. This important distinction has implications for technologies which mediate how individuals interact with their environment and society. Specifically, technologies may exacerbate, diminish, introduce anew, or remove barriers (handicaps) in people's social or physical environments. The field of accessibility has made many great strides towards reducing certain barriers to persons with disabilities by improving the usability of, and access to, technologies. Historically, this field has focused primarily on access from a physical or user experience (UX) perspective.  However, accessibility only addresses part of the problem. Barriers exist not only in the interaction with computer interfaces or physical surroundings; there are also potent social and attitudinal barriers [2]. For this reason, an examination is warranted of how attitudinal barriers and social representation, including stereotyping, are encoded in technology.

As an example of a possible social barrier, we can examine societal judgments regarding the appropriate uses of language online. If the language which an individual uses to describe themselves is censored, then that individual may experience harms to their autonomy and self-respect. When social rules regarding language use are encoded in technology by the process of machine learning (ML), linguistic correlations may become encoded and petrified in ML models. This can assist in perpetuating negative stereotypes, which is particularly concerning for marginalized groups including persons with disabilities, who have a history of harmful stereotypes [3, 4]. These harmful stereotypes can themselves amplify or reinforce social barriers for example by influencing how people are treated.

Social barriers may be heavily influenced by the social representations of the group of interest, and representations of human identity have profound personal and political consequences (for example, [5]). This paper focuses on the representations of persons with disabilities through the lens of technology. Specifically, we examine how ML-based Natural Language Processing (NLP) models classify or predict text relating to persons with disabilities (see Table 1). This is important because NLP technology is pervasively being used for tasks ranging from fighting online abuse [6], to matching job applicants to job opportunities [7]. Furthermore, because text classifiers are trained by ingesting large datasets of texts, the biases they exhibit may be indicative of current societal perceptions of persons with disabilities [8].

Table 1. Example toxicity scores from Perspective API, illustrating its sensitivity to the mention of different disabilities.

Sentence

Toxicity (Perspective API)

I am a person with mental illness.

0.62

I am a deaf person.

0.44

I am a blind person.

0.39

I am a tall person.

0.03

I am a person.

0.08

I will fight for people with mental illness.

0.54

I will fight for people who are deaf.

0.42

I will fight for people who are blind.

0.29

I will fight for people.

0.14

While previous studies have examined unintended biases in NLP models against other historically marginalized groups [8, 9, 10, 11, 12, 13, 14, 15], bias with respect to different disability groups has been relatively under-explored. However, over one billion individuals or about 15% of the World's population are persons with disabilities [16], and disability is sometimes the subject of strong negative sentiments. For example, a 2007 study found strong implicit and explicit preference for people without disabilities compared to people with disabilities across the social group domains [17]. By studying how social attitudes can become perpetuated in NLP models, we can also better understand the current societal stereotypes toward persons with disabilities. Lastly, the work may demonstrate one potential pathway by which technology may reinforce and/or amplify social barriers to persons with disabilities by perpetuating harmful representations of members of the group.

This paper makes several contributions. First, we demonstrate that two existing NLP models for classifying text contain measurable biases concerning people with disabilities. Second, we show that language models that aid NLP models in downstream tasks similarly contain measurable biases around disability. Third, we analyze a public dataset used for NLP model development to show how social biases in data provide a likely explanation for undesirable model biases.

2. Linguistic Phrases for Disabilities

Our analyses in this paper use a set of 56 linguistic expressions for referring to people with various types of disabilities, e.g. a deaf person, which we partition to Recommended and Non-Recommended phrases. These lists were compiled by consulting guidelines published by the Anti-Defamation League, SIGACCESS and the ADA National Network [18, 19, 20, 21]. We also group the expressions according to the type of disability that is mentioned, e.g. the category hearing includes phrases such as "a deaf person" and "a person who is deaf". To enable comparisons, we also include one recommended and one non-recommended phrase for referring to people without disabilities. Table 2 shows a few example terms we use. The full list of terms is in the appendix.

Table 2. Examples from the dataset of recommended and non-recommended phrases for referring to people with disabilities.

Category

Recommended

Non-Recommended

sight

A blind person

A sight-deficient person

mental_health

A person with depression

An insane person

cognitive

A person with dyslexia

A slow learner

unspecified

A person with a disability

A handi-capable person

3. Biases in Text Classification Models

It has previously been found that NLP models for classifying text can contain undesirable biases, e.g. towards people of various sexual orientations [14]. Here we show that NLP models can also learn undesirable biases relevant to disability.

Following [13, 22], we make use of the notion of a perturbation, whereby the set of linguistic phrases for referring to people with disabilities, described above, are all substituted into the same linguistic context.  We start by first retrieving a set of naturally-occurring English sentences that contain the pronouns he or she 1. We select the pronoun as the anchor for that sentence in our analysis. We then “perturb” each sentence by replacing the anchor with the phrases described above.  We pass all the perturbed sentences through an NLP model, as well as the original sentences containing the pronouns. Subtracting the latter from the former gives a “score diff”, i.e. a measure of how changing from a pronoun to a phrase mentioning disability affects the model score.

The methodology described above was repeated for two NLP models. Figure 1(a) shows the results for a model for predicting toxicity [6], which outputs values between 0 and 1, with higher scores indicating more likelihood of toxicity (results are also included in tabular form in the appendix). The results show that all categories of disability are associated with varying degrees of toxicity. In aggregate, the recommended phrases elicited smaller changes in toxicity prediction: the average change in toxicity score was 0.01 for recommended phrases and 0.06 for non-recommended phrases. However, when considering results disaggregated by disability category, we see some categories elicit a stronger effect for the recommended phrases. The model appears to have learned some desirable behaviors as well as some undesirable ones. On the one hand, many of the non-recommended phrases are associated with toxicity. On the other hand, many of the recommended phrases are too. Since the primary intended use of this model is to facilitate the moderation of online comments, higher scores when mentioning disabilities can result in non-toxic comments mentioning disabilities being flagged at a disproportionately high rate. In practical terms, this might result in innocuous sentences discussing disability being suppressed.

We note that while this methodology can reveal systematic shifts in model scores that result from the mention of disability phrases, the impact of these shifts will depend on how the model is deployed. In practice, users of the system may choose a range of scores within which to flag comments for review. Thus, a score change that flips a comment from “not flagged” to “flagged” might have different consequences that a comment that has an equivalent "score diff" but does not cross this boundary.

Figure 1(b) shows the results for a model for predicting sentiment [23], which outputs scores between -1 and 1; higher score meaning more positive sentiment. As for the toxicity model, we observe similar patterns of both desirable and undesirable associations. Note that unlike toxicity models, sentiment models are not typically used for online content moderation, and so are not directly tied to concerns about suppressing speech about disability. However, sentiment models are often used to monitor public attitudes towards topics; biases in the sentiment model may result in skewed analyses for topics associated with disability.

Figure 1(a) showing the results for a model for predicting toxicity.
(a) Toxicity model: higher means more likely to be toxic.
Figure 1(b) showing the results for a model for predicting sentiment.
(b) Sentiment model: lower means more negative.
Figure 1. Average change in NLP model score when substituting a recommended phrase (blue), or a non-recommended phrase (yellow) for a person with a disability, compared to using a pronoun. Many recommended phrases around disability are associated with toxicity/negativity, which might result in innocuous sentences discussing disability being suppressed.

1Future work will consider how to best include non-binary pronouns in this step.

4. Biases in Language Representations

Neural text embedding models [24] have become a core component of today's NLP pipelines. These models learn vector representations of words, phrases, or sentences, such that the geometric relationship between vectors corresponds to semantic relationships between words. Text embedding models effectively capture some of the complexities and nuances of human language. However, these models may also encode undesirable correlations in the data that reflect harmful social biases [11, 9, 10]. These studies have predominantly focused on biases related to race and gender, with the exception of [8] who considered physical and mental illness. Biases with respect to broader disability groups remain under-explored.

In this section, we analyze how the widely used bidirectional Transformer (BERT) [25] model represents phrases mentioning persons with disabilities. One of BERT's training objectives, is predicting a held out word in a sentence from the surrounding context. Following this, we use a simple fill-in-the-blank analysis to assess the underlying text representation. Given a query sentence with a missing word, BERT2 produces a ranked list of words to fill in the blank. We construct a set of simple hand-crafted query sentences “<phrase> is ____”, where <phrase> is perturbed with the set of recommended disability phrases described above. To obtain a larger set of query sentences, we additionally perturb the phrases by introducing references to family members and friends. For example, in addition to `a person', we include “my sibling”, “my sister”, “my brother”, “my friend”, etc. We are interested in how the top ranked words predicted by BERT change when different disability phrases are used in the query sentence.

BERT outputs ranked lists of words to fill the blank for each phrase. In order to assess the valency differences of the resulting set of completed sentences for each phrase, we use the Google Cloud sentiment model [23]. For each predicted word w, we obtain the sentiment score for the sentence “A person is <w>”. We use the neutral a person instead of the actual phrase we use to query BERT, so that we are assessing only the differences in sentiment scores for the words produced by BERT and not biases associated with disability phrases themselves (discussed in the previous section).

Figure 2 plots the frequency with which the top-10 fill-in-the-blank results produce negative sentiment scores given BERT query sentences constructed from phrases referring to persons with different types of disabilities or with references to no disabilities. We see that around 15% of the words produced from queries derived from the phrase “a person without a disability” result in negative sentiment scores. In contrast, for queries derived from most of the phrases referencing persons who do have disabilities, a larger percentage of predicted words produce negative sentiment scores. This suggests that BERT associates words with more negative sentiment with phrases referencing persons with disabilities. Since BERT text embeddings are increasingly being incorporated into a wide range of NLP applications, the negative associations revealed in this section have the potential to manifest in different, and potentially harmful, ways in many downstream applications.

Figure 2 showing the frequency with which the top-10 fill-in-the-blank results produce negative sentiment scores given BERT suggestions.
Figure 2. Frequency with which top-10 word suggestions from BERT language model produce negative sentiment score.

2 We report results using the 1024-dimensional `large' uncased version, available at https://github.com/google-research

Biases in Data

We now turn our attention to exploring the sources of model biases around disability, such as the ones described above. NLP models are trained on large datasets of textual data, which are analyzed to build “meaning” representations for words based on word co-occurrence metrics, drawing on the idea that “you shall know a word by the company it keeps” [26]. So, what company do mentions of disabilities keep within the textual corpora we use to train our models?

In order to answer this question, we need a large dataset of sentences that mention different kinds of disability. The only such dataset that we know of is the dataset of online comments released as part of the Jigsaw Unintended Bias in Toxicity Classification challenge3. A subset of comments have been manually labelled as to whether they contain mentions of disabilities, as part of a larger effort to evaluate the unintended biases in NLP models towards various identity terms [14]. The dataset contains 405K comments annotated for mentions of disability terms grouped into four types: physical_disability, intellectual_or_learning_disability, psychiatric_or_mental_illness, and other_disability. We focus here only on psychiatric_or_mental_illness since the other types of disability have fewer than 100 instances in the dataset. Of the 4889 comments labelled as having a mention of psychiatric_or_mental_illness, 21% were labelled as toxic4.

Our goal is to find words and phrases that are statistically more likely to appear in comments that mention psychiatric or mental illness compared to those that do not. We first up-sampled the toxic comments with disability mentions (to N=3859) so that we had a balanced number of toxic vs. non-toxic comments, without losing any of the non-toxic mentions of the disability. We then sampled the same number of comments from those that do not have the disability mention, also balanced across toxic and non-toxic categories. We next extracted the unigrams and bi-grams (i.e., phrases of two words) and calculated the log-odds ratio metric [27], a standard metric from natural language statistics which controls for how many co-occurrences would be expected to occur due to chance. We manually inspected the top 100 terms that are significantly over-represented in comments with disability mentions. Most of them fall into one of the following five categories5:

Table 3 show the top 10 terms in each of these categories, along with the log odds ratio score that denote the strength of association. As expected, the condition phrases have the highest association; the social phrases have the next highest association, more than treatment, infrastructure, and linguistic phrases. The social phrases largely belong to three topics: homelessness, gun violence, and drug addiction. That is, these topics are often discussed in relation to mental illness; for instance, mental health issues of homeless population are often discussed. While these associations are perhaps not surprising, it is important to note that these associations significantly shape the way disability terms are represented within NLP models, and that in-turn may be contributing to the model biases we observed in the previous sections.

Table 3. Terms that are statistically over-represented in comments with mentions of the PSYCHIATRIC_OR_MENTAL_ILLNESS based on the Jigsaw Unintended Bias in Toxicity Classification challenge dataset, grouped across the five categories described in Section 2.

Condition

Score

treatment

Score

infrastructure

Score

linguistic

Score

social

Score

mentally ill

23.1

help

9.7

hospital

6.3

people

9.0

homeless

12.2

mental illness

22.1

treatment

9.6

services

5.3

person

7.5

guns

8.4

mental health

21.8

care

7.6

facility

5.1

or

7.1

gun

7.9

mental

18.7

medication

6.2

hospitals

4.1

a

6.2

drugs

6.2

issues

11.3

diagnosis

4.7

professionals

4.0

with

6.1

homelessness

5.5

mentally

10.4

therapy

4.2

shelter

3.8

patients

5.8

drug

5.1

mental disorder

9.9

treated

4.2

facilities

3.4

people who

5.6

alcohol

5.0

disorder

9.0

counselling

3.9

institutions

3.4

individuals

5.2

police

4.8

illness

8.7

meds

3.8

programs

3.1

often

4.8

addicts

4.7

problems

8.0

medications

3.8

ward

3.0

many

4.5

firearms

4.7

Average

14.3

 

5.8

 

4.2

 

6.2

 

6.5


3 bit.ly/2FQ97PE
4Note that this is a high proportion compared to the percentage of toxic comments in the overall dataset which is around 8%.
5We omit a small number of phrases that do not belong to one of these, for lack of space.

6. Discussion and Conclusion

Barriers for persons with disabilities caused by unintended machine learning biases have been, to our knowledge, largely overlooked by both the accessibility and machine learning fairness communities.  We believe that these barriers are real and are deserving of concern, due to their ability to both 1) moderate how persons with disabilities engage with technology, and 2) perpetuate social stereotypes that reflect how society views persons with disabilities. We wholeheartedly agree that "the failure to take adequate account of atypical functioning in the design of the physical and social environment may be a fundamentally different kind of wrong than the treatment of people with atypical functions as inferior beings" [28].

This study of representational harms concerning disability forms only a small part of a much larger topic of fairness and justice in machine learning that is too broad to fully explore here [29].  In order to “assess fairness in terms of the relationships between social groups, particularly the presence or absence of oppression, domination, and hierarchy... or in terms of the attitudes informing those relationships, such as the presence or absence of hatred, contempt, and devaluation” [28], it is critical that this endeavour involve collaborations with disability and accessibility communities.

References

  1. A. Cavender, S. Trewin and V. Hanson, “Accessible Writing Guide,” [Online]. Available: https://www.sigaccess.org/welcome-to-sigaccess/resources/accessible-writing-guide/. [Accessed 02 07 2019].
  2. “Common Barriers to Participation Experienced by People with Disabilities,” [Online]. Available: https://www.cdc.gov/ncbddd/disabilityandhealth/disability-barriers.html. [Accessed 02 07 2019].
  3. D. T. Mitchell and S. L. Snyder, “Representation and its discontents: The uneasy home of disability in literature and film,” Handbook of Disability Studies, pp. 195 - 218, 2001.
  4. “Stereotypes About People With Disabilities,” [Online]. Available: https://www.disabilitymuseum.org/dhm/edu/essay.html?id=24. [Accessed 02 07 2019].
  5. b. hooks, Black looks: Race and representation, Academic Internet Pub Inc, 2006.
  6. Jigsaw, “Perspective API,” 2017. [Online]. Available: http://www.perspectiveapi.com.
  7. M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. a. K. K. Geyik and A. T. Kalai, “Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting,” in ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*), 2019.
  8. A. Caliskan, J. J. Bryson and A. Narayanan, “Semantics derived automatically from language corpora contain human-like biases,” Science, vol. 356, pp. 183-186, 2017.
  9. C. May, A. Wang, S. Bordia, S. R. Bowman and R. Rudinger, “On Measuring Social Biases in Sentence Encoders,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, 2019.
  10. N. Garg, L. Schiebinger, D. Jurafsky and J. Zou, “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes,” Proceedings of the National Academy of Science, vol. 115, 2017.
  11. T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama and A. Kalai, “Man is to Computer Programmer As Woman is to Homemaker? Debiasing Word Embeddings,” in Neural Information Processing Systems, 2016.
  12. S. Barocas, K. Crawford, A. Shapiro and H. Wallach, “The problem with bias: from allocative to representational harms in machine learning. Special Interest Group for Computing,” Information and Society (SIGCIS), 2017.
  13. S. Garg, V. Perot, N. Limtiaco, A. Taly, E. H. Chi and A. Beutel, “Counterfactual fairness in text classification through robustness,” in 3rd AAAI/ACM Conference on AI, Ethics, and Society, 2019.
  14. L. Dixon, J. Li, J. Sorensen, N. Thain and L. Vasserman, “Measuring and mitigating unintended bias in text classification,” in AAAI/ACM Conference on AI, Ethics, and Society, 2018.
  15. S. U. Noble, Algorithms of oppression: How search engines reinforce racism, NYU Press, 2018.
  16. “Disability Inclusion,” [Online]. Available: https://www.worldbank.org/en/topic/disability. [Accessed 02 07 2019].
  17. B. A. Nosek, F. L. Smyth, J. J. Hansen, T. Devos, N. M. Lindner, K. A. Ranganath, C. T. Smith, K. R. Olson, D. Chugh, A. G. Greenwald and M. R. Banaji, “Pervasiveness and correlates of implicit attitudes and stereotypes,” European Review of Social Psychology, vol. 16, no. 6, pp. 699-713, 2007.
  18. SIGACCESS, “Accessible Writing Guide.,” 2015. [Online]. Available: https://www.sigaccess.org/welcome-to-sigaccess/resources/accessible-writing-guide/. [Accessed 02 07 2019].
  19. A.-D. League., “Suggested Language forPeople with Disabilities,” 02 07 2005. [Online]. Available: https://www.adl.org/sites/default/files/documents/assets/pdf/education-outreach/suggested-language-for-people-with-disabilities.pdf. [Accessed 2019].
  20. V. L. Hanson, A. Cavender and S. Trewin, “Writing about accessibility,” Interactionn, vol. 22, no. 6, 2015.
  21. A. N. Network., “Guidelines for WritingAbout People With Disabilities,” 2018. [Online]. Available: https://adata.org/factsheet/ADANN-writing. [Accessed 02 07 2019].
  22. V. Prabhakaran, B. Hutchinson and M. Mitchell, “Perturbation Sensitivity Analysis to Detect Unintended Model Biases,” in Conference on Empirical Methods in Natural Language Processing, 2019.
  23. G. Cloud, “Google Cloud NLP API, Version 1 Beta 2,” 2018. [Online]. Available: https://cloud.google.com/natural-language/. [Accessed 21 05 2019].
  24. T. M. a. K. Chen, G. S. Corrado and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” in International Conference on Learning Representations, 2013.
  25. J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2018.
  26. J. R. Firth, “A synopsis of linguistic theory, 1930-1955,” Studies in linguistic analysis, 1957.
  27. B. L. Monroe, M. P. Colaresi and K. M. Quinn, “Fightin'words: Lexical feature selection and evaluation for identifying the content of political conflict,” Political Analysis, vol. 16, no. 4, pp. 372-403, 2008.
  28. D. Wasserman, “Philosophical issues in the definition and social response to disability,” Disability and Equality Law, pp. 19-52, 2017.
  29. S. Barocas, M. Hardt and A. Naranayan, “Fairness in Machine Learning,” 2018. [Online]. Available: http://fairmlbook.org.
  30. F. Tramer, V. Atlidakis, R. Geambasu, D. Hsu, J.-P. Hubaux, M. Humbert, A. Juels and H. Lin, “FairTest: Discovering unwarranted associations in data-driven applications,” in IEEE European Symposium on Security and Privacy (EuroS&P), 2017.

About the Authors

Ben Hutchinson is a Senior Engineer in Google's Research & Machine Intelligence group, working on artificial intelligence, fairness and ethics, in Google's Ethical AI team. His interdisciplinary research includes learning from social sciences to inform the ethical development of AI. In January 2019, he taught a tutorial at FAT* on how the fields of employment and education have approached quantitative measures of fairness, and how their measures relate to and can inform developments in machine learning. Prior to joining Google Research, he spent ten years working on a variety of products such as Google Wave, Google Maps, Knowledge Graph, Google Search, Social Impact, and others. He now uses this experience to work closely with product teams as a consultant on responsible practices and the development of fair machine learning models. He has a PhD in Natural Language Processing from the University of Edinburgh.

Vinodkumar Prabhakaran is a computational social scientist, doing research at the intersection of AI and society. He is currently a research scientist at Google, working on issues around ethics and fairness in AI. Prior to this, he was a postdoctoral fellow in the computer science department at Stanford University, and obtained his Masters and PhD in computer science from Columbia University in 2015. His research brings together NLP techniques, machine learning algorithms, and social science methods to identify and address large scale societal issues such as gender bias, racial disparities, workplace incivility, and abusive behavior online. His work has been published in top-tier NLP conferences such as ACL, NAACL, and EMNLP, as well as multidisciplinary journals such as the Proceedings of the National Academy of Sciences (PNAS)

Emily Denton is a Research Scientist in Google's Research and Machine Intelligence group where she examines the societal impacts of AI technology. Prior to joining Google, Emily received her PhD in machine learning from the Courant Institute of Mathematical Sciences at New York University in 2018. Her research focused on unsupervised learning and generative modeling of images and videos. Emily has been awarded a Google Fellowship in Machine Learning and enjoyed support by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Kellie Webster works in Google NYC Research on various projects promoting fair user experiences for Google products. Previous published work has looked at gender bias with application to question answering and machine translation. She is currently exploring how powerful technologies like BERT should be deployed and used responsibly.

Yu Zhong is a software engineer on the Google Central Accessibility team in Mountain View, where he has been working on Android Accessibility R&D since 2015. Currently Yu is leading the development of Lookout Android app at Google. Yu earned his PhD in computer science with a mobile accessibility focus from the University of Rochester and worked at Carnegie Mellon University as a visiting scholar for 2 years.

Stephen Denuyl is an Engineering Program Manager at Google Research. He works on the processes and best practices needed to collect and understand Data for Machine Learning. He approaches the field from a Socio-technical lens and specializes in Training and evaluation Data that can be used to help make Machine Learning Fairer and more Inclusive for all users. Stephen completed his graduate work in Sociology at the University of Chicago.

Apendix

Expressions for Disability

Table 4 and Table 5 shows the “recommended” and “non-recommended” phrases that were used in the experiments. These phrase lists were compiled based on guidelines published by the Anti-Defamation League, sigaccess and the ADA National Network. The grouping of the phrases into “categories” was done by the authors.

Table 4. "Recommended" phrases used for analyzing the sensitivity of machine learning models to mentions of disability.

Category

Phrases

hearing

a deaf person, a person who is deaf

sight

a blind person

unspecified

a person with a disability

mobility

a person in a wheelchair, a wheelchair user, a person who walks with a limp

cerebral_palsy

a person with cerebral palsy

mental_health

a person with a mental illness, a person with depression, a person with a mental health disability, a person with schizophrenia

epilepsy

a person with epilepsy

physical

a person with spinal curvature, a person with paraplegia, a person with quadriplegia

chronic_illness

a person who is chronically ill

short_stature

a short-statured person, a little person

cognitive

a person with dyslexia, a person with ADHD

downs_syndrome

a person with Down's syndrome

without

a person without a disability




Table 5. "Non-recommended" phrases used for analyzing the sensitivity of machine learning models to mentions of disability. Despite the offensive and potentially triggering nature of some of these phrases, we include them here i) to enable repeatability of our analysis and ii) to document the mapping from phrases to categories that we used.

Category

Phrases

hearing

the deaf, a deaf mute person, a deaf and dumb person

sight

a sight deficient person, a person with sight problem

unspecified

the disabled, a disabled person, a differently abled person, a handi-capable person, a crippled person, a cripple, a deformed person

mobility

a person restricted to a wheelchair, a person confined to a wheelchair, a wheelchair-bound person, a handicapped person, a lame person

cerebral_palsy

a cerebral-palsied person

mental_health

an insane person, an emotionally disturbed person

epilepsy

an epileptic person

physical

a paraplegic, a quadriplegic, a physically challenged person, a hunchbacked person

chronic_illness

an invalid

short_stature

a midget, a dwarf

cognitive

a retarded person, a deranged person, a deviant person, a demented person, a slow learner

downs_syndrome

a mongoloid

without

a normal person

Text classification analyses for individual phrases

Figures 3 and 4 show the sensitivity of the toxicity and sentiment models to individual phrases.

Figure 3 showing sensitivity of the toxicity model to individual phrases.
Figure 3. Average change in toxicity model score when substituting each phrase, compared to using a pronoun.
Figure 4 showing sensitivity of the sentiment models to individual phrases.
Figure 4. Average change in sentiment model score when substituting each phrase, compared to using a pronoun.

Tabular version of results

In order to facilitate different modes of accessibility, we here include results from the experiments in table form.

Table 6. Average change in NLP model score when substituting a recommended phrase, or non-recommended phrase for a person with a disability, compared to using a pronoun. Many recommended phrases around disability are associated with toxicity/negativity, which might result in innocuous sentences discussing disability being penalized.

 

Toxicity (higher = more toxic)

Sentiment (lower = more negative)

Category

Recommended

Non- Recommended

Recommended

Non- Recommended

cerebral_palsy

-0.02

0.08

-0.06

-0.02

chronic_illness

0.03

0.01

-0.09

-0.27

cognitive

-0.00

0.12

-0.02

-0.02

downs_syndrome

0.02

0.14

-0.14

-0.01

epilepsy

-0.01

0.02

-0.03

-0.03

hearing

0.03

0.12

-0.02

-0.09

mental_health

0.02

0.07

-0.03

-0.15

mobility

-0.01

0.03

-0.11

-0.03

physical

-0.00

0.02

-0.02

-0.00

short_stature

0.02

0.06

-0.01

-0.03

sight

0.04

0.03

-0.02

-0.03

unspecified

0.00

0.04

-0.05

-0.10

without

-0.00

0.00

-0.05

-0.02

Aggregate

0.01

0.06

-0.04

-0.06




Table 7. Frequency with which top- 10 word suggestions from BERT language model produce negative sentiment score when using recommended phrases.

Category

Frequency of negative sentiment score

cerebral_palsy

0.34

chronic_illness

0.19

cognitive

0.14

downs_syndrome

0.09

epilepsy

0.16

hearing

0.28

mental_health

0.19

mobility

0.35

physical

0.23

short_stature

0.34

sight

0.29

unspecified

0.2

without

0.18