Advertisement

We need your help now

Support from readers like you keeps The Journal open.

You are visiting us because we have something you value. Independent, unbiased news that tells the truth. Advertising revenue goes some way to support our mission, but this year it has not been enough.

If you've seen value in our reporting, please contribute what you can, so we can continue to produce accurate and meaningful journalism. For everyone who needs it.

Ai-Da Robot poses for pictures with a self portrait in the Houses of Parliament in London before making history as the first robot to speak at the House of Lords. PA
VOICES

International Women's Day Does AI have a woman problem?

Fair representation in datasets a little-known battle front in the fight for equality, writes Sarah Jane Delany.

ON INTERNATIONAL WOMEN’S Day, we must ask – does AI have a woman problem? 

A global tech company receives thousands of job applications every year. Imagine an AI system that processes these applications to screen out unsuitable candidates. Suppose that system has been trained by a dataset that includes all successful and unsuccessful CVs for the proceeding 10 years. So far, so efficient.

Now imagine your name is Michael. No one by the name of Michael has been hired by this tech company in the last 10 years. The AI screening system ‘learns’ that all candidates with the name Michael have been unsuccessful in their applications. The system concludes that people called Michael must be unsuitable for recruitment.

This is an absurd example and AI systems of this type, already in widespread use, are anonymised and do not screen for people’s names. However, there are many other details buried in CVs that might read as ‘signals’ to an AI system; flagging as ‘unsuitable’ categories of people who have been unsuccessful in the past for reasons other than job suitability.

We already know that women are underrepresented in many employment areas, including tech. What details peculiar to rejected female candidates might ‘teach’ an AI system to reject them again?

We have evidence of this learned prejudice already. In 2015, tech giant Amazon’s machine-learning specialists discovered that their new AI-powered recruitment system had a women problem.

The company’s hiring tool used AI to rate candidates utilising data from 10 years of job applications. Most CVs in the dataset were from male applicants – a common phenomenon in the tech sector.

Not surprisingly, the system ‘preferred’ male candidates. It rejected CVs that included the word ‘women’ (i.e. women’s basketball team or Women in Stem Programme).

Graduates of all-women’s colleges in the US were automatically demoted.

Amazon engineers reprogrammed the system to remove this particular example of bias. This episode reveals, however, a fundamental weakness of AI systems that ‘learn’ from data sets.

What data sets are we using and what are they teaching our AI?

Where else might biased or incomplete datasets create problems in our day-to-day lives, in ways we are not even aware of?

My work at Technological University Dublin is concerned with this very question. AI cannot be representative if the data it learns from is not representative. 

The Equal Status Acts 2000-2018 cover nine grounds of discrimination; gender, marital status, family status, age, disability, sexual orientation, race, religion, and membership of the Traveller community.

As we move to automate many public-facing systems – social welfare, government, law, healthcare screening, recruitment, parole, education, banking – we must ensure that we are not building decision-making machines that have learned from datasets that leave out or mitigate against members of the any of these nine groups.

Image recognition software is a case in point.

As a judge in the BT Young Scientists Competition, I recently reviewed a project by Solomon Doyle of Dundalk Grammar School. Doyle created a mobile app to diagnose malignant skin lesions by analysing a photo and searching for similar characteristics in a database of images of diagnosed malignancies.

Doyle optimised his system to improve accuracy for people of colour – he discovered biases within existing software that had been trained primarily on images of white skin. Doyle went on to win the Analog Technology Award.

How do we correct for these biases?

Firstly, we need diversity in tech. Humans build AI, and where we have diverse groups developing software, we are more likely to identify and eradicate biases as they emerge.

Secondly, we must put processes in place to evaluate datasets before we use them to build decision-making systems.

Natural language models, of the sort that train chatbots, have been shown to reflect gender bias existing in training data.

This bias can impact on the downstream task that machine learning models, built on this training data, are to accomplish. 

Several techniques have been proposed to mitigate gender bias in training data.

In one study at TU Dublin, we compare different gender bias mitigation approaches on a classification task, to see which approaches are the most effective.

In a second study, we compare and evaluate different systems for labelling datasets to isolate for gender. Building knowledge in this field is essential to support AI researchers in handing the datasets they use to build the systems we increasingly rely upon to make decisions on our behalf.

Thirdly, we must evaluate the decision-making systems themselves for fairness and inclusion.

In the case of gender bias, does the system behave differently for groups of females than groups of males? This can be considered and extended for any of the sub groups from the nine grounds of discrimination. 

The work of creating a fairer society for all consists of examining reproduction of prejudice wherever we find it.

Artificial Intelligence is a brave new world, but without vigilance it will carry forward the worst aspects of a blinkered old one.

Sarah Jane Delany is a Professor of Inclusive Computer Science at TU Dublin and a research collaborator with the Insight SFI Research Centre for Data Analytics

PastedImage-48264

Author
Sarah Jane Delany
Your Voice
Readers Comments
5
This is YOUR comments community. Stay civil, stay constructive, stay on topic. Please familiarise yourself with our comments policy here before taking part.
Leave a Comment
    Submit a report
    Please help us understand how this comment violates our community guidelines.
    Thank you for the feedback
    Your feedback has been sent to our team for review.

    Leave a commentcancel