To reduce AI bias: IBM has created a more diverse set of millions of faces
Although the technology itself is neutral, in the development process of artificial intelligence (AI), it is inevitable that some human biases will be introduced. To reduce this bias, IBM Research has just created a more diverse millions of faces dataset. In recent years, with the popularity of smart phones, facial recognition has been widely used in many fields. However, in some tests, some seemingly excellent AIs have even failed.
Given that many conditions are related to certain skin tones or ages, IBM Research hopes to further eliminate this bias.
Obviously, this is a multi-layered problem, which is largely due to the lack of thought of developers and creators.
If there is no comprehensive face data set, AI will inevitably be biased in the training process.
With the new Million Diversity Facial Dataset, AI developers will be able to take full account of diverse facial features (DiF). The paper explained that:
In order for facial recognition to perform as required (fair and accurate), the training data must provide sufficient balance and coverage.
It should be large enough and diverse to understand more types of facial inherent differences. The images must reflect the diversity of facial features we see in the world.
It is reported that these faces come from a larger set of 100 million image datasets (Flickr Creative Commons).
By running another machine learning system and finding as many faces as possible. Isolate and crop them before you start real work.
Versatile and accurate marking.
These sets can be ingested by other machine learning algorithms and therefore require diverse and accurate labeling.
The DiF dataset contains a million faces, each with metadata to describe features such as eye distance and forehead.
Combining the above-mentioned multiple measures, the system can be used to match the image with an individual's face imprint, but it still needs to consider whether the algorithm is suitable for a certain ethnic group.
With that in mind, the IBM team put together a set of revisions that not only included simple content, but also described the associations between the measures-such as the ratio of the area above the eyes and below the nose, skin tone, contrast, and type of tinting.
In addition, the user's age can also be estimated automatically. People were asked to mark the faces of men or women and guess their age.
Of course, there will definitely be some deviations here, but all of them can be understood on a broader scale than any other publicly available facial recognition training data set.
John R. Smith, an IBM researcher who led the research, said in an email:
Culturally and biologically, the boundaries between races are not obvious. We choose to focus on coding schemes that can be reliably measured to provide a certain scale of support for diversity analysis.