The Performance Audit Report

24 | Cathy O'Neil, Algorithm Auditor, Author of Weapons of Math Destruction

March 1, 2023
Episode Summary
Transcript

Our guest is Cathy O'Neil, author of Weapons of Math Destruction: how big data increases inequality and threatens democracy.

Cathy joined us to discuss critical aspects for auditors to consider, including:

  • The human element: why understanding 'for whom an algorithm might fail' is more important than the detailed technical design of an algorithm.
  • Algorithm testing: the importance of post-deployment testing to ensure there are no unintended adverse outcomes.
  • Setting standards: the need for industry-wide standards so that regulators can ensure there is no bias or discrimination in how algorithms operate.
  • Laggers and leaders: the four big industries/domains using algorithms that need to step up audit in their space and how law-makers and regulators are catching up.
  • Emerging AI, e.g. language models/chatbots - why trust by users is key, and the pros and cons of using these models.

Links

Episode Transcript

Conor: 

Hello and welcome to this special episode that will feature in both The Assurance Show and The Performance Audit Report. My name's Conor McGarrity.

Yusuf: 

And I'm Yusuf Moolla, and we are really excited today to be talking to Cathy O'Neil, author of Weapons of Math Destruction, a New York Times bestseller that has been praised in several well-known publications, has won the Euler book prize and was long listed for the National Book Award. Cathy is a mathematician turned quant turned algorithm auditor. She received her PhD in mathematics from Harvard and has worked in finance, tech, and academia, including launching a data journalism program at Columbia University. Cathy recently founded ORCAA, an Algorithm Auditing company. Cathy, thanks for joining us.

Cathy: 

Thanks for having me.

Yusuf: 

What inspired you to write the book, Weapons of Math Destruction.

Cathy: 

I entered finance. I was a quant in finance at a hedge fund called DE Shaw, just as the credit crisis started actually. I like entered in June or July of 2007. And then within two months, September 2007, the first ripples in the credit markets were happening. It wasn't really obvious to the outside world for another year, but it was obvious within finance. immediately. I just sort of never had a moment there where I was like, oh, this is good, you know, it was more like, what the hell is going on from the get-go? And the explanations were varied, but most of them centered around the housing market and a bunch of very risky derivatives and securities that nobody really understood, were opaque, but were trusted because of things called, credit ratings like AAA credit ratings from Moody's, S&P, and Fitch. I sort of had an early introduction into this essential trust that people had in mathematical risk scores. And just mathematical modeling in general. Larry Summers was working at DE Shaw. I was, supposed to work on his project, so I was having meetings with him and he was one of those guys who had said, oh, we can trust derivatives because they distribute risk in a good way, in a trustworthy way, but he also didn't understand anything at all, about the way the credit markets were going, were trembling. Fast forward a few years, I had left finance, I joined Occupy. I was disgusted by the part I had played in that system, and I started working as a data scientist thinking, oh, this will be better. It wasn't better. it was. the same actually. We had a lot of trust. People trusted us. We weren't predicting markets, we were predicting people. But that's kind of worse because it meant that if we had a flawed algorithm, it wasn't just going to, be obviously problematic at a public markets level, but rather it was gonna be problematic for individuals who were applying for jobs and getting rejected, unfairly, or people who were given higher APRs on their credit cards than they should be. By the way, I should also mention that I, working as a data scientist, I realized pretty quickly that the most obvious statistical signals, I was picking up with my, predictive models were like, basically money, money, first, gender, race. And I was like, oh, okay. So essentially I am, making lucky people, luckier and unlucky people unluckier here if, if you will., and I'm doing it secretly and people are trusting me. And so it was like that combination of things where I was like, oh, this is, this is bad I mean, I'm in another system that's crappy and that's making things worse. So that's really why I wrote the book because I was like, okay, enough is enough. Like is there a way to do better than this? I wanted to alarm people. Obviously, I tried within my field, but guess what? Nobody wanted to alarm Nobody wanted to lose that opportunity to make good money and to be trusted and to be hailed as some kind of genius and hero. So, the last point was that I, I wanted to tell the public in general,

Yusuf: 

A lot of your book focuses on auditing algorithms. What exactly is algorithm auditing?

Cathy: 

Well, there's essentially three different types of algorithmic auditing from my perspective. But they all center on this question of like, does this algorithm work? As expected. So algorithms are generally seen as opaque and, complicated, sophisticated objects. But if you just think of them as sort of things that take in input and put out outputs, usually risk scores, almost entirely risk scores in my experience. then. you can sort, ask questions in plain English and see what the answers are. that's the kind of work I do is like translating plain English questions into, sort of statistical tests. You could say, for example, for a student loan, credit card algorithm that's trying to determine whether somebody deserves a student loan or not. You could say, does this treat, white people fairly versus black people? Or does it treat black people fairly versus white people? and you have to define what exactly that means. You could say like, what is the definition of fair? in this case, maybe we, care about false positives or false negatives. you have to make that very precise. You have to say something like, it is the false positive rate higher for black people than white people. And then, you have to define, well, what does it mean for, two people to be similarly qualified, but one of them is denied when the other one isn't. the work there is defining qualification then you do a test and it's not that different from the tests that we have heard historically that sociologists do, you know? So the sociologists would send equally qualified applications for law students that didn't really exist, but theoretical law students, to summer law internship programs. And they would change small amounts of information like, the names or, some other kind of thing that shouldn't matter theoretically. And then they would see, if people with white sounding or male sounding names would get, interviews more often than people with, black sounding or female sounding names. And, basically the answer always is yes. Yes. in the sense that there always is that sexism or racism or ageism or whatever it is that they're looking for. And the question is, how much is it unacceptable? what is the threshold of acceptability? So those are, these are all the questions that you might ask in that situation. I just do it statistically with a computer. And it's actually much faster and easier, because you don't actually have to draw up a bunch of applications. You just literally send a bunch of queries to, the same predictive algorithm. And the predictive algorithm just spews out the risk scores of everybody involved and that defines the, thresholds. So there's lots of questions there, obviously, and they're human questions. They're human value questions like number one. of course, what is race? I mean, race is of social construction. but of course racism is very real, so we have to grapple with that and we have to grapple with gender. Gender is also, in a lot of ways of construction, But sexism is also real. And then what does it mean to be qualified and why is it false positives or false negatives that you're worried about rather than something else? So there's lots of choices in that. And so really my job, if you think about it, like what I just described, the easy part is the test. So my job is mostly a translator of values into choices, at a statistical level, and then making sure that everyone in the room understands those choices. And by the way, the final thing I'd say is like, I never do only one test. because there's ambiguity in those choices. I would do a battery of tests with all such choices.

Yusuf: 

So, while you were talking there, and also quite heavily in your book, you talk about a correlation between, for example, credit ratings and wealth and how that then correlates with race. Those are obviously potential proxy indicators. Can you tell us a bit more about this? Because this is possibly the least understood thing. It's quite easy to look at data and know, by looking at individual fields, what they mean. But those correlations that may not be apparent, may not be easy to recognize.

Cathy: 

Yes, certainly. Credit scores, they're big deals and, they matter a lot and they're used for things that are completely inappropriate by their definition. Things like whether you deserve a job or housing. But they, were invented to be sort of legal alternatives, for bank, loan officers to use, to decide whether someone deserved a loan. or how risky they were for a loan. And this was in reaction, in fact, to like all of the, demonstrated racism and sexism that bank officers often use. The history of it is fascinating, but they were sort of created to be legal, to be non-racist, non-sexist. And indeed they aren't very sexist, but they are racist. When I say that, what I mean by like, they're correlated with race. So they're definitely proxies for race and so to use them in something. a housing algorithm, like, do you deserve this housing is problematic. It basically excludes people of color. But why? Well, On its face, scores like this for the most part are measuring your ability to pay your bills and your, actual practice of paying your bills. And so for proxies for like are you gonna pay back this loan? If you're thinking of giving somebody a credit card, it is quite a strong proxy. I mean, it's almost incredibly direct proxy. Are you paying your electricity bill? Are you paying other credit card bills? is your bank account, Does it have money in it? Are you getting regularly paid? Do you have, other loans that you haven't paid back out there outstanding. And I'm not, an expert on exactly what they collect, but it's along those lines. Of course in the states we don't have universal health care. So there's a huge amount of very large, medical bills that just nobody can afford if they don't have insurance, and historically uninsured people is a large percentage of people. And those folks tend to be poorer. and so the people that get stuck in their finances because of a large medical bill that they didn't see coming, they tend to be poorer. And of course we also have a correlation between race and poverty. And so, just for medical bills alone, unforeseen medical problems and bills that result. Of course, that's not the only example, but the, larger point is that there's a kind tax on poor people, that makes their finances more chaotic, leading to lower FICO scores. And then I would say the larger correlation is that people of color tend to be poorer. So we have that correlation there. By the way, I will just say what I really just argued for is that there's a correlation between poverty and bad FICO scores. And then there's a course of correlation between poverty and people of color. So that's how we're getting that, correlation. There might be more to it. There might be a, and I would love to do this analysis. Is there, is there like a correlation of FICO scores to race above and beyond the correlation to poverty? I don't know.

Yusuf: 

So it could be just because of the way in which the scores are calculated and maybe the fact that they all are very, very quantitative and based on historical observation of a particular cohort that would be getting finance as opposed to a bit more qualitative and looking at other factors.

Cathy: 

Yeah, I mean, speaking loosely, I would say that the FICO score is measuring the chaos in your life. To have a perfect FICO score, you have to have no chaos. You have to have everything perfectly under control with, all the insurance that you could possibly imagine so that nothing happens unexpectedly that you can't afford. Your finances have to be absolutely perfect and it's rare that you can accomplish that. so it's not just about good planning, it's also about an overall sense of security and that means being embedded in a system where if something really bad happens, you can depend on, wealthy, relatives, you know, that's another thing that we know happens less for people of color, that they have less inherited wealth and intergenerational wealth, so all I'm saying is that it's part of a larger system, not just an individual's ability to anticipate rent.

Conor: 

I just wanted to, touch on that notion of bias as a potential perverse outcome from algorithms and how that plays out. And so, I think it was 2016, the book was published. Presumably there's been a huge growth in the use of algorithms. What are some of the examples that you've seen in your work since the book where there has been potential bias in public services that you've encountered?

Cathy: 

When I was writing the book, I was amazed by how widespread really crappy algorithms were being used. But of course, by the time it was published and since then, it's become, almost comical if it weren't as, tragic as it is to see terrible algorithms deployed for really, really important things. I think you name a country, there's been an embarrassing story. I know the Netherlands had a welfare fraud scandal people were denied their living expenses, payments. I know that happened in Australia for, people who were accused of fraud for some kind of welfare system. in the UK we had, kids in Covid, high school kids who couldn't take their A level tests so that an algorithm distributed scores to them based basically how kids in that school did in the past. It was just outrageous. And a lot of kids who were very good students, theoretically, weren't able to go to the college that they'd already been accepted to because of this assigned A level test. It was outrageous. it's both, unsurprising and yet a very sad story about, the continued trust that people seem to, imbue in automated systems as if they're more trustworthy than God. You know, why do we think that you know it? And I still don't know the answer to that Conor. I think there's something about human nature where as soon as it becomes technical, we close our eyes and just fall asleep. As soon as we think something's beyond us in some kind of metaphysical way, we stop asking questions and I don't understand it. But it is a real problem for us because, as we're using these algorithms, they destroy lives. They often destroy lives. I will just add, by the way, they don't destroy all of the lives. Right? one of the other aspects of this weird trust that we have in these terrible systems is that those systems are actually optimized to the by the deployer, in other words, they aren't deployed with no testing whatsoever. They're deployed with a very narrow notion of success. So in the case of the fraud detection and welfare type algorithms, You would think they're so bad. You'd think they weren't tested at all. And it is shameful how little testing they underwent, but they were tested for like, would this save money for us? You know, like, there's one way you could be sure that these algorithms were tested, which is like, does it make my job easier if I use this? Whoever is deploying it, has already thought through what works for them. And for that reason, Conor, The way my company ORCAA does algorithmic audits is directly related to that point I just made, which is to say, yes, it works for you. That's why you deployed it. For whom does this fail? And honestly, like that's the only thing we do. It's not rocket science. We literally just ask that question, for whom does this fail? a million times. Like we ask it in a bunch of different ways. We say, who are the stakeholders? What would it mean to them if it failed? Let's talk to them. Let's have an actual conversation with, with the people who are impacted by this, and ask them what they're worried about for this system and let's investigate whether their concerns are grounded. So it's basically just sort of expanding the notion of whether an algorithm works. It doesn't work if it works for the people who deploy it, but for nobody else. It works if it works for everybody. And of course, sad fact is that no algorithm really works for everybody, but at the very least, you have to sort of be explicit about balancing the stakeholders needs and the stakeholders harms against each other.

Conor: 

Just wanna tease out one particular thing there. So as auditors, we are gatekeepers, so to speak, or, we have a view in the public interest as you've just described. So where do we start as auditors? What's our role in looking at algorithms?

Cathy: 

Most of the work I do is human, is ferreting out what are the things that could go wrong and how bad would they be? and who would they impact and what would they look like? I want to emphasize that that's not technical. Like this is not a technical discussion. It only gets technical at the very last step when you're just like, okay. This group is worried about, you know, particular type of harm, and we've translated it into precisely what they mean by that, or a couple versions of precisely what they could mean by that. And now we have to go and actually see if this algorithm is doing that to them. so it becomes technical because, for example, if I'm trying to audit, an insurance algorithm, they don't collect. gender or race information about their customers. So I have to infer that I have an inference methodologies, and that's technical. but it's not that technical and everyone understands what I'm trying to do. And I can even explain the kinds of failures that these inference methodologies have and, you know, it can, maybe you can account for those kinds of failures. So that's a step in the process. But, you actually have to talk to people about what could go wrong. You know what I mean? It's not rocket science. There are a few steps where I'm just like, oh, now we did this statistical test, and how do we decide whether it's statistically significant or whatever. But that's, the smallest part of it, to be honest.

Conor: 

Do you see a role, for regulators and lawmakers and trying to bring them together to establish some sort of standards for algorithmic auditing or, or what is their role in this space?

Cathy: 

Yeah, that's actually what I do. that's my, my favorite part of my job. I al also have invitational audits. Like we, you know, we have companies that come to us and say, you know, we have this algorithmic system, we wanna make sure it's not, you know, harming people. And that's great. We love doing those kinds of audits. But my favorite kind of audit is when I'm working with a regulator to decide what does it mean for, a company that I regulate or an industry that I regulate, really to be compliant with this particular law in the age of algorithms, right? So all of these companies that I regulate use algorithms to decide, what's the premium for this insurance product, and how do I decide whether it's racist, you know, because there's an anti-discrimination law that I have to enforce. And I'm like, yeah, that's what we do. That's exactly what we do. And it's better to do it for an industry than to do it for a particular company because one of the things about. If you're doing it for a particular company, you could be like, well, this, this isn't perfect, nothing's perfect. but we don't know whether it's like bad or pretty good, you know, relative to your peers. And we don't know what the standards are, so, you know. all I can say is try to get better. You know what I mean? Like if you're just talking to an individual company, whereas if you're talking to a regulator, the regulator can actually ask the same question of all companies in the industry and then set standards and then say like, oh, these companies are doing pretty well. These company is straggler. That company's doing even better. And by the way, the company that's doing even better, we know. that we can all get there so we can set standards and push them towards better. So that's the, this exciting thing about it. and I'll just go up, a level, when the big data started, the most recent incarnation of this sort of data revolution started, the whole point was, computers can be fairer than humans. And that was a bullshit suggestion given that we weren't making them better, you know, we were just following the data and, repeating historical practices of, bias. So it was empty as a promise. But if we do the work of saying, actually, this isn't good enough, it's biased. Let's make it better. Let's set standards and let's make those standards tighter over time, and you're gonna have to work at it to make this work. then we could actually fulfill that promise, that computers are better than humans. When I say better, I don't mean unbiased because there's no such thing as unbiased, but, bias in a way that we can live with and that sort of squares with our values that we can do over time, but it will take this kind of work.

Conor: 

Is there any particular industry that you think is more progressive or forward leaning in trying to establish those standards or try and set some, idea of what good or fair looks like

Cathy: 

No, not that I know of. I think of there being four big industries that have to adjust to this new world pretty soon because of long established regulatory anti-discrimination laws. Number one is credit, but number two is insurance. Number three is housing, and number four is hiring. And all four of those big industries have a long way to go but at least there is activity in most of them. I With the exception of housing, which I think is kind of a black hole. The housing algorithms are problematic and nobody's addressing that. But the other ones, I believe there's, efforts. but I will just say that all of those industries rely heavily on the internet ad ecosystem. And having worked in the internet ad ecosystem as a data scientist, let me just tell you that it is perpendicular to the notion of fairness. As I said to you guys earlier, like as a data scientist, I was measuring people primarily by wealth, then by gender then by race. and making lucky people luckier. That meant like I was giving better opportunities to people who are wealthier, whiter and more male. that's how that system works. Like the entire internet runs on the engine of ads. and those have really nothing to do with fairness. It has to do with this making lucky people luckier and making money off of that system. and to the extent that all those four industries, attract customers through the ad ecosystem, which is how it happens, that is a problem. that's a real problem. Let me just give an example. A hiring algorithm. If you're looking to hire. You have to go through the internet, that's how people find jobs. And if those matchmaking systems for job seekers to jobs are problematic, which they are, but if we're starting at that moment when people have applied for this job, making sure that, those who have applied aren't being filtered unfairly, but we don't actually measure the problem of who gets to apply in the first place, who gets to know about this job, that's going to be a longer standing problem. And I think that's my fear. My fear is that we're going to charge, the industry with the work of making sure that their algorithms are relatively OK. but we're not gonna deal with the marketing itself. And the marketing is a, a shit show.

Conor: 

and that whole equity of access for different cohorts or different parts of society is something that auditors around the world are trying to grapple with as well in terms of what they look at and how they look at that equity issue. So it's, it's really interesting what you just said there.

Cathy: 

There's one optimistic thing I could say about this, which is that Facebook recently, released a, a white paper, which you can think of as like a blog post on, something called like the variance control system or something. Anyway, it's like a way for them to, make sure that they're not doling out housing ads only to white people. You know, it's basically, it checks on, I think, gender and race for housing and credit and, Hiring. So for job applications. So Facebook internally, for those, ads that are identified as, living in, in a particular regulated space, they're making sure that they're not doling it out unfairly. And that's good. And that's because of a big Department of Justice uh, I don't think they would've voluntarily done that, but that's just my personal opinion. But it's just like, okay, great. So like this is happening within Facebook for some of the industries, but I don't see it happening at Google which is a bigger advertiser. And so it's just a tiny little spot. And like Facebook, might have more information actually about the people it's advertising to than a Google would. or that a match.com would, it's hard to imagine, given the free wheeling, aspect and almost anonymous aspect of the internet. this happening at scale. I mean, obviously Facebook is at scale at some level, but it's not the whole internet anyway. The, the larger point is that like, when I think about What would it take for, this kind of marketing, access to be done in a fair way? I've just, it's really hard. There's just so much that we don't know. And of course there's privacy concerns, like people don't want you to know who they are when they're surfing the web, so it's, these two worlds of advertising for goods and advertising for opportunities, financial opportunities, like maybe they need to be separated. But I'm not the architect that's gonna solve this problem.

Conor: 

One of the important things is if a company or an organization can show in how it's going about its business and saying we're trying to do a fair thing here by, you know, everybody we deal with, that's a really good starting point, I guess.

Cathy: 

Oh yeah. No, for sure. I do want that to happen. I don't wanna make it sound like inevitably impossible to solve this problem. For sure. You do what you can do. Once people have applied to the job, you make sure. Your hiring algorithm isn't problematic.

Yusuf: 

Something that's obviously come to the fore very recently and, and there's lots of interest in is, language models, right? And chat bots in particular. So we're talking open AI and G P T and you know, the range of that. Of course there's a lot of good use cases for it, but there's already talk about how they're built. So what data is used to feed those language models in the first place? And the fact that Wikipedia, for example, was, I don't know, 80, percent male in terms of, contribution of, articles. And obviously we, you know, there's lots of history that may need to be written in a different way. And so essentially these language models are using historical data that exists on the internet and, that could be repeated. Obviously it's, different to some of the, domains we've been talking about up to now in this conversation, but what do you see as the, potential risks there? Could these be used as weapons of math destruction in any way?

Cathy: 

a couple years ago when I was a Bloomberg columnist, just for fun, I tested one of these language models. it was supposed to measure. the likelihood of your, gender. It was supposed to guess your gender by a writing sample. And so I input a blog post I had written in the past about being a quant and like, this is how you do this particular technique. And it measured it as 99% male. And then I found a blogger, a male blogger who was writing about cooking. And it rated it as 99% female. And I was like, okay, yeah, I get it. this is incredibly shitty. it's almost like a prejudice machine. and thanks so much for that. that's just what we need. And it, to me it seemed, just kind of funny and stupid. And I still kind of think that Chat GPT is funny and stupid, to be honest. Like the question of whether to take it seriously. a bunch of people tried to make me take it seriously, but I can't, Yusuf, you should try to make me take it seriously, but the reason I can't is because and I think I've demonstrated this in all of what I've said so far. the ingredient that's needed to make it a weapon of math destruction is trust. We don't have trust in these things, and that's a good thing. obviously, because they're just stupid. so as long as we don't have trust in them, we're not going to put them in charge of anything.. And when I say that, of course, nobody ever really puts a computer in charge of things. That's, a slip of the tongue, you know, that's a metaphorical notion. But we're not gonna let the computer direct us, to do evil things, if we don't trust it, Weapons of math destruction happen when the outputs of problematic algorithms are trusted by people to make important decisions about people that can harm them or that can deny them opportunities they deserve. But I, unless that's happening with Chat GPT tell me if it is like, unless that's happening. I'm just like, okay. So it's dumb.

Yusuf: 

I've used it to ask questions, and then moved on from Chat GPT to something else that actually listed the sources so that we can actually find out where did it get information from? and not just a random answer that changes every time you ask the question. But part of the problem is, depending on how it's being used, it could just be repeating history over and over and over again. And the more of that content is created, the more that content is strengthened. So, if you continuously ask a particular question about, let's say an event that happened, it's generating more data about that event that happened, and that just reinforces the notion that that event happened in that particular way, does that make any sense?.

Cathy: 

No, not to me, and I'll tell you why. I just don't think that, I mean, I can imagine, I can close my eyes and imagine a Chat GPT talking about the Holocaust not having happened at length. You know, maybe continuously. But unless people are like, oh, it said that, therefore it must be true. I don't care. You can imagine all sorts of bullshit happening, you know, infinitely, let's just say infinitely you know? But if no one's reading it and no one important is making decisions based on it with power, then who cares?

Yusuf: 

So that's the problem there is that, if we aren't making decisions based on it, and there isn't a groundswell, cool. Sure. Yep. If it's, just another source of information. but if it gets, well, a source of, a source of search, right? But if it starts being used and relied on without fact checking, without, does this actually make sense? Then we do have a problem. And so if there's just inherent trust, we have a problem.

Cathy: 

I'm 100% with you. Like if, but that's a big if. And I also would point out that I don't think it's invented any new problems. You know, people point out that like, oh, now we have Chat GPT, we can sort of very cheaply produce propaganda. And I'm like, actually, we have plenty of cheap propaganda sources in this country, in this world. we could ask Mechanical Turks to write propaganda for us and it would probably be be better quality than Chat GPT. you know, I'm just saying like, okay, it's very cheap to make bullshit, but like, we actually have plenty of cheap bullshit in this world. So unless we get people to start trusting this stuff and using it, you know, or even imagine, or I imagine like a bunch of kind of gullible folks on Facebook reading misinformation and acting on it, you know? But again, like that's not new, right? If Chat GBT sort of added to the chorus of misinformation on Facebook. I'm not even sure it would be worse than what is, what's already there, which seems infinite in supply already. So I'm, I just don't see how it's worse than what we already have. that's not an advertisement for what we have, but it's like It's just, unless there's trust there, I don't see trust by powerful people. You know, one of the arguments I disagree with that was in the New York Times is like, oh, now lobbying will be a lot easier. What? nobody reads, public opinion if it's just infinite, right? All it will mean is that if you wanna have lobbying, in effect, you're gonna have to have a personal meeting. with a check. You know what I mean? So it's like, it's the opposite of something that will change. Lobbying. Lobbying still is already about money rather than ideas. So it's not a, persuasive argument. I think the larger point I'm trying to get at is that like we all have a limited amount of attention and Chat GPT giving us a cheap supply of bullshit isn't going to affect us if we don't pay attention to it. we already don't want to pay attention to things. There's already plenty of reasons for me to ignore emails and those are emails from humans, actual humans writing to me about actual things and I'm already ignoring them. So, Chat GBT isn't going to get my attention.

Yusuf: 

I'm with you.

Conor: 

I couldn't help myself. I did ask Chat GPT what questions should we ask you today, Cathy, to see what it came up with and they weren't very helpful. I didn't think they were very helpful, but there you go.

Cathy: 

See it didn't work. Didn't get my attention.

Conor: 

Yeah.

Yusuf: 

look, obviously this is not necessarily related to bias, but there may be uses, replacement for search, for example, is one of the uses they're coming up with now. But if you do have overreliance on it, overtrust, then you may have a, problem later on,

Conor: 

Cathy, your ongoing work with ORCAA where is it now and what of project have you got lined up for the future?

Cathy: 

Well, we are working with. a bunch of AGs, Attorney Generals, and a bunch of, insurance commissioners. we're really excited about, that kind of work for the reasons I told you before, figuring out what it means for a given algorithm to be compliant with a given anti-discrimination law. It's exactly the kind of work that I think has to be done, and I'm really excited to be doing. I'm building sort of a framework. So I'm writing a paper for a, law journal, I guess it's called the Law Review article, to learn the language about sort of the framework that I think, um, is helpful in this context. Like what does it mean? to hold algorithms accountable by law. it comes down to sort of tracking a conversation between lawyers, I, I think it's a really important perspective. There's a whole field right now of computer scientists trying to decide what accountability and algorithms looks like, and I think they've kind of gone off the rails to be honest, because they spending a lot of time with technical questions and then making it seem like the question of what fairness means is a formulaic question. Like, oh, here's a bunch of formulas that policymakers should adopt. Well, no, that's never gonna be what happens. Policymakers are not gonna be learning higher math to try to figure out which formula to adopt for fairness, what actually fairness looks like in reality is this question of like, is it okay to, uh, use FICO score against somebody when you're deciding car insurance? Like it's really a question of a negotiation, values-based negotiation, between the lawyers of the industries that wanna make money and the lawyers in the regulatory agency that are just trying to protect the public and the market in general. So, that's the kind of activism I'm doing is just to make sure that we, we go about this right. More standards. I mean, the, the mission of ORCAA is to set standards for accountability. I started today's talk with you guys talking about Moody's S&P and Fitch, the AAA rated mortgage backed securities. There were just no standards. It was awful. It was like a sellout to the investment banks that were packaging these shitty mortgages. And I don't wanna see algorithmic, auditing go down that same path. So the point is to like set good standards for what independent audits look like.

Conor: 

as auditors, we're used to standards more broadly, but I just wanna flip that a little bit and ask you, are there any pitfalls because the auditing profession's probably more at the nascent stage of turning its mind to what its role is in terms of looking at algorithms. Are there any pitfalls or things that maybe auditors should stay away from? while we try and build capability.

Cathy: 

you know, one of the things that I realized when I worked in financial risk after I left a hedge fund, I worked at risk metrics, doing value at risk, and I revamped their credit default swap model of risk. is, you know, first of all, Stale assumptions about distributions of returns. that was one of the problems with credit default swaps. but the larger problem was that we fixed at 95 VAR that was like 90, this is like a technical term, but you know, the way people measured their risk before and even after the credit crisis was like, what's the worst that could happen in one out of 20 days? And that was called 95 VAR, 95th percentile of value at risk. Why did we choose 95 and stick to it? That was stupid. I'm not saying value at risk had no substance, no value at all. But codifying where we're going to measure it was a huge mistake because what it meant is that, people were allowed to embed as much risk as they wanted in their portfolio as long as it appeared to be very rare. And so basically all the risk was stuffed into the last 5%, if you will. Metaphorically. The larger point I'm trying to make though is that if we, codify measurements of risk, then people will game it. People will game those systems. So that's one of the standards that we must avoid is making our measurements of harm or risk, gameable and, you know, codifying, telling people in advance exactly how they're gonna be measured is the number one way to make it gamable. That's one of the many things that we have to avoid. Conor measuring the wrong thing or like deciding that we're only gonna measure these four things and the rest are blind spots to our system. That's another problem. But, we'll stop there.

Yusuf: 

beyond bias, beyond hiring bias or customer focused bias, what advice do you have for executives in particular that use algorithms to make business decisions?

Cathy: 

Well, I would of course ask them to have their algorithms checked for problems. in the sense that I described earlier, which is to say, for whom does this fail? Make sure it's legal. figure out what the blind spots are. And you know, I sometimes they use the metaphor of like a cockpit in an airplane. Like if you walked into an airplane and you noticed that there were no dials in the cockpit whatsoever, and no pilots, you might be worried. that's the metaphor for like most algorithms right now. There's just like no way to know whether they're working safely. there's no altitude, there's no gas pressure. We just don't know if this is flying well or it's about to crash. And so for every algorithm, especially if it's high stakes and important, we should have a bunch of ways of measuring it to make sure things are going well, including it's treating different types of people similarly well, whatever that means. Including, is it asking the right question? How do we checking that? What's the ground truth and how do we verify that? Is generally speaking, accurate? unless you have that cockpit designed and built for an algorithm, I probably wouldn't even use it.

Yusuf: 

So before we close, there's a new book that you've released called The Shame Machine, who Profits in the New Age of Humiliation. I have to admit I haven't read it yet. even though the last one was a one sitting book, but we'll get to this one quickly. Are you able to give us a, quick intro to that book?

Cathy: 

It's a very different book from Weapons. it's about, shame. It's about how we, use shame as a society, to not deal with certain problems, and to blame the individual for things that, They really have no control over, like poverty or addiction or obesity. I wrote it, it came out about a year ago. and it's a tough time to talk about shame cuz I think a lot of people are just recovering from covid and it's kind of a heavy subject, but it's been heartening to me to see just how relevant it is. because obviously my message isn't, let's continue to do this, but my message my message is like, we actually have to deal with these problems. as society-wide problems not blaming the individuals for them. And, I do think, all three of those and as well as many others and for example, the way social media makes young women in particular, but many, many people, so body conscious as to be, you know, distracted and even, suicidal sometimes. you know, we have to deal with these at a, at a higher level rather than just thinking of them as individual problems. I do think it's, just becoming more and more relevant. So I do hope you get a chance to, take a look.

Yusuf: 

Will do. Thank you. Cathy, how can people connect with you and find out more about ORCAA and the work that you do? What's the best place to find?

Cathy: 

Yeah. ORCAA risk.com ORCAA. I, of course, am very happy to talk to people about, their algorithms and the auditing they need on them. Thank you.

Yusuf: 

Cathy, thanks again for joining us today.

Cathy: 

Thanks for having me, Conor and Yusuf been a great conversation.

Conor: 

Thank you, Cathy.

Listen to More Episodes Like This

Conor McGarrity
Your host

Conor McGarrity

A specialist in performance audit, Conor is an author, podcaster, and senior risk consultant with two decades experience, including leadership positions in several statutory bodies.
Yusuf Moolla
Your host

Yusuf Moolla

Podcaster, author, and senior risk consultant, Yusuf helps auditors confidently use data for more effective, better quality audits.
PA Reports Logo
Smart performance audit research.

Navigation

Latest Podcast Episodes

New episodes of The Performance Audit Report are released fortnightly. Subscribe now to get notified on release.

Submit a Brief

Tell us about your audit topic/title, objective, keywords, keyphrases, and any other details we may need. We’ll confirm the details with you within 2 business days so we can deliver your research working paper within 5 business days.
Submit a Brief
crosschevron-down