Skip to content

Ep. 24: Mark Perlin of Cybergenetics

Summer of 50 PGH Tech Stories

For twenty years, Cybergenetics has been solving complex DNA evidence. Most forensic items are mixtures of two or more people, often in small amounts. Many crime labs can't get information from these DNA items. They incorrectly call them "inconclusive," or report a wrong match statistic. Cybergenetics "unmixes" DNA mixtures. The Pittsburgh company's TrueAllele technology uses math and computers to find the right answer. A big match number shows that someone left their DNA, while a small statistic (a lot less than one) shows they didn't. Courts and investigators need match numbers.

Get the whole story from founder Mark Perlin. Plus check out an article detailing proposed legislatio to take away you right to use independent forensic software.


Summer of 50 PGH Tech Stories is powered by Comcast.



For twenty years, Cybergenetics has been solving complex DNA evidence. Most forensic items are mixtures of two or more people, often in small amounts. Many crime labs can't get information from these DNA items. They incorrectly call them "inconclusive," or report a wrong match statistic. Cybergenetics "unmixes" DNA mixtures. The Pittsburgh company's TrueAllele® technology uses math and computers to find the right answer. A big match number shows that someone left their DNA, while a small statistic (a lot less than one) shows they didn't. Courts and investigators need match numbers. Get the whole story from founder Mark Perlin. Summer of 50 PGH Tech Stories is powered by Comcast.




Everybody, this is Jonathan Kersting, with the Pittsburgh Technology Council bringing you our Summer of 50 Pittsburgh Tech Stories with Comcast. And some of the stories we do just, some of them really just they get to me in this crazy way. And that's what's happening here with Cybergenetics like this to me is the ultimate of how technology empowers like, like justice. It's just, I can't put words to describe it, that's why we're going to talk to Mark Perlin from Cybergenetics. This is a company you probably never heard of, but they're doing such transformative work behind the scenes when it comes to criminal justice. When it comes to all things around just DNA and being able to prove who did or didn't do a crime, and they're doing this like at the highest levels possible. And to know they're in Pittsburgh doing this to me is just so exciting. So Mark, welcome to the show today. I've got so many questions for you because I think you're one of the unsung heroes of  Pittsburgh with some of the work that you guys are doing.


Well, thank you very much. It's a pleasure to be on. 


Very, very cool stuff. So quickly, just give us your background real fast. Now I know you got a pretty cool background, and you're doing the work that you do with Cybergenetics.


Well, I have education in different areas. A PhD in computer science from Carnegie Mellon, where I was faculty for 10 years. I have a math PhD and a medical degree from University of Chicago. And putting that all together really puts Cybergenetics, which we formed about 25 years ago, in a very good position to build technologies based on math, computers, and biology, that could solve problems that people can't.


Absolutely, absolutely. And so tell us about the quick overview of Cybergenetics, and give us the nickel tour and your role with Cybergenetics and everything like that.


Okay, I was faculty at Carnegie Mellon in computer science in the early 90s working on The Human Genome Project, and I invented some technologies that let us accelerate gene discovery, automated diagnosis of disease. And it turned out that those technologies also had use in forensics. So in 1994, I left Carnegie Mellon founded Cybergenetics around the corner on Craig Street,, and we were approached by the British government in 1998. We were mainly working with drug companies and academics on diagnostics with genetics and gene discovery, drug discovery. And they asked a question, could we get rid of their backlog of 350,000 reference samples where they had almost 100 people working three shifts in a building in Birmingham and replace it with our genetic software? And the answer was, yes. And we did it. And they released this, and basically 95 people found something else to do. And a few desktop computers gave much better results much faster without people. And this was the problem of how do you interpret DNA data? This was simple data from a cheek swab. So even with something that simple, you have two or three people who are looking at the data generated by the lab, trying to interpret it, and accurately get the right genetic type of that person, the genotype. That was the first major automation we did, and we continued. Another 350,000 samples a year with a few TrueAllele computers in Birmingham, England, automating the DNA for the British DNA database. 


That's back in 1998. Like 22 years ago,  I didn't realize you guys were 22 years into this thing. That blows my mind.


Well, what happened is that once we solved that problem 20 years ago, they said, all right, that's the cheek swab problem. That's for a reference sample where the data is easy. What about evidence, which are usually mixtures of two or three or six people all mixed in together. We can't solve that at all. People don't know what to do. We throw out almost all the data and almost everything's inconclusive. We don't get any information. Can you solve that? And so we did.


You’re so matter of fact about this Mark.


So what we did, we used Bayesian probability, Bayesian modeling, a type of machine learning that will reach deeply into the data, and think about it, try out 100,000 different solutions, and determine what's more probable what's less probable, and fully answer the question in an accurate, automated way, but never sees an answer. It doesn't know who the suspect is. It just generates a solution for every possible person who could live or not live, and then make a comparison and tell you with a number, what's the support that a person’s DNA is present or not in the evidence?


Right? Wow. It's just fascinating. And so like, tell us you know more about TrueAllele the technology and really like some of your customers. obviously, you're working with the British government there. I'm assuming you're working with customers of all types of governments, with lawyers, all types of customers.


So groups that actually buy TrueAllele as products or crime labs,, about a dozen groups in the US that use TrueAllele with their own computers in house. TrueAllele is a setup. So there is a server computer that can have from say 12 to 100 processors all solving problems simultaneously in real time. And then there's client software, so that an analyst can ask questions, and in a visual way, upload their data and then visually review the answers to look at those genotypes and the match results that tell you, hey, if the number is a million, that suggests that somebody's DNA is there. But if the number is one in a million as a statistic that points away from them. So those are the crime lab customers. And then we also work with lawyers. We work with prosecutors, defenders, innocence groups. We work with police and other investigators. We worked on the World Trade Center project 15 years ago, helping  to identify 18,000 victim remains using the TrueAllele Casework technology, and all of this advanced Bayesian statistics.


Mark, you are giving me goosebumps here because like, I mean, this is just so powerful. I mean, everything from the World Trade Center, to finding out who the victims were so their families can have some peace of mind, to checking out DNA to prove if somebody actually did a crime or not based on their DNA actually being there or not. Like, the impact that you're having on society is just massive, I think. I mean, how many court cases has your technology been a part of? It's got to be countless by this point?


Well, including all the crime labs, it's thousands, Cybergenetics has been to court around 100 times. We've had, occasionally groups will challenge the evidence, and ask a judge, “is this reliable?”. And then we bring to court a hundred documents with the validation studies that have been done. There have been over three dozen validation studies that show that TrueAllele is reliable. It finds the right people. It doesn't find the wrong people. The method is reproducible. Eight of those studies have been published in peer-reviewed journals. So, we've been to these admissibility hearings over two dozen times in the United States alone. And judges need to be persuaded through evidence about the reliability based on testing the system, 


You bring data to it right? At the end of the day, you're bringing data.


That's the concept in science and law. You test systems, whether they're robots or chemistries, machines, test tubes, or software. You test your systems on real data, and you measure how well they work. You measure their error rates. You do all of this testing. You follow standards or guidelines the government sets forth to make sure that you're testing your systems properly. We test our system. The crime labs test our system. Other people test systems. Every case we're involved in, we offer to the other side the opportunity to test our TrueAllele system at no cost. You can always test our system. We're scientists, and we believe in transparency.


I can just say, I mean, it's transparency 100% all the way that makes this so powerful. You got nothing to hide. It's like the data is the data. And here's how it's being, you know, presented. So that's just amazing. 


Also, the methods that we use, we published. We've been publishing them for 25 years.  We've disclosed them in patent filings 20 years ago. We disclosed them in papers 20 years ago. So we have a host of papers that describe the methods, the validations, the results, how our algorithms and methods are applied. So it's an open book as to what it is that we're doing.


Amazing stuff. Let's talk about like an actual case. I think it's always fun to bring this down. You tell us such great stories about how you know your technology has been used. And tell us about what happened in Texas in the Lydell Grant DNA exoneration case. I think this is a pretty cool, pretty cool way for us to explore this.


Well, that was pretty recent. This started last summer. So the Innocence Project of Texas contacted Cybergenetics. We did our usual procedure, which is a free screening. We will always let the computer look at somebody's DNA data at no cost and send them back a preliminary report. If they don't like, disagrees with what they want, no harm done, they can move on. Now this case the Innocence Project saw that we showed two things. One, Lydell Grant was not connected with the murder of Aaron Scheerhoorn. His DNA was not on these items. But we also found somebody else's DNA. So the Innocence Project of Texas was able to take our results and start arguing: first, that Lydell wasn't connected to the crime, and secondly, that let's do a search of the FBI’s CODIS database that has 14 million convicted offenders on there and see if maybe the true killer corresponded to this DNA genotype that we found.


Okay, so they're in the search?


Well, they don't like running searches. That has to be done usually by crime labs, done in a certain way. And the rules are set up that it makes it very hard to revisit the past. For example, a lab can't go back and visit a lab 10 years ago, when the work was done. But we worked very closely with a Virginia, sorry, with a South Carolina TrueAllele lab. And we spent a year putting protocols together we thought we were complying completely with whatever the FBI wanted. And we followed those rules and the South Carolina lab ran a search with the TrueAllele profile we found. We found the killer in Georgia. And the response to showing that Lydell Grant was not in the evidence and that the true killer was somebody else in Georgia was total silence. Absolutely nobody cared. 


He ignored it? I mean, you're coming back with something saying this is wrong and they're ignoring that?


So the district attorney in Houston wasn't interested because you know, people believe they've convicted the right people.


And they don't want to look at that thinking that Yeah, something happened and it's their fault.


I think they truly believe what they believe. And it's a mindset you get into. The FBI was not happy that we'd run a search and found somebody was not the person in jail. We were yelled at a little bit by different groups. But ultimately, the truth won out. And the Innocence Project of Texas was able, over a few months, to persuade the investigators, the police down in Houston to look into it again. And they went, and they found the individual that TrueAllele had identified in Georgia. And he confessed to the murder. And finally, Lydell grant was released from prison. And he still hasn't been exonerated. Nobody has sent us tremendous thanks except the Innocence Project of Texas.


Aren’t you on his Christmas card list now or something like that. I mean, come on.


Maybe for the Innocence Project. But that concept of using better technology on the same government data, to get a much better result to find the truth in DNA that science can deliver, and then actually help justice is not really a thankful task, right?


No, it's not. I want it to be a thankful task. But obviously there's powers that be that don't want that. So I'm assuming you've probably run into lots of opposition over the years, obviously. And it seems to have shaken things up at the national level, because I know that there's some proposed legislation, you know, questioning whether or not to use outside services to test and everything like that.


Well, there's an interesting bill. It's called the Justice Through Forensic Algorithms Act or something. And what's interesting about the bill is that it's based on very false premises. And it has very radical solutions. First, it assumes that whatever companies do when they develop innovative technologies, they publish their algorithms, they make the software available for testing, that somehow the trade secret source code that keeps them in business and lets them not go out of business, because they maintain protected trade secrets, is somehow the secret as opposed to the method or the testing. Now, what's strange is that Cybergenetics and all the other commercial companies in this area that we know of, will let a defendant, under confidentiality, review the source code. So, we wonder why you're even trying to pass a law to revoke trade secrets that keep all software companies in business.


Isn't it? Right? I mean, that’s the thing that is their business like yeah.


And why do you want to. The purpose of trade secrets is to promote innovation. You already can see the source code. Why do you want to pass a law that revokes trade secrets? The bill goes on. For a hundred years judges have served as gatekeepers for reliable evidence of all kinds: scientific, forensic. And if someone challenges evidence from the opposition, then you have to prove your methods are reliable through testing and error rates, through peer-reviewed publication. And good technology can do that. And it’s done by impartial judges. It gives you the right as a defendant to bring technology that you want to use to help you in court. 


Right? Exactly. Like I mean, it’s part of your defense.


The second prong is to get rid of the judges and the lawyers and your rights. And to put all of this control of admissibility of reliable evidence in the hands of a small federal agency that actually promotes products. So, they would be in a position to promote the products that they support and legally block better methods. 




That seems odd. They also promote having standards for testing. Sounds great, except they've been around for five years and everybody follows them. They would centralize testing and take it away from the hundreds of groups that do it now in a decentralized way. So there are other things that the law proposes, but basically, the things that we value in science and justice, your right to have an impartial judge to present evidence, to have transparency in the process, to be able to defend yourself, and not have the same government that's accusing you of a crime with their methods, that government can block other methods that might be more effective. There's nothing about the bill that makes much sense to us.


Yeah, no. I mean, what can people do about this? I mean, should we be reaching out to our elected officials and telling them that this is something we don't want to be thinking about?


That would be a good thing. We're talking with the groups that understand TrueAllele and the situation, whether they're public defenders, whether they're prosecutors, whether they're lawyers, or they're just not lawyers, and they think the bill is a terrible idea. And writing to your congressman, and saying, you know, I've read about this bill. I've watched this podcast, I've read the links that are attached to it. And I don't understand why I want to surrender my rights and good science in the name of government protecting me.


Absolutely. I mean, what kind of legislation do we really need to help justice? I know there's lots of gaps out there. In your eyes what could actually make sure that people are treated equally in front of the law. 


Well, the biggest problem we see is a lack of transparency and accountability from government, from the data. It's not the methods that are quite transparent. It's crime labs and federal agencies lock the data that's needed up in laboratories and databases, so independent scientists, experts, people with better technologies or other technologies can't get access to it. If you've been accused of a crime and you can't get access to the data the crime labs accusing you with, not so much of a problem in Pittsburgh, but it's a problem elsewhere in the country where defenders have to fight to get that data. You can't defend yourself. Prosecutors who want that data in another case may struggle to get it from a crime lab. If you want to go back once you realize that scientifically, crime labs have misinterpreted or under interpreted complex DNA mixtures in a million or more cases, and you want to get justice by revisiting those old cases, you won't get it. There's no law giving you access to undo the damage that hiding that data from government has done in the past. And as we saw in the Lydell Grant case, we got our one CODIS search that was historically ever done outside of a crime lab in the US, and we found the killer, exonerated the innocent man, got him out of prison. The result was more rules to stop independent scientists from seeking justice through better science. So, a better legislation would be to provide transparency for government data, government labs, government databases, so citizens can see what's going on with forensic evidence and revisit the past and open up the future to get better justice.


I mean, did you ever imagine 22 plus years ago, when you started this up that this is where this would be? That you would take it to this level? That it is used around the world, and that I mean, I just feel like this thing has grown to be something quite large and quite impactful. And I'm always curious when someone starts something, you're trying to do something. Something's interesting. You see, you see a solution to a problem, but think of the positive impact that you've had. I think it's pretty amazing. Do you think it was going to be like this?


No, not at all. We were working on some scientific problems. We were asking how much information could math and computers and science and modeling get out of DNA data to unravel, solve impossible problems, and get the answer as a single number. And our initial thought was, well, this is really good. People will just use it, but that's not what happens in technology. It took us 10 years to refine it. And once we first used it in court here in southwestern Pennsylvania with the attorney general's office in the Kevin Foley case with John Yelenic, a dentist who was murdered out in Blairsville. That was 2009. What happened, and it was the DNA under his fingernails and TrueAllele’s analysis that really made the case that connected the killer to the crime. We thought okay, now crime lab’s will buy this, and we'll move on to some other problem. What happened instead is things were slow in government. We had some groups that really liked it. It was a huge fight. And by the time we’d proven that it worked, demonstrated reliability, published papers, won admissibility hearings, now there are 10 other groups adopting our technology. And selling it in the marketplace, which is always interesting. So on the one hand, it's really good that our ideas are out there and helping as innovators. As a business, we certainly didn't capture the market because once there's a good idea out there, larger companies go out and compliment you by sharing your work.


You're such a gentleman there Mark. I like the way you said that. Very much so. To me it's just an amazing story like a quiet little company on Craig Street there working on some of the world's toughest problems and literally saving people's lives at the end of the day. And that's why I'm so pumped to have just learned a little bit about what Cybergenetics does here, and so proud that you guys are in Pittsburgh. And as I've been saying, all these great stories I'm telling with Comcast, you're making Pittsburgh proud Mark. Simple as that. Making Pittsburgh proud. 


Thank you very much. 


Great stuff, Cybergenetics. Check these guys out. It's an amazing, amazing company.