The Future of ML Governance and Data Management with Kevin Petrie

Episode 
21

MLOps Weekly Podcast

The Future of ML Governance and Data Management with Kevin Petrie
VP of Research, Eckerson Group

For episode 21 of the MLOps Weekly Podcast, Simba Khadder and Kevin Petrie, VP of Research at Eckerson Group, delve into the challenges and opportunities of integrating MLOps in traditional enterprises. They discuss strategies to overcome technical debt in implementation, the pivotal role of data in the success of ML projects, navigating regulatory compliance in machine learning, and the future of AI governance.

Connect with Kevin on Linkedin!

Listen on Spotify

Transcript:


[00:00:06.040] - Simba Khadder 

Hey, everyone, Simba Khadder here, and you're listening to the MLOps Weekly Podcast

[00:00:09.850] - Simba Khadder 

Today, I'm speaking with Kevin Petrie. Kevin is the VP of Research at Eckerson Group, where he manages their research agenda and writes about topics such as data integration, data observability, and  machine learning. For 25 years, Kevin has deciphered what technology means to practitioners. He's been an industry analyst, and an instructor, a marketer, a services leader, and a tech journalist. 

[00:00:28.260] - Simba Khadder 

He launched a data analytics services team for EMC Pivotal in the Americas and in EMEA, and ran field training at the data integration software provider Attunity, which is now part of Qlik. He's a frequent speaker and the co-author of two books about data management. He also loves helping startups educate their communities about merging technologies. 

[00:00:46.020] - Simba Khadder 

Kevin, so great to have you today. I just gave a quick intro on you, but I'd love to get your story of how you got to the position you are in today. 

[00:00:53.700] - Kevin Petrie 

Simba, really appreciate the opportunity to be here. I run the research group here at Eckerson Group.  We're a boutique research and consulting firm focused on data analytics. I came from the vendor side,  among other things. 

[00:01:05.780] - Kevin Petrie 

I was with EMC for about 10 years. During that time, I ran an analytics services team with the EMC  Pivotal. Then I was a client of Wayne Eckerson for a number of years when I was with a data integration vendor called Attunity, which got acquired by Qlik, really enjoyed reading Wayne Eckerson's reports.  We're both folks that love the written word and love to simplify complex technology and teach it to business people. 

[00:01:32.010] - Kevin Petrie 

That's fundamentally what I love about what I do now, looking at bleeding-edge opportunities, trying to translate that to IT team leaders and business leaders, help them understand what it means to them so that we're technically deep, but we're also helping folks understand the high level. That's a little bit about 

what makes me tick circuitous route, but I really enjoy where I landed here about three and a half years ago. 

[00:01:58.450] - Simba Khadder 

When we think of enterprises, I'm curious to understand or get your take on... Everyone's talking about AI  and ML. What makes it specifically hard for an enterprise? There's a lot of things, but I'd love to get your take on the pillars that specifically make it hard to get to the bleeding edge in AI and ML if you are a large enterprise. 

[00:02:19.810] - Kevin Petrie 

Great question. I think that there are a few elements that make this stuff hard. One is that, as I mentioned,  we have two sides of the house here at Eckerson Group. We have research, where we're working with innovative companies to talk about the very latest bleeding-edge tools and opportunities. Then we've got consulting. 

[00:02:39.940] - Kevin Petrie 

We noticed about a five to seven-year gap between what cutting-edge vendors are talking about and  what the average traditional enterprise, meaning an enterprise born before the cloud boom, say before  2010, what they're actually doing. There are a lot of stubborn, hard problems that have persisted for a  long time. They're not sexy. They're not as interesting to talk about on a keynote stage, but they boil down to silos, fiefdoms, a lot of technical debt. 

[00:03:11.150] - Kevin Petrie 

You've got this long tale of old stuff that because of data gravity, sovereignty requirements, migration  complexity, and cost, you're not going to be able to move it to the cloud. That's one thing. You've got  these hybrid, complex environments. 

[00:03:24.550] - Kevin Petrie 

Another is that you have folks that have built up a fair amount of habits within a certain business unit  that's different from the rest of the organisation, and it's hard to reconcile that. It's hard to retrain people.  There's also process. Process can get ingrained. 

[00:03:41.660] - Kevin Petrie 

Struggling with that technical debt related to people, and process, and technology, I think is the  fundamental thing that makes any new technology initiative hard. That's certainly true with the AI/ML,  because you can take very cutting-edge tools and you can empower a smaller business unit to do some  great things fantastic.

[00:04:01.380] - Kevin Petrie 

But you want to think of the second order implications in terms of getting something just slightly wrong on  a fraud prevention ML algorithm could have some pretty serious downstream implications. You want to  think about making certain decisions in terms of data governance for one business unit and how that  actually might cascade to or create tougher regulatory compliance processes for the rest of the business. 

[00:04:24.810] - Kevin Petrie 

I think, in a nutshell, it's really dealing with history. It's the fact that you don't have a fresh start, if you're a  company that was born before the cloud came around. 

[00:04:33.150] - Simba Khadder 

I love that split you had of the people, process, and technologies. We sometimes, ourselves, when we're  selling, we'll say organisational problems and technology problems, and we create that distinction. One,  all these things, how deeply tied are they? When you think about, let's say, the weights of those  problems, if I'm like, "Hey, I can focus on getting new technologies in, or I can focus on process," is there  one you could focus on? Is it even possible to focus on one of them? 

[00:05:00.070] - Kevin Petrie 

I think that it helps to start with the people, and make sure that you have the right people, and that you're  educating and motivating them in the right way to make sure that they can take advantage of the latest  technology and that they can adapt their process. If you start with the people, you can address, I'd say,  more than a third of the problem, half or more of the problem related to taking advantage of new  technology opportunities. 

[00:05:25.850] - Kevin Petrie 

Then it's time to go to process and figure out, "What needs to change?" Then you go to technology and  say, "What are the very latest tools that can help us achieve the business objectives that we have filtering  down to IT from executive leadership?" That's one way to look at it. 

[00:05:41.090] - Kevin Petrie 

There is a tendency, of course, because we're technology people, and we get excited about, rightfully,  new stuff to start with the technology and start with the latest bells and whistles, start with advanced  algorithms that can predict customer actions, recommend customer purchases, prevent fraud, do all sorts  of things, predict prices and personalised content on the web. Those are very exciting bells and whistles,  and those are pretty powerful algorithms that are now available off the shelf from a lot of public libraries.

[00:06:10.270] - Kevin Petrie 

The bigger question is, do you have the right data to feed that and do you have the right people and  process to support it? 

[00:06:16.380] - Simba Khadder 

You mentioned you're almost fighting against your company's build-up-a-debt. When I'm in a situation, in  one hand, I see the explosion of innovation coming from AI and ML, and there's so many almost obvious  use cases to empower the business. At the same time, I'm dragging along potentially even hundreds of  years of debt, both technical, disorganisational, et cetera, trying to make this happen. What's the pattern  you've seen work how to bring innovation to a larger organisation that's, let's say, fighting against a lot of  technical debt that exists? 

[00:06:56.150] - Kevin Petrie 

What works, I think, is starting with something that is bite-size. Start with a problem that is demonstrable,  find a group that's in pain, and then spin up a tiger team, a group of innovative, forward-thinking folks who  have fewer dependencies process-wise on the rest of the business, and see what they can cook up. It  might be that you've got a website or part of your website focused on a new customer segment or a new  offering, and you're less tied down to the rest of the of the business. You can create some pretty cool  content personalisation algorithms or customer recommendation algorithms. 

[00:07:35.950] - Kevin Petrie 

If you can start to innovate in a modular way, and if you can demonstrate some quick success to the rest  of the business, that can create confidence, it can give you a learning curve, it can help you demonstrate  business results to the rest of the business and get the right political support, the right momentum to go  broader, and start to roll out some of the things you learned in terms of people process and technology to  the rest of the business. 

[00:08:03.780] - Kevin Petrie 

I think starting small, looking for a quick win, those are common, almost cliché phrases, but they're very  appropriate here because we all have a tendency to fixate on the very latest cutting-edge tools rightfully.  There's some really incredible, powerful stuff out there, but looking for a quick win is a good way to get  started to make sure you don't take on too much at the outset. 

[00:08:25.660] - Simba Khadder 

That makes a lot of sense. It's almost this idea of creating momentum and building off of that momentum  to get where you need to get. It's because there's always going to be roadblocks. There's always going to 

be hurdles. There's always going to be some level of organisational depth to just fight your way through.  The more momentum you have, the easier it is to just push through all that. 

[00:08:44.570] - Simba Khadder 

One piece of the equation, which I know comes up a ton, is the data. The data is almost, in some way,  like a projection of lots of its debt that gets created. First broadly, if you're working ML or AI, it all starts  with the data. If you don't have good data, there's not really much you can do. What are the common pain  points and pillars that you see around the data part of the ML/AI process? 

[00:09:09.420] - Kevin Petrie 

I'll start by agreeing with you wholeheartedly that data is 90% of the problem and 90% of the opportunity because even with the very latest large language models, you look at what Databricks said when they  came out with Dolly. They said, "You know what? We can take older, more rudimentary models, but if we  apply it to a relatively small but clean set of inputs, we can generate some pretty stunning results." 

[00:09:34.240] - Kevin Petrie 

It just shows that all the innovation and algorithms, that's not the big problem right now. The big problem  is the data. If you look at the data, there are some fundamental recurring themes here. There are silos. 

[00:09:46.210] - Kevin Petrie 

If you look at structured data, we'll start with that, there are siloed, sometimes conflicting or disparate  views of entities such as customers, such as suppliers, such as employees. There's a need for basic  master data management there. There's a need for consolidation or at least reconciliation so that you  

have one version of the truth wherever possible as opposed to multiple. There's also a need to look at  unstructured data and figure out how to enrich your insights on business opportunities, on entities, and so  forth by extracting insights from the unstructured data and co-mingling it with the structured data. 

[00:10:26.840] - Kevin Petrie 

I think those are some fundamental issues related to, broadly speaking, data quality that persist time and  time again. The notion of data centric AI, I think, really makes sense because philosophically, it starts to  really get at that problem of making sure that you have clean, well-organised input. Labeling of inputs  goes a long way as well. 

[00:10:48.100] - Simba Khadder 

I totally agree. Data centric AI is really the only kind. I almost feel like that's all we do. Everything is more  about making the data usable. The models are getting to the point, where we're almost coming to straight  up APIs. It's more about orchestrating the data, moving the data, getting to the right data, cleaning it up, 

et cetera, much more so than having the most cutting-edge model, which almost feels like it's being  commoditised. 

[00:11:14.320] - Simba Khadder 

On the data side, I noticed almost two trends that, in a funny way, almost seem to counteract each other.  One is this huge centralisation of data. Let's get all of our data into this data lake. At the same time, I'm  seeing this concept. It's like loosely data mesh, but it's very much this idea of, "Hey, all these teams are  maintaining their own databases, which allows them to move faster, allows them to iterate faster." 

[00:11:41.940] - Simba Khadder 

Then the goal becomes unifying an abstraction, or almost like an interface, or a protocol over the  desperate data sets. Is that fair? Do you see that, too? Do you see those things fundamentally rubbing  against each other, or can they both exist in the same organisation? 

[00:11:58.750] - Kevin Petrie 

Really good question. It gets to do a lot of the conversations we have within our research team here.  There is definitely a pendulum effect over time. If we look at the rush to the cloud during, say, 2015 to  2020, there was a desire to consolidate onto a cloud data lake or a cloud data warehouse. Now, the  notion of a combined, whatever we want to call it, the lake has certainly works this common pattern of  SQL on top of object stores. 

[00:12:27.590] - Kevin Petrie 

There was a desire to consolidate as much as you can of analytical data in one repository. But the reality  is that data consolidation, I want to say it failed. We still have very decentralised data sets. 

[00:12:41.310] - Kevin Petrie 

The reason is that there's the long tail of old stuff that because of data gravity, inertia, what have you, it  might be a fraction of what you have, but it's going to stay in mainframe on-premise, older stuff. It's going  to be there for a while. You've also got the desire to... 

[00:12:58.090] - Kevin Petrie 

Most companies now work with multiple cloud service providers. They're trying to optimise certain  workloads on one cloud versus another, maybe gain some pricing leverage, maybe take advantage of  certain offerings in certain regions, so that multi-cloud trend is continuing. 

[00:13:14.510] - Kevin Petrie

There are things that limit what you can really achieve with data consolidation. That recognition of data  decentralisation is part of what has created this enthusiasm about the notion of a data mesh, because  now you can say it's okay that we have these islands of data. Let's start to empower people out in the  business units to own and provide it to the rest of the business. 

[00:13:36.450] - Kevin Petrie 

But I think the shift now is people say, "That's fine. If we accept the data is decentralised, somewhat, we  need to have a common semantic layer, a common management plane on top." That's where you see, I  think, some more investment now in order to figure out how to manage things across decentralised  environments in somewhat of a uniform way. 

[00:13:58.480] - Simba Khadder 

I totally agree. I think that's what we've seen. We've even structured our own product. We call it the virtual  feature store just because it's almost like, what would happen if you could virtualise this application layer  without having to forcibly centralise the data? 

[00:14:13.070] - Simba Khadder 

We've seen a lot of the uptake comes from companies that are on-prem and in cloud, where it just doesn't make sense. It's just not possible, especially when you're dealing with really sensitive data. The work you  have to do to get that and lift it off of your mainframe onto wherever is just enormous. 

[00:14:32.560] - Simba Khadder 

I want to talk a bit about that, too, because there's also this component of regulatory risk and governance  and all the components that come around that aspect. First, I'd love to just get almost your lay of the land.  If you are a large organisation enterprise, you have financial data, you have a variety of different user  data. What is that like? What regulations come up a ton? What things should should people in those  positions be thinking about as they're working with this data? 

[00:15:04.060] - Kevin Petrie 

The regulatory environment continues to evolve. I think there's some pretty common patterns. If you're a  global organisation, you've been dealing with GDPR in the European Union for some time now, and that's  going to force you to make sure that you're only taking actions with customer data that they have explicitly  authorised. 

[00:15:25.650] - Kevin Petrie

The strong corollary to that is the California Consumer Privacy Act, the CCPA. We've got several other  state versions cropping up in the United States that are similar. Broadly speaking, they have the same  principles. 

[00:15:39.260] - Kevin Petrie 

If you get down to the regulatory fine print, it is hard for companies now because they got to figure out,  "I'm complying with CCPA, does that mean I've got it licked for all the United States? Does that mean I'm  okay with GDPR?" 

[00:15:51.480] - Kevin Petrie 

I was at an event in January with the CEO of L.L.Bean, and he was lamenting the fact that they had to  have multiple compliance teams solving multiple compliance requirements. He said, "Why don't we get a  universal standard here, at least for the United States?" Globally would be great. 

[00:16:07.870] - Kevin Petrie 

The first trend is that consumer privacy trend, making sure that you're only taking authorised actions with  data. That applies to both business intelligence and new analytics workloads such as data science. That's  one broad set of activities that's important. 

[00:16:22.910] - Kevin Petrie 

The next one is figuring out, "What are we going to do with artificial intelligence?" Because now we need  to make sure that we have some visibility. We need to get past the black box problem to understand we  actually know what actions we're taking with the customer data, and then we can explain it to people. 

[00:16:39.120] - Kevin Petrie 

The European Union has some draft legislation looking at data science, looking at artificial intelligence.  It'll take a while for that thing to shape up. But I think that what companies are taking a much harder look  at are more basic questions. This explosion of interest in large language models since ChatGPT came  onto the scene in November has, I think, stunned the world and stunned enterprises who look at their  teams, including, according to our surveys, nearly half of data engineers are already using ChatGPT to  help them do their jobs. 

[00:17:10.730] - Kevin Petrie 

They're starting to say, "Whoa, great. If you can get some productivity benefits, it's fantastic." But this is  also the Wild West because with ungoverned inputs, you're going to get ungoverned outputs, and it  creates a lot of risk from a compliance perspective, from an accuracy perspective.

[00:17:26.710] - Kevin Petrie 

I think the next wave of innovation is going to focus on AI/ML governance and looking at some basic  questions, which is, do I know why this algorithm told me to do something, and do I trust it? Those are  easy questions to ask, very hard questions to answer. I think there's going to be a lot of focus on it from  companies and from vendors. 

[00:17:49.350] - Simba Khadder 

When we work with certain types of companies, I find that a lot of problems around governance is  handled via essentially meetings, committees. Before I get a feature in production, I need to talk to this  team and this team, et cetera. You mentioned there's always productivity gains. There's always business  opportunities that you can take advantage of if you are using this new technology. 

[00:18:13.650] - Simba Khadder 

First, what's the state of the world now? How is governance in practice actually being applied at these  enterprises? One, and then two, I'd love to get a sense of the utopia state. What should it look like? What  would be in theory, the best way that this would look in the future? 

[00:18:32.170] - Kevin Petrie 

I think broadly speaking, the companies that we work with... There are some very common trends across  companies. One is to get much more serious about cataloging their data and assembling the right  metadata of all their different data sets throughout the organisation. It's hard to do, like everything, easier  said than done. That's one effort that's underway. 

[00:18:52.010] - Kevin Petrie 

Another is to modernise and move as much as you can onto a common cloud-based repository for  analytics. Cataloging, cloud consolidation are big trends. Now, there's also a desire now to invest quite a  bit in training in fostering best practices. There's a focus on data literacy. There's also a focus on enabling  companies and making sure that you're at the grassroots level. 

[00:19:23.230] - Kevin Petrie 

The people in your organisation who are starting to use ChatGPT or Bard or Bloom, whatever it is, if  they're starting to use those tools, they're doing so in consistent ways that they've been trained on some  easy-to-understand guidelines about what's inbound, what's out of bounds. We find that companies that  create a centre of excellence to help foster consistent best practices for using both new and established  technologies is the right way to get that going. 

[00:19:55.650] - Kevin Petrie

Those are some of the big trends that we see in terms of getting your arms around this governance  problem. 

[00:20:02.390] - Simba Khadder 

It's funny, it comes back to your same people, process, technology, the part that it almost feels like  skipped upon because it's not that sexy, it's like the people part. How do we get people in our org? Even if  we don't necessarily have this perfect technology, how can we get it to a point where people just get it?  They're well trained, they understand how we think about things, and they can make the right decision  even given the mix of ambiguity that exists. Do you buy that? Is that fair of how you think about it? 

[00:20:32.140] - Kevin Petrie 

I think that we're all still a little bit in shock about the capabilities, and I keep going back to large language  models, but it's forced us to ask some hard questions about AI overall. We're all still in shock about how  powerful it is, but I think that's starting to settle down. 

[00:20:45.340] - Kevin Petrie 

Some of the common patterns that are starting to emerge is that vendors and companies are going to  create smaller language models, small SLLMs that have curated, governed inputs, and they're solving  small tactical focus problems. It's not a big Wild West. Rather, it's something that a company working  closely with a vendor can make sure they're solving in a governed way. 

[00:21:11.240] - Kevin Petrie 

I think the things will settle down on that front. It's going to take some time, and it'll probably take some  missteps by a lot of companies in the meantime as well. 

[00:21:20.140] - Simba Khadder 

I have a deck that has a slide where I defined an SLM, a small language bundle. 

[00:21:25.390] - Kevin Petrie 

There you go. You beat me to it. I was thinking of coining the term. I'm sure it's appearing right on board  with you. 

[00:21:30.640] - Simba Khadder 

I was joking. I'm like, "Well, if GPT is an LLM, does that make Bard an SLM? Just maybe starting to string  together the traditional ML and the more AI focus have the new wave, the new paradigm that are  emerging in the last six months to a year.

[00:21:49.970] - Simba Khadder 

I'm curious to ask about that. In the enterprise, we're seeing GPT provide this new paradigm. The idea of  a prompt as a data scientist, it wasn't really a concept before. The idea of thinking about your prompt,  constructing prompts, that was never something we'd ever think about just because the models just  worked very differently. 

[00:22:09.220] - Simba Khadder 

Now, if I am a team, and there's a lot of companies that have chatbots, they have chatbot teams. The  chatbot teams are working in what's called more traditional methods of NLP. Is ChatGPT going to wipe  out a lot of traditional ML use cases? Is everything going to get replaced with GPT or similar LLMs? How  do you think about that? Where is the lines and distinctions between where it makes sense and where it  doesn't make sense? 

[00:22:34.960] - Kevin Petrie 

My guess is that there will be a profound transformation, but it'll take time to really shake out, and the  implications are hard to predict right now. GPT got there first but also had some of the most interesting  examples of ungoverned outputs. I don't know that they're necessarily going to win the war. Google is  doing quite a bit, and Google, I think, has a very high interest in making sure that these are governed  inputs and outputs. 

[00:22:59.420] - Kevin Petrie 

I think that what's more likely to happen is that you will have small SLM, small language models, in  governed rollouts using commercial technology from vendors, using homegrown stuff from open-source  communities or a mixture thereof, where companies start to roll out much more focused language models.  That'll probably become the norm because people have higher confidence in it than at least the current  versions of ChatGPT. 

[00:23:29.290] - Simba Khadder 

It's really interesting. It makes a ton of sense. I've actually not heard this take as strongly. I think it's very  much, in my head, it's the most clear path forward for an enterprise. If you're a large bank, figuring out  how to make LLMs work is a lot harder than trying to figure out how to make an SLM work. 

[00:23:47.750] - Simba Khadder 

I want to come back to the COEs. Obviously, a lot of larger enterprises have COEs, and I've seen many  different ways of trying to implement them. I'm curious, from your perspective, what makes a centre of  excellence successful? How can you successfully show and nurture these best practices across a large 

org? If you think SLM is the way, how do you build a team that really can show that, one, and two,  disseminate it to the rest of the organisation? 

[00:24:16.420] - Kevin Petrie 

Great question because it's real easy to spin up something that's cross-functionally that involves dotted  lines and voluntary time on top of a day job, and it just goes away into the ether after a few months. But if  you start by saying, "We need executive buy-in and we need executive commitment of dedicated time at  the top to actually foster this community centre of excellence." 

[00:24:37.480] - Kevin Petrie 

Then we need within the different business units and different IT organisations, line managers that are  told, "Your people are going to dedicate X% of time to this." Then you've created enough time, enough  motivation and buy-in that people give it the calories it needs. You need to balance this with everything.  This need for centralisation and control with need for innovation and decentralisation. 

[00:25:03.660] - Kevin Petrie 

Start with broad brush guiding principles, and start to boil down to specific policies and best practices,  and find people at the individual contributor level or the team manager level who are having success  putting those guidelines to work, celebrate that success, help them share those best practices with their  peers, and start to foster a community where people have a vested interest in learning from one another  because ultimately, they're now helping one another be more productive in their day jobs. 

[00:25:33.760] - Kevin Petrie 

I think those are some of the common practices that we see. What's interesting about it is that none of it is  distinct to AI/ML or to other specific technologies. Rather, this is the art of human management that is  always evasive. There's some timeless principles that are still hard to master. 

[00:25:50.990] - Simba Khadder 

I love how you frame it. It's almost like making innovation a process to something that you do. I still have  so many questions. I feel like we could continue to talk about this all day. 

[00:26:00.820] - Simba Khadder 

I would love just to leave off, for someone who wants to follow you, follow your research, I know you write  a lot and share a lot of work about this, where should they look for you? 

[00:26:11.170] - Kevin Petrie

I am on LinkedIn quite a bit. I post each day and love to engage community members there. I find that  some of our best research comes from people that we find on LinkedIn through polls, through messaging,  and so forth. I encourage folks to look for me on LinkedIn. 

[00:26:29.130] - Kevin Petrie 

If you think I'm wrong with something I say, tell me. Let's engage. I would love to have those  conversations. Pro and con, I find that it really enriches our research process and ultimately the value that  we can provide to our community of practitioners. 

[00:26:42.580] - Simba Khadder 

Awesome. We'll include all those links in the description. This has been such a pleasure. Thank you so  much for helping on and sharing all these insights with us. 

[00:26:50.250] - Kevin Petrie 

Awesome. Thank you, Simba. Really enjoyed it.

Related Listening

From overviews to niche applications and everything in between, explore current discussion and commentary on feature management.

explore our resources

Ready to get started?

See what a virtual feature store means for your organization.