The MLOps Mindset with Stefan Krawczyk

August 2, 2022

Episode

MLOps Weekly Podcast

Former Model Lifecycle Lead, Stichfix

Listen on Spotify

‍

[00:00:06.000] - Simba Khadder

Hey, I'm Simba Khadder, and you're listening to the MLOps weekly podcast. Today I'm chatting with Stefan Krawczyk, former Manager of Data Platform, at Stitch Fix. He built, and later opened source to Hamilton Framework while he was there. Prior to that, he was an employee at LinkedIn, Nextdoor and NLP enterprise start-up that, in his own words, crashed and burned. Stefan, so good to have you on the show today.

[00:00:27.690] - Stefan Krawczyk

Yeah, thanks for having me, Simba. Excited.

[00:00:29.580] - Simba Khadder

I gave a quick outline of how you got to MLOps, but I'd love to hear it in your own words. What was your story that got you MLOps?

[00:00:35.640] - Stefan Krawczyk

How do I get in to MLOps? Well, I originally did computer science at Stanford and I had specialization, so I knew I wanted to do something around this new field of ML, AI. I guess that. At LinkedIn I pivoted to prototyping content based recommendation products. I got firsthand experience, what it's like to build a bottle and try to get it out to production. At Nextdoor, I built similar things and related tools of technology. First version of data warehouse then we could build and get data to try and model with and online experimentation, A/B testing frameworks and so. That's what led me to go to the startup because I wanted to get better at machine learning frameworks because I really wanted to understand how to operate them, how to build them except that was for future. Then the opportunity came up with Stitch Fix was, come and help build a machine learning platform because that's why MLOps was coined. That was, I think, people were trying to build machine learning platforms and so that's what I was interested in doing. While at Stitch Fix, I was building a little more than just a platform but trying to help data scientists build, get things to production themselves.

[00:01:36.660] - Stefan Krawczyk

The MLOps term was coined so it was great. This is like the easiest way that I can explain to someone what me and my team does. We're trying to help. Data scientists operationalize machine learning in a way that's self-service.

[00:01:48.740] - Simba Khadder

I love a lot of these terms. Like, [inaudible 00:01:50] with feature store. Well, define a feature store to someone and they're like, "Oh yeah, we have that." Everyone built all these things. Anyone who's been doing machine learning production has been doing MLOps forever. We didn't call it that and we didn't really even think about it that way. We don't really break it down in that way. It was just the platform for machine learning. For us, the feature store was something we used called it our data platform for machine learning. It wasn't the feature store that came later and we said, "Yeah, that's what we built, it's a feature store."

[00:02:20.760] - Stefan Krawczyk

Yeah. I think it's similar to how DevOps came about. Rather than handing off, you're now trying to go do things yourself in the best practices way to get things to production. In which case, adding a feature store and thinking about features and models and data, it's all ecosystem. In which case, it makes sense. I was happy that whoever came up with that came up with that. I'm like, "This is great. This makes my job much easier to explain." Thinking about the software engineering best practices and again trying to bring in the DevOps mindset of how do you operationalize things without breaking production, you want to stop bad things from happening so what's a good way to think about and do it.

[00:02:53.160] - Simba Khadder

You've answered this question, but I wanted to ask you so it's discrete for anyone listening. How would you define MLOps?

[00:03:00.180] - Stefan Krawczyk

You're sort of riffing on what I just said before. For me, the whole goal, the mindset of MLOps is to stop bad things from happening in production but you're doing it in and approaching it in such a way that you're bringing what's termed developer operations or DevOps, best practices for deploying and getting to production. But then you're also thinking about software engineering best practices because machine learning is also data and code that then you ship and package together. It's really the mindset of how do you stop bad things from happening production and then what are the techniques and ways and framings that you can employ to make that happen.

[00:03:32.070] - Simba Khadder

I love that you brought up DevOps. It's interesting to see how everyone's view of MLOps, which is DevOps, are obviously different things, but some people say they're completely different. Other people are like, it's just the same thing, applied in different places. Some people I've talked to, some people are like, it's like 90% the same. MLOps doesn't even have to exist. How would you define like the difference between MLOps and DevOps?

[00:03:53.370] - Simba Khadder

For me, DevOps when I encountered it, it was rather than it being someone else's job function, a job role, rather to deploy my code, I now as a developer, I deploying it myself. For me, MLOps is, if you think about if you take DevOps next mindset. How do you enable someone to develop and deploy and develop things as part of the job function. Then I think to me, that's similar to the MLOps mindset. Now DevOps, I want to say it's probably right now to me, it's actually a subset of MLOps because if you think about deploying production, then maybe there's a little bit of this disjointness in DevOps because it's focus is on things that deployments would never be in machine learning deployments. I think, right now, once DevOps is a subset of the other and helps feed into… Any best practices in DevOps, I'm pretty sure you can bring them back in MLOps.

[00:04:37.620] - Simba Khadder

I love that. It's funny that in the last episode we had James, who's an investor. He said, "One open question is the size of the DevOps market vs MLOps market." You're not exactly saying that but I love the idea of that MLOps market is the superset, it's DevOps and more. Putting models of production, it hits every pain point of DataOps, DevOps and all of these other things in between so you're right. I can't imagine someone having a really good MLOps workflow also have amazing DevOps workflow, same with DataOps. It's almost like you need both as a foundation and then you need to find a way to apply that for your machine learning team, to get all working together as a MLOps. Could you share some things that manifested in what makes… so putting a service in production versus putting them all in production, could you talk about, we dissolve differences. Could you talk about, maybe a story of something that happened, that could only happen in machine learning but you have to solve?

[00:05:32.220] - Stefan Krawczyk

What's a model? Think about it from first principles in terms of, you want to predict something while there's going to be some function somewhere that you pass some data in and then that function actually has some state as some internal state. That is [inaudible 00:05:46] model and so it's together with the inputs you're then transforming and using, computing something with that internal state and outputting a result. If in traditional software engineering and apps or services, there isn't really any internal state. It's all pretty invariant and very easy to reason about the inputs that you get and the outputs so you can write. As long as you have great unit test coverage, pretty much could something wrong it can go wrong in production with your app, but with MLOps and machine learning model into production, I think, you have to really think about all the different… There are so many things that could go wrong. I wouldn't say it's if you have a model rather the time… There are so many components to putting a model into production that can go wrong really. Whereas with a traditional app, the thing is invariant given time. As time goes on, as long as expectations that you coded things, the inputs that you expect come in as expected, you don't have to update or change the app or the model, that's not necessarily true.

[00:06:50.550] - Stefan Krawczyk

Particularly since your model, if your inputs shift in inputs and values over time. Then maybe the internal state that you have in the model, actually becomes to be less relevant and actually starts output and change different results. Whereas in the other app, everything is pretty much hardcoded and it's very easy to reason about with the model, that's not quite the case.

[00:07:12.690] - Stefan Krawczyk

It's usually happens that you're also updating models, probably far more often than you are updating that internal app logic. If you think of did that function endpoint and how many different versions there are, over the course of time. As you update the model, you have to get that update to the app somehow. How you do that in a way that doesn't break things. Once it's in there, how do you reason about, ensure that things aren't changing for the worse or if they're changing for the better. You need more measurement and observability around it. Then what happens in production, then also flows back to or so creating a better model. You have a loop, you should always be thinking about when you're deploying a model that there is going to be this feedback loop that you have to think about. Whereas with the traditional app, you can be one and done it and other than adding new features to the app, which is where DevOps practices really help, from that perspective but in terms of once you've created a feature, it's probably not going to change very much unless the business changes.

[00:08:12.060] - Stefan Krawczyk

That changes on a much slower pace than on the pace of our model, as it evolves.

[00:08:20.400] - Simba Khadder

My last company, we built a Iot of recommended systems which I'm sure you don't, but quite a bit, that's Stitch Fix. One of the problems that we always ran into is there's no such thing as like the perfect recommended system, and sometimes for example, la user would say, "Oh, this user got a really bad recommendation." It's like, "Well, how do you fix that?" One option is that you write if-statement, band-Aid on it, but other options, you try to retrain the model and fix that, but then it becomes Whac-A-Mole. You can't unit test it, because it's what the model has learned. We had to do all kinds of wild… There's a lot of stuff we built to make it possible and uses things that would never happen, like you said. Something that is much more the standard programming, if something is wrong, you can go and debug it. If a model is outputting garbage sometimes, what do you do? It's not really a good answer. You can retrain, you just try things and hope it fixes it or you really have to… there's no perfect way do it.

[00:09:17.190] - Simba Khadder

You can't just go in and say, "Hey, I'm going to change the sway to 1.7 and then it will all work out." I think that's where a lot of these problems come into play.

[00:09:27.600] - Stefan Krawczyk

The way that I think about it is that model is statistical, so there is rather than it being a binary outcome which traditional programming you do. With a model, there's a statistical set of outcomes. In which case, there are some areas where the results not that good and we're not going to know what to do here. Definitely brings some uncertainty into running a production service.

[00:09:53.220] - Simba Khadder

We already talked about it but I want to bring it back to the forefront is, in the early days of MLOps, we didn't even call it MLOps. Nowadays, there's a whole set of startups, there's a whole space, there's all these categories that we've defined. How has MLOps changed over the years, from your perspective?

[00:10:12.420] - Stefan Krawczyk

Good question. I think back in the day, people were, such as myself, we were an embedded engineer in a team and we basically did things end to end. Then as companies start bringing ML, back then it was, you need a bit list, bit of a platform to centralize some of this costs. I think that's where the move to ML platform and building ML platforms. I think, Uber's Michelangelo, is a classic example of this. People realized that there's a whole ecosystem around it, you need to build a fully blown solution like Michelangelo to get any value. It's in the mindset and practice, therefore MLOps emerged. How has it changed? I think there's definitely more widespread adoption of machine learning and people realizing that they have similar problems. Then also, realizing that there isn't one solution that fits every single use case. AD surveying technology and machine learning models there is very different than predicting health outcomes. They have different velocities of model retraining and stuff like that. I think there's been a more emergence and acknowledgment from industry.

[00:11:20.520] - Stefan Krawczyk

There are all these different problems, you can't build a single platform that will serve everyone. People have tried. I think those haven't been gone too well. There's also the people who are doing MLOps has changed over the years. I think it was very much CS heavy, computer science background. Now I think it's anyone who comes from the other end, who's coming from, "Hey, I know how to build a model." Say, applied physicist, statistician. How can I now think about deploying and pushing things to production? I think potentially less software engineering, classical training background people getting into things versus before. If you were a machine learning engineer, you're almost guaranteed to have a software engineering background. Now I don't think that's the case. That impacts the framing and who's using and the pitch and tooling from ops.

[00:12:11.640] - Simba Khadder

I love the breakdown. There's almost three stages, that you mentioned. The stage one was MLOps wasn't a thing, you were just an engineer on a machine learning team or a team that happened to be doing machine learning and your goal was get this in production. Stage two was, we'll build a platform team that does this for all company and then stage three is, what we're at now or what we're getting into, which is, we don't really need to have a custom platform per company. A lot of these things look the same in different places, there's lots of nuance. We really need is almost like a set of tools, like DevOps tools, that we can bring in and configure to fit our workflow. It is interesting to compare DevOps to this because it's very similar. Stage one was, you have an engineer on the team and their goal was just get it into production somehow. We'll build it, that person's job is or maybe [inaudible 00:13:00] engineers, get in prod, make sure it doesn't go down. If it goes down, you get a [inaudible 00:13:04], you go figure it out. Stage two was let's go build our own DevOps platform like Google built Borg and there's all kinds of other examples of DevOps toolsets that are created.

[00:13:15.540] - Simba Khadder

Now we're in that third stage, which is, we don't need to go and build Borg, Kubernetes exists, even all the hashwork tools exists, like CI/CD exists. There's all this Opera open source tooling that we can take up and bring together their workflow. It is bit of the same thing happening over again, but just applied for a different problem space.

[00:13:37.770] - Stefan Krawczyk

Yeah, totally. I think it says something about the maturity of how solutions permeate through industry.

[00:13:44.490] - Simba Khadder

With all the changes that are happening, how do you, yourself keep up with everything? It feels like there's a new blogpost, new something every single day.

[00:13:52.440] - Stefan Krawczyk

Yeah. It can be overwhelming. I think my go-to is generally Twitter. The way I discover things is, Twitter is one way, there's is engineering blogs from tech companies. There's a rise of Substack and all the various newsletters. I subscribe to bunch of them that I thought were interesting. What I end up doing is, skimming headlines and then bookmarking things later that I want to read. The best methodologies are things that get the most buzz generally in your eye. I probably [inaudible 00:14:22] because of the way I look at things, but otherwise it's useful from the things that I have bookmarked, is trying to get an understanding for the mindset of, why was this thing created? Because to the point where there isn't one solution that fits all. People create solutions for different reasons. I think it's useful to vicariously live through those posts and things, and try to understand why it was created in the first place. Was it an organizational thing that led to this particular solution versus then building another one? That's an interesting question to ask sometimes with these decisions, because it's not true with MLOps that every company has the same topology, in which case like different solutions, you might be selling to different people.

[00:15:01.080] - Stefan Krawczyk

Should I build this is [inaudible 00:15:02] it really depends on your company topology. At least from staying up to date, I try to step back and ask question, why was it created and what was the environment that created it? Maybe there's something to that, smaller companies do this or big companies, they have these problems. That's at least way that I… things I love to take away from reading posts and things.

[00:15:21.960] - Simba Khadder

Can you name drop a few people, that people should follow or some Substacks you like or anything. Just to name a couple, for people listening who want to at least get started.

[00:15:32.700] - Stefan Krawczyk

Your mileage may vary. I'm terrible at remembering names. I just know that it comes in my inbox. The last one that I remember reading was TheSequence on Substack. Otherwise, on Twitter I follow Sarah from Amplify Partners. Tweets and data ecosystem, ML ecosystem there. I think if you know of any of the open source projects, follow their Twitter handles, maybe follow some of the core contributors on them, you'll get to see updates. Maybe they they'll go to a conference or something and then you can pick things up that way. I have Josh Wells from the data engineering side. I follow the ML, AI gurus Andrew Wing, Andrej Karpathy. I try to hit people from the various big tech companies, there is Sebastián Ramírez from FastAPI, since I've been living in a Python based world or focus on Python based things and so following a bunch of those people there. Try to follow some people from DevOps side, MLOps side. You can follow me or you, Simba. From the actual machine learning infrastructure, I'm like where industry and interesting things that are happening, follow AI, ML research people.

[00:16:37.620] - Simba Khadder

Everyone should probably follow a mix of each section, you need to understand what's happening in AI [inaudible 00:16:41] MLOps specifically, from practitioners. You're going to need to cast a wide net to make sure you catch the things that matter. There's a lot of noise and there's just a lot of things being figured out. It's almost like trying to keep up on papers, you just can't. You just have to find a way to figure out what is and isn't worth keeping up with. You have a class coming out [inaudible 00:17:04] on August 22nd. Can you share some more about it?

[00:17:08.640] - Stefan Krawczyk

Yeah, I'm partnering with Sphere. They're their recent yc batch company and it's called Mastering Model of Deployment and Inference. The idea is an executive Ed style class for practitioners. There's four two hour sessions. The goal is to try to give people the skill set and ability or at least the know how to improve latency and throughput by thinking about like how do you select appropriate inference architectures, how do you reduce outages or the mean-time to resolution? How do you do that? What are some common model observability approaches to do so? What's the overall macro-architecture and what is the impact of your machine learning architecture with respect to reliability, scalability and getting models to production? If you think about it, you're thinking about questions such as, what components should my model deployment system have? Where will my current approach to deployment inference break down? It isn't going to be all lecture based. It's going to be me lecturing a little bit, but then it's going to be some group work or a group discussion. Hopefully there'll be some other machine learning engineers or people who deploy stuff to production so that there'll be some interesting networking, could also be some interesting questions asked, I have this problem, et cetera, so you can maybe let your classmates. Hopefully by the end of the class, learners, you'll be able to answer questions that something like the following: What are the components that my model deployment system should have?

[00:18:27.810] - Stefan Krawczyk

I think it's an interesting one. Not everyone needs every single component because I don't think it really depends on your SLA and what's the cost of an outage. Our discussion is to, what components should you have? Then maybe even helping you take a critical look at, what is my current approach to deployment and inference and where is it going to break down? What is going to be painful for the business or for me and then what do I want to do about it? What are some architectural pressures or changes that I could pattern that I could use to help make a decision? What are some architectures/patterns or tools that can help me reduce my outages or my mean-time to resolution? A class that try to pack in a lot in four sessions but the idea is that it's framework agnostic and that you can try to take away some general mindset thinking and patterns that you can apply to your particular context.

[00:19:16.020] - Simba Khadder

We'll have a link at the bottom. If you want to check it out and if someone checks it out, you'll be able to learn more there. Let's talk about Stitch Fix. You built out a lot of the ML infrastructure there. Maybe you could share, how did the workflow look like a Stitch Fix?

[00:19:30.360] - Stefan Krawczyk

You don't know, Stitch Fix has 100 plus data scientists which… It's role was to iterate prototype production [inaudible 00:19:38] be on call for models. We're trying to build tooling and abstractions to enable them to software engineering [inaudible 00:19:44], pull things off the shelf and create their workflow and getting things to production without software engineering [inaudible 00:19:50] rather than them having to engineer things themselves. That's the mindset and the thing that we're going for. We were always competing with people doing things themselves. In which case, the task of my team was to like build better tooling and get people to adopt our stuff. By the end of it, we were working on a way to a YAML config in Python code driven way to creating machine learning pipelines. The idea was to how do you enable people to more easily manage and create pipelines because that was bit of a problem. It's districts where everyone wrote their machine learning pipelines in different ways. How do you standardize that? The idea was, if we can get people to write these configs that then would, under the hood, essentially compile down to airflow jobs.

[00:20:34.710] - Stefan Krawczyk

How do we build this API layer, that we're not leaking if we're not leaking too much of the context but we're standardizing and trying to simplify how people specify things in a way that helps ensure that things aren't too coupled, so that if they want to reuse something or change something, it's not going to be a lot of pain and effort to do something. That was then all built on top of a little framework we called the Model Envelope. It was like our ML flow model, ModelDB analogous type of technology. With the model envelope as the metaphor, I'm trying to suggest is, you have a model and an envelope and then you're shoving things, not only about the model but things about it into it. Then we're packaging up in an envelope so that we could then use it in various contexts without you having to write any code to do so. It's just tricks, you could write a model, save it and then you could have it deployed into production in under an hour because we could auto generate the web service code for you. Getting to production was, at least a model to production, was pretty easy.

[00:21:32.280] - Stefan Krawczyk

Getting features to production that was a little harder, just to implement things potentially at two places or we're trying to fix that, move on that to try to make it only one. Then we're also trying to simplify the management of model pipelines. Since over time, as the team grows, you generally inherit and have more machine learning models or more pipelines that grow. How do you manage that? That's where that effort was going, to help teams not have to incur tech tag by relying on a platform less so there on code.

[00:21:59.820] - Simba Khadder

A lot of it was standardizing, once you have the weights, the model itself, it's standardizing everything around it. How is it deployed? Was it need this input all that and making it so that with this, once you fill these configuration files for me like I'll make it so in production.

[00:22:15.630] - Stefan Krawczyk

Yeah. The Model Envelope was, you can say, independent from the system for defining your model training pipelines but essentially, yeah. With the DevOps and MLOps mindset, we're trying to put in the hooks and things such that people didn't have to make different decisions on how do you log things in a web service? We just kind to standardize it by actually saying, "We'll just generate the web service for you." "What are the things that we should save about the model?" Hence, we introspected the Python environment to make sure that we capture the Python dependencies exactly as what they were so that we could one, always reproduce the model but then to have a have a pretty good idea of what's required to run it in production. Then with the system to simplify model training and model training pipelines, it was around a lot of people, role model training code that was highly coupled to their context and it was very hard to share that. How do you do that? You needed to couple how you provide inputs into the model training process, in which case it was lsoftware abstraction was config to split the two apart of like how you create and featurize data and provide its training and make the training the process of creating a model pretty standardized in a way that is agnostic of how data gets to it.

[00:23:22.560] - Simba Khadder

What's a design decision that you made in building this that perhaps someone listening is building like a model surveying or a similar thing, what's something you decided that you can maybe share, that worked really well.

[00:23:33.090] - Stefan Krawczyk

I think the easiest is to probably talk about the Model Envelope. One of the ideas was, we don't have an API that someone triggers to deploy a model at the end of their pipeline. If you look at a bunch of these frameworks, you save the model, but then somewhere along the lines, you say deploy model step. We explicitly made a decision to not have that and instead people then need to go to a web service UI and write a little set of rules that would trigger deployment. That was so that was very easy for us to then insert and stick in a CI/CD step such that we could enforce one. For anyone who wanted to deploy something, we could make it very easy for someone to like, "Oh, you want to deploy this model? Okay, well, maybe you're on the point of staging first." Do some things with staging, say that it's good and then we'll kick off a production deployment. Whereas if we had allowed people to stick in deploy model at the end of the script, we wouldn't have had as much control with respect to inserting and upgrading and changing how models are deployed so we made a very explicit decision that, you can't trigger deployment programmatically, you need to do it through our service, which where you set up a SM rules.

[00:24:39.120] - Stefan Krawczyk

[inaudible 00:24:41] worked really well for us. We were able to ensure that if we wanted to change how things are deployed or how things are triggered, we only had to change our service. We didn't have to change anyone else's pipelines or code or anything. That was one very explicit decision we made.

[00:24:55.800] - Simba Khadder

That's fascinating. That's a very interesting decision. I feel like that can apply in a lot of different places. A lot MLOps [inaudible 00:25:01] anything, You say send, there say deploy and making it so that you almost can't do that. It has to be integrated into [inaudible 00:25:10] system, very opinionated. Yeah, it's fascinating. I also want to talk about another big project that obviously you've shared a lot about is, Hamilton. Maybe you could share more. For those who don't know, what is Hamilton and how does it fit into the system?

[00:25:23.700] - Stefan Krawczyk

Hamilton is a declarative dataflow paradigm in Python. What do I mean by that? It's declarative in that as you're writing code, you're declaring an output and then you're declaring more inputs. You're not writing procedural code when you're writing dataflow. What's a dataflow? Dataflow is basically an academic term for basically modeling, how data and computation flow. You can think of an analogous to workflow, a pipeline. You're basically building a workflow or a pipeline. Hamilton, where it fits in and where it came from was, it was actually one of our earlier projects that we did, and it was to help the team who was doing time series feature engineering and the code base. It was like one of the oldest teams with Stitch Fix. They were a team that I created, operational forecast for the business, that business made decisions on. They were always under the gun to like produce numbers or forecasts so that the business can make decisions. In which case, they're not a team that had time to address tech tag or anything like that.

[00:26:19.140] - Stefan Krawczyk

Essentially, they were in such a state that their feature engineering code was spaghetti code. That's partly because with time series feature engineering, you are creating a lot of features. You basically think of a data frame or data frame that you're going to be training or fitting a model on. It's thousands of columns wide. It's not necessarily big data, but it's very wide and it's very wide because of the feature during process, because you're usually deriving features from other features. You're basically chaining features together to create other features. If you do it in a procedural way, once you get to a certain scale, your code can very easily devolve into spaghetti code, especially if you're using pandas. Hamilton was built to try to mitigate and ensure that things are [inaudible 00:26:57], they're documentation friendly, and can help them with their workflow, creating features and generating this featurized data frame. Back to Hamilton, so rather than writing procedural code, we're doing column C equals column A plus column B, you would rewrite that as a function where the function name is C or column C and then the function inputs, the input arguments, column A and column B, and then you'd have your logic to some column A, and B.

[00:27:24.630] - Stefan Krawczyk

You could use the function docstring document. Anywhere that you basically have assignments, you're rewriting in a script, you're rewriting it into a function, and then you then have to write a little more… we got a little driver script, but the driver scripts purpose is to basically stitch together these functions because the function name declares an output, if you want to create, actually create column C or use column C and some other thing, you write all these functions. We then crawl the python codes to pass some code in this, what we call driver object to create a directed acyclic graph. Basically it's a dependency chain of, if I want to compute column C, I know I need A and B as input. A and B can either be defined as either functions, you would look for a function name with a column A, column B, that would be provided as inputs. Hamilton decouples the modeling of the dataflow with the pipeline from materialization. You're running these declarative functions that declare a workflow or a pipeline or a dataset, of how data [inaudible 00:28:21]. Then you're writing this driver script that's actually defining the dag, and that's where you are providing configuration and inputs.

[00:28:29.220] - Stefan Krawczyk

At the end you're specifying what you want computed. Now with Hamilton, you can specify or rather you can model a superset of transforms and things and only in the driver you only have to request the things that you need because we have a DAG or directed acyclic graph, we can walk the graph in a way that we only compute what we need. This means you can test integration tests things very easily. You don't have to have a monolithic script where it's like if you add something, you need to run everything to test something. No, with Hamilton it's very easy to just test that one thing you add into end. It's also very easy to see unit test because you write everything as functions. Those functions, you're not leaking how data gets into it. It's very easy to to writing a unit test that passes in the right data to exercise the logic. That's been running a production [inaudible 00:29:12] for over two and a half years. Since then, we open-sourced in October. We're now adding a few more things to it. If you want to scale on to [inaudible 00:29:19], it's very easy to do so.

[00:29:21.510] - Stefan Krawczyk

You don't have to do anything. You just have to change some driver code. We recently just added some basic ability to do some runtime data quality checks. Common complaint was, "Hey, my problems are running. I think the code looks good, but the output is crap. What's going on." Right now with Hamilton, it's very easy to set with the function just above it, with a decorator, you can set some expectations such that when things run execution time, we can run a quick check to ensure [inaudible 00:29:46] types are there. There are no NaNs or if there should be less than 5% of NaNs and things like that. I think Hamilton right now is a pretty interesting tool for anyone doing any feature engineering, and especially if you're doing a time series feature engineering.

[00:30:00.300] - Simba Khadder

One thing is based off of listening to a lot of what you've been saying that I can have a very similar opinion as you is, most of MLOps problems are abstraction problems. While people treat them like infrastructure problems. The hard problem, in MLOps, in my opinion, is getting the right abstraction in the workflow to… You have to be [inaudible 00:30:18] in some places and very configurable numbers. You have to be like, "Hey, you need to do things this way, that's only keeping this version, organized, testable, et cetera." In other places, if you want to use, like in your example, [inaudible 00:30:31] MLOps like this. Do whatever you want. Train, use if you want to use tensorflow, you want to use PyTorch, we don't really care because that's where every needs data science is. What we care about is all the coordination and metadata that comes in to allow you to do that without having to like mainly say, "this is this and it goes here." We have to set some parameters, some framework so that you follow this framework, it's implied where it will go, what version it is, what the name is, all this other stuff that you want in MLOps pipeline.

[00:31:02.990] - Stefan Krawczyk

Yeah, totally. To preventing bad things from happening in production, you really need all that extra metadata. With the abstraction is, how do you get people to do it? It's either procedurally you have to add things in, which in case people forget or they don't do it at all because it's extra work. Then how do you do it in a way that can pull out of things automatically? Part of the theme of my team in Stitch Fix was, how can we do that in a way that people just don't have to think about it and it's the right way to do it. Hamilton is in bit of this theme as well. You right functions out, they're automatically unit testable just by design. You don't have to think about it later. Definitely it's an abstraction problem if you want to do MLOps well.

[00:31:45.530] - Simba Khadder

We have talked about so much different aspects of MLOps from surveying to data pipelines to how it's changed over the years. What's something you're most excited about in a MLOps space right now?

[00:31:55.820] - Stefan Krawczyk

That's a good question. I think there's so many open source tools, there's also so many startups and companies trying to do something in space, if I was a practitioner and I like to build models, I'd be excited because it's never been easier to get something up and running without you having to build all of it yourself. One of the things that I'm looking to is, how does low-code solutions impact MLOps and how much code are people actually going to be writing to get most production? Is it going to be config based stuff? Is it going to be integrated to a SQL? You see companies, guys out of Uber, I think Ludwig [inaudible 00:32:32] they're trying to use SQL like syntax to help people do models and do modeling. There's a bunch of companies like that and there's companies that are trying to appeal to more people who want to write a little bit more code. To me, it's exciting to see all these different approaches emerge, and I'm wondering who's going to win? Which one is going to actually ultimately win out?

[00:32:53.840] - Stefan Krawczyk

I wouldn't be surprised if everyone wins because there is so many ways and so many different companies that have machine learning about a structure and function differently in different SLAs, in which case it could be a win for all of them but otherwise, just the Cambrian explosion, you could say, of MLOps tools. I'm excited to see how things evolve as to what eventually dies off and what actually sticks.

[00:33:12.990] - Simba Khadder

I love that. I think there's so much more we could cover, but we do have to eventually wrap up. I would love to like you're almost like a TL;DR. What is a tweet length takeaway that someone listening to this podcast should leave with?

[00:33:26.120] - Stefan Krawczyk

Use Hamilton, take my course, right? No. The tweet length takeaway is I think understanding the environment on which you operate based on a tweet… that's not a tweet length. You should understand the impact on the cost of what you want to prevent. I think it's probably maybe one way to frame what we've been talking about. MLOps is very specific to our environment. You want to prevent bad things from happening and so therefore the solutions that you want to implement, I think are related to your environment and what the cost is of bad things happening. You should understand the impact of costs, what you want to prevent.

[00:33:59.030] - Simba Khadder

I love that. Thanks so much for hopping on, Stefan. This has been such a cool conversation. Thanks so much.

[00:34:02.780] - Stefan Krawczyk

Thanks for having me, Simba.

‍

The MLOps Mindset with Stefan Krawczyk

MLOps Weekly Podcast

Related Listening

MLOps and Feature Stores in 2025 with Ben Epstein

Bridging Software Engineering and MLOps with Paul lusztin of Decoding ML

From Recession to Al Boom: Venture Capital Perspectives with Gautam Krishnamurthi

Building the Future of ML Platforms with Ketan Umare

Ready to get started?

PRODUCT

RESOURCES

COMPANY

PRICING

DOCS