Hi folks!
Featureform v0.10 looked more like a major release (as opposed to an itty-bitty minor release) with a number of exciting new functionality & support, including:
Because of the sheer volume of new features and enhancements the team has developed for this release alone (while co-sponsoring the UC Berkeley Hackathon as well as conference duties at the Databricks Data + AI Summit), we’ll be spreading the highlights over a couple different posts in order to fully appreciate the sheer scope of support and capabilities data scientists can now experience with Featureform v0.10!
We also want to give a big shout out to our community members and our customers for their ongoing support and feedback.
Feel free to drop us a line in our Slack community with your thoughts on v0.10!
In this post let’s focus on the dashboard!
Whether working solo or as part of a large enterprise organization, data scientists no longer need to search for the source-of-truth about where the data sources, feature transformations, and training sets live.
Instead they can view all of their organizations resources currently registered with Featureform in a single place. A user can see which models are using which features, which features are stored on which providers, and other cross-sections of exploring the metadata.
In other words, the dashboard provides a comprehensive view of data lineage, transformation logic, ownership, and more.
A common pain-point for data scientists is keeping track of data, model, & code artifacts, including changes & experimentation runs.
So how does Featureform v0.10 continue to help solve these problems?
Resources that data scientists are able to view through the dashboard (assuming they have the access to do so) include:
One key area we’ve made enhancements in our dashboard is search, specifically the ability to search across all resources (both by name and also metadata, such as tags) as of v0.10.
Not only can all these resources be easily viewed within the Featureform dashboard but so too can the different variants associated with each resource, as well as all the metadata provided with the resources, including:
One major enhancement as of v0.10 is the ability to assign tags to resources directly through the dashboard UI. Tags are unique to the resource and can be used to group resources.
Navigate to a resource in the Featureform dash and there’s a rich amount of information available (without having to @here the question “where pricing dataset used to power email referrals model plz halp 😅” in the company slack #data-newbies channel).
For example, as a newly hired data scientist, I want to understand more about a feature (or set of features) being used to power a model for predicting home prices.
I also want to know what data is being used where and for what purposes.
Clicking on a source, I can immediately learn the following:
Aside from sneak-peeking the first 150 row values in both local and hosted mode, users can also directly copy values from the source table to check data types and values.
We’re committed to making the user experience for data scientists seamless and this can be seen in a number of ways, especially in our commitment to clean design.
Another area where this commitment to legibility goes hard is the updates we’ve made in pretty formatting for your Python and SQL transformations (as of v0.10).
No need to copy the transformations to VSCode & hit save just to let your linter clean up the super-long-SQL-query-with-nested-CTEs juusssst right so you can see that yes, the referrals model is in fact using that orders table that everyone thought was super defunct and deprecated but is in fact still going since Jason from Finance & Accounting left to go start a farm (funded largely in part by their “day-in-the-life-of-a-tech-agro” TikToks).
The Featureform V0.10 release builds off existing investment (in both open-source and enterprise Featureform) to support individual practitioners move towards greater collaboration both with their data science teams but also external teams.
Data science teams have needs that include:
One of the most important ways we do this is by providing:
The dashboard reports metrics, namely throughput, latency, and errors for that variant are displayed for features and training sets.
Teams can leverage the existing monitoring capabilities provided by Kubernetes or integrate their own monitoring solutions to provide near real-time information on feature pipelines being used for production inference and training.
Feature stores aren’t just specialized databases where machine learning features get parked (and eventually go to die from neglect).
Feature platforms are integral to the data science workflow at every stage, from when a data scientist is first beginning to experiment locally with a dataset on a Jupyter notebook to when they’re ready to deploy their feature engineering pipelines and materialize them in an offline store.
Auto-generated variants, named runs, and immutable variants ensure that as long as the data scientist hits apply, all their work will be registered with Featureform, won’t write over existing features, and will be readily available and visible through the dashboard.
Interested in learning more about v0.10 of Featureform or looking for access control and governance capabilities?
Book a demo of the Featureform platform here!
See what a virtual feature store means for your organization.