Featureform v0.🔟 Highlight 🔦: The New & Improved Dashboard

July 13, 2023

min read

Hi folks!

‍

Featureform v0.10 looked more like a major release (as opposed to an itty-bitty minor release) with a number of exciting new functionality & support, including:

‍

Vector database support: For providers like Redis, Weaviate, and Pinecone for LLM-based workflows in both local mode and hosted.
Dashboard upgrade and makeover: New functionality and enhancements of the Featureform dashboard for both user and administrators, including metadata management for resource tags, previewing transformation results, and clear visibility of transformation logic.
API improvements for data science development: Native support for dataframe operations, including reading a directory of files.
Updated documentation: The energy and drive from the UC Berkeley Hackathon participants encouraged us to update our contribution guidelines and add more examples leveraging Featureform to build modern ML applications.

‍

Because of the sheer volume of new features and enhancements the team has developed for this release alone (while co-sponsoring the UC Berkeley Hackathon as well as conference duties at the Databricks Data + AI Summit), we’ll be spreading the highlights over a couple different posts in order to fully appreciate the sheer scope of support and capabilities data scientists can now experience with Featureform v0.10!

‍

We also want to give a big shout out to our community members and our customers for their ongoing support and feedback.

‍

Feel free to drop us a line in our Slack community with your thoughts on v0.10!

‍

‍

In this post let’s focus on the dashboard!

‍

The Featureform Dashboard: The Homebase for Data Science Workflows

Whether working solo or as part of a large enterprise organization, data scientists no longer need to search for the source-of-truth about where the data sources, feature transformations, and training sets live.

‍

Instead they can view all of their organizations resources currently registered with Featureform in a single place. A user can see which models are using which features, which features are stored on which providers, and other cross-sections of exploring the metadata.

‍

In other words, the dashboard provides a comprehensive view of data lineage, transformation logic, ownership, and more.

‍

How We Support Versioning & Documentation For Data Scientists

A common pain-point for data scientists is keeping track of data, model, & code artifacts, including changes & experimentation runs.

‍

‍

So how does Featureform v0.10 continue to help solve these problems?

‍

Viewing So Many Resources, In So Little Time

‍

Resources that data scientists are able to view through the dashboard (assuming they have the access to do so) include:

‍

Training Sets: Sets of features matched with the respective labels to be served for training.

Features & Labels: Including the transformation logic as well as the feature values themselves.

Models: The model name registered at the time of serving features or training sets to create a logical grouping of models and their dependencies.

Sources: These include both primary sources (the original data sources that are passed into Featureform and used in transformations) and transformation sources (the data resulting from the transformations defined using SQL or Python and registered with Featureform).

Providers: The providers being used as offline or online stores (include Redis, Snowflake, BigQuery, etc).

Users: The individual data scientists who create, share, or reuse the features, training sets and models.

‍

‍

One key area we’ve made enhancements in our dashboard is search, specifically the ability to search across all resources (both by name and also metadata, such as tags) as of v0.10.

‍

‍

(Occasionally) Editing Resource Metadata

Not only can all these resources be easily viewed within the Featureform dashboard but so too can the different variants associated with each resource, as well as all the metadata provided with the resources, including:

‍

Name
Variant
Schedule
Description
Tags & Properties.

‍

One major enhancement as of v0.10 is the ability to assign tags to resources directly through the dashboard UI. Tags are unique to the resource and can be used to group resources.

‍

Sneak-Peeking The Data

Navigate to a resource in the Featureform dash and there’s a rich amount of information available (without having to @here the question “where pricing dataset used to power email referrals model plz halp 😅” in the company slack #data-newbies channel).

For example, as a newly hired data scientist, I want to understand more about a feature (or set of features) being used to power a model for predicting home prices.

I also want to know what data is being used where and for what purposes.

‍

‍

Clicking on a source, I can immediately learn the following:

What the data looks like with our “Preview Result” enhancement (as of v0.10);
What training sets, features or labels the data source is being used in;
Additional context about the source (with the “Description” field);
How ready the source is for further use (with “Status”);
Where the data came from (with “Origin”);
And the associated provider, including where the source is materialized (with “Provider”).

‍

Aside from sneak-peeking the first 150 row values in both local and hosted mode, users can also directly copy values from the source table to check data types and values.

‍

I Can See Clearly Now The Formatters Are Here

We’re committed to making the user experience for data scientists seamless and this can be seen in a number of ways, especially in our commitment to clean design.

‍

Another area where this commitment to legibility goes hard is the updates we’ve made in pretty formatting for your Python and SQL transformations (as of v0.10).

‍

(No, it’s not available for Apple Watch. Stop asking. Some of us already need Lasik.)

‍

No need to copy the transformations to VSCode & hit save just to let your linter clean up the super-long-SQL-query-with-nested-CTEs juusssst right so you can see that yes, the referrals model is in fact using that orders table that everyone thought was super defunct and deprecated but is in fact still going since Jason from Finance & Accounting left to go start a farm (funded largely in part by their “day-in-the-life-of-a-tech-agro” TikToks).

‍

Breaking Silos Throughout Teams By Improving Collaboration & Streamlining Cross-Functional Workflows

The Featureform V0.10 release builds off existing investment (in both open-source and enterprise Featureform) to support individual practitioners move towards greater collaboration both with their data science teams but also external teams.

‍

‍

Data science teams have needs that include:

Collaboration: Collaborating with other members of the DS team (and potentially even external partners) on projects, with visibility into progress or health of data science assets.
Cross-Functional Workflows: Interfacing with non-DS teams (including other engineering teams, as well as non-eng teams like legal & marketing).
Documentation & Discoverability: Sharing & distributing knowledge asynchronously, while getting ahead of human bottlenecks & the accumulation of tribal knowledge.
Compatibility With Existing Product Stack: Doing everything with the least amount of overhead possible with the least amount of steps.

‍

One of the most important ways we do this is by providing:

‍

Real-Time Visibility of Feature Pipeline Health

The dashboard reports metrics, namely throughput, latency, and errors for that variant are displayed for features and training sets.

‍

Teams can leverage the existing monitoring capabilities provided by Kubernetes or integrate their own monitoring solutions to provide near real-time information on feature pipelines being used for production inference and training.

‍

‍

Unifying Feature Development, Experimentation & Production

Feature stores aren’t just specialized databases where machine learning features get parked (and eventually go to die from neglect).

‍

Feature platforms are integral to the data science workflow at every stage, from when a data scientist is first beginning to experiment locally with a dataset on a Jupyter notebook to when they’re ready to deploy their feature engineering pipelines and materialize them in an offline store.

‍

‍

Auto-generated variants, named runs, and immutable variants ensure that as long as the data scientist hits apply, all their work will be registered with Featureform, won’t write over existing features, and will be readily available and visible through the dashboard.

‍

Interested in learning more about v0.10 of Featureform or looking for access control and governance capabilities?

Book a demo of the Featureform platform here!

Featureform v0.🔟 Highlight 🔦: The New & Improved Dashboard

The Featureform Dashboard: The Homebase for Data Science Workflows

How We Support Versioning & Documentation For Data Scientists

Viewing So Many Resources, In So Little Time

(Occasionally) Editing Resource Metadata

Sneak-Peeking The Data

I Can See Clearly Now The Formatters Are Here

Breaking Silos Throughout Teams By Improving Collaboration & Streamlining Cross-Functional Workflows

Real-Time Visibility of Feature Pipeline Health

Unifying Feature Development, Experimentation & Production

Ready to get started?

PRODUCT

RESOURCES

COMPANY

PRICING

DOCS