Tutorial project
Last updated
Last updated
Before diving into the tutorial, we strongly recommend reviewing the previous sections of the GETTING STARTED guide. This will provide you with a solid foundation and make it easier for you to follow the instructions and ensure a smoother progression throughout the tutorial project.
The completion of the tutorial will take 10 to 15 minutes depending on your existing knowledge of Vectice. In completing the tutorial, you will learn:
To effectively navigate the Vectice application, projects, iterations, and assets
To organize your projects in phases and iterations
To use the Vectice Python API to log key artifacts inside iterations
To create phase documentation from the assets you logged
To request a review in Vectice after completing your phase
Imagine you're a data scientist or machine learning engineer working at a large retailer. Your manager has just informed you that the unit sales prediction model was flagged as using unethical data during the last quarterly business review. You must urgently retrain the model by dropping a column from the original dataset. Unfortunately, the colleague who developed the model is unavailable. As a newcomer, you'll use Vectice to understand the project history and proceed with retraining.
This tutorial has two parts.
In the first part of the tutorial, you will understand the work completed to build the original model. You will use the Vectice API in the modeling notebook to retrain and log the new model to Vectice.
In the second part of the tutorial, you will return to the Vectice app to see the auto-documented work produced from the work you completed in the first part of this tutorial.
Let's go!
Log back into Vectice if you were logged out.
Check your original Vectice invitation email if you are unsure how to log back in.
We will use the global search capabilities to find the tutorial project you need to work on.
Vectice search lets you discover assets and projects created by your team in the past. You can use advanced datasets and model filters, such as model techniques or data source types, to find assets that can be reused in your own project.
To find the tutorial project, simply type "Forecast" in the global search bar on the top left of the screen.
Pick the most recent project in your personal workspace if more than one result is coming back. You personal workspace is prefixed with a period followed by your name. (i.e., .tsmith)
Open the tutorial project; you will see an overview of the Forecast Project, with the various phases, contributors, and recent activity.
The project follow a standard Machine Learning project lifecycle, encompassing the iterative nature of both the development and production phases. It consists of six phases, starting with Data Understanding and concluding with the Quarterly Business Review phase.
Vectice provides a single interface to organize the key milestones, decisions, and assets created during the entire lifecycle of ML projects to foster alignment and collaboration.
Click on one of the phase names, for example, "Data Understanding" to open it.
Take a few minutes to familiarize yourself with the "Data Understanding" phase and other phases of the project, gaining an overall understanding of the work history. For now, focus on getting a high-level view of the project.
Vectice allows teams to easily customize the various phases of the project and create project templates to standardize best practices. This is outside the scope of this tutorial. You can refer to the guide templates best practices if you are interested in more details.
Now, let's dive deeper into the modeling phase, as our task involves retraining the existing model. We will explore this phase in detail to gain a complete understanding of the steps that were taken.
Go to the "Modeling" phase. Iteration 1 was starred and selected for the model training. Click on it, and you will see something similar to the screen below.
From the iteration page, you can retrieve the modeling history, assets, and lineage of the original model.
Click on the scikit-learn Ridge Regression model that you see inside the Build Model section of the iteration.
This will display a view of the previously trained v1 of the model, including its lineage and performance metrics.
Next, let's retrieve the modeling data to verify if it contains a postal code column.
From the lineage section on the model version page, click on ProductSales Modeling next to the database icon. ProductSales Modeling represents the dataset used to train and test this model version.
Once you open the ProductSales Modeling dataset you will see the following.
The Dataset Resources will list all the columns (features) used to train this model version.
Use the search bar to search for the Postal Code column. Perfect, you identified the column that you will need to remove, and that was indeed part of the training data.
Go back to the iteration you analyzed by using the "Creation Details" panel on the right of the screen and click "Iteration 1".
Now that you are familiar with the dataset, model, and code that were previously used, it's time to do a retraining.
For convenience, we have provided a phase for you named Model Retraining, where you will log your work as you retrain the model.
Vectice offers the flexibility to create, rename and reorder phases using drag-and-drop so your project can reflect the non-linear nature of ML initiatives and be customized to your needs.
Open the Model Retraining phase. It should be empty.
Copy the Phase Id from the Vectice app. We will use this Phase Id in the Model Retraining notebook we will provide in the next section.
In Vectice, every project consists of one or more phases. Each phase is assigned a Phase Id that always remains the same after it is created. The phase Id uniquely identifies a specific phase across all projects and workspaces in your Vectice instance.
Before, you can execute Model Retraining notebook to retrain the model will need an API key to connect to Vectice.
The instructions below assume you already created and have your API key. If you have lost your API key, simply create a new key by following the instruction on the "Create an API key" page in the left menu under GETTING STARTED.
Below, you will find our Model Retraining tutorial notebook available for download from our GitHub public repo. You can use this notebook from your preferred Python environment. Alternatively, if you prefer to run the notebook in a hosted environment, we also provide the same notebook in Colab for your convenience.
We recommend having a split screen of the modeling notebook and the Vectice UI to easily navigate between the two and see the changes being made while you are executing the cells in the notebook.
After retraining the model and logging your assets inside an iteration, you can now view your auto-documented work in the UI. Let's dive in!
Return to the tutorial project in your personal workspace. Select the Modeling Retraining phase and the Iteration tab to find and select your iteration. You should see a new iteration containing the updated dataset, your retrained model, the note and attachments you logged from your notebook.
You can also find these datasets and models in the Datasets and Models tab of your project.
Click the ProductSales Modeling dataset, Version 2, and use the search bar to verify that the "Postal code" column has been removed from the training dataset.
You will also notice the dataset is "Version 2". When you derive a dataset from another via the API, it automatically logs the lineage and increments the version if the metadata has changed.
Return to the Modeling Retraining phase and make sure the Documentation tab is selected.
Write Retraining Iteration at the beginning of the documentation page and convert it into an H1 header.
Next, add an iteration widget below the Retraining Iteration header by selecting Insert > Iteration.
Finally, click the Add an iteration widget and select your last iteration to display its content.
Add another header and call it "Model Performance Comparison".
Next, add a model widget.
Click the Add Model button, select the scikit-learn Ridge Regression model and click on Insert.
You can confirm that the new model version 2 has very similar performance to model 1 so you can move on to asking for a review and get the new model version approved.
For the purpose of this tutorial, you will request a review from yourself. Start by assigning yourself as the phase owner.
Go to Model Retraining phase tab -> Select Phase Setting. Under the Phase Detail, select yourself as the Owner.
Then, select Reviews and click Request a review.
Request a review from yourself and leave a message. You will see a pending review request similar to the image below.
If you do not receive emails from Vectice, they may be disabled by your organization. In this case, check your notifications in Vectice for the review request.
You can then approve the review to complete the Model Retraining phase of the tutorial by selecting Submit review -> Approve -> Submit.
Log in and search for the Forecast Unit project to review the project history
Familiarize yourself with phases, data, Vectice approaches, and code repo,
Duplicate the modeling phase, retrieve the existing notebook, drop the necessary column, and log the new modeling dataset
Create a new model version, add a note to denote the dropped column
Add widgets for the training dataset and model version, upload the updated notebook
Request a review from your in the UI to complete the phase
Discover how to Invite your Colleagues to Vectice by using the instructions on the next page.
Once you completed the Modeling Retraining notebook, return to this document to complete Part 2 of the tutorial where will learn how to see the updates you have made from the notebook in the Vectice app.
Congrats on completing the tutorial! You have learned how to easily navigate Vectice to complete the following tasks: