Iterative Development Guide

This guide will show Data Scientist how to sync their iterative development to Vectice.

In iterative development, a model is developed and tested in cycles of repeated iterations. The model's hyperparameters are tuned, developed, and tested each iteration until a fully functional algorithm is ready for deployment.

To get started, we will walk you through an iterative development process using Vectice.

Step 1: Installation & Configuration

Install and import any packages you need for your model development, including the vectice library. If you have not installed the vectice library, view the Install Vectice Library guide for more details.

Once you installed and imported vectice into your script, configure your work environment to connect to the Vectice API. To learn how to configure your work environment, view the Vectice Configuration guide for more information.

Now that you have connected to the Vectice API and linked your script to a project, we can start an iteration.

Step 2: Retrieve a phase

To start an iteration, you must retrieve a phase. Here, we have already connected a project to our Vectice class named project as shown in our configuration guide. To retrieve a phase, enter the code below, along with your phase name.

phase = project.phase("phase name")

Step 3: Initialize Iteration

To initialize an iteration, your phase must have at least one step defined. Otherwise, you will receive an error. To create a step, view our Setting up a Project guide.

After retrieving a phase, we want to begin an iteration. To do that, we will initialize an iteration with this code below:

my_iteration = phase.iteration()

This will retrieve an existing iteration or create a new one for you if you do not have an ongoing iteration. Each iteration contains the sequence of steps that were defined at the Phase and acts as a guardrail for data scientists to provide their updates.

You can have only one active iteration per phase. You may have multiple active iterations across various phases. At a phase level, there can be multiple active iterations, but only one can be active per user.

Step 4: Retrieve a step

Once an iteration is declared, you can retrieve a step within the phase you want to develop.

When retrieving a step in your current iteration, you must prefix the step name with step_. For example, to retrieve the "Collect initial data" step, you can access the iteration property using my_iteration.step_collect_initial_data.

my_step = my_iteration.step_name_of_step_in_UI

The assets created during your machine learning work can be linked to the steps, thus serving as a vehicle to provide automatic progress updates. Those updates will render on the UI within the phase iteration you are working on.

Step 5: Development

The development process is where the data science magic takes place. You can develop, test, and validate your models, all while using simple lines of code to share your updates and metadata of your work.

The following are a few capabilities that you can utilize to add visibility into your model development.

Register a Dataset

You can register all datasets used during development to the Vectice UI. This includes raw data, cleaned data, and modeling data. For more information on how to register datasets during development, view How to register datasets.

Register a model

Your models are continuously improving in each iteration. You can register each model version to the Vectice UI. For more information on how to register models during development, view How to register models.

Code Capture

Vectice captures your local code if there is a .git folder or repository available. Only locally changed files not already in Vectice are captured with each new git commit.

Automatic code capture is enabled by default. To disable the automatic capture of your local code, you can set code_capture to False, as shown below:

import vectice
vectice.code_capture = False

View How to capture code for more information on code capture during development.

Version Control

Vectice enables the version tracking of datasets, models, and code used during development. Stored metadata for datasets and models includes version IDs, descriptions, lineage, properties, resources, attachments, model algorithms, metrics, status, and dataset types.

Asset versions and metadata are accessible in the UI within the Datasets and Models sections of the Project.

All assets are automatically versioned, making it easy to follow a project's progression and compare results from multiple iterations.

Step 6: Complete an iteration

Once you have completed the expected goals specified in a step, you can close the iteration as below. This line of code will send information to the web UI in real time.

iteration.complete()

Once an iteration is complete, you can start the next iteration to continue iterative development until all steps are complete. You can view all the steps and their status by using:

iteration.steps()

Once the final step is complete, the iteration is automatically marked as complete in the Vectice UI.

You may revisit the details of the iteration for a retrospective. If satisfied, you can summarize your outcomes or start another iteration.

Full Workflow Example

from sklearn import datasets
import numpy as np
import pandas as pd
import vectice

# Connect to your project by entering your 
# API Token, endpoint, workspace, and project name.
my_project = vectice.connect(
        api_token="YOUR_API_TOKEN",
        host="YOUR_ENDPOINT",
        workspace="YOUR_WORKSPACE",
        project="YOUR_PROJECT_NAME",
    )

# Retrieve the project phase to begin the model development.  
phase = my_project.phase("phase one")

# Initialize iteration and retrieve the register origin dataset step
my_iteration = phase.iteration()

# Data science magic goes here:
iris = datasets.load_iris()
data = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                      columns=iris['feature_names'] + ['target'])
data.to_csv("origin_iris.csv")

# Use the FileResource to wrap your local origin dataset
origin_dataset = FileResource(path="origin_iris.csv")

# Register the origin dataset to Vectice associated with the step "Origin data"
my_iteration.step_origin_data = Dataset.origin(name="Iris Origin", resource=origin_dataset)

# Completes and closes the current step
iteration.complete()

The above is just an example of how to perform various actions. The best practice is to have one file (or notebook) per phase of the Project.

All registered iterations are rendered in the UI in real-time showcasing metadata such as the iteration number, status, owner, latest step, last updated, and created date.

View the Python API Docs section guides for more information on how to use Vectice API for development.

Last updated