Iterative Development Guide

This guide will show Data Scientist how to sync their iterative development to Vectice.

In iterative development, a model is developed and tested in cycles of repeated iterations. The model's hyperparameters are tuned, developed, and tested each iteration until a fully functional algorithm is ready for deployment.

To get started, we will walk you through an iterative development process using Vectice.

Step 1: Installation & Configuration

Install and import any packages you need for your model development, including the vectice library. If you have not installed the vectice library, view the Install Vectice Library guide for more details.

Once you installed and imported vectice into your script, connect to the Vectice API.

import vectice

connection = vectice.connect(config="vectice_config.json")

Now that you have connected to the Vectice API and linked your script to a project, we can start an iteration.

Step 2: Retrieve a phase

To start an iteration, you must retrieve a phase. To retrieve a phase, connect using your phase ID.

# Connect to your phase using your phase ID
phase = connection.phase("PHA-XXX")

Step 3: Initialize Iteration

Each iteration contains the sequence of steps defined at the Phase and acts as a guardrail for data scientists to provide updates.

To initialize an iteration, your phase must have at least one step defined. Otherwise, you will receive an error.

After retrieving a phase, we want to begin an iteration. We will initialize an iteration with the create_or_get_current_iteration() method, as shown below.

my_iteration = phase.create_or_get_current_iteration()

Current iteration is your last updated iteration. However, if the current iteration is not writable or no iterations exist, we will create an iteration or list writable iterations. If you have multiple "In progress" iterations from the past, to not make assumptions, we will display a list of writable iterations that you can select using the {phase}.iteration("iteration name or ID") method.

You may have multiple writable iterations across various phases. Multiple writable iterations can exist at a phase level, but only one can be active per user.

Step 4: Retrieve a step

Once an iteration is declared, you can retrieve a step from your current phase iteration.

To find a list of step names via the API, use the following method to print a list of step names available to your iteration.

my_iteration.list_steps()

To retrieve a step in your current iteration, select a step by copying and pasting the step'sshortcut name as shown below:

my_iteration.step_step_name

If selecting a step name from the UI, use the prefix step_ before the step name.

The assets created during your machine learning work can be linked to the steps, thus serving as a vehicle to provide automatic progress updates. Those updates will render on the UI within the phase iteration you are working on.

Step 5: Development

The development process is where the data science magic takes place. You can develop, test, and validate your models, all while using simple lines of code to share your updates and metadata of your work.

The following are a few capabilities that you can utilize to add visibility into your model development.

Register a Dataset

You can register all datasets used during development to the Vectice UI. This includes raw data, cleaned data, and modeling data. For more information on how to register datasets during development, view How to register datasets.

Register a model

Your models are continuously improving in each iteration. You can register each model version to the Vectice UI. For more information on how to register models during development, view How to register models.

Code Capture

Vectice captures your local code if there is a .git folder or repository available. Only locally changed files not already in Vectice are captured with each new git commit.

Automatic code capture is enabled by default. To disable the automatic capture of your local code, you can set code_capture to False, as shown below:

import vectice
vectice.code_capture = False

View How to capture code for more information on code capture during development.

Version Control

Vectice enables the version tracking of datasets, models, and code used during development. Stored metadata for datasets and models includes version IDs, descriptions, lineage, properties, resources, attachments, model algorithms, metrics, status, and dataset types.

Asset versions and metadata are accessible in the UI within the Datasets and Models sections of the Project.

All assets are automatically versioned, making it easy to follow a project's progression and compare results from multiple iterations.

Step 6: Complete an iteration

Once you have completed the expected goals specified in a step, you can close the iteration as below. This line of code will send information to the web UI in real time.

my_iteration.complete()

Once an iteration is complete, you can start the next iteration to continue iterative development until all steps are complete.

Once the final step is complete, the iteration is automatically marked as complete in the Vectice UI.

You may revisit the details of the iteration for a retrospective. If satisfied, you can summarize your outcomes or start another iteration.

Full Workflow Example

from sklearn import datasets
import numpy as np
import pandas as pd
import vectice
from vectice import FileResource, Dataset

# Connect to Vectice
connection = vectice.connect(config="vectice_config.json")

# Retrieve the project phase to begin the model development.  
phase = connection.phase("PHA-XXX")

# Initialize iteration and retrieve the register origin dataset step
my_iteration = phase.create_or_get_current_iteration()

# Data science magic goes here:
iris = datasets.load_iris()
data = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                      columns=iris['feature_names'] + ['target'])
data.to_csv("origin_iris.csv")

# Use FileResource to wrap your local origin dataset
origin_dataset = FileResource(paths="origin_iris.csv")

# Register the origin dataset to Vectice associated with the step "Origin data"
my_iteration.step_origin_data = Dataset.origin(name="Iris Origin", resource=origin_dataset)

# Completes and closes the current step
my_iteration.complete()

The above is just an example of how to perform various actions. The best practice is to have one file (or notebook) per phase of the Project.

All registered iterations are rendered in the UI in real-time showcasing metadata such as the iteration number, status, owner, latest step, last updated, and created date.

View the Python API Docs section guides for more information on how to use Vectice API for development.