Iterative development guide

This guide will show Data Scientist how to sync their iterative development to Vectice.

In iterative development, a model is developed and tested in cycles of repeated iterations. The model's hyperparameters are tuned, developed, and tested each iteration until a fully functional algorithm is ready for deployment.

To get started, we will walk you through an iterative development process using Vectice.

Step 1: Install and configure

Install and import any packages you need for your model development, including the vectice library. If you have not installed the vectice library, view the Install Vectice Library guide for more details.

Once you installed and imported vectice into your script, connect to the Vectice API.

import vectice

connect = vectice.connect(api_token="your-api-key") #Paste your API key

Now that you have connected to the Vectice API and linked your script to a project, we can start an iteration.

Step 2: Retrieve a phase

To start an iteration, you must retrieve a phase. To retrieve a phase, connect using your phase ID.

# Connect to your phase using your phase ID
phase = connect.phase("PHA-XXX") #Paste your own phase ID

Step 3: Initialize iteration

After retrieving a phase, we want to begin an iteration. We will initialize an iteration with the create_or_get_current_iteration() method, as shown below.

iteration = phase.create_or_get_current_iteration()

Current iteration is your last updated iteration. However, if the current iteration is not writable or no iterations exist, we will create an iteration or list writable iterations. If you have multiple "In progress" iterations from the past, to not make assumptions, we will display a list of writable iterations that you can select using the {phase}.iteration("iteration name or ID") method.

You may have multiple writable iterations across various phases. Multiple writable iterations can exist at a phase level, but only one can be active per user.

Step 4: Log your assets

You can log your assets while developing, testing, and validating your models, all while using simple lines of code to share your updates and artifacts from your work.

To log your assets, use the iteration's log method as follows:

#Log an asset
iteration.log(asset)

#Log a comment
iteration.log("this is a comment")

You can also use sections to organize your assets:

iteration.log(asset, section="your-section-title")

The following are a few capabilities that you can utilize to add visibility into your model development.

Log a Dataset

You can register all datasets used during development to the Vectice app. This includes raw data, cleaned data, and modeling data. For more information on how to register datasets during development, view How to log datasets.

Log a model

Your models are continuously improving in each iteration. You can register each model version to the Vectice app. For more information on how to register models during development, view How to log models.

Log development code

Vectice captures your local code if there is a .git folder or repository available. Only locally changed files not already in Vectice are captured with each new git commit.

Automatic code capture is enabled by default. To disable the automatic capture of your local code, you can set code_capture to False, as shown below:

import vectice
vectice.code_capture = False

View How to log code for more information on code capture during development.

Version Control

Vectice enables the version tracking of datasets, models, and code used during development. Stored artifacts for datasets and models includes version IDs, descriptions, lineage, properties, resources, attachments, model algorithms, metrics, status, and dataset types.

Asset versions and artifacts are accessible in the Vectice app within the Datasets and Models sections of the Project.

All assets are automatically versioned, making it easy to follow a project's progression and compare results from multiple iterations.

Step 5: Complete an iteration

Once you have completed the iteration, you can mark the iteration complete This line of code will send information to the Vectice app in real time.

iteration.complete()

You may revisit the details of the iteration for a retrospective. If satisfied, you can summarize your outcomes or start another iteration.

Example

import numpy as np
import pandas as pd
import vectice
from vectice import FileResource, Dataset, Model, Metric
from sklearn import datasets

# Connect to Vectice
connect = vectice.connect(api_token="your-api-key")

# Retrieve the project phase  
phase = connection.phase("PHA-XXX")

# Initialize iteration
iteration = phase.create_or_get_current_iteration()

# Prepare origin dataset
iris = datasets.load_iris()
data = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                      columns=iris['feature_names'] + ['target'])
data.to_csv("origin_iris.csv")

# Use FileResource to wrap your local origin dataset
origin_resource = FileResource(paths="origin_iris.csv")
origin_dataset = Dataset.origin(name="Iris Origin", resource=origin_resource)

# Prepare model artifacts
model_name = "your-model-name"
model_metrics = [Metric("RMSE", 153), Metric("Clusters number", 5)]

# Declaring your model artifacts
model = Model(name=model_name, metrics=model_metrics)

# Log the origin dataset and model to Vectice
iteration.log(origin_dataset)
iteration.log(model, section="Model Experiment #1")

# Completes and closes the current iteration
iteration.complete()

The above is just an example of how to perform various actions. The best practice is to have one file (or notebook) per phase of the Project.

All registered iterations are rendered in the Vectice app in real-time showcasing artifacts as shown below.

View the Python API Docs section guides for more information on how to use Vectice API for development.

Last updated