How to log datasets
Learn how to log your datasets to Vectice.
Make sure to complete the prerequisites before getting started with logging datasets to Vectice. To learn more, view our Getting Started with Vectice API guide.
The Vectice API enables you to log all datasets used during development to the Vectice app. This includes origin datasets, cleaned datasets, and modeling datasets.
Origin datasets
Origin datasets refer to your datasets containing raw data.
Cleaned datasets
Cleaned datasets refer to your datasets that have been cleaned and prepared for data modeling or data analysis.
Modeling datasets
Modeling datasets combine training, testing, and validation data in a single dataset.
Dataset logging
Now that you know more about the different dataset types and available resources, we will walk you through logging your wrapped datasets.
Each wrapped dataset is logged at the iteration level. Use the static methods for each dataset type (i.e., origin()
, clean()
, and modeling()
to log your datasets to Vectice.
Log origin dataset
To log the columnar data of your origin dataset, use any resource to log it to your current iteration. For this example, we will use FileResource()
to log a local origin dataset:
Log cleaned dataset
To log the columnar data of your cleaned dataset, use any resource to log it to your current iteration. For this example, we will use FileResource()
to log a local clean dataset:
Log modeling dataset
To log the columnar data of your modeling datasets used for training, testing, and validation, use any resource to log it to your current iteration. We will use FileResource()
to log this example dataset:
Training and testing resources are required to log a modeling dataset. Validation resources are optional.
Full code example
This example will demonstrate the full workflow of how to log your modeling datasets' artifacts to Vectice.
Once your wrapped dataset is logged inside of Vectice, you can view datasets artifacts by clicking on the Datasets tab in the Vectice app.
Best practices
When logging datasets, append the dataset type (Origin, Cleaned, and Modeling) to the end of the corresponding dataset name for easy identification when logged in the Vectice app.
Before cleaning or transforming your datasets, log your origin dataset to Vectice. This way, you can always refer back to it if needed.
As you iterate through different modeling approaches, log multiple versions of your modeling dataset. This will help you to keep track of the changes you have made and to compare the results of different approaches.
Document your work thoroughly, including your data sources, cleaning and modeling processes, and any assumptions or decisions you make. This will help you to communicate your work to others and to ensure that your analysis is transparent and reproducible. Document key milestones via:
The Vectice app in your Phase documentation
The Vectice API by adding an iteration comment
Mark your most valuable iterations and assets in the Vectice app by selecting the star next to the corresponding iteration and assets before beginning the phase review. This will make it easier for stakeholders and subject matter experts to identify the iterations and assets in review.
Last updated