Vectice Docs
API Reference (Latest)Vectice WebsiteStart Free Trial
Latest
Latest
  • 🏠Introduction
    • Vectice overview
      • Autolog
      • Next-Gen Autolog [BETA]
      • AskAI
      • Vectice for financial services
  • 🏁Quickstart
    • Getting started
    • Quickstart project
    • Tutorial project
    • FAQ
  • ▶️Demo Center
    • Feature videos
  • 📊Manage AI/ML projects
    • Organize workspaces
      • Create a workspace
      • Workspace Dashboard
    • Organize projects
      • Create a project
      • Project templates best practices
    • Invite colleagues
    • Define phase requirements
    • Collaborate with your team
  • 🚀Log and Manage Assets with Vectice API
    • API cheatsheets
      • Vectice Python API cheatsheet
      • Vectice R API cheatsheet
    • Connect to API
    • Log assets to Vectice
      • Autolog your assets
      • Log datasets
      • Log models
      • Log attachments and notes
      • Log code
      • Log a custom data source
      • Log assets using Vectice IDs
      • Log dataset structure and statistics
      • Log custom metadata in a table format
      • Log MLFLow runs
    • Retrieve assets from app
    • Manage your assets
    • Manage your iteration
    • Preserve your code and asset lineage
  • 🤝Create Model documentation and reports
    • Create model documentation with Vectice Reports
    • Streamline documentation with Macros
    • Auto-document Models and Datasets with AskAI Prompts
    • Document phase outcomes
  • 🗂️Admin Guides
    • Organization management
    • Workspace management
    • Teams management
    • User management
      • User roles and permissions
      • Update a user role in your organization
      • Activate and deactivate users
      • Reset a user's password
    • Manage report templates
  • 🔗Integrations
    • Integrations Overview
    • Integrate Vectice with your data platform
  • 💻IT & Security
    • IT & Security Overview
    • Secure Evaluation Environment Overview
    • Deployment
      • SaaS offering (Multi-Tenant SaaS)
      • Kubernetes self-hosted offering
        • General Architecture & Infrastructure
        • Kubernetes on GCP
          • Appendices
        • Kubernetes on AWS
          • Appendices
        • Kubernetes on Azure
          • Appendices
        • GCP Marketplace deployment
        • On premise
        • Configuration
      • Bring Your Own LLM Guide
    • Data privacy
    • User management
    • SSO management
      • Generic SAML integration
      • Okta SSO integration
    • Security
      • Data storage security
      • Network Security
        • HTTPS communication
        • Reverse proxy
        • CORS/CSRF
        • VPC segregation
      • Sessions
      • Secrets and certificates
      • Audit logs
      • SOC2
      • Security updates
      • Best practices
      • Business continuity
    • Monitoring
      • Installation guide
      • Customizing the deployments
    • Maintenance & upgrades
    • Integrating Vectice Securely
  • ⭐Glossary
    • Concepts
      • Workspaces
      • Projects
        • Setup a project
      • Phases
      • Iterations
        • Iterative development
      • Datasets
        • Dataset resources
        • Dataset properties
        • Dataset lineage and versions
      • Models
      • Reports
  • 🎯Release notes
    • Release notes
  • ↗️References
    • Vectice Python API Reference
    • Vectice R API Cheatsheet
    • Notebooks and code samples
    • Vectice website
Powered by GitBook
On this page
  • Things you should know
  • Datasets best practices
  • Advanced Dataset guides
  • How to log datasets to Vectice?

Was this helpful?

  1. Glossary
  2. Concepts

Datasets

PreviousIterative developmentNextDataset resources

Last updated 5 months ago

Was this helpful?

Datasets reflect the dataset metadata logged to Vectice during model development.

Please note that Vectice does not store your actual datasets.

Vectice documents dataset metadata like origin, version, and artifacts from various sources, while automatically managing dataset versions for lineage tracking.

The Vectice API enables you to log all dataset artifacts used during development to the Vectice app, including your origin, cleaned, and modeling datasets.

Dataset type
Description

Origin datasets

Origin datasets refer to your datasets containing raw data.

Cleaned datasets

Cleaned datasets refer to your datasets that have been cleaned and prepared for data modeling or data analysis.

Modeling datasets

Modeling datasets combine training, testing, and validating data in a single dataset.

Things you should know

  • Data scientists can capture basic statistics when logging their datasets' artifacts to Vectice. These statistics include the mean, median, variance, quartiles, and more. Learn more by viewing our section.

Datasets best practices

  • Logged datasets should have the dataset types Origin, Cleaned, and Modeling appended to the end of the corresponding dataset name for easy identification in the Vectice app.

  • Create a phase with the primary objective of origin datasets registration. This is usually included in the Data Preparation phase of a CRISP-DM project.

  • Project phases that aim to clean and process raw data can have a requirement that ask members to log the cleaned datasets. This is typically completed in the Data Preparation phase of a CRISP-DM project.

  • Mark your most valuable iterations and assets in the Vectice app by selecting the star next to the corresponding iteration and assets before beginning the phase review. This will make it easier for stakeholders and subject matter experts to identify the iterations and assets in review.

Advanced Dataset guides

Select the star next to the valuable iterations before completing a phase, even without review. This allows you to identify the most valuable iterations for that phase.

How to log datasets to Vectice?

Logging datasets is your first step to knowledge capture in Vectice. You can log all datasets used during development, including your origin datasets (raw datasets), cleaned datasets, and modeling datasets.

View our guide for more information on how to log datasets during iterative development.

⭐
Dataset resources
Dataset properties
Dataset lineage and versions
How to log datasets
Capturing Dataset Statistics