Vectice Docs
API Reference (Latest)Vectice WebsiteStart Free Trial
24.4
24.4
  • 🏠Introduction
    • Vectice overview
      • Autolog
      • AskAI
      • Vectice for financial services
  • 🏁Quickstart
    • Getting started
    • Quickstart project
    • Tutorial project
    • FAQ
  • ▶️Demo Center
    • Feature videos
  • 📊Manage AI/ML projects
    • Organize workspaces
      • Create a workspace
      • Workspace Dashboard
    • Organize projects
      • Create a project
      • Project templates best practices
    • Define phase requirements
    • Invite colleagues
    • Collaborate with your team
  • 🚀Log and Manage Assets with Vectice API
    • API cheatsheets
      • Vectice Python API cheatsheet
      • Vectice R API cheatsheet
    • Connect to API
    • Log assets to Vectice
      • Autolog your assets
      • Log datasets
      • Log models
      • Log attachments and notes
      • Log code
      • Log a custom data source
      • Log assets using Vectice IDs
      • Log dataset structure and statistics
      • Log custom metadata in a table format
      • Log MLFLow runs
    • Retrieve assets from app
    • Manage your assets
    • Manage your iteration
    • Preserve your code and asset lineage
  • 🤝Create Model documentation and reports
    • Auto-document models with AskAI
    • Streamline documentation with Macros
    • Create model documentation with Vectice Reports
    • Document phase outcomes
  • 🗂️Admin Guides
    • Organization management
    • Workspace management
    • User management
      • User roles and permissions
      • Update a user role in your organization
      • Activate and deactivate users
      • Reset a user's password
  • 🔗Integrations
    • Integrations Overview
    • Integrate Vectice with your data platform
  • 💻IT & Security
    • IT & Security Overview
    • Secure Evaluation Environment Overview
    • Deployment
      • SaaS offering (Multi-Tenant SaaS)
      • Kubernetes self-hosted offering
        • General Architecture & Infrastructure
        • Kubernetes on GCP
          • Appendices
        • Kubernetes on AWS
          • Appendices
        • Kubernetes on Azure
          • Appendices
        • GCP Marketplace deployment
        • On premise
        • Configuration
    • User management
    • SSO management
      • Generic SAML integration
      • Okta SSO integration
    • Security
      • Data storage security
      • Network Security
        • HTTPS communication
        • Reverse proxy
        • CORS/CSRF
        • VPC segregation
      • Sessions
      • Secrets and certificates
      • Audit logs
      • SOC2
      • Security updates
      • Best practices
      • Business continuity
    • Monitoring
      • Installation guide
      • Customizing the deployments
    • Maintenance & upgrades
    • Integrating Vectice Securely
  • ⭐Glossary
    • Concepts
      • Workspaces
      • Projects
        • Setup a project
      • Phases
      • Iterations
        • Iterative development
      • Datasets
        • Dataset resources
        • Dataset properties
        • Dataset lineage and versions
      • Models
      • Reports
  • 🎯Release notes
    • Release notes
  • ↗️References
    • Vectice Python API Reference
    • Vectice R API Cheatsheet
    • Notebooks and code samples
    • Vectice website
Powered by GitBook
On this page
  • Dataset lineage
  • Dataset versions
  1. Glossary
  2. Concepts
  3. Datasets

Dataset lineage and versions

PreviousDataset propertiesNextModels

Dataset lineage

To keep track of your derived datasets lineage, use the derived_from parameter to list the datasets (or dataset IDs) from which your dataset is derived.

from vectice import Dataset, FileResource

origin_dataset = Dataset.origin(
    name="my origin dataset",
    resource=FileResource(paths="origin.csv"),
)

clean_dataset = Dataset.clean(
    name="my clean dataset",
    resource=FileResource(paths="clean_dataset.csv"),
    derived_from= [origin_dataset]
)

Dataset versions

Dataset versions are datasets with the same name as another dataset that you have already logged in Vectice. Logging datasets with the same name automatically increment the versions, maintaining the dataset's history.

⭐