How to use dataset resources
Use the following resources listed below to wrap data from any data source. This will enable you to log your dataset's columnar data and artifacts to Vectice.
Vectice stores the artifacts of your datasets, not your actual datasets.
Resource()
Wrap your dataset's columnar data and artifacts from your storage location. It can be extended for any data source. (example: Redshift, RDS, etc.)
FileResource(...)
Wrap your dataset's columnar data and artifacts from a local file.
GCSResource(...)
Wrap your dataset's columnar data and artifacts from your Google Cloud Storage (GCS) source.
S3Resource(...)
Wrap your dataset's columnar data and its artifacts from your AWS S3 source.
BigQueryResource(...)
Wrap your dataset's columnar data and artifacts from your BigQuery source.
DatabricksTableResource(...)
Wrap your dataset's columnar data and artifacts from your Databricks source.
Resource usage examples
Below we highlight how you can use the available Resources to wrap your dataset's columnar and artifacts to later log your dataset to Vectice.
A Custom Data Source
To wrap data from a custom data source, create a custom resource inherited from the base Resource
class and implement your own _build_metadata()
and _fetch_data()
methods.
View our guide How to add a custom data source for more information and examples.
Local Data Source
Use FileResource()
to wrap columnar data that you have stored in a local file.
Google Cloud Storage data source
Use GCSResource()
to wrap columnar data that you have stored in Google Cloud Storage to Vectice.
You have the option to retrieve file size, creation date and updated date (used for auto-versioning) up to 5000 files.
AWS S3 data source
Use S3Resource()
to wrap data that you have stored in AWS S3.
You have the option to retrieve file size, creation date and updated date (used for auto-versioning) up to 5000 files.
BigQuery data source
Use BigQueryResource()
to wrap data that you have stored in Google's BigQuery.
Databricks data source
Use DatabricksTableResource()
to wrap data that you have stored in Databricks.
Last updated