How to use dataset resources

Use the following resources listed below to wrap data from any data source. This will enable you to register your dataset's columnar data and metadata to Vectice.

Vectice stores the metadata of your datasets, not your actual datasets.

Resources
Description

Resource()

Wrap your dataset's columnar data and metadata from your storage location. It can be extended for any data source. (example: Redshift, RDS, etc.)

FileResource(...)

Wrap your dataset's columnar data and metadata from a local file.

GCSResource(...)

Wrap your dataset's columnar data and metadata from your Google Cloud Storage (GCS) source.

S3Resource(...)

Wrap your dataset's columnar data and its metadata from your AWS S3 source.

BigQueryResource(...)

Wrap your dataset's columnar data and metadata from your BigQuery source.

DatabricksTableResource(...)

Wrap your dataset's columnar data and metadata from your Databricks source.

Resource Usage Examples

Below we highlight how you can use the available Resources to wrap your dataset's columnar and metadata to later register your dataset to Vectice.

A Custom Data Source

To wrap data from a custom data source, create a custom resource inherited from the base Resource class and implement your own _build_metadata() and _fetch_data() methods.

View our guide How to add a custom data source for more information and examples.

Local Data Source

Use FileResource() to wrap columnar data that you have stored in a local file.

from vectice import FileResource

my_resource = FileResource(paths="my_resource_path")

Google Cloud Storage Data Source

Use GCSResource() to wrap columnar data that you have stored in Google Cloud Storage to Vectice.

You have the option to retrieve file size, creation date and updated date (used for auto-versioning) up to 5000 files.

from vectice import GCSResource

my_resource = GCSResource(
    uris="gs://my_bucket_name/my_folder/my_filename"
)

AWS S3 Data Source

Use S3Resource() to wrap data that you have stored in AWS S3.

You have the option to retrieve file size, creation date and updated date (used for auto-versioning) up to 5000 files.

from vectice import S3Resource

my_resource = S3Resource(
    uris="s3://my_bucket_name/my_resource_path"
)

BigQuery Data Source

Use BigQueryResource()to wrap data that you have stored in Google's BigQuery.

from vectice import BigQueryResource

my_resource = BigQueryResource(
    paths="bigquery-public-data.stackoverflow.posts_questions",
)

Databricks Data Source

Use DatabricksTableResource() to wrap data that you have stored in Databricks.

from vectice import DatabricksTableResource

my_resource = DatabricksTableResource(
    paths="path/to/resources"
)