Log a custom data source

Learn how to wrap datasets from a custom data source.

To wrap data from any data source, create a custom resource class and inherit from the Vectice baseResource class. Then create a _build_metadata() and _fetch_data() method to collect metadata from your custom data sources.

Custom resource example code

Below is a pre-built custom resource code example you could use to build your own data resource:

from vectice import Resource, DatasetSourceOrigin, FilesMetadata

class MyCustomResource(Resource):
    _source_name = "Data source name"
    
    def __init__(
                self,
                paths: str | list[str],
            ):
                super().__init__(paths=paths)

    def _build_metadata(self) -> FilesMetadata:  # 
        files = ...  # fetch file list from your custom storage, retrieve them from self._paths
        total_size = ...  # compute total file size
        return FilesMetadata(
            size=total_size,
            origin=self._source_name,
            files=files,
            usage=self.usage,
        )

    def _fetch_data(self) -> dict[str, bytes]:
        files_data = {}
        for file in self.metadata.files:
            file_contents = ...  # fetch file contents from your custom storage
            files_data[file.name] = file_contents
        return files_data

From this point, you can use your custom resource class to wrap data from any data source (i.e., Redshift, RDS, Snowflake, etc).

To learn how to log your dataset from your custom data source, view our How to log datasets guide.

Last updated