`orbitra.lake.client`

Functions

`get_lake_client`

get_lake_client(environment: str = 'prod', credential: Optional[TokenCredential] = None) -> OrbitraLakeClient

Get the Orbitra Lake client based on the environment. Args:

environment: Environment to use (“prod” or “dev”). Defaults to “prod”.
credential: Synchronous Azure credential for API operations.

Returns:

Configured lake client instance.

Classes

`OrbitraLakeClient`

Client for interacting with the Orbitra Lake database. Methods:

`list_namespaces`

list_namespaces(self) -> list[str]

List all namespaces in the database. Returns:

list[str]: A list of namespace names.

`create_table`

create_table(self, namespace: str, table: TableSchema) -> TableSchema

Create a new table in the specified namespace. Args:

namespace: The namespace where the table should be created.
table: The schema of the table to create.

Returns:

The schema of the created table.

Raises:

LakeError: If the table already exists or if the namespace does not exist.

`list_tables`

list_tables(self, namespace: str) -> list[str]

List all tables in the specified namespace. Args:

namespace: The namespace to list tables from.

Returns:

list[str]: A list of table names in the specified namespace.

Raises:

LakeError: If the namespace does not exist.

`get_table_metadata`

get_table_metadata(self, namespace: str, table_name: str) -> TableSchema

Retrieve the metadata of a table. Args:

namespace: The namespace where the table is located.
table_name: The name of the table to retrieve metadata for.

Returns:

The schema of the table if it exists.

Raises:

LakeError: If the table does not exist.

`add_column_to_table`

add_column_to_table(self, namespace: str, table_name: str, column: str, column_type: AllowedColumnTypes) -> TableSchema

Add a new column to an existing table. Args:

namespace: The namespace where the table is located.
table_name: The name of the table to add the column to.
column: The name of the new column to add.
column_type: The data type of the new column.

Returns:

The updated schema of the table after adding the new column.

Raises:

LakeError: If the column is invalid, already exists or if the table does not exist.

`remove_column_from_table`

remove_column_from_table(self, namespace: str, table_name: str, column: str) -> TableSchema

Remove a column from an existing table. Args:

namespace: The namespace where the table is located.
table_name: The name of the table to remove the column from.
column: The name of the column to remove.

Returns:

The updated schema of the table after removing the column.

Raises:

LakeError: If the table or column does not exist or if it is a reserved column.

`add_or_update_table`

add_or_update_table(self, namespace: str, table: TableSchema, allow_column_removal: bool = False) -> TableSchema

Add or update a table. This method will add the table if it does not exist, or update it if it does. Updates are only allowed in regular columns. Partition columns are not allowed to be updated. Args:

namespace: The namespace where the table is located.
table: The schema of the table to add or update.
allow_column_removal: Whether to allow column removal.

Returns:

The updated schema of the table after adding or updating it.

Raises:

LakeError: If there are changes in partition columns or if a column is removed and allow_column_removal is False.

`overwrite_data`

overwrite_data(self, namespace: str, table_name: str, df: pd.DataFrame) -> OverwriteTableDataResponse

Overwrite partitions data in a table. This method overwrites the data in a table based on the partition columns present in the DataFrame. If the table has no partition columns, it overwrites the entire table data. Args:

namespace: The namespace where the table is located.
table_name: The name of the table to overwrite data in.
df: The DataFrame containing the data to overwrite in the table.

Returns:

A response object containing information about the modified partitions and inserted rows.

Raises:

LakeError: If the table does not exist or if the DataFrame contains invalid data.

`delete_data`

delete_data(self, namespace: str, table_name: str, partition_filters: list[PartitionFilter]) -> str

Delete data from a table based on partition filters. This method deletes data from a table based on the provided partition filters. Partition filters are combined with an AND operation. If the table has no partition columns, it deletes the entire table data. Args:

namespace: The namespace where the table is located.
table_name: The name of the table to delete data from.
partition_filters: A list of partition filters to apply for the delete operation. Must be empty if the table has no partition columns.

Returns:

An operation ID for tracking the delete operation.

Raises:

LakeError: If the table does not exist or if the partition filters are invalid.

`overwrite_data_by_custom_columns`

overwrite_data_by_custom_columns(self, namespace: str, table_name: str, custom_columns: list[str], df: pd.DataFrame) -> OverwriteDataByCustomColumnsResponse

Overwrite data into a table by custom columns. This method overwrites data into a table based on the provided custom columns and data. It will delete data based on the values in the custom columns and then insert the given data frame. Args:

namespace: The namespace where the table is located.
table_name: The name of the table to overwrite data into.
custom_columns: A list of columns to use as custom columns.
df: The DataFrame containing the data to overwrite.

Returns:

A response object containing information about the modified custom values and inserted rows.

Raises:

LakeError: If the table does not exist or if the custom columns are invalid.

`get_table_data`

get_table_data(self, namespace: str, table_name: str, scan_filters: list[Filter]) -> pd.DataFrame

Retrieve data from a table based on scan filters. This method retrieves data from a table based on the provided scan filters. Scan filters are combined with an AND operation. If an empty list of scan filters is provided, it retrieves all data from the table. Args:

namespace: The namespace where the table is located.
table_name: The name of the table to retrieve data from.
scan_filters: A list of column filters to apply for the query.

Returns:

pd.DataFrame: A DataFrame containing the data retrieved from the table.

Raises:

LakeError: If the table does not exist or if the scan filters are invalid.

`run_query`

run_query(self, namespace: str, query: str, engine: Literal['local', 'remote'] = 'local') -> pd.DataFrame

Run a query using the selected environment as query engine. Args:

namespace: The namespace to run the query in.
query: The query to run.
engine: The engine to use for the query.

Returns:

pd.DataFrame: A DataFrame containing the retrieved data.

Raises:

LakeError: If the query is invalid or if the engine is not supported.

`save_raw_bytes_to_blob`

save_raw_bytes_to_blob(self, bytes_io: io.BytesIO, full_filename: str, namespace: str) -> bool

Save a bytes object as a raw blob in Azure Blob Storage. Args:

bytes_io: The bytes object to persist.
full_filename: The blob path, including virtual directories, e.g. "finance/2025/09/transactions.parquet".
namespace: Namespace used to compose the container name. The effective container is settings.orbitra_lake_raw_container_prefix + namespace.

Returns:

True if the bytes object was stored, False if it already exists and is the same.

`save_raw_df_to_blob`

save_raw_df_to_blob(self, df: pd.DataFrame, full_filename: str, namespace: str) -> bool

Save a DataFrame as a Parquet blob in Azure Blob Storage. The DataFrame is serialized to Parquet in memory and uploaded to the configured storage account and container. Existing blobs will be overwritten. Args:

df: The DataFrame to persist.
full_filename: The blob path, including virtual directories, e.g. "finance/2025/09/transactions.parquet".
namespace: Namespace used to compose the container name. The effective container is settings.orbitra_lake_raw_container_prefix + namespace.

Returns:

True if the DataFrame was stored, False if it already exists and is the same.

`read_raw_bytes_from_blob`

read_raw_bytes_from_blob(self, full_filename: str, namespace: str) -> io.BytesIO

Read a raw bytes object from the specified blob storage location. Args:

full_filename: The full path and filename of the raw bytes object to read from the raw storage container.
namespace: Namespace used to compose the container name. The effective container is settings.orbitra_lake_raw_container_prefix + namespace.

Returns:

io.BytesIO: The raw bytes object read from the blob storage.

`read_raw_df_from_blob`

read_raw_df_from_blob(self, full_filename: str, namespace: str) -> pd.DataFrame

Reads a raw Parquet file from the specified blob storage location and returns its contents as a pandas DataFrame. Args:

full_filename: The full path and filename of the Parquet file to read from the raw storage container.
namespace: Namespace used to compose the container name. The effective container is settings.orbitra_lake_raw_container_prefix + namespace.

Returns:

pd.DataFrame: The contents of the Parquet file as a pandas DataFrame.

`get_raw_file_system`

get_raw_file_system(self, namespace: str) -> AbstractFileSystem

Get a filesystem interface for the raw storage. This method returns a filesystem interface (e.g., AzureBlobFileSystem or LocalFileSystem) for direct file operations on the raw storage beyond the standard save/read operations. Args:

namespace: Logical namespace used to compose container/directory name.

Returns:

A filesystem interface for accessing raw storage.

`set_processed_flag`

set_processed_flag(self, full_filename: str, namespace: str, is_processed: bool) -> None

Set the processed flag for a raw file. Args:

full_filename: The full path and filename of the raw file to set the processed flag for.
namespace: Namespace used to compose the container name. The effective container is settings.orbitra_lake_raw_container_prefix + namespace.
is_processed: The processed flag value to set.

`get_processed_flag`

get_processed_flag(self, full_filename: str, namespace: str) -> bool

Get the processed flag for a raw file. Args:

full_filename: The full path and filename of the raw file to get the processed flag for.
namespace: Namespace used to compose the container name. The effective container is settings.orbitra_lake_raw_container_prefix + namespace.

Returns:

The processed flag value.

Lake SDK

​orbitra.lake.client

​Functions

​get_lake_client

​Classes

​OrbitraLakeClient

​list_namespaces

​create_table

​list_tables

​get_table_metadata

​add_column_to_table

​remove_column_from_table

​add_or_update_table

​overwrite_data

​delete_data

​overwrite_data_by_custom_columns

​get_table_data

​run_query

​save_raw_bytes_to_blob

​save_raw_df_to_blob

​read_raw_bytes_from_blob

​read_raw_df_from_blob

​get_raw_file_system

​set_processed_flag

​get_processed_flag

`orbitra.lake.client`

Functions

`get_lake_client`

Classes

`OrbitraLakeClient`

`list_namespaces`

`create_table`

`list_tables`

`get_table_metadata`

`add_column_to_table`

`remove_column_from_table`

`add_or_update_table`

`overwrite_data`

`delete_data`

`overwrite_data_by_custom_columns`

`get_table_data`

`run_query`

`save_raw_bytes_to_blob`

`save_raw_df_to_blob`

`read_raw_bytes_from_blob`

`read_raw_df_from_blob`

`get_raw_file_system`

`set_processed_flag`

`get_processed_flag`