orbitra.lake.client
Functions
get_lake_client
environment: Environment to use (“prod” or “dev”). Defaults to “prod”.credential: Synchronous Azure credential for API operations.
- Configured lake client instance.
Classes
OrbitraLakeClient
Client for interacting with the Orbitra Lake database.
Methods:
add_column_to_table
namespace: The namespace where the table is located.table_name: The name of the table to add the column to.column: The name of the new column to add.column_type: The data type of the new column.
- The updated schema of the table after adding the new column.
LakeError: If the column is invalid, already exists or if the table does not exist.
add_or_update_table
append_data
namespace: The namespace where the table is located.table_name: The name of the table to append data to.df: The DataFrame containing the data to append.
- A response object with inserted_rows and operation_id.
LakeError: If the table does not exist or if the DataFrame contains data not matching the table schema.
create_or_update_saved_query
namespace: The namespace where the saved query is stored.saved_query: The saved query definition containing name, SQL, and optional description.
- The newly created or updated saved query, including its current version and timestamps.
LakeError: If the SQL is invalid, not a SELECT statement, references non-existent tables, or if a conflicting table/saved query name exists when creating a new saved query.
create_or_update_table
namespace: The namespace where the table is located.table: The schema of the table to create or update.allow_column_removal: Whether to allow column removal.include_hash: Whether to include the orbitra_hash column. Defaults to False.
- The updated schema of the table after creating or updating it.
LakeError: If there are changes in partition columns or if a column is removed and allow_column_removal is False.
- Note: In the example above, if “my_table” already exists and has a different schema, the method will remove columns not present in the new schema if allow_column_removal is set to True.
create_saved_query
namespace: The namespace where the saved query should be created.saved_query: The saved query definition, including name, SQL text, and optional description.
- The created saved query with populated metadata fields such as version, created_at, and updated_at.
LakeError: If the saved query name already exists as a table or saved query in the namespace, if the SQL is invalid, or if it is not a SELECT statement or references non-existent tables.
create_table
namespace: The namespace where the table should be created.table: The schema of the table to create.include_hash: Whether to include the orbitra_hash column. Defaults to False.
- The schema of the created table.
LakeError: If the table already exists or if the namespace does not exist.
delete_data
namespace: The namespace where the table is located.table_name: The name of the table to delete data from.partition_filters: A list of partition filters to apply for the delete operation. Must be empty if the table has no partition columns.
- An operation ID for tracking the delete operation.
LakeError: If the table does not exist or if the partition filters are invalid.
delete_saved_query
list_saved_query_history and
get_saved_query(..., version=...) with include_deleted=True.
Args:
namespace: The namespace where the saved query is stored.saved_query_name: The name of the saved query to delete.
LakeError: If the saved query does not exist.
get_processed_flag
full_filename: The full path and filename of the raw file to get the processed flag for.namespace: Namespace used to compose the container name. The effective container issettings.orbitra_lake_raw_container_prefix + namespace.
- The processed flag value.
get_raw_file_system
namespace: Logical namespace used to compose container/directory name.
- A filesystem interface for accessing raw storage.
get_saved_query
version is provided, this returns the historical snapshot of the
saved query at that timestamp. Otherwise, it returns the current definition.
Args:
namespace: The namespace where the saved query is stored.saved_query_name: The name of the saved query to retrieve.version: Specific historical version timestamp to retrieve. IfNone, the latest version is returned.
- The requested saved query metadata and SQL definition.
LakeError: If the saved query does not exist, or if the requested historicalversiondoes not exist for that saved query.
get_saved_query_data
namespace: The namespace containing the saved query.saved_query_name: The name of the saved query to execute.scan_filters: A list of filters to apply on the resulting columns (e.g. id, date, etc.). Use an empty list to return all rows.limit: Optional maximum number of rows to return. IfNone, all matching rows are returned.engine: Query engine to use. Defaults to “local”.
- pd.DataFrame: The query result as a pandas DataFrame.
LakeError: If the saved query does not exist, if the underlying SQL becomes invalid (for example, due to a missing table), or if filters cannot be applied.
get_table_data
namespace: The namespace where the table is located.table_name: The name of the table to retrieve data from.scan_filters: A list of column filters to apply for the query.limit: The maximum number of rows to retrieve, defaults to None for all rows.
- pd.DataFrame: A DataFrame containing the data retrieved from the table.
LakeError: If the table does not exist or if the scan filters are invalid.
- Note: If you pass
scan_filters = [], it will retrieve all data from the table.
get_table_metadata
namespace: The namespace where the table is located.table_name: The name of the table to retrieve metadata for.
- The schema of the table if it exists.
LakeError: If the table does not exist.
list_namespaces
- list[str]: A list of namespace names.
list_saved_queries
namespace: The namespace to list saved queries from.
- list[str]: A list of saved query names available in the namespace. Returns an empty list if no saved queries are found.
list_saved_query_history
namespace: The namespace where the saved query is stored.saved_query_name: The name of the saved query whose history should be listed.include_deleted: IfTrue, returns history even if the current saved query has been deleted. IfFalse, raises a LakeError when the saved query does not currently exist.
- list[datetime]: A list of version timestamps in ascending order.
LakeError: If the saved query does not exist andinclude_deletedis False.
list_tables
namespace: The namespace to list tables from.
- list[str]: A list of table names in the specified namespace.
LakeError: If the namespace does not exist.
overwrite_data
namespace: The namespace where the table is located.table_name: The name of the table to overwrite data in.df: The DataFrame containing the data to overwrite in the table.check_hash: If True, uses hash-based change detection to only overwrite data when hashes differ. Requires the table to have a hash column. Defaults to False.
- A response object containing information about the modified partitions and inserted rows.
LakeError: If the table does not exist, if the DataFrame contains data not matching the table schema, or if check_hash is True and the table has no hash column.
overwrite_data_by_custom_columns
namespace: The namespace where the table is located.table_name: The name of the table to overwrite data into.custom_columns: A list of columns to use as custom columns.df: The DataFrame containing the data to overwrite.
- A response object containing information about the modified custom values and inserted rows.
LakeError: If the table does not exist or if the custom columns are invalid or don’t match the table schema.
read_raw_bytes_from_blob
full_filename: The full path and filename of the raw bytes object to read from the raw storage container.namespace: Namespace used to compose the container name. The effective container issettings.orbitra_lake_raw_container_prefix + namespace.
- io.BytesIO: The raw bytes object read from the blob storage.
read_raw_df_from_blob
full_filename: The full path and filename of the Parquet file to read from the raw storage container.namespace: Namespace used to compose the container name. The effective container issettings.orbitra_lake_raw_container_prefix + namespace.
- pd.DataFrame: The contents of the Parquet file as a pandas DataFrame.
remove_column_from_table
namespace: The namespace where the table is located.table_name: The name of the table to remove the column from.column: The name of the column to remove.
- The updated schema of the table after removing the column.
LakeError: If the table or column does not exist or if it is a reserved column.
run_query
namespace: The namespace to run the query in.query: The query to run.engine: The engine to use for the query.
- pd.DataFrame: A DataFrame containing the retrieved data.
LakeError: If the query is invalid or if the engine is not supported.
save_raw_bytes_to_blob
bytes_io: The bytes object to persist.full_filename: The blob path, including virtual directories, e.g."finance/2025/09/transactions.parquet".namespace: Namespace used to compose the container name. The effective container issettings.orbitra_lake_raw_container_prefix + namespace.
- True if the bytes object was stored, False if it already exists and is the same.
save_raw_df_to_blob
df: The DataFrame to persist.full_filename: The blob path, including virtual directories, e.g."finance/2025/09/transactions.parquet".namespace: Namespace used to compose the container name. The effective container issettings.orbitra_lake_raw_container_prefix + namespace.
- True if the DataFrame was stored, False if it already exists and is the same.
set_processed_flag
full_filename: The full path and filename of the raw file to set the processed flag for.namespace: Namespace used to compose the container name. The effective container issettings.orbitra_lake_raw_container_prefix + namespace.is_processed: The processed flag value to set.