Christian Screen
Christian Screen

Snowflake Time Travel Not the First Time Traveller but Let’s Review

Twitter
LinkedIn

The ability to look at stored data across some period of time is not a new concept to Snowflake. IBM DB2 long before Snowflake had this concept as did a few other select databases using a Temporal Database concept. I think Snowflake was able to make it more popular and mainstream due to the association of Data Warehousing and analytics specifically where this was not previously a universal concept to those of us concerned with ETL and staging the data, and merely scanning data only 7 or 14 days back in the latest source retrieve in order to build our facts and dimensions.

Let’s talk Snowflake time travel which is similar to all other databases using a temporal database concept as well. With Snowflake getting a historical point of view of data that may have changed overtime compared to the current state of the database, schema, table, has a solid set of use cases. We may want to look at general changes due to updates, accidental updates, or purely use a retroactive snapshotting concept. Perhaps we need to revert data. I think reverting data is the most common use case for time travel at this point. Similar to reverting data, how about reverting changes, like UN-dropping a table which is Snowflake is possible with the undrop command.

But how does it actually work, time travel? Snowflake saves the data state before making any updates to data. This is done by the Snowflake service layer and the user has no direct control in this area. This can be set for the database, table, or schema levels. However, the length of data state persistence for time travel is mainly controlled by the retention period which is somewhat controlled by the administrator of the Snowflake account and can vary between 0 and 90 days for the retention. Enterprise and above accounts have a retention period by default of 1 day and can be configured up to 90 days based on your account plan level.

As an example the retention period on a table can be set using the following SQL:

ALTER TABLE db.schema.orders SET DATA_RETENTION_TIME_IN_DAYS = 0;

-- use SHOW TABLES to see metadata including retention time 
SHOW TABLES LIKE 'orders%' in db.schema;

You may have also heard about fail safe in Snowflake. Fail safe is a trailing 7 day non-configurable persisting of data under the time travel umbrella that Snowflake controls on your behalf (storage fees apply here). Basically once the retention period (0 to 90 days) ends, Snowflake still has your data stored for 7 days, but only for permanent tables (not transient or temp tables). If you need access to this data, you basically need to contact Snowflake support as you cannot time travel this data yourself using the methods we discuss further in this article.

If we want to check the amount of data storage held in the fail safe (and time travel storage) part of the Snowflake solution you could run a simple query (and filter as needed with a WHERE clause):

SELECT TABLE_NAME,
	ACTIVE_BYTES / (1024*1024*1024) AS GENERAL_STORAGE, -- convert into GBs using multiplication
	TIME_TRAVEL_BYTES / (1024*1024*1024) AS TIME_TRAVEL_STORAGE,
	FAILSAFE_BYTES / (1024*1024*1024) AS FAILSAFE_STORAGE
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
-- WHERE TABLE_NAME = 'orders';

When we want to use time travel we have to think about how it will be called from a querying perspective. The UNDROP command is a bit more object oriented not line of data specific. You would use UNDROP in case you accidentally ran the DROP command on an object, like a TABLE, an called UNDROP <TABLE> will restore the table to your account just like nothing happened. But this can only be done in the retention period window. To run a simple UNDROP command check out the syntax of dropping and un-dropping a table object:

-- create a table
CREATE TABLE orders (col1 int, col2 decimal);
INSERT INTO orders (col1, col2) values (1, 9.99);
INSERT INTO orders (col1, col2) values (2, 3.99);

-- drop the table 
DROP TABLE orders;

-- get a missing table notification
SELECT COUNT(*) FROM orders;

-- undrop the table
UNDROP TABLE orders;

-- verify 
SELECT COUNT(*) FROM orders;
SELECT * FROM orders LIMIT 20;

When querying the data with time travel in mind we can use a timestamp, and offset (using AT and OFFSET), or a specific query. For the query id, this means we need the Query ID and can time travel to any point before or after the specified query id using the BEFORE w/ STATEMENT arrow function.

Let’s say that we pushed data into a Snowflake table on a continual basis and then one day we accidentally updated the table with some bad data which incorrectly updated a few columns across a number of rows. If we think about using SQL for this we would want to follow some best practices such as setting the timezone to UTC, identifying the current date/time timestamp as we are conducting the work, and then identifying the objects (tables) where we know the data needs to be recovered/reverted. Then we determine if it is best to use OFFSET or the BEFORE w/ TIMESTAMP arrow function to best get at the historical data.

Here’s a look at code to see how to time travel using the BEFORE method, if your have a table named ‘orders’:

-- always set the timezone for your time traveling querying to UTC
ALTER SESSION SET TIMEZONE = 'UTC';

-- get the current timestamp and copy the returned value, for example, 2023-06-12 13:03:33.033 +0000
SELECT CURRENT_TIMESTAMP;

-- view data currently in the table, db.schema.orders
SELECT * FROM db.schema.orders LIMIT 20;

--Time travel using a BEFORE function from when you want to get the data to revert and pass in your timestamp value as a string and be sure to CAST to a timestamp
SELECT * FROM db.schema.orders BEFORE(TIMESTAMP => '2020-07-13 16:04:34.068'::timestamp);

-- to use OFFSET it would look like this to time travel to 15 minutes ago on that table
SELECT * FROM db.schema.orders AT(OFFSET => -60 * 15)

-- to use Query Id it would look like this to time travel to a previous query id (the one that ran the update or change on the table we aren't fond of)
SELECT * FROM db.schema.orders BEFORE(STATEMENT => '867530ec-111e-3333-0000-000000000000');

Lastly how would you recover the data from a table and bring it back to life. The most conscious way to do this is by restoring the time travelled object to a separate table and then either renaming the table or cloning it after verify all of the data is verified. This SQL should give you and idea:

CREATE OR REPLACE TABLE db.schema.orders_tt_recovery
AS
SELECT * FROM db.schema.orders AT(OFFSET => -60 * 5)

-- validate the table and then...

DROP TABLE db.schema.orders;

RENAME TABLE db.schema.orders_tt_recovery TO db.schema.orders;

Hopefully, this gave you some clarify on time traveling with Snowflake and when and how to use it to make life administering or snapshotting important data much more straightforward than systems of the past. This is one of the handiest features in the Snowflake arsenal. Try to use it every now and then.

More to explorer

International Women's Day 2024

International Women’s Day 2024: Empowerment and Progress

As we commemorate International Women’s Day on March 8th each year, it’s a time to honor the resilience, accomplishments, and contributions of women worldwide. In 2024, this day holds particular significance as we take stock of the strides made, acknowledge persistent challenges, and recommit ourselves to the pursuit of gender equality.

Bank grade security

5 Steps to Configure Key Pair Authentication in Snowflake

Key pair authentication is a secure way to access your Snowflake data warehouse without relying solely on traditional username and password authentication. In this step-by-step guide, we will walk you through the process of setting up key pair authentication in Snowflake. We’ll also cover how to install OpenSSL, a crucial tool for generating the necessary key pair.

streamline-processes

Streamlining Your Bullhorn CRM: Mastering Duplicate Data Management

Discover the most effective strategies for eliminating duplicate records in your Bullhorn CRM. Duplicates can hinder your productivity, lead to data inaccuracies, and impact your relationships with clients and candidates. In this insightful session, we will guide you through best practices, cutting-edge tools, and proven techniques to ensure a clean and efficient CRM database.

Scroll to Top