In this blog post we will focus on loading data from a valid source to a cloud data platform like snowflake. There are different tools available in order to do this, one of the tools that is gaining transaction is dbt,(https://docs.getdbt.com/). One of the main highlights of dbt is that it uses SQL for doing lot of the data transformations/loading into a cloud data warehouse like snowflake. There are certain additions that dbt has on top of using SQL that makes it very flexible for the ETL/ELT purposes. There are 2 ways to use dbt, one is to use the Command Line Interface, the other one is dbt cloud. There are lot of configurations available which can be set up to make the data transformation process efficient and effective. The core concept involved in dbt is called the models. dbt uses models extensively to create table/views on the cloud data warehouse. The order of creating the tables and views in the cloud data warehouse is taken care of dbt by using the concept of models. Models allows one to define the base objects and relationships.
In order to connect to the different data sources, there adapters available that dbt provides. These adapters allows dbt to connect to the datasource and load data into the target cloud data warehouse. For a list of the adapters available, please check the following link: https://docs.getdbt.com/docs/available-adapters. The adapters are primarily for cloud data warehouses/data lakes like Snowflake,Redshift,Bigquery. In order to start using dbt one has to create dbt project, to quote from dbt: A dbt project is a directory of .sql
and .yml
files, which dbt uses to transform your data.
https://docs.getdbt.com/docs/building-a-dbt-project/using-sources.
Typically in a ETL/ELT operation there are some considerations that need to be taken into account for loading data:
2. Is the data load into the cloud data warehouse going to be incremental? If so how many tables follow this loading type.
No comments:
Post a Comment