What Is Microsoft Fabric and Why You Should care?
- Unified Software as a Service (SaaS), offering End-To-End analytics platform
- Gives you a bunch of tools all together, Microsoft Fabric OneLake supports seamless integration, enabling collaboration on this unified data analytics platform
- Scalable Analytics
- Accessibility from anywhere with an internet connection
- Streamlines collaboration among data professionals
- Empowering low-to-no-code approach
Components of Microsoft Fabric
Fabric provides comprehensive data analytics solutions, encompassing services for data movement and transformation, analysis and actions, and deriving insights and patterns through machine learning. Although Microsoft Fabric includes several components, this article will use three primary experiences: Data Factory, Data Warehouse, and Power BI.
Lake House vs. Warehouse: Which Data Storage Solution is Right for You?
In simple terms, the underlying storage format in both Lake Houses and Warehouses is the Delta format, an enhanced version of the Parquet format.
Usage and Format Support
A Lake House combines the capabilities of a data lake and a data warehouse, supporting unstructured, semi-structured, and structured formats. In contrast, a data Warehouse supports only structured formats.
When your organization needs to process big data characterized by high volume, velocity, and variety, and when you require data loading and transformation using Spark engines via notebooks, a Lake House is recommended. A Lakehouse can process both structured tables and unstructured/semi-structured files, offering managed and external table options. Microsoft Fabric OneLake serves as the foundational layer for storing structured and unstructured data
Notebooks can be used for READ and WRITE operations in a Lakehouse. However, you cannot connect to a Lake House with an SQL client directly, without using SQL endpoints.
On the other hand, a Warehouse excels in processing and storing structured formats, utilizing stored procedures, tables, and views. Processing data in a Warehouse requires only T-SQL knowledge. It functions similarly to a typical RDBMS database but with a different internal storage architecture, as each table’s data is stored in the Delta format within OneLake. Users can access Warehouse data directly using any SQL client or the in-built graphical SQL editor, performing READ and WRITE operations with T-SQL and its elements like stored procedures and views. Notebooks can also connect to the Warehouse, but only for READ operations.
An SQL endpoint is like a special doorway that lets other computer programs talk to a database or storage system using a language called SQL. With this endpoint, you can ask questions (queries) to get information from the database, like searching for specific data or making changes to it. It’s kind of like using a search engine to find things on the internet, but for your data stored in the Fabric system. These SQL endpoints are often used for tasks like getting data, asking questions about it, and making changes to it within the Fabric system.
Choosing Between Lakehouse and Warehouse
The decision to use a Lakehouse or Warehouse depends on several factors:
- Migrating from a Traditional Data Warehouse: If your organization does not have big data processing requirements, a Warehouse is suitable.
- Migrating from a Mixed Big Data and Traditional RDBMS System: If your existing solution includes both a big data platform and traditional RDBMS systems with structured data, using both a Lakehouse and a Warehouse is ideal. Perform big data operations with notebooks connected to the Lakehouse and RDBMS operations with T-SQL connected to the Warehouse.
Note: In both scenarios, once the data resides in either a Lakehouse or a Warehouse, Power BI can connect to both using SQL endpoints.
A Glimpse into the Data Factory Experience in Microsoft Fabric
In the Data Factory experience, we focus primarily on two items: Data Pipeline and Data Flow.
Data Pipelines
- Used to orchestrate different activities for extracting, loading, and transforming data.
- Ideal for building reusable code that can be utilized across other modules.
- Enables activity-level monitoring.
To what can we compare Data Pipelines ?
microsoft fabric data pipelines Data Pipelines are similar, but not the same as:
- Informatica -> Workflows
- ODI -> Packages
Dataflows
- Utilized when a GUI tool with Power Query UI experience is required for building Extract, Transform, and Load (ETL) logic.
- Employed when individual selection of source and destination components is necessary, along with the inclusion of various transformation logic for each table.
To what can we compare Data Flows ?
Dataflows are similar, but not same as :
- Informatica -> Mappings
- ODI -> Mappings / Interfaces
Are You Ready to Migrate Your Data Warehouse to Microsoft Fabric?
Here is our solution for implementing the Medallion Architecture with Fabric data Warehouse:
Creation of New Workspace
We recommend creating separate workspaces for Semantic Models, Reports, and Data Pipelines as a best practice.
Creation of Warehouse and Lakehouse
Follow the on-screen instructions to setup new Lakehouse and a Warehouse:
Configuration Setups
- Create a configurations.json file containing parameters for data pipeline activities:
- Source schema, buckets, and path
- Destination warehouse name
- Names of warehouse layers bronze, silver and gold – OdsStage,Ods and Edw
- List of source tables/files in a specific format
- Source System Id’s for different sources
Below is the screenshot of the (config_variables.json) :
File Placement
Place the configurations.json and SourceTableList.csv files in the Fabric Lakehouse.
SourceTableList will have columns such as – SourceSystem, SourceDatasetId, TableName, PrimaryKey, UpdateKey, CDCColumnName, SoftDeleteColumn, ArchiveDate, ArchiveKey
Data Pipeline Creation
Create a data pipeline to orchestrate various activities for data extraction, loading, and transformation. Below is the screenshot of the Data Pipeline and here you can see the different activities like – Lookup, ForEach, Script, Copy Data and Stored Procedure
Bronze Layer Loading
Develop a dynamic activity to load data into the Bronze Layer (OdsStage schema in Warehouse). This layer truncates and reloads data each time.
We utilize two activities in this layer: Script Activity and Copy Data Activity. Both activities receive parameterized inputs from the Configuration file and SourceTableList file. The Script activity drops the staging table, and the Copy Data activity creates and loads data into the OdsStage table. These activities are reusable across modules and feature powerful capabilities for fast data loading.
Silver Layer Loading
Establish a dynamic activity to UPSERT data into the Silver layer (Ods schema in Warehouse) using a stored procedure activity. This procedure takes parameterized inputs from the Configuration file and SourceTableList file, handling both UPDATE and INSERT operations. This stored procedure is reusable. At this time, MERGE statements are not supported by Fabric Warehouse. However, this feature may be added in the future.
Control Table Creation
Create a control table in the Warehouse with columns containing Sequence Numbers and Procedure Names to manage dependencies between Dimensions, Facts, and Aggregate tables. And finally fetch the values using a Lookup activity.
Gold Layer Loading
To load data into the Gold Layer (Edw schema in the warehouse), we develop individual stored procedures to UPSERT (UPDATE and INSERT) data for each dimension, fact, and aggregate table. While Dataflow can also be used for this task, we prefer stored procedures to handle the nature of complex business logic.
Dashboards and Reporting
Fabric includes the Power BI application, which can connect to the SQL endpoints of both the Lakehouse and Warehouse. These SQL endpoints allow for the creation of semantic models, which are then used to develop reports and applications. In our use case, the semantic models are built from the Gold layer (Edw schema in Warehouse) tables.
Upcoming Topics Preview
In the upcoming articles, we will cover topics such as notebooks, dataflows, lakehouse, security and other related subjects.
Conclusion
microsoft Fabric data warehouse stands as a potent and user-friendly data manipulation platform, offering an extensive array of tools for data ingestion, storage, transformation, and analysis. Whether you’re a novice or a seasoned data analyst, Fabric empowers you to optimize your workflow and harness the full potential of your data.
We specialize in aiding organizations in meticulously planning and flawlessly executing data projects, ensuring utmost precision and efficiency.
Curious and would like to hear more about this article ?
Contact us at Info@dataplatr.com or Book time with me to organize a 100%-free, no-obligation call
Follow us on LinkedIn for more interesting updates!!
DataPlatr Inc. specializes in data engineering & analytics with pre-built data models for Enterprise Applications like SAP, Oracle EBS, Workday, Salesforce to empower businesses to unlock the full potential of their data. Our pre-built enterprise data engineering models are designed to expedite the development of data pipelines, data transformation, and integration, saving you time and resources.
Our team of experienced data engineers, scientists and analysts utilize cutting-edge data infrastructure into valuable insights and help enterprise clients optimize their Sales, Marketing, Operations, Financials, Supply chain, Human capital and Customer experiences.