Daasity Data Model: Overview & Design Philosophy

This article provides an outline of the general Daasity Data Model, why we designed the data model this way, and how the Daasity transformation layer works.

Design Philosophy

The Daasity Data Model was designed to be future-proof so that a complete rebuild of reporting would not be needed when either a source system changed or a new source system was added.

Our team's experience of dealing with switching Email Service Providers (ESPs) and changing APIs lead to this design that allows us to minimize the impact by leveraging a normalized middle layer so that we only need to make one set of changes when upstream systems are modified.

Daasity Data Model

Data undergoes three steps:

  1. Data is replicated from the source system into the Extractor Schema.
  2. Data is transformed from the Extractor Schema into a Normalized Schema.
  3. Data is transformed from the Normalized Schema into the Reporting Schemas.

Extractor Schemas

The extractor schema is the best representation of the source system (SaaS platform, database or other data source) that is possible in a traditional database structure. Thus, nested data sources (e.g., JSON) may be denested into multiple tables.

This approach enables us to implement an ELT approach and move the transformation logic to a SQL/Python layer where it is easier to access and modify.

Although our storage costs may increase because of the data replication, in the consumer brand industry, the size of data is relatively small, and storage costs are minimal in comparison to the cost of maintaining pipelines that transform from source to end reporting.

Normalization Schemas

The normalization schemas are a core component of the Daasity platform. Developing a normalization schema has significants impacts for analytics development, as it reduces the overall maintenance of the data model and allows you to plan for the future.

For example, our Unified Order Schema (UOS) is built to support a multi-shipment/multi-recipient framework across eCommerce, Marketplace, Retail, and Wholesale, which very few commerce platforms support.

This means if a commerce platform were to add additional functionality for multi-shipment/multi-recipient you would only need to change the transformation code from the Extractor Schema to the Normalization Schema, and none of the downstream data models and reports would be impacted. This greatly reduces the maintenance, as we have one single data model to change.

Currently we have four normalized data models deployed:

We are currently designing one additional data models to support omnichannel and other complex business cases:

Reporting Schemas

The data reporting schema (DRP) is the source schema for reporting and where we link a visualization tool like Looker, Tableau or Sigma Computing. Building the data reporting schema from the normalized schema enables us to build the business logic into this transformation layer and limit changes that need to be made to changes to business logic and not the source system.

Data Reporting Platform (drp)

DRP is the original reporting data model which uses the concept of data marts even though the tables are stored in a single schema.  This allows us to build our visualization layer for specific user groups to ensure that a user can build reports themselves and reduce the likelihood they will get the wrong results.

For consumer brands, we construct these tables into several sections:

  • Visitor Traffic and Store Performance: providing the ability to understand traffic across eCommerce, Marketplace and Retail as well as conversion and site performance (eCommerce and Marketplace)
  • Channel and Attribution: providing the ability to understand where your customers came from and how different attribution methodologies change that
  • Marketing: providing the ability to understand how your acquisition marketing is performing
  • Orders & Revenue: providing the ability to understand the component of revenue and perform complex customer/product/order analytics
  • Customer & Lifetime Value: providing the ability to build customer segments and how customers perform over time
  • Subscription: providing the ability to understand the performance of businesses that offer subscription
  • Email & SMS: providing the ability to understand email/SMS performance from both an email/SMS and customer perspective

Users with access to our Customizable Business Logic feature can review our code in the following Github repositories:

  • Base DRP Code - Github repository with our base code
  • Pro DRP Code - Github repository with code specific to our enterprise transformation

Data Marts (dm)

New reporting functionality is being developed in stand-alone data marts to enable code to run independently and leverage the Daasity data orchestration engine that allows for data mart updates to run concurrently.

Similar to our original drp schema, the data mart structure enables us to build the visualization layer on top of each data mart to address specific questions related to that area.

Current data marts in production: