A fundamental and ongoing challenge in eCommerce data management is incorporating new data sources into your analytics as you grow.
Have you been tasked with the challenge of rewriting a lot of your transformation code (whether in SQL, Python, or by using tools like dbt) because your business or a platform you use changed?
Or, if you were on Shopify in 2018, how did you handle the rollout supporting multiple warehouses and retail locations? What about when Shopify launched POS and added the ability to track which sales associate captured individual sales? What if you started on Shopify and then the business decided to sell on Amazon, too?
For a data engineer, data scientist, or data analyst, managing these kinds of transformation changes can be extremely challenging and time consuming.
That’s why the first data model we focused on at Daasity was a schema that could handle orders from any platform, and it's why we called the model a Unified Order Schema or simply UOS.
Even though we started with Shopify, we wanted to build our UOS schema to be future-proof, expecting that Shopify will continue to improve their functionality and handle more use cases, such as for B2B orders, multiple recipients in an order, and product customization/personalization.
I’ll share a quick story about my experience with big data model changes, and why creating a future-proofed data model was so important.
When I first started at ProFlowers, we used an external company to manage and process our data. It worked great until we bought Red Envelope and realized that we needed to support not only a cart but also product customization. Our data model couldn’t handle this additional transformation, so we had to scramble and build out the functionality in about 3 months, which cost us ~$1MM.
What does a future-proofed UOS data model look like?
A data model built for brands in the long term needs to support the following:
- eCommerce platforms (e.g., Shopify, Magento): Supporting eCommerce platforms is a given, as most brands cannot have an online presence without them. That said, platforms will continue to introduce and develop features that will impact the underlying data model, so the data model must accommodate evolution.
- Marketplaces (Amazon): Many brands begin to sell on Amazon as they scale, which provides a new customer acquisition channel. However, data models often cannot support Amazon, as Amazon does not provide the concept of a customer, and there is little fulfillment information, as the marketplace fulfills the product.
- Retail: By including retail, we add additional concepts to our data model: e.g., physical store location, data around employees that helped sales. Additionally, we need to be able to account for the lack of in-store customer data—it’s important to be able to tie a refund to an in-store purchase.
- Wholesale: Our UOS supports sell-out. As analysts, looking at customer shopping behavior, we care about the sell-out.
- Fulfillment: The data model must account for both standard fulfillment as well as complex and edge fulfillment cases, e.g., including split shipments and multiple delivery groups.
ERD for Daasity’s UOS data model
Here is our ERD:
Our core tables are critical to properly populating data around an order.
Even if a commerce platform doesn’t provide 1:1 data concepts for each of these tables, you need to derive the concepts to normalize the data and enable accurate customer, product, and order analytics.
- For instance, brands will use different naming conventions for their merchandising hierarchy: common terminology includes Style/Product, and Item/SKU/Product Variant. Daasity has taken the Shopify approach and labels tables as Product and Product Variant.
List of Core Tables
Orders: We incorporated as many of the Shopify concepts as possible (financial status, fulfillment status) but we deviated from the Shopify order model to account for future changes. We also included several Amazon fields, such as business order (to account for B2B instances), expedited shipping (to identify and account for rushed delivery), and Prime (to ID individuals who are Prime members.
Note: Billing information is stored here, but shipping information is in the fulfillments table. It accounts for multiple delivery/shipping addresses (in case Shopify builds out that functionality at some point).
Order Line Items: This table contains the items that were purchased. A core feature here is the concept of a Listing SKU and a SKU. A Listing SKU is the SKU you put on the site and often across different channels. A SKU is what is fulfilled, which can then be mapped. This enables you to easily look at sales performance for the same product across platforms.
Fulfillments: This table contains shipping information and tracking information. This differs from the standard Shopify design, but it is critical if you want to support multi-delivery groups.
- Multi-delivery is a feature available in Magento and Salesforce Commerce Cloud, and it is important for companies that support corporate type orders (e.g., corporate gifting).
Order Item Fulfillment: This is a linking table that enables you to traverse the other tables from order to fulfillment.You could even use this if you fulfilled an order item with more than one unit into multiple fulfillments by using the [ordered_quantity] field.
Locations: This table enables you to support retail because you order needs to be assigned to a retail location and also to support multiple warehouses by linking to fulfillments
Customers: This table captures the most recent customer data. While the Orders table will include the customer information at the time of purchase, this table will have the most recent customer information.
- For systems (e.g., Amazon) that don’t have the concept of a customer, we use data from Orders to populate this table.
Product Variant: This table contains your SKU (i.e., what the customer actually buys and wants to have fulfilled). We include both this SKU and the Listing SKU to normalize data and enable tracking and cost across platforms, so you can quickly calculate your current gross margin.
Daasity’s secondary tables complete the data picture for a brand.
List of Secondary Tables
Note: A goal of our UOS is to incorporate logic that enables you to “household” customers across systems in order to understand omni-channel performance. This is why the Customers core table has a customer record for each commerce platform.
Customer Household Lookup: This table is where we tie a customer from a commerce platform to a householded customer.
Unique Customers: This table is where we keep email, name and address information on the householded customer.
- This becomes a core table for building LTV so you can look at a customer across all commerce channels
Order Household Lookup: This table enables us to easily link an order to a householded customer
Order Discount Codes: This table tracks discount codes. It enables you to support stackable discount codes and also build some analytics right off this table to track multi-discount code orders.
Order Shipping Service: This table tracks the customer-facing option of what was presented as the shipping method vs. the method that was actually used to fulfill an order.
Payment Activity: An order may have multiple transactions or payments associated with it (e.g., using a gift card to pay for part of the purchase and a credit card for the remainder). The [order_payments] and [transactions] tables enable you to track those details
Note: There is only one additional table in the product section. You can easily add a merchandising hierarchy table to join to [product_variants] and make it easy to roll-up reporting across your hierarchy
Product: Following Shopify logic, this could also be considered a Style.
Note: Refunds can be challenging, especially for retail locations where you may not be able to tie back to the original order. We consider these secondary tables as not all merchants want to allow refunds, and in retail you may need to have an order that has a negative value to capture the refund.
Refund: This table incorporates all refunds (even if there are multiple refunds per order).
Refund Line Items: This table covers each item that was refunded and ties back the refund(s) to the order item(s).
- It is important to track the quantity as the customer may not return all the units of the item.
Note: We believe that tracking inventory levels is critical to certain types of merchandising and planning analytics, so we built our inventory tables to enable capture inventory over time so that you can have records of historical inventory.
Inventory Levels: This table was constructed to track inventory at the day, location, SKU (item) level. This will enable you to track a history of your inventory.