Building Your Own eCommerce Analytics Stack
So, you’ve been tasked with building the analytics stack for a scaling eCommerce brand. You’re probably buried in raw data and Excel spreadsheets, and you’re looking for a straightforward guide that explains the data tools you’ll need to build a full analytics stack, the cost, the expertise you’ll need to launch your stack, and how long you should expect this build to take.
In this article, we’ll cover those points at a high level so that you will have a solid framework for what’s needed.
Part 1: What is an analytics stack?
An analytics stack is a set of tools and technologies that work together to ultimately present data in ways that are most actionable for those making critical business decisions at your brand.
A full analytics stack will cover five major steps to evolve raw data into beautifully presented visual assets or charts:
- Extract: Copying raw data from its original sources (e.g., Shopify, Amazon, Klaviyo, Google Analytics)
- Load: Replicating data into a data warehouse (e.g., Snowflake, RedShift, BigQuery)
- Transform: normalizing the data to build a data model to load into a BI platform or visualization tool
- Visualize: Using a BI (or other visualization) platform to turn data in graphs and charts for the team to easily interpret the data.
- Analyze: The team interprets the metrics and trends in a visualization tool to gain insights to inform new strategies for the company.
If you were going to build this yourself, here is a visualization for reference:
Part 2: Analytics stack tools...and how to make them all work
Extract and Load
Time to build: Appx. 2-4 weeks
Extract and Load Functionality
Fivetran and Stitch have similar functionality. Both tools copy raw data from data connectors (CRMs, ERPs, social media, ads platforms, CS tools, etc.), and load that data into a data warehouse (think Snowflake/Redshift).
The differences between the platforms are more technical, and the tool you choose is more about what you are most comfortable working with and what makes the most sense for your brand.
Fivetran also has greater (though not complete) post-load transformation functionality, as it can be more easily integrated with dbt (more on dbt later).
Extract and Load Pricing
Fivetran and Stitch both offer pricing models based around rows of data extracted per month. Pricing for both Fivetran and Stitch scales logarithmically. So, a company that uses 1 billion rows is not paying 100 times more per month than a company using 10 million rows.
Stitch’s pricing model is based on “raw” rows, that is, the total number of rows that store data. Stitch offers two pricing tiers: Standard and Enterprise. Stitch Standard supports up to 300 million rows of data per month; larger data needs require a custom pricing solution on the platform.
Fivetran charges for MARs (Monthly Active Rows), and customers purchase credits that correlate to the number of MARs. Fivetran has three price tiers, each of which has additional features (such as progressively higher data security, faster support, and more frequent data syncs). The per-credit cost increases with each tier from ~$1/row to $2/row.
Extract and Load Experience Required
Fivetran, Stitch, and other extract and load platforms do require some technical experience in order to ensure that the data is extracted and loaded into a data warehouse properly. For example, the user needs to be able to read and understand Entity Relationship Diagrams (ERDs), set up and connect the data warehouse to the extract and load tool, and ensure that everything is configured properly.
Your brand may also require non-supported data connectors/data sources on the extraction side or needs particular customizations to the data that is being ingested, then relevant integration or development will be required. This becomes essential for eCommerce brands because Fivetran, Stitch, and other extract and load tools tend not to support common eCommerce platforms, such as Amazon Marketplace, Gorgias, Skubana, and Attentive.
If you or somebody on your team isn’t able to make the desired changes, then you will probably have to hire a data engineer with more specific experience. Depending on your location, a data engineer with relevant eCommerce experience costs $125k-$150k annual salary (Glassdoor).
It’s important to consider these sorts of variables when you’re calculating the budget required for your analytics stack. If you only allow the budget for the data platforms themselves and you then need to hire external resources to build and maintain customizations, you can quickly find yourself extremely over budget.
Just as we focused on Fivetran and Stitch out of many extract and load tools, we’ll focus on Snowflake, Redshift, (AWS), and BigQuery (GCP). These are Online Analytical Processing (OLAP) systems, which allow for queries of large data sets and are geared for BI and analytics.
Time to build: Appx. 1 week
Data Warehouse Functionality
Snowflake, Redshift, and BigQuery are going to be the three main cloud-based data warehouse options providers that eCommerce companies consider. They can be connected to an extract and load tool and allow for transform code to be built in.
However, it’s crucial to understand nuances among these options in the context of pricing and the experience necessary to implement them into your analytics stack.
Data Warehouse Pricing
Data warehouse providers tend to charge for both data storage and data queries/calculations/updates.
Snowflake offers either on-demand or pre-paid pricing models, both of which have three pricing tiers (Standard, Enterprise, and Business Critical). The on-demand model is more expensive than the pre-paid model. Both models factor in the three facets of Snowflake’s system, called the Data Cloud. The three facets are Storage, Virtual Warehouses (for loading and querying), and Cloud Services (permanent state management and coordination of Snowflake, such as loading data using Snowpipe).
Snowflake’s storage charges by the Terabyte and costs vary by region. For example, a company located in Oregon will pay $40/TB/month ($23 if pre-purchased), and a company located in Tokyo will pay $46/TB/month ($26 if pre-purchased). Storage costs are usually a smaller portion of the total Snowflake costs (5% to less than 20%).
The pricing of Snowflake’s Virtual Warehouses and Cloud Services are both based on Snowflake Credits (a unit of measure based on time spent using Virtual Warehouse and/or Cloud Service resources).
The cost of Snowflake Credits also vary by region, and the overall pricing tier (the higher the tier, the higher the cost per credit). The larger the Snowflake Virtual Warehouse, the more credits it will consume per hour (this ranges from 1 credit/hr for the smallest option to 128 credit/hr for the largest option) and the more credits it will require for cloud services—because the larger warehouses are slower.
Because Snowflake charges based on runtime, knowing how much time your brand will need to run queries every day will be crucial to cost optimization. If you have a strong idea of your expected daily runtime, you may be able to opt for the pre-paid pricing model, which will save your brand money every month. Additionally, it’s crucial to understand whether you will need to move data between regions because Snowflake may charge for transfers, depending on the location specifics.
BigQuery also offers on-demand or pre-paid pricing models, but it does not include pricing tiers. Instead, it offers one overall pricing model that charges for data storage and data analysis. BigQuery categorizes data analysis as querying and anything else that activates data, such as some data manipulation language (DML) and data definition language (DDL).
Loading data into BigQuery is either free or paid depending on your input method (more on that here), but all data in BigQuery’s storage must be paid for. The storage costs break down into two categories: active storage ($0.02/GB/mo) and long-term storage ($0.10/GB/mo). Data in BigQuery is charged as long-term when it has not been activated in 90 days.
BigQuery was once not optimized around DML statements (i.e., modifying data within tables), but Google addressed this limitation in early 2020. This was relevant for eCommerce brands because eCommerce data is frequently updated (e.g., multiple updates around order status).
As of 2021, there are some edge cases for larger volume merchants—such as large concurrent INSERT statements—who should follow some of BigQuery’s recommendations for table optimization. This might only be an issue if you have a high traffic website with a high volume of streaming data; if you track eCommerce information like user data and order information, this likely will not be an issue for you.
Redshift offers a different data storage approach than Snowflake and BigQuery, in that you’re actually renting nodes (physical servers) from Amazon when you purchase storage space. The nodes come in many sizes, which facilitates storage scaling and provides a predictable monthly storage cost.
Storage costs vary by region, but in the United States, the price ranges from $0.024 to $0.027 per GB per month.
Data Warehouse Experience Required
Of these three data options, Snowflake generally requires the lowest level of technical expertise, and it is the most straightforward to implement: essentially, you need some SQL and data warehouse experience.
Both BigQuery and Redshift require greater expertise than Snowflake, as you will need someone that is comfortable technically with databases for setup.
As we mentioned in the Extract and Load section, the experience you and your team have with data warehouses will factor into whether you need to hire someone to assist with implementation and maintenance, and this can have a considerable impact on your budget for this section of your eCommerce data stack.
This is usually the longest part in configuring your analytics stack: transforming the raw data (that your Extraction tool loaded into your warehouse) and optimizing it for the BI/Visualization tool of your choice. If you aren’t using a BI tool, you will still want to transform your data so your analyst team has an optimized data infrastructure they can consistently access.
Time to build: Appx. 3-6 months
dbt is a popular cloud-based external transform tool to kick off your transform code. External transforms can be done in Amazon Web Services, or directly in BI tools, but for reasons we’ll get into, it is often beneficial for eCommerce brands to run in-cloud transformation code.
If you only have one person using dbt, the tool is free. For smaller teams, pricing roughly starts at $50/month/user and scales from there.
Transform Experience Required
While dbt makes it easier for an analyst to get started, building a data model for a growing brand from scratch takes a wealth of experience in not only SQL code but also eCommerce knowledge. The nuances of understanding how standard metrics like gross margin and customer lifetime value are calculated for an eCommerce business can trip up even experienced analysts.
In addition to knowing how each metric needs to be calculated, there are a wide variety of tools that a growing DTC and omnichannel brand uses. This further complicates the transformation process to understand performance across sales and marketing channels. For example, a standard attribution model across marketing channels or understanding product performance across distribution channels are quite intricate to accurately track and model.
Time to build: 2-3 weeks
Looker, Tableau, and other BI tools are designed to pull in a brand’s transform code into their platform. The quality of the code and logic determines the quality of reports that can be built in the platform. You want the most accurate fresh data to properly flow into the BI platform so the data tables, metrics, and visualizations that you create will be relevant, accurate, and filled with actionable insights.
Your team’s experience with SQL will determine which BI tool is best for your team. After syncing your data to the BI platform, the magic happens. The team will be able to join data, build reports, and create dashboards to draw insights on how to improve the data.
Keeping your data well organized is what propels this process forward and empowers more of the team to build reports for their respective teams.
Pricing for BI tools range from $3-5k per year to $100k+ and changes based on your needs.
Visualize/Analyze Experience Required
If you properly build your ELT process, then the team will spend more of their time in the BI tool of your choice creating reports and dashboards. While SQL knowledge and being able to interpret ERDs is always helpful for the “No SQL editor” visualization tools, the team will be able to use the BI platform without having to know how to write SQL code. The bulk of the heavy lifting is already done if you have an optimized transformation code ready.
While we recommend that everyone at the company familiarize themselves with creating reports, we know that most of the time that falls on you, the analyst/analytics team.
Part 3: A Custom eCommerce analytics stack vs Daasity’s pre-built eCommerce analytics stack
We’ve now gone through a high-level view of what is required to build an eCommerce analytics stack from scratch.
You need an Extract and Load tool like Fivetran/Stitch, a Data Warehouse tool like Snowflake, a Transform tool like dbt (or custom transform code), and a BI platform for visualizations that will allow you and your team to easily analyze mission critical data, metrics, and trends.
But, let us ask you: how soon do you want to gain value from analyzing your data?
The reason we ask is because Daasity (when combined with Looker) is an all-in-one ELT+BI tool that automatically manages your data and allows you to run custom reports, build custom metrics and dashboards, and get the most actionable data you need, when you need it.
The reason why we know this process so well (and how long it takes) is that our founding team used to build custom instances for early clients. After building this process a couple of times, the team realized there was a more efficient way for eCommerce companies to access their data.
Daasity has built the ELT functionality (your data warehouse) for you, and has partnered closely with Looker to cover the BI section (the data modeling).
Daasity’s data warehouse is optimized for Looker, which means that brands can customize dashboards and metrics to unlock insights, understand trends, and combine store data (e.g., Shopify+Amazon) in ways that simply isn’t possible on other platforms.
The other good news is that Daasity + Looker will require a whole heck of a lot less time to implement.
Consider these two timelines:
While you could build your own data warehouse, you will need to hire at least one other person but likely two (or more), you will need to integrate complicated platforms together (none of which are optimized for eCommerce, and all of which require special customizations to function with eCommerce tools), make them work as a well-oiled machine, and ensure your data is ready to be called on whenever you need it.
Instead of hiring a data team (Data Analysts, Engineers and Scientists), spinning up 3-5 separate tools and building this out over 9-12 months, we’ve built an all-in-one product to get eCommerce companies access to their data faster.
Daasity is built solely for eCommerce, and is optimized for all facets of your business—from integrations, to data warehousing, to how specific metrics like gross margin are calculated for eCommerce, to the logic necessary to easily customize your data model.
Because Daasity is purpose-built for eCommerce, your data pipeline and data model is a clean and seamless flow, without the frequent hiccups, missing data, maintenance, troubleshooting, and API updates that your team will have to handle itself (even if you do have a well-built custom data analytics stack).
That said, if you are interested in continuing the process of building your own analytics stack, we sincerely hope this guide has helped out; every eCommerce brand is different, and you may find utility in building everything from the ground up.
However, if you need to derive insights from your data in the next month or don’t have the data team to build this yourself give us a holler, and you can talk to an Analytics Specialist here.
We’re here to help make every eCommerce merchant data-empowered. We hope that you’ll be one of them.