“My On-Prem Data Warehouse Is Just Fine Where It Is.” October 06,2020

Perhaps some of your servers and infrastructure have already made their way to Microsoft Azure, Amazon’s AWS, or Google Cloud. For some reason, though, we still find customers are holding back on making the same move with data warehouses and data marts. If they’re performing adequately now, some are tempted to keep them all where they are until they have to make a change. That wait could be a huge missed opportunity in the making.

Many of the newest technologies and approaches your organization might be embracing now or soon, like social media tracking, apps engaging your customers, IoT, or others, introduce significantly more data into your operation than you may be able to keep up with on the hardware and software side. If you haven’t yet future-proofed your data architecture to accommodate what’s coming, a cloud data warehousing solution could be a life-saver.

The following aren’t new ideas, but at The Lytic Group we're finding that reporting databases and data warehouses to often be the last train out of on-prem and into the cloud. So what’s the hurry to get your decision-support data and your BI or analytics data into the cloud?

 

#1  You could be delegating away a lot of your stress.

Imagine the weight lifted off your shoulders as the data manager or administrator of your organization’s computing assets when you hand the burden of their reliability over to the Microsofts and Googles of the world. With a cloud-based housing of your BI solution, you hand that uptime responsibility to the largest tech giants in the world. Among the weak links you eliminate:

  • Old, failing, or secretly faulty hardware.
  • The bill for new hardware.
  • Hardware redundancy.
  • Downtime for hardware upgrades/replacements.

All gone. And all very easy wins.

 

#2  Ridiculously Easy Scaling

With on-prem hardware, the pressures of capacity planning demand that just-right balance of durability, performance, and cost. Get it wrong, and you have a procurement process, end-user complaints, business impact, embarrassment, and lost time standing between you and correcting it.

To my mind, quick scaling is the biggest draw for moving your data warehousing, workloads, ETL/ELT, and analytics layers(cubes) into the cloud. This is the part that soothes all your anxiety about the boatloads more data that you’re being asked to handle and analyze.

In the cloud you are merely a decision and a few clicks away from multiplying your current capacity. Was last week’s 4X performance kick not enough? You’re merely minutes from upping it to 10X. If your needs for computing power fluctuate day-to-day or even hour-to-hour, you can schedule that compute boost you need just for those analytics layer refreshes or for a major ingestion of large fact tables, and follow that with a scheduled drop to cheaper resources for simpler activities for the remainder of the day. Can’t be bothered with figuring out schedules? Try out the auto-scaling options that all the big cloud platforms offer to dynamically figure out what resources you need when.

 

#3  You already know it’s cheaper

Time is money. Time to tweak the operating system, time to provision the hardware, time to do a security lockdown, time to test performance, etc. all come with an impact on the business. Those hidden costs come on top of the personnel and hardware necessary to your shop humming along.

In his “Top 10 Data Blunders” presentation at the 2019 MIT Citi Conference, MIT adjunct computer science professor Michael Stonebraker said of cloud platforms - “They're deploying servers by the millions; you're deploying them by the tens of thousands. They're just way further up the cost curve and are offering huge economies of scale.” (https://mitsloan.mit.edu/ideas-made-to-matter/10-big-data-blunders-businesses-should-avoid)

Particularly for data warehousing workloads, unlike other “always on” resources, there’s a huge opportunity to reap savings from scheduling your compute even being turned on. And so back to the discussion above, also an opportunity to delegate different compute pools, at different strengths and price points, to handle your user BI consumption versus ETL/ELT and aggregation jobs.

 

#4 You’ll never be behind again.

Maybe you meant to update your database server version last year but ran down your budget because of overruns by some project. So now they’re sunsetting your support.

Perhaps you promised your data consumers faster hardware for their data warehouse this year. But then, you know, COVID-19.

What if you had ZERO concerns about “getting around to” keeping all of your capacity and product licensing current? What if your promise to your stakeholders, decision-makers and customers could keep itself?

Redundancy? Check.

Always-current product versioning? Yup.

Hardware that keeps up with your data needs? Always.

 

I’d strongly suggest you go ahead and start at least dabbling in the cloud for your data warehousing and BI data storage needs. Proof-of-concepts are relatively easy, harmless, and cheap with today’s tools built into these platforms. And it also doesn’t come close to those heavy lifts like planning a move of your network files, users, or applications. You will wonder why you waited so long.

  • Recent Posts:
Dirty Data Part 2 - Data Quality Techniques
Edward Heraux April 15,2021

“An ounce of prevention is worth a pound of cure,” said Ben Franklin. Second of my two-part look at winning against dirty data, I'll apply this idea to your data quality and data hygiene efforts.

Read More
Don't Get Caught with Dirty Data (Part One)
Edward Heraux December 03,2020

Your data warehouses have so many sources of unclean data. How do you PROACTIVELY handle data hygiene?

Read More