Jigsaw Blog

25

Sep
2013
0 comments

OLAP or SQL

On the rest of the website you will have read about the Analytical Applications, the dashboards and how easy analysis enabled business users to gain benefit that XLCubed products and services are all about.

The truth is that XLCubed is only useful based on data and this section attempts to explain the options available to you and briefly share the experience of 25 years of the author's experience of delivering business intelligence applications. Over the years these applications have been called Executive Information Systems, Management Reports, Performance Management but they all came down to trying to provide insight into business data.

The amount of data that business now has in 2009 compared to 1984 is staggering and over those decades different technologies have emerged to cope with the volume of data and demanding needs of the end users.

Today the choice for a business intelligence application is largely perceived as between a Relational and/or Multi-Dimensional databases (recently "NEW" columar style databases have been coming to market but these are outside the scope of this discussion).

There are numerous flavours of each of the database styles and an ever increasing set of products to make them go faster but let's characterize them as:

Relational = SQL = Tables
Multi-dimensional = OLAP = Cubes

In the most simplistic terms Relational databases design history were for transaction based applications (i.e. getting data in) and Multi-dimensional for reporting (getting data out). In the 90's OLAP databases were largely perceived as very expensive and only for the biggest of companies. Microsoft started to change that when they provided an OLAP database for free with SQLServer 7 in 1997 but I think it's fair to say that the majority of BI remains based on SQL.

So what's the difference and where should you consider using each type for BI projects.

Let's talk first about OLAP Cubes. A Cube consists of Dimensions and Measures. Dimensions are descriptive elements about business organised into hierarchies e.g. Geography. These examples show the XLCubed Excel member selector and hierarchy examples.

olap1 olap2

 

Measures are facts about the business - basically items that you want to know about - Sales, Revenue, Profit, Price etc and the Cube is the combination of these two such that any the Measure at any point in a combination of dimensions is known. In the vast majority of cases Cubes are sourced from relational databases so SQL remains a key part of an overall solution.
A cube is constructed through an ETL - (E)xtract, (T)ransform and (L)oad process

  • Extract - Get the data from the source systems
  • Transform - Convert the data into a common format
  • Load - Get it into the cube

This is typically called the "Build" process and relies primarily on good relational database design, SQL skills and a perform build process is essential to a high quality BI solution. Too often we see Cube build's taking too long - hours, days even - and alarm bells ring.

We also recommend adding a final step to the build process - known as the Validate Step - which basically means that you should make sure that your cube adds up to your source data as red faces ensue if they don't.

The cube will also contain calculation's which can be simple Profit= Sales - Costs or extremely complex - Rolling 7 Day Volume weighted margins. These are exposed to the user as new measures or members.

End Users then query the cube - rarely directly - but through a front end tool (such as XLCubed) which constructs the query and the end user is hidden from the complexity making it easy to understand.

The technology behind cubes ensures that if you follow good design principles that the results of queries are blindingly fast. Navigation through the data becomes intuitive resulting in speed of thought analysis and discovery of actionable items that make a difference.

Sounds just what you need for Business Intelligence - and in many cases you will be right - but there are important considerations to assess with respect to application design. Most, if not all, of the edge cases where cubes have challenges require experience, lateral thought and compromise to be successful. Our consultants and customers have examples of success in almost every case.

Much of this discussion is based around our decade of experience with Microsoft Analysis Services.

Your breadth of reporting needs - Cubes perform best with manageable sets of dimensions- rule of thumb 16 Dimensions - from both a manageability (human beings understanding) and performance. HyperCubes with 100's of dimensions can and are successful but require careful design and typically clarity of explanation.
Data Volumes and Dimensional Rate of Change - Very high data volumes (tens of millions of dimension members), 100's of millions of facts require careful thought and design particularly with respect to dimensional hierarchies.
How up to date do you need your data - The more up to date the more challenging the cube build becomes.
How much hierarchical navigation do your end users need - Cubes excel where there are lots of hierarchies. If there are no real hierarchies then the benefits lessen. If people only want to report on detail data then there really is very little point.
How much text data do you want to report on - Cubes do numbers great - Text not so well.
What is the nature of your reporting needs - If you simply are producing list style reports which have little need for further interactive analysis then Cubes aren't for you.
What is the nature of what you want to do to your measures:


  • Adding Up the Hierarchies - Great
  • Averaging Up the Hierarchies - Great
  • Inception to user defined date - Care to be taken
  • On the Fly Volume Weighting - Care to be taken
  • Complex Triangulation calculation - Care to be taken

Ultimately it comes down to how many numbers of having to be added up in the course of a query. If it's millions to billions then be careful.

Weighed against the benefits of the cube analytical capabilities is the "Cost" of creating the cube environment both from a software cost and a manpower cost. There has been a perception in the market that Cube and data warehouse projects take a long time are expensive and don't deliver return.

This does not have to be the case and personally I think the failure of such projects is much more to do with the definition of the business problem and design failures along the way.

Unless a business knows what it wants to do with its information, understands its organization and operational infrastructure and the quality of its source data systems (of which there will be many) then a project is typically doomed.

XLCubed consultants are trained to ask:

  • First - the business question's first (The Why - where is the benefit going to come from)
  • Second - Where does the data come from ( The how - we try very hard in this question to look at the detail of the data as I'm afraid the data never lies)
  • Third - The technology infrastructure and environment

Business Intelligence projects can be completed in days, weeks and months but without looking at projects this way a tightrope is walked. Based on this analysis the cost benefit of the approach can be taken and a decision taken. Sometimes an evolutionary approach can be taken with a key focus being realizing benefits early.

XLCubed products support both OLAP & SQL and our consultants are fully versed in the costs and benefits.

So in summary:

OLAP = Best for interactive analysis and High Level Aggregated reporting with average latency
SQL = Best for detail list style analysis and very high latency

In our applications we always look at the costs vs. the benefit and more than often utilize a combination.