The Needs Analysis Checklist: From Data to a Cube

At the beginning of the summer, when I first started my internship at Marketing Technologies Group (MTG), I knew next to nothing about Essbase.  Then, after taking part in the Essbase Bootcamp, I was assigned the task of building an Essbase cube for a topic of my choice.  The possibilities seemed endless, and even after I made the somewhat morbid decision to create a database of mortality statistics, I still faced a lot of uncertainty.  The gap between raw data and a useful and effective Essbase cube is huge.   According to MTG’s standard methodology, I turned to MTG’s Need’s Analysis Checklist, which provides an objective-based, step-by-step planning process.  With its help, I was able to better understand the project and design requirements so that I could ultimately build a more effective cube.

In this blog, I will walk you through the Needs Analysis Checklist as I approached it in working on my mortality database.

Users

The first step is to think about who is going to use the database.  Are there a few users or many?  What type of experience do they have?  To what departments do they belong and what are their job functions?  A successful project is one whose users find it to be useful, so to make sure the project is successful, you must first understand your users.

In my case, I was my own user.  To make the project more interesting, however, I pretended that I worked for the World Health Organization(WHO) or was in some other position that gave me the ability to guide decisions about health initiatives.

Business Objectives

The second step is to identify the users’ business objectives.  What questions do they want the cube to answer?  How will the cube be used to make decisions?  To what improvements will it lead?  The answers to these questions have great significance.  They guide the design of the cube and provide benchmarks on which to measure the project’s success.  If you find yourself struggling to answer these questions, the cube may not be worth building.

I considered what my WHO alter ego wanted to gain from this project and came up with the following.

  • The cube should answer the following questions:
  1. What causes of death are most prevalent for particular regions, genders, and age groups?
  2. What genders and age groups are most at risk in particular regions?
  3. How do the death rates of various regions compare?
  4. Where are the death rates from easily preventable diseases still high?
  5. What are the death rate trends?  Which death rates are going up over time and which are going down?
  • The cube will be used to make decisions as follows:
  1. Information about the number of deaths due to various preventable diseases will inform decisions about how to allocate funds to various health initiatives.  Variance statistics among regions will highlight diseases that are much more deadly in certain regions of the world.  These diseases and regions will be likely candidates for effective programs.
  2. Statistical comparisons among regions, age groups, and genders will help make decisions about which populations various programs should target.
  3. Year-to-year comparisons will provide insight about the success of existing programs and will be used to make decisions about whether to continue, change, or eliminate these programs.
  • The cube will lead to the following improvements:
  1. More targeted, effective, and efficient spending on health initiatives

Metrics

The next step is to decide what measurements you will need to meet your business objectives.  In many cases, these are financial measurements.  In my case I decided on the following:

  1. Deaths in thousands (The format of the data I would be loading)
  2. Deaths  (A more intuitive way to look at the raw numbers)
  3. Crude Death Rate (For comparison among regions, genders, age groups, and years with different populations.  Crude Death Rate is the number of deaths per thousand people.)
  4. Population (Used to calculate Crude Death Rate)
  5. Percentage of deaths due to given cause (To illustrate the relative impact of each cause of death; also useful for comparisons among populations)
  6. Percent Variance (To show how the crude death rate deviates from the norm and identify candidates for health initiatives by highlighting abnormally high death rates)

Dimension Summary

The best place to start is with the business objectives.  What other dimensions are necessary in order to complete the desired analysis?

After deciding what dimensions you need, consider the hierarchy structure of each.  At this stage, the Oracle Essbase Bootcamp suggests making label outlines for each dimension.  Label outlines consist of the dimension name followed by two parallel hierarchies.  The first hierarchy lists a generic name for each generation while the second lists a real-world example for each generation.

Based on my business objectives, I knew that I wanted to make comparisons across years, ages, genders, regions, and causes, so I decided to make each of these categories a dimension.  The next step was to map out the hierarchies.  All Years, All Ages, and Both Genders were easy because they had very few members.  For All Causes, I used the hierarchy that already existed in my data source.  For All Regions, however, my data source just had a list of a couple hundred countries.  I knew that I wanted to make larger scale comparisons among parts of the world, so I decided to organize the countries into regions and sub-regions.  I created my label outlines and organized them in the table below.

After you map out your hierarchies, make a note of how large you expect each dimension to be.  You may also want to make a tentative decision about sparse and dense storage configuration.  You’ll need to load your data before you’ll be able to make a final decision, but setting the Accounts and Time dimensions (Measures and All Years in my case) to dense and the other dimensions to sparse is usually a good place to start.

Output

After you create the cube, how will users access the data?  What views will provide them with the information they need?  In answering these questions, you’ll want to refer once again to the business objectives.  Consider what reports each user will need and what layout of page, column, and row dimensions/members will make the information easiest to understand.  Also, consider the formatting of the reports.  For instance, will you create tables or graphs?  Tables provide a granular look at exact figures, but graphs may be a better way to quickly understand the “big picture.”

Information about output requirements can be organized in a table.  The table below lists only a few of the reports that I decided to make, but it should provide an idea about how to approach this step in the planning process.

Input

After you develop plans to organize and use the data, it’s time to consider the input data.  What are the sources, formats, and sizes of your input files?  How frequently are input files created?  In terms of the dimensions that you have outlined, how is your data organized?  As with the information about output, this information can be organized in a chart.  Information about my three sources is organized below.

It’s also important to compare and contrast your input sources.  In particular, can you use one load rule for all your input files or do different sources need different rules?  The formatting of my three files was comparable.  The 2008 file was different in that it provided data for males and females on the same worksheet.  However, the 2004 filed could easily be combined into a file with the same format as the 2008 file.

Another point to consider is whether the data in the input files will be loaded at level 0.  The best practice is to load data at level 0.  However, in my case, it was not possible to do so.   My population data had different dimensionality than my death data; while the population data was specific to each year, age, gender, and region, it was not specific to a cause.  Thus, I had to load my population data at generation one of my All Causes dimension.  Normalization calculations are generally needed when data is loaded high.  In the next section, I will discuss the normalization calculations that my database required.

Transformations

The transformations section asks you to consider all the needed calculations.  These include calculations to be set in the outline, like aggregation properties and member formulas, as well as calculations to be included in the calc script.  Consider errors that would occur if you ran the default calc script.  How can you correct these errors?

Normalization Calcs

Normalization calcs are needed when data is loaded high.   If you are going to load data high, work through the logic of the necessary corrections.

As aforementioned, I loaded population data to generation one of All Causes.   In order to calculate Crude Death Rate for cause categories and individual causes at lower levels of the hierarchy, I needed my calc script to copy the population statistics from generation one down to the other generations of All Causes.   I then needed to prevent further aggregation across All Causes as such aggregation would incorrectly add the population data.  However, I did need to aggregate All Causes at some point because I needed to aggregate the Deaths in Thousands data.  To resolve this problem, I decided to turn off AggMissg, aggregate the All Causes dimension, and then perform my normalization calc.

Essbase Outline Calcs

When planning the outline calculations, consider each dimension.  Where possible, add dynamic calcs to reduce database size.

I began with my Measures dimension.  First, I considered aggregation.  Since it made no sense to aggregate the various measures, I set their consolidation property to Ignore (~).  I also set Measures to Label Only.  I then focused on my four calculated measures and decided that each could be dynamically calculated by a formula.  I planned out the formulas as below.

  • Deaths = Deaths in thousands * 1000
  • Crude Death Rate = Deaths/Population * 1000
  • Percentage of deaths = Deaths % Deaths -> All Causes
  • Percent Variance= %Var(Crude Death Rate, Crude Death Rate->All Regions)

Next, I considered my other five dimensions.  Except for the issue with aggregating population as discussed in the Normalization section, each dimension had a very standard aggregation.  I set all the members of these five dimensions to aggregate by addition.

Procedures and Hardware/Software Environment

After creating a cube, you’ll need to provide your users with information about how to use it.  Provide instructions on loading data, calculating, updating the outline, and producing reports.  Also consider providing information about the required hardware and software.

Conclusion

After completing all these steps, you should better understand your cube’s purpose and the process you will follow to make your data serve that purpose.  While completing the Needs Analysis Checklist requires time and thought, it prevents costly errors, enables a more efficient build process, and produces a more useful result.  It helped me to successfully create my first database, and it helps consultants at MTG to create cubes for clients.  Working through the Needs Analysis Checklist is well worth the time.

Share
Posted in Essbase | Leave a comment

Web Analysis Subscription Controls

One of Hyperion Web Analysis’ strengths is creating visually appealing reports, or dashboards, that combine various components.  The most effective dashboards have components that work together.  For instance, a spreadsheet could display actual and budget profit data for all products in a given region while a bar chart in the same report could display profit variance percent for all products in the same region.  If users want to change the point of view and, for example, see information for a different region, they should be able to make a new selection (e.g., select a new region) just once and have that selection applied to all components of the report (e.g., the spreadsheet and chart).  Subscription controls provide this functionality in an easy and visual way.

Subscription controls come in various types from Radio Button Group Subscriptions to Slider Subscriptions to Combo Box Subscriptions.  When you create a subscription control, you link it to the data sources of whatever components you want to control and then select members of a dimension among which you want to toggle.  Making a selection with a subscription control icon applies that selection to all of the components whose data sources you selected.  This makes it easy to quickly change one or several factor across a variety of displays.

Create a Subscription Control:

  • Open or create a report that contains at least two components.

  • In Design view, drag and drop a subscription icon, such as the Combo Box Subscription icon , into the content pane.
  • Use Ctrl-click to select the data sources for all the components that you wish to control.  Click OK.

  • Select the dimension that you want to control. Click OK.

  • Click Add and select the members that you want to toggle among. Click OK.

  • Click OK again.  Switch to Analyze view and use the subscription control to control for which member of the selected dimension data is shown.  To control the POV for additional dimensions, add additional subscription controls.

Share
Posted in Essbase, Reporting | Leave a comment

New Calc Functions for Date Measures in Essbase 11.1.1: @DateDiff, @DatePart and @DateRoll

Date math can be extremely useful in solving a variety of business problems such as accounts receivable and accounts payable forecasting, project management and loan and interest calculations.  It has always been possible to do date math in Essbase the same way Excel does date math, using serial dates.  For example April 13, 2011 can be represented as 40646 – the number of days since January 1, 1900.  The only difference is that Excel is much better about formatting the results as the date.  The user has to do that in Essbase queries.

In version 11 Essbase introduces text and date measures and a handful of Calc functions for manipulating them.  In this article we’ll address three Essbase BSO Calc language functions for dealing with dates.

Essbase V 11 New Date Functions

Function Returns
@DATEDIFF Difference between two dates.
@DATEPART The number of the Year, Quarter, Month, Week, Day, DayOfYear or weekday
@DATEROLL adds or subtracts the specified number of time periods to the input

In this simple example we want to forecast revenue based on outstanding invoices and customer payment history. That is, we will add the average number of days it takes each customer to pay to the outstanding invoices to calculate expected future payment date.  This may or may not be a reliable approach to forecasting.  Our objective here is simply to practice the mechanics of the calculations.

Setting up the outline to use using date measures requires several steps:

This spreadsheet shows the input data, invoices by customer.

Input:

  • InvoiceDate: Date of invoice
  • PaidDate: Date paid for already paid invoices

Output:

  • PaidAge : Number of days old when paid
  • RevenueDate: Expected payment date
  • RevenueMonth : Month number when paid

 

We will do the calculations in three steps:

  1. CalculatePaidAge, the age when paid for each already paid invoice
  2. Calculate the average PaidAge by customer
  3. Calculate the RevenueDate and RevenueMonth for unpaid invoices

Here’s the calc script:

/***** Declare a temporary variable to hold the average age when paid  *********/Var vAvgDays;

/******* Set up the calc member block since we are using and IF statement in a script**** */

“RevenueDate”(

If(“PaidDate” >0)  /***********  for already paid invoices  ********/

“PaidAge”=@DateDiff(“InvDate”,”PaidDate”,DP_DAY);  /** PaidDate – Invoice Date   **/

ELSE     /*************  for unpaid invoices    ***********/

/**  Set the temporary variable equal to the average PaidAge for each customer   ****/

vAvgDays=@AVGRANGE ( SKIPMISSING, “PaidAge”,@ISIBLINGS (@CURRMBR(“Customers”)));

/********  Set PaidAge equal to the temporary variable   ********/

PaidAge=vAvgDays;

/****  add the average days to the invoice date to calc the expected paid date */

“RevenueDate”=  @DateRoll(“InvDate”,DP_DAY,vAvgDays);

/***  Extract the month number from the RevenueDate   *****/

“RevMonth”=@DatePart(“RevenueDate”,DP_MONTH);

ENDIF;

)

We will use a temporary variable – vAvgDays because we don’t really need to store the average days when paid.  We only need to use it in the calc for RevenueDate.   In this case we will store it anyway so we can check each step of the calc.

We need a calc member block for three reasons.  It is required when using IF Statements in a script.  It’s required when assigning a value to a temporary variable.  It’s also needed for “interdependent calcs” that is, to force the order of calculation so that the vAvgDays is calculated then used in the RevenueDate calc immediately.   The normal order (without a calc member block)  would be to calc all the vAvgDays values first, then go on to the RevenueDates.  The problem is that the vAvgCalc variable only holds one number at a time so you need to force the “interdependent” calc.

Here are the results:

This might be a nice example of the functions but it is a pretty simplistic approach to forecasting.  If you wanted to improve it you might use @CORRELATE to measure how PaidDays is affected by invoice size, time or year, product, salesperson or other significant features of the sale.   You might use a @Variance to determine which customers are consistent enough to use their historical average reliably.   You might even calculate several scenarios to forecast  possible futures of “slow payments”, “fast payments” and “normal payments”.  Using Essbase technology to solve real world business problems requires the technical skill to get the calcs right in addition to the business acumen to know what needs to be calculated in the first place.

Appendix: Text and Date Measure Functions Syntax

The following is loosely adapted from the Oracle Essbase Technical reference where you can find a much more complete treatment.  The material is repeated here for the reader’s convenience.

@DATEPART()

Returns the date part as a number. Date parts are Year/Quarter/Month/Week/Day/DayOfYear/Weekday

Syntax

@DATEPART ( date, date_part_ex )

Example

RevenueMonth=@DATEPART(“InvoiceDate”,DP_MONTH);

@DATEDIFF()

Returns to difference between two dates in terms of the specified date_part

Syntax

@DATEDIFF ( date1, date2, date_part )

Example

InvoiceAge=@DATEDIFF(“InvoiceDate”,@TODAY(),DP_DAY);

@DATEROLL()

Returns a date a specified number of periods from the input date

Syntax

@DATEROLL ( date, date_part, number )

Example 1

RevenueDate=@DATEROLL(“InvoiceDate”,DP_DAY,AvgAgeWhenPaid);

Example 2

Fix(“Jan”,”Actual”,”New York”)

“Price”(

If(“CloseDate”>@DATEROLL(@TODAY(),DP_DAY,15))

“Price”=”Price”*1.1;

ENDIF

)

ENDFIX

Share
Posted in Essbase, Tips | Leave a comment

Creating From-To Dimensions in Essbase

Have you ever wanted to create a “From-To” matrix in Essbase to follow hierarchy changes you make in your cube’s dimensions? We recently concluded a project where our client acquired another entity and needed to create a new chart of accounts. These new accounts would include accounts and their data from both entities.

The problem was following the data as it made its way from various old accounts into new accounts. Without a powerful solution, an entire team of accountants would have to manually enter values into excel and use checksums to verify their work. This was something our clients wanted to avoid.

Thankfully, following changes in dimensional hierarchies is easy to do using EIS with a solid relational star schema. The end product in Essbase is a cube with two dimensions that represent two different views of one old dimension.

First, copy your existing dimension table e.g. dbo.dim_dept, to a new table e.g., dbo.dim_dept_new, in your relational data source. Next, change the hierarchies as desired.

Move to your EIS OLAP Model and join the new table to your fact table using the same key column you used for the original version of the table. So, if your join was

fact_table.dept_code = dim_dept.dept_code

it should now be

fact_table.dept_code = dim_dept_new.dept_code

Save your OLAP Model and move to your Metaoutline. Add the new dimension and required columns. In order to prevent a conflict in naming conventions, transform the member names in the new dimension by adding the prefix “New.” Two examples could be:

1.       “Admin Dept” to “New Admin Dept”

2.       “Dept12345” to “New Dept12345”

Save your Metaoutline, build, and enjoy!

Share
Posted in Essbase, Tips | Leave a comment

Faster Testing: Taking Audit Sheets To the Next Level

Finance users and IT administrators can both benefit from anything that makes testing faster.  “Audit Sheets” use 3 Excel sheets with the same rows, columns and POV (page members in spreadsheet add-in).    Enter a formula like  =Test!B3-Control!B3 and copy it to all the data cells to create a “differences sheet” to compare a correct target “control data” sheet to a “test data” sheet.

Absolute Values

You can’t easily test an entire Essbase cube in Excel but the more data you test the better.  That means some pretty big audit sheets that you can’t easily scan visually.   You can sum up the differences into one cell and if the result is zero the data is validated.

But what if there are offsetting errors e.g. +100 in New York -> Basses and -100 in California  -> Trombones?   Instead of the formula above, use an absolute value formula like  =ABS(Test!B3-Control!B3) which makes all the errors positive.  That means they won’t offset each other.

Excel Data Filters

Another way to quickly find errors on your differences sheet is to use Excel data filters  to display only the rows with errors. You might want to add a column to the far right that sums the other columns so you can apply your filter in one place.  In Excel 2007 select the column labels then click Data|Filter. Click on the drop down and select Number Filters.

Summary Sheets

Sometimes you will have several sets of audit sheets in the same workbook.  You can sum the differences on differences sheet to one cell.  Then use a formula to display all the sums on a “summary sheet”.    Then you only have to look in one place to see which set of data had errors.  If you are using the spreadsheet add-in then the Essbase Query Designer can be used to retrieve all sheets at once (you still have to connect to each one-yikes).  Depending on version, Smart View makes it pretty easy to retrieve all sheets.

Synchronized Dynamic Audit Sheets

You can create the row and column (and page) members on the test data and differences sheets using formula that point to the rows and columns (and page members) on the control data sheet.  Then if you change the members on the control sheet (and refresh the test data) the three sheets are synchronized.

Difference Cubes

Audit sheets are a great tool but you can take it a big step higher using “difference cubes”.  Difference cubes create an entire cube full of differences by subtracting every cell in a Test cube from the analogous cell in a Control cube.  IMHO Difference cubes are a secret weapon in the war on bad data. [Check out my webcast Difference Cubes: Making Testing Fast & Easy for more useful tips.]

Share
Posted in Essbase, Tips | Leave a comment

Managing Dimension Table Integrity with SQL

When working in EIS or Essbase Studio, maintaining integrity between your fact table and dimension tables is important for maintaining financial data integrity.  While both programs have a way for you to build dimension members to a miscellaneous hierarchy in your cube, you can check to see what will “fall out” by comparing your fact table to your dimension tables in your data mart.  Here is code that you can use in SQL Server to accomplish this:

select distinct Account_ID from Fact_Table a
where not exists
(
select distinct Account_ID from Accounts_Dim_Table b
where a.Account_ID = b.Account_ID
)

With Account_ID as your key column, this code looks at the fact table and pulls out the individual accounts that are not found in the dimension table.  You can use this code for any dimension and even roll it into a stored procedure that checks all your dimensions at once and reports the results back to you.  This way, you can see what your dimension tables are missing before building your cube.

Share
Posted in Tips | Leave a comment

Maximizing Version 11- What’s New with Partitions

BSO cubes could be the target of partitions way back in version 7. But now ASO cubes can be targets too. In this webcast we will explore and demonstrate some new partitioning features including ASO as the target.

Date: Thursday, April 14, 2011

Time: 12:00 pm EST

Duration: 1 hour

Host: Ron Moore, MTG

Register

Share
Posted in Event, Free Essbase Webinar | Leave a comment

Webinar: Maximizing Version 11 – New Calculations in Essbase 11

We will explore and demonstrate several new calculation features introduced in Version 11 including @XWRITE, calculations with dates, typed measures and varying attributes. You can also create custom error messages with @Return.

Date: Tuesday, March 22, 2011

Time: 12:00 pm EST

Duration: 1 hour

Host: Ron Moore, MTG

Register

Share
Posted in Event, Free Essbase Webinar | Leave a comment

Webinar: Partition Basics

Partitions allow Essbase cubes to exchange data so you can view data or even calculate across cubes. This opens up many optimization and design options. In this webcast we will discuss the three types of partitions and consider when to use each. We will discuss strengths and limitations of partitions designs and we will demonstrate a couple of simple partitions. We will also talk about partitioning cubes with different outlines including different dimensions.

Date: Thursday, March 17, 2011

Time: 12:00 pm EST

Duration: 1 hour

Host: Ron Moore, MTG

Register

Share
Posted in Event, Free Essbase Webinar | Leave a comment

Quick and Dirty Dimensional Modeling Using Excel Pivot Tables

One of the first decisions that confronts Essbase developers is deciding what categories or columns can be modeled as hierarchies and which must be modeled as two separate dimensions.  Dimensions are many-to-many relationships.  Hierarchies are many-to-one relationships.  Modeling a hierarchy as two separate dimensions will produce a larger slower Essbase database.  Attempting to model two separate dimensions as a hierarchy will generally result in rejected records during the dimension build.

While the following method has limitations it can be a quick and relatively easy way to analyze a data file or metadata file to identify the relationship between any two columns and therefore what can be modeled as hierarchies in what must be modeled as separate dimensions.

The following example analyzes only two columns.  A complete analysis would include each pair of columns in data files and metadata files to detect many to many relationships at upper levels in addition to the level 0 columns analyzed here.

  • Open the data file in Excel.
  •  If the data file is too large, load as much as is practical.  Using a partial data file you can’t prove that two columns can be modeled as a hierarchy because the many-to-many relationships may not show up in the part of the file you’ve analyzed.  However there’s a good chance they will, so you might be able to prove that the columns must be modeled as two separate dimensions.

 

  • From the menu choose: Data | Pivot Table and Pivot Chart Report | Finish.
  • An empty pivot table is created on the worksheet

 

  • Decide which columns to test.
  • From the pivot table field list drag one of the columns to the Row Fields area.
  • Drag the other column to column Fields area.
  • Drag the data column to the data items area.
  • Note: Use a total of all the data columns for complete analysis in one step.
  • In the example at right you can immediately see the many-to-many relationship.  That is, there are many salespeople per region and many regions per salesperson.

  • If the pivot table is large it is convenient to add count formulas along the bottom (row 51) and right side (column x) of the pivot table, and maximum formulas to summarize the count formulas (W51 and X51).
  • If both maxima are greater than one, a many-to-many relationship exists. 

 

Here is what the formulae should look like.

Share
Posted in Data Warehousing, Dimensional Modeling, Essbase | Tagged , | Leave a comment