At the beginning of the summer, when I first started my internship at Marketing Technologies Group (MTG), I knew next to nothing about Essbase. Then, after taking part in the Essbase Bootcamp, I was assigned the task of building an Essbase cube for a topic of my choice. The possibilities seemed endless, and even after I made the somewhat morbid decision to create a database of mortality statistics, I still faced a lot of uncertainty. The gap between raw data and a useful and effective Essbase cube is huge. According to MTG’s standard methodology, I turned to MTG’s Need’s Analysis Checklist, which provides an objective-based, step-by-step planning process. With its help, I was able to better understand the project and design requirements so that I could ultimately build a more effective cube.
In this blog, I will walk you through the Needs Analysis Checklist as I approached it in working on my mortality database.
Users
The first step is to think about who is going to use the database. Are there a few users or many? What type of experience do they have? To what departments do they belong and what are their job functions? A successful project is one whose users find it to be useful, so to make sure the project is successful, you must first understand your users.
In my case, I was my own user. To make the project more interesting, however, I pretended that I worked for the World Health Organization(WHO) or was in some other position that gave me the ability to guide decisions about health initiatives.
Business Objectives
The second step is to identify the users’ business objectives. What questions do they want the cube to answer? How will the cube be used to make decisions? To what improvements will it lead? The answers to these questions have great significance. They guide the design of the cube and provide benchmarks on which to measure the project’s success. If you find yourself struggling to answer these questions, the cube may not be worth building.
I considered what my WHO alter ego wanted to gain from this project and came up with the following.
- The cube should answer the following questions:
- What causes of death are most prevalent for particular regions, genders, and age groups?
- What genders and age groups are most at risk in particular regions?
- How do the death rates of various regions compare?
- Where are the death rates from easily preventable diseases still high?
- What are the death rate trends? Which death rates are going up over time and which are going down?
- The cube will be used to make decisions as follows:
- Information about the number of deaths due to various preventable diseases will inform decisions about how to allocate funds to various health initiatives. Variance statistics among regions will highlight diseases that are much more deadly in certain regions of the world. These diseases and regions will be likely candidates for effective programs.
- Statistical comparisons among regions, age groups, and genders will help make decisions about which populations various programs should target.
- Year-to-year comparisons will provide insight about the success of existing programs and will be used to make decisions about whether to continue, change, or eliminate these programs.
- The cube will lead to the following improvements:
- More targeted, effective, and efficient spending on health initiatives
Metrics
The next step is to decide what measurements you will need to meet your business objectives. In many cases, these are financial measurements. In my case I decided on the following:
- Deaths in thousands (The format of the data I would be loading)
- Deaths (A more intuitive way to look at the raw numbers)
- Crude Death Rate (For comparison among regions, genders, age groups, and years with different populations. Crude Death Rate is the number of deaths per thousand people.)
- Population (Used to calculate Crude Death Rate)
- Percentage of deaths due to given cause (To illustrate the relative impact of each cause of death; also useful for comparisons among populations)
- Percent Variance (To show how the crude death rate deviates from the norm and identify candidates for health initiatives by highlighting abnormally high death rates)
Dimension Summary
The best place to start is with the business objectives. What other dimensions are necessary in order to complete the desired analysis?
After deciding what dimensions you need, consider the hierarchy structure of each. At this stage, the Oracle Essbase Bootcamp suggests making label outlines for each dimension. Label outlines consist of the dimension name followed by two parallel hierarchies. The first hierarchy lists a generic name for each generation while the second lists a real-world example for each generation.
Based on my business objectives, I knew that I wanted to make comparisons across years, ages, genders, regions, and causes, so I decided to make each of these categories a dimension. The next step was to map out the hierarchies. All Years, All Ages, and Both Genders were easy because they had very few members. For All Causes, I used the hierarchy that already existed in my data source. For All Regions, however, my data source just had a list of a couple hundred countries. I knew that I wanted to make larger scale comparisons among parts of the world, so I decided to organize the countries into regions and sub-regions. I created my label outlines and organized them in the table below.
After you map out your hierarchies, make a note of how large you expect each dimension to be. You may also want to make a tentative decision about sparse and dense storage configuration. You’ll need to load your data before you’ll be able to make a final decision, but setting the Accounts and Time dimensions (Measures and All Years in my case) to dense and the other dimensions to sparse is usually a good place to start.
Output
After you create the cube, how will users access the data? What views will provide them with the information they need? In answering these questions, you’ll want to refer once again to the business objectives. Consider what reports each user will need and what layout of page, column, and row dimensions/members will make the information easiest to understand. Also, consider the formatting of the reports. For instance, will you create tables or graphs? Tables provide a granular look at exact figures, but graphs may be a better way to quickly understand the “big picture.”
Information about output requirements can be organized in a table. The table below lists only a few of the reports that I decided to make, but it should provide an idea about how to approach this step in the planning process.
Input
After you develop plans to organize and use the data, it’s time to consider the input data. What are the sources, formats, and sizes of your input files? How frequently are input files created? In terms of the dimensions that you have outlined, how is your data organized? As with the information about output, this information can be organized in a chart. Information about my three sources is organized below.
It’s also important to compare and contrast your input sources. In particular, can you use one load rule for all your input files or do different sources need different rules? The formatting of my three files was comparable. The 2008 file was different in that it provided data for males and females on the same worksheet. However, the 2004 filed could easily be combined into a file with the same format as the 2008 file.
Another point to consider is whether the data in the input files will be loaded at level 0. The best practice is to load data at level 0. However, in my case, it was not possible to do so. My population data had different dimensionality than my death data; while the population data was specific to each year, age, gender, and region, it was not specific to a cause. Thus, I had to load my population data at generation one of my All Causes dimension. Normalization calculations are generally needed when data is loaded high. In the next section, I will discuss the normalization calculations that my database required.
Transformations
The transformations section asks you to consider all the needed calculations. These include calculations to be set in the outline, like aggregation properties and member formulas, as well as calculations to be included in the calc script. Consider errors that would occur if you ran the default calc script. How can you correct these errors?
Normalization Calcs
Normalization calcs are needed when data is loaded high. If you are going to load data high, work through the logic of the necessary corrections.
As aforementioned, I loaded population data to generation one of All Causes. In order to calculate Crude Death Rate for cause categories and individual causes at lower levels of the hierarchy, I needed my calc script to copy the population statistics from generation one down to the other generations of All Causes. I then needed to prevent further aggregation across All Causes as such aggregation would incorrectly add the population data. However, I did need to aggregate All Causes at some point because I needed to aggregate the Deaths in Thousands data. To resolve this problem, I decided to turn off AggMissg, aggregate the All Causes dimension, and then perform my normalization calc.
Essbase Outline Calcs
When planning the outline calculations, consider each dimension. Where possible, add dynamic calcs to reduce database size.
I began with my Measures dimension. First, I considered aggregation. Since it made no sense to aggregate the various measures, I set their consolidation property to Ignore (~). I also set Measures to Label Only. I then focused on my four calculated measures and decided that each could be dynamically calculated by a formula. I planned out the formulas as below.
- Deaths = Deaths in thousands * 1000
- Crude Death Rate = Deaths/Population * 1000
- Percentage of deaths = Deaths % Deaths -> All Causes
- Percent Variance= %Var(Crude Death Rate, Crude Death Rate->All Regions)
Next, I considered my other five dimensions. Except for the issue with aggregating population as discussed in the Normalization section, each dimension had a very standard aggregation. I set all the members of these five dimensions to aggregate by addition.
Procedures and Hardware/Software Environment
After creating a cube, you’ll need to provide your users with information about how to use it. Provide instructions on loading data, calculating, updating the outline, and producing reports. Also consider providing information about the required hardware and software.
Conclusion
After completing all these steps, you should better understand your cube’s purpose and the process you will follow to make your data serve that purpose. While completing the Needs Analysis Checklist requires time and thought, it prevents costly errors, enables a more efficient build process, and produces a more useful result. It helped me to successfully create my first database, and it helps consultants at MTG to create cubes for clients. Working through the Needs Analysis Checklist is well worth the time.















