Category Archives: business analytics

Data Warehousing Interview Questions- Part3

1. What is a Fact?

Ans. In data warehousing, a fact table consists of the measurements, metrics or facts of a business process.

It is located at the centre of a star schema or a snowflake schema  surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema.

A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level.

2. What are additive, semi-additive and non-additive measures(facts)?

Ans. Non-additive measures – Those measures which cannot be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.).

Usually these are any kind of ratios or percentages. For example the profit margin over the quarters does not simply roll-up to the year as a summation of the individual quarterly values but is recalculated at the parent level.

Another example can be the average height of the citizens of a particular city, while rolling up to the state level these citywise average values cannot be simply summed of and showed at the state level hence this measure is a non-additive measure.

Semi-additive measures – Those measures where only a subset of aggregation function can be applied. For example: Account Balance.

A sum() function on balance does not give a useful result but max() or min() balance might be useful.

Another example can be currency exchange rate. Sum is meaningless on rate; however, average function might be useful.

Additive measures – Those measures which can be used with any aggregation function like Sum(), Avg() etc. For example: Sales Quantity etc.


3. What is a FACTLESS FACT?

Ans. A factless fact table is a fact table that does not have any measures. It is essentially an intersection of dimensions. On the surface, a factless fact table does not make sense, since a fact table is, after all, about facts. However, there are situations where having this kind of relationship makes sense in data warehousing.

Eg: As an example, these can be used for modeling many-to-many relationships or capture events.

For example: Think about a record of student attendance in classes. In this case, the fact table would consist of 3 dimensions: the student dimension, the time dimension, and the class dimension. 

The only measure that you can possibly attach to each combination is “1” to show the presence of that particular combination. However, adding a fact that always shows 1 is redundant because we can simply use the COUNT function in SQL to answer the same questions.

Factless fact tables offer the most flexibility in data warehouse design. For example, one can easily answer the following questions with this factless fact table:

  • How many students attended a particular class on a particular day?
  • How many classes on average does a student attend on a given day?

Without using a factless fact table, we will need two separate fact tables to answer the above two questions. With the above factless fact table, it becomes the only fact table that’s needed.

4. What is Slicing and Dicing?

Ans. Slice and dice refers to a strategy for segmenting, viewing and understanding data in a database.

Slicing refers to filtering data. Dicing refers to grouping data.

Slicing is the act of dividing up the cube to extract this information for a given slice. It is important because it helps the user visualize and gather information specific to a dimension.

Dicing is similar to slicing but it works a little bit differently. When one thinks of slicing, filtering is done to focus on a particular attribute, dicing on the other hand is more a zoom feature that selects a subset over all the dimensions but for specific values of the dimension.

5. What is Business Intelligence?

Ans. Business intelligence usually refers to the information that is available for the enterprise to make decisions on. A data warehousing (or data mart) system is the backend, or the infrastructural, component for achieving business intelligence.

Business intelligence also includes the insight gained from doing data mining analysis, as well as unstructured data.