Pages

Monday, March 18, 2013

Data Warehouse Real time interview questions with answers


1. What is Data warehouse?
 
In 1980, Bill Inmon known as father of data warehousing. "A Data warehouse is a subject oriented, integrated ,time variant, non volatile collection of data in support of management's decision making process".
  1. Subject oriented : means that the data addresses a specific subject such as sales, inventory etc. 
  2. Integrated : means that the data is obtained from a variety of sources.
  3. Time variant : implies that the data is stored in such a way that when some data is changed.
  4. Non volatile : implies that data is never removed. i.e., historical data is also kept.

2. What is the difference between database and data warehouse?

A database is a collection of related data. 
A data warehouse is also a collection of information as well as a supporting system.

3. What are the benefits of data warehousing? 
  1. Historical information for comparative and competitive analysis.
  2. Enhanced data quality and completeness.
  3. Supplementing disaster recovery plans with another data back up source.

4. What are the types of data warehouse?
 
There are mainly three type of Data Warehouse are :
  • Enterprise Data Warehouse
  • Operational data store
  • Data Mart

5. What is the difference between data mining and data warehousing?

Data mining, the operational data is analyzed using statistical techniques and clustering techniques to find the hidden patterns and trends. So, the data mines do some kind of summarization of the data and can be used by data warehouses for faster analytical processing for business intelligence. Data warehouse may make use of a data mine for analytical processing of the data in a faster way.

6. What are the applications of data warehouse? 
  1. Datawarehouse are used extensively in banking and financial services, consumer goods.
  2. Datawarehouse is mainly used for generating reports and answering predefined queries.
  3. Datawarehouse is used for strategic purposes, performing multidimensional analysis.
  4. Datawarehouse is used for knowledge discovery and strategic decision making using data mining tools. 

7. What are the types of datawarehouse applications? 
  1. Info processing
  2. Analytical processing
  3. Data mining 

8. What is metadata?
 
Metadata is defined as the data about data. Metadata describes the entity and attributes description.

9. What are the benefits of Datawarehousing?
 
The implementation of a data warehouse can provide many benefits to an organization.
A data warehouse can :
  • Facilitate integration in an environment characterized by un–integrated applications.
  • Integrate enterprise data across a variety of functions.
  • Integrate external as well as internal data.
  • Support strategic and long–term business planning.
  • Support day–to–day tactical decisions.
  • Enable insight into business trends and business opportunities.
  • Organize and store historical data needed for analysis.
  • Make available historical data, extending over many years, which enables trend analysis.
  • Provide more accurate and complete information.
  • Improve knowledge about the business.
  • Enable cost–effective decision making.
  • Enable organizations to understand their customers, and their needs, as well competitors.
  • Enhance customer servicc and satisfaction.
  • Provide competitive advantage.
  • Provide easy access for end–users.
  • Provide timely access to corporate information.

10. What is the difference between dimensional table and fact table?
 
A dimension table consists of tuples of attributes of the dimension. A fact table can be thought of as having tuples, one per a recorded fact. This fact contains some measured or observed variables and identifies them with pointers to dimension tables.

11. What is data mart?
 
A data mart can contain one fact table to address one subject. In such a case,when a number of data marts are integrated to create a data warehouse, it is important that the facts in each table mean the same thing. Such facts (i.e., measures or metrics) that have the same meaning in different data marts are called conformed facts.

12. What are the different types of datasmarts? 
  1. Stand–alone Data Marts : Data marts that do not interact with other data marts are called stand–alone data marts. On the other hand, data marts can be integrated to create a data warehouse.
  2. Multi–source Data Mart : A data mart for which the input data is obtained from multiple sources is called a multi–source data mart.
  3. Personal Data Mart : A data mart for use by individuals such as Chief Executive Officer (CEO), Chief Technology Officer (CTO) or Chief Financial Officer (CFO) is called Personal Data Mart.
  4. Operational Data Store : ODS is a database system that obtains data from different sources, consolidates it and keeps it at a single location. However, it does not have the sophistication of a data mart or a data warehouse for analytical processing. ODS is generally developed as a pre-cursor for ERP systems.

13. What is fact table?

A fact table is the central table that contains the measures or facts of a business process.

14. What is the mapplet?

Mapplet is a set of transformations that you build in the mapplet designer and you can use in multiple mappings.

15. What is meant by grain?

Granularity of data is a very important factor in the design of fact table. In fact table, we can represent data at different atomic levels, called grains.

16. What is difference between view and materialized view?
 
Views contains query whenever execute views it has read from base table.
A view which is physically stored in a database. Materialized views allow query performance to be enhanced by precalculating costly operations.

17. What is surrogate key?

A system generated artificial primary key that is not derived from any data in the database. It is similar to the object identifier in an object oriented system.

18. What is a folder?

Folder contains repository objects such as sources, targets, mappings, transformation which are helps logically organize our data warehouse.

 19. What is pivot?
 
The pivot (or rotate) operation rotates the axes of a cube to provide an alternative presentation of the data.

20. What is the difference between mapping and session? 
  • Maping : It is a set of source and target definitions linked by transformation objects that define the rules for transformation.
  • Session : It is a set of instructions that describe how and when to move data from source to targets.

21. What is meant bt ETL?
 
The overall data acquisition process, called ETL (extraction, transformation,and loading), is generally grouped into three main components :
  • Extraction : Involves obtaining the required data from the various sources.
  • Transformation : Source data undergoes a number of operations that prepare it for import into the data warehouse (target database). To perform this task, integration and transformation programs are used which can reformat, recalculate, modify structure and data elements, and add lime elements. They can also perform calculations, summarization, de-normal-ization, etc.
  • Loading : Involves physically placing extracted and transformed data in the target database. The initial loading involves a massive data import into the data warehouse. Subsequently, an extraction procedure periodically loads fresh data based on business rules and a pre–determined frequency.

22. What are the types of dimensional schema?
 
There are two types of dimensional schema :
  • Star schema
  • Snowflake schema

23. What is star schema?
 
Star schema, there is only one central fact table, and a set of dimension tables, one for each dimension. In star schema, each dimension is represented by only one table, and each table contains a set of attributes.


24. What is snowflake schema?
 
A snowflake schema avoids the redundancy of star schemas by normalizing the dimension tables. Therefore, a dimension is represented by several tables related by referential integrity constraints.


25. What is meant by starflake schema?
 
A starflake schema is a combination of the star and the snowflake schemas where some dimensions are normalized while others are not.

26. What is Operational Data Store?
 
Operational Data Store (ODS) is a hybrid data architecture to cover the requirements for both analytical and operational tasks.

27. What is the differenece between star schema and snowflake schema?
 
The star schema consists of a fact table with a single table for each dimension. The snowflake schema is a variation on the star schema in which the dimensional tables from a star schema are organized into a hierarchy by normalizing them. A fact constellation is a set of fact tables that share some dimension tables.

28. What is data staging?
 
Data staging is the process of transferring the data from the data sources (operational systems) into the target database of the data warehouse.

29. What is a session?
 
A session is a set of instructions that describes how and when to move data from sources to targets.

30. What is mapping?
 
A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation.

31. What is Datadriven?
 
The informatica server follows instructions coded into update strategy transformations with in the session maping determine how to flag records for insert,update,delete or reject. If u do not choose data driven option setting, the informatica server ignores all update strategy transformations in the mapping.

32. What are the three major types of metadata in a data warehouse?
 
Metadata in a data warehouse fall into three major categories :
  • Operational Metadata
  • Extraction and Transformation Metadata
  • End–User Metadata

33. What is OLAP?
 
Allow users to run complex dimensional queries.
Enable users to generate canned queries.
Two categories of online analytical processing are multidimensional online analytical processing (MOLAP) and relational online analytical processing (ROLAP).

 34. What is meant by geographic information system(GIS)?
 
A software system that allows users to define, create, maintain, and control access to a geographic database.

35. What is dimension table?
 
A relational table that contains dimension data.

36. What is the diffrence between OLTP and OLAP?
 
The main differences between OLTP and OLAP are:
  • OLTP systems are for doing clerical/operational processing of data whereas OLAP systems are for carrying out analytical processing of the data.
  • OLTP systems look at data in one dimension; whereas in OLAP systems, data can be viewed in different dimensions and hence interesting business intelligence can be extracted from the data.
  • Operational personnel of an organization use the OLTP systems whereas management uses OLAP systems, though operational personnel may also use portions of OLAP system.
  • OLTP systems contain the current data as well as the details of the transactions. OLAP systems contain historical data, and also data in summarized form.
  • OLTP database size is smaller as compared to OLAP systems. If the OLTP database occupies Gigabytes (GB) of storage space, OLAP database occupies Terabytes (TB) of storage space.

37. What is DTM?
 
DTM transform data received from reader buffer and its moves transformation to transformation on row by row basis and it uses transformation caches when necessary.

38. What is meant by spatial data warehouse?
 
A data warehouse that manipulates spatial data, thus allowing spatial analysis. This is to be contrasted with conventional and temporal data warehouses.

39. What is a Batch?
 
Batches provide a way to group sessions for either serial or parallel execution by the Informatica Server.

40. What are the types of batch?
 
There are two types of batches are :
  • Sequential batch : Runs sessions one after the other.
  • Concurrent batch : Runs sessions at the same time.

41. What is Rolap?
 
Relational OLAP (ROLAP) servers store data in relational databases and support extensions to SQL and special access methods to efficiently implement the multidimensional data model and the related operations.

42. What is Molap?
 
Multidimensional OLAP (MOLAP) servers directly store multidimensional data in special data structures (for instance, arrays) and implement the OLAP operations over those data structures. While MOLAP systems offer less storage capacity than ROLAP systems, MOLAP systems provide better performance when multidimensional data is queried or aggregated.

43. What is Holap?
 
Hybrid OLAP (HOLAP) servers combine both technologies, benefiting from the storage capacity of ROLAP and the processing capabilities of MOLAP.

44. Describe Informatica Architecture?
 
Informatica consist of client and server. Client tools such as Repository manager, Designer, Server manager. Repository data base contains metadata it read by informatica server used read data from source, transforming and loading into target.

45. What is fact?
 
A central component of a multidimensional model that contains the measures to be analyzed. Facts are related to dimensions.

46. What are the types of dimensional table?
 
There are three types of dimensional table are :
  • Conformed dimensional table
  • Junk dimensional table
  • Degenerate dimensional model

47. What are the types of fact table?
 
There are three types of fact table are :
  • Additive fact
  • Semi additive fact
  • Non additive fact

48. What are the types of data mart?
 
There are two types of data mart are :
  • Dependent data mart
  • Independent data mart

 49. What is data source view?
 
A Data source view (DSV) allows one to define the relational schema that will be used in the analysis services database. This schema is derived from the schemas of the various data sources. Cubes and dimensions are created from data source views rather than directly from data source objects.

50. What are the different threads in DTM process? 
  • Master thread
  • Maping thread
  • Pre and post session threads
  • Reader thread
  • Writer thread
  • Transformation thread

No comments:

Post a Comment