Think about a scenario where an organization is using data and text mining techniques against a large multidimensional data warehouse
1 – Multidimensional & Mined Data
The area of business intelligence is dominated by the idea of multidimensional databases (often referred to as data warehouses) that can be made arbitrarily large and highly detailed. Data from many sources can be brought together into consolidated databases to encompass most of the enterprise. It is often said that these dimensional structures can be sliced and diced in a large variety of ways, making it possible to aggregate or drill-down the data to analyze exactly what is really going on in an enterprise: hence calling it business intelligence.
The area of data mining involves looking into (usually very large) data structures to discern patterns or relationships that might not be obvious, or that couldn’t be seen through traditional querying of the data. This kind of analysis is sometimes described as “discovery-driven.” Text mining is a special case of data mining where the analysis looks into text data that might have otherwise been treated as single blobs of undifferentiated data.
Think about a scenario where an organization is using data and text mining techniques against a large multidimensional data warehouse. What is an example of something that they might learn that would be much less likely to be learned in traditional business intelligence activities without these technologies? How do you think the dimensionalization of the data in a data warehouse helps or hinders these opportunities.
Response Guideline
Post your very concise 200-300 word initial response early in the week, and then reply to at least two initial responses of your peers. In your responses, use your own observations to suggest or ask about how one might see the dimensionality of the data or the mining techniques involved as supporting their example. Also respond appropriately to anyone who posts comments or questions against your own postings. Keep all postings this week short and to the point.
2 – Tailoring to Your Company/Industry
This unit offers readings and a presentation about the use of data warehouse and business intelligence technologies in the healthcare sector. Some of this material is very specific to healthcare, while a lot of it is generic enough that similar applications in other sectors would do pretty much the same things. Consider how the specific aspects would have to change if applied to some other sector in which you might work — such as the one you are looking at for your semester project. The dimensions of the warehouse model would presumably need to be different to support that sector, and a discussion of source data systems would have to change as well.
Discuss ways in which you are tailoring the warehousing design described in these materials to fit your chosen industry in your semester project. What dimensions are you adding or subtracting from the ones shown in the materials? Describe how combinations of these resulting dimensions enable you to think about providing support for each of the subject areas you’ve defined for your project? In particular, which aspects of your subject areas include data that would need the largest number of dimensions in order to be adequately represented (i.e., your highest dimensional facts!). Also discuss how the level of detail for data in your warehouse will be different for data inside your company versus data from other companies in your chosen sector.
Response Guideline
Post your very concise 200-300 word initial response early in the week, and then reply to at least two initial responses of your peers. In your responses, use your own observations to suggest or ask about how one might see the dimensionality of the data or the mining techniques involved as supporting their example. Also respond appropriately to anyone who posts comments or questions against your own postings. Keep all postings this week short and to the point.
3 – Controls in the Warehouse
The Biehl reading in this unit describes several controls in a data warehouse design. In particular, consider these three a) Unexpected/Undesired Values, b) Orphan Tracking, and c) Surrogate Merges Each of these controls helps ensure that the data in the warehouse will be correct and usable for its intended purposes.
For ONE of these three controls, describe a hypothetical situation in which the condition being mitigated might occur, and the impact the situation might have if uncontrolled by the system.
Response Guideline
Post your response of 1-3 paragraphs (about 200-300 words) early in the week, and then reply to at least two initial responses of your peers, particularly focusing on responses that might differ from your own. Also respond appropriately to anyone who posts questions against your own postings.
Answer preview for Think about a scenario where an organization is using data and text mining techniques against a large multidimensional data warehouse
962 Words