Summary of Application Development and Administration

Summary

• Decision-support systems analyze online data collected by transaction- processing systems, to help people make business decisions. Since most organizations are extensively computerized today, a very large body of information is available for decision support. Decision-support systems come in various forms, including OLAP systems and data mining systems.

• Online analytical processing (OLAP) tools help analysts view data summarized in different ways, so that they can gain insight into the functioning of an organization.

OLAP tools work on multidimensional data, characterized by dimension attributes and measure attributes.

The data cube consists of multidimensional data summarized in different ways. Precomputing the data cube helps speed up queries on summaries of data.

Cross-tab displays permit users to view two dimensions of multidimensional data at a time, along with summaries of the data.

Drill down, rollup, slicing, and dicing are among the operations that users perform with OLAP tools.

• The OLAP component of the SQL:1999 standard provides a variety of new functionality for data analysis, including new aggregate functions, cube and rollup operations, ranking functions, windowing functions, which support summarization on moving windows, and partitioning, with windowing and ranking applied inside each partition.

• Data mining is the process of semiautomatically analyzing large databases to find useful patterns. There are a number of applications of data mining, such as prediction of values based on past examples, finding of associations between purchases, and automatic clustering of people and movies.

• Classification deals with predicting the class of test instances, by using at- tributes of the test instances, based on attributes of training instances, and the actual class of training instances. Classification can be used, for instance, to predict credit-worthiness levels of new applicants, or to predict the performance of applicants to a university.

There are several types of classifiers, such as Decision-tree classifiers. These perform classification by constructing a tree based on training instances with leaves having class labels. The tree is traversed for each test instance to find a leaf, and the class of the leaf is the predicted class.

Several techniques are available to construct decision trees, most of them based on greedy heuristics.

Bayesian classifiers are simpler to construct than decision-tree classifiers, and work better in the case of missing/null attribute values.

• Association rules identify items that co-occur frequently, for instance, items that tend to be bought by the same customer. Correlations look for deviations from expected levels of association.

• Other types of data mining include clustering, text mining, and data visualization.

• Data warehouses help gather and archive important operational data. Ware- houses are used for decision support and analysis on historical data, for in- stance to predict trends. Data cleansing from input data sources is often a major task in data warehousing. Warehouse schemas tend to be multidimensional, involving one or a few very large fact tables and several much smaller dimension tables.

• Information retrieval systems are used to store and query textual data such as documents. They use a simpler data model than do database systems, but provide more powerful querying capabilities within the restricted model.

Queries attempt to locate documents that are of interest by specifying, for example, sets of keywords. The query that a user has in mind usually cannot be stated precisely; hence, information-retrieval systems order answers on the basis of potential relevance.

• Relevance ranking makes use of several types of information, such as: Term frequency: how important each term is to each document.

Inverse document frequency.

Site popularity. Page rank and hub/authority rank are two ways to assign importance to sites on the basis of links to the site.

• Similarity of documents is used to retrieve documents similar to an example

document. Synonyms and homonyms complicate the task of information retrieval.

• Precision and recall are two measures of the effectiveness of an information retrieval system.

• Directory structures are used to classify documents with other similar documents.

Comments

Popular posts from this blog

XML Document Schema

Extended Relational-Algebra Operations.

Distributed Databases:Concurrency Control in Distributed Databases