Aggregate Functions

By Ahmed Farahat - May 04, 2015

Aggregate Functions

Aggregate functions are functions that take a collection (a set or multiset) of values as input and return a single value. SQL offers ﬁve built-in aggregate functions:

• Average: avg

• Minimum: min

• Maximum: max

• Total: sum

• Count: count

The input to sum and avg must be a collection of numbers, but the other operators can operate on collections of nonnumeric data types, such as strings, as well.

As an illustration, consider the query “Find the average account balance at the Perryridge branch.” We write this query as follows:

select avg (balance)

from account

where branch-name = ’Perryridge’

The result of this query is a relation with a single attribute, containing a single tuple with a numerical value corresponding to the average balance at the Perryridge branch. Optionally, we can give a name to the attribute of the result relation by using the as clause.

There are circumstances where we would like to apply the aggregate function not only to a single set of tuples, but also to a group of sets of tuples; we specify this wish in SQL using the group by clause. The attribute or attributes given in the group by clause are used to form groups. Tuples with the same value on all attributes in the group by clause are placed in one group.

As an illustration, consider the query “Find the average account balance at each

branch.” We write this query as follows:

select branch-name, avg (balance)

from account

group by branch-name

Retaining duplicates is important in computing an average. Suppose that the ac- count balances at the (small) Brighton branch are $1000, $3000, $2000, and $1000. The average balance is $7000/4 = $1750.00. If duplicates were eliminated, we would ob- tain the wrong answer ($6000/3 = $2000).

There are cases where we must eliminate duplicates before computing an aggregate function. If we do want to eliminate duplicates, we use the keyword distinct in the aggregate expression. An example arises in the query “Find the number of depositors for each branch.” In this case, a depositor counts only once, regardless of the number of accounts that depositor may have. We write this query as follows:

select branch-name, count (distinct customer-name)

from depositor, account

where depositor.account-number = account.account-number

group by branch-name

At times, it is useful to state a condition that applies to groups rather than to tuples. For example, we might be interested in only those branches where the average account balance is more than $1200. This condition does not apply to a single tuple; rather, it applies to each group constructed by the group by clause. To express such a query, we use the having clause of SQL. SQL applies predicates in the having clause after groups have been formed, so aggregate functions may be used. We express this query in SQL as follows:

select branch-name, avg (balance)

from account

group by branch-name

having avg (balance) > 1200

At times, we wish to treat the entire relation as a single group. In such cases, we do not use a group by clause. Consider the query “Find the average balance for all accounts.” We write this query as follows:

select avg (balance)

from account

We use the aggregate function count frequently to count the number of tuples in a relation. The notation for this function in SQL is count (*). Thus, to ﬁnd the number of tuples in the customer relation, we write

select count (*)

from customer

SQL does not allow the use of distinct with count(*). It is legal to use distinct with max and min, even though the result does not change. We can use the keyword all in place of distinct to specify duplicate retention, but, since all is the default, there is no need to do so.

If a where clause and a having clause appear in the same query, SQL applies the predicate in the where clause ﬁrst. Tuples satisfying the where predicate are then placed into groups by the group by clause. SQL then applies the having clause, if it is present, to each group; it removes the groups that do not satisfy the having clause predicate. The select clause uses the remaining groups to generate tuples of the result of the query.

To illustrate the use of both a having clause and a where clause in the same query, we consider the query “Find the average balance for each customer who lives in Harrison and has at least three accounts.”

select depositor.customer-name, avg (balance)

from depositor, account, customer

where depositor.account-number = account.account-number and

depositor.customer-name = customer.customer-name and

customer-city = ’Harrison’

group by depositor.customer-name

having count (distinct depositor.account-number) >= 3

Search This Blog

Database Management System course

Aggregate Functions

Aggregate Functions

select count (*)

Comments

Post a Comment

Popular posts from this blog

Database System Architectures:Parallel Systems.

Summary of Query Optimization

Summary of Distributed Databases