Aggregate Functions
Aggregate Functions
Aggregate functions are functions that take a collection (a set or multiset) of values as input and return a single value. SQL offers five built-in aggregate functions:
• Average: avg
• Minimum: min
• Maximum: max
• Total: sum
• Count: count
The input to sum and avg must be a collection of numbers, but the other operators can operate on collections of nonnumeric data types, such as strings, as well.
As an illustration, consider the query “Find the average account balance at the Perryridge branch.” We write this query as follows:
select avg (balance)
from account
where branch-name = ’Perryridge’
The result of this query is a relation with a single attribute, containing a single tuple with a numerical value corresponding to the average balance at the Perryridge branch. Optionally, we can give a name to the attribute of the result relation by using the as clause.
There are circumstances where we would like to apply the aggregate function not only to a single set of tuples, but also to a group of sets of tuples; we specify this wish in SQL using the group by clause. The attribute or attributes given in the group by clause are used to form groups. Tuples with the same value on all attributes in the group by clause are placed in one group.
As an illustration, consider the query “Find the average account balance at each
branch.” We write this query as follows:
select branch-name, avg (balance)
from account
group by branch-name
Retaining duplicates is important in computing an average. Suppose that the ac- count balances at the (small) Brighton branch are $1000, $3000, $2000, and $1000. The average balance is $7000/4 = $1750.00. If duplicates were eliminated, we would ob- tain the wrong answer ($6000/3 = $2000).
There are cases where we must eliminate duplicates before computing an aggregate function. If we do want to eliminate duplicates, we use the keyword distinct in the aggregate expression. An example arises in the query “Find the number of depositors for each branch.” In this case, a depositor counts only once, regardless of the number of accounts that depositor may have. We write this query as follows:
select branch-name, count (distinct customer-name)
from depositor, account
where depositor.account-number = account.account-number
group by branch-name
At times, it is useful to state a condition that applies to groups rather than to tuples. For example, we might be interested in only those branches where the average account balance is more than $1200. This condition does not apply to a single tuple; rather, it applies to each group constructed by the group by clause. To express such a query, we use the having clause of SQL. SQL applies predicates in the having clause after groups have been formed, so aggregate functions may be used. We express this query in SQL as follows:
select branch-name, avg (balance)
from account
group by branch-name
having avg (balance) > 1200
At times, we wish to treat the entire relation as a single group. In such cases, we do not use a group by clause. Consider the query “Find the average balance for all accounts.” We write this query as follows:
select avg (balance)
from account
We use the aggregate function count frequently to count the number of tuples in a relation. The notation for this function in SQL is count (*). Thus, to find the number of tuples in the customer relation, we write
select count (*)
from customer
SQL does not allow the use of distinct with count(*). It is legal to use distinct with max and min, even though the result does not change. We can use the keyword all in place of distinct to specify duplicate retention, but, since all is the default, there is no need to do so.
If a where clause and a having clause appear in the same query, SQL applies the predicate in the where clause first. Tuples satisfying the where predicate are then placed into groups by the group by clause. SQL then applies the having clause, if it is present, to each group; it removes the groups that do not satisfy the having clause predicate. The select clause uses the remaining groups to generate tuples of the result of the query.
To illustrate the use of both a having clause and a where clause in the same query, we consider the query “Find the average balance for each customer who lives in Harrison and has at least three accounts.”
select depositor.customer-name, avg (balance)
from depositor, account, customer
where depositor.account-number = account.account-number and
depositor.customer-name = customer.customer-name and
customer-city = ’Harrison’
group by depositor.customer-name
having count (distinct depositor.account-number) >= 3
Comments
Post a Comment