Friday, February 24, 2012

group by

GROUP BY Clause
Specifies the groups into which output rows are to be placed and, if
aggregate functions are included in the SELECT clause <select list>,
calculates a summary value for each group. When GROUP BY is specified,
either each column in any non-aggregate expression in the select list should
be included in the GROUP BY list, or the GROUP BY expression must match
exactly the select list expression.
I don't understand the implication of having to include all columns in any
non-aggregate expression in the select list.
For example (from the "Commerce" ASP.NET Starter Kit) :
CREATE Procedure CMRC_CustomerAlsoBought
(
@.ProductID int
)
As
/* We want to take the top 5 products contained in
the orders where someone has purchased the given Product */
SELECT TOP 5
CMRC_OrderDetails.ProductID,
CMRC_Products.ModelName,
SUM(CMRC_OrderDetails.Quantity) as TotalNum
FROM
CMRC_OrderDetails
INNER JOIN CMRC_Products ON CMRC_OrderDetails.ProductID =
CMRC_Products.ProductID
WHERE OrderID IN
(
/* This inner query should retrieve all orders that have contained the
productID */
SELECT DISTINCT OrderID
FROM CMRC_OrderDetails
WHERE ProductID = @.ProductID
)
AND CMRC_OrderDetails.ProductID != @.ProductID
GROUP BY CMRC_OrderDetails.ProductID, CMRC_Products.ModelName
ORDER BY TotalNum DESC
CREATE TABLE [dbo].[CMRC_OrderDetails] (
[OrderID] [int] NOT NULL ,
[ProductID] [int] NOT NULL ,
[Quantity] [int] NOT NULL ,
[UnitCost] [money] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CMRC_Products] (
[ProductID] [int] IDENTITY (1, 1) NOT NULL ,
[CategoryID] [int] NOT NULL ,
[ModelNumber] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ModelName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ProductImage] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[UnitCost] [money] NOT NULL ,
[Description] [nvarchar] (3800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[CMRC_OrderDetails] ADD
CONSTRAINT [PK_CMRC_OrderDetails] PRIMARY KEY NONCLUSTERED
(
[OrderID],
[ProductID]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[CMRC_Products] ADD
CONSTRAINT [PK_CMRC_Products] PRIMARY KEY NONCLUSTERED
(
[ProductID]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[CMRC_OrderDetails] ADD
CONSTRAINT [FK_OrderDetails_Orders] FOREIGN KEY
(
[OrderID]
) REFERENCES [dbo].[CMRC_Orders] (
[OrderID]
) NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[CMRC_Products] ADD
CONSTRAINT [FK_Products_Categories] FOREIGN KEY
(
[CategoryID]
) REFERENCES [dbo].[CMRC_Categories] (
[CategoryID]
)
GOHi John

> I don't understand the implication of having to include all columns in any
> non-aggregate expression in the select list.
>
It is a requirement to do this otherwise you will get the error message such
as
Server: Msg 8120, Level 16, State 1, Line 5
Column 'CMRC_OrderDetails.ProductID' is invalid in the select list because
it is not contained in either an aggregate function or the GROUP BY clause.
In your example as ProductID is the primary key for CMRC_Products it will
make no difference as all combinations of ProductID and ModelName are unique
.
HTH
John
"John Grandy" wrote:

> GROUP BY Clause
> Specifies the groups into which output rows are to be placed and, if
> aggregate functions are included in the SELECT clause <select list>,
> calculates a summary value for each group. When GROUP BY is specified,
> either each column in any non-aggregate expression in the select list shou
ld
> be included in the GROUP BY list, or the GROUP BY expression must match
> exactly the select list expression.
>
> I don't understand the implication of having to include all columns in any
> non-aggregate expression in the select list.
> For example (from the "Commerce" ASP.NET Starter Kit) :
>
> CREATE Procedure CMRC_CustomerAlsoBought
> (
> @.ProductID int
> )
> As
> /* We want to take the top 5 products contained in
> the orders where someone has purchased the given Product */
> SELECT TOP 5
> CMRC_OrderDetails.ProductID,
> CMRC_Products.ModelName,
> SUM(CMRC_OrderDetails.Quantity) as TotalNum
> FROM
> CMRC_OrderDetails
> INNER JOIN CMRC_Products ON CMRC_OrderDetails.ProductID =
> CMRC_Products.ProductID
> WHERE OrderID IN
> (
> /* This inner query should retrieve all orders that have contained the
> productID */
> SELECT DISTINCT OrderID
> FROM CMRC_OrderDetails
> WHERE ProductID = @.ProductID
> )
> AND CMRC_OrderDetails.ProductID != @.ProductID
> GROUP BY CMRC_OrderDetails.ProductID, CMRC_Products.ModelName
> ORDER BY TotalNum DESC
>
> CREATE TABLE [dbo].[CMRC_OrderDetails] (
> [OrderID] [int] NOT NULL ,
> [ProductID] [int] NOT NULL ,
> [Quantity] [int] NOT NULL ,
> [UnitCost] [money] NOT NULL
> ) ON [PRIMARY]
> GO
> CREATE TABLE [dbo].[CMRC_Products] (
> [ProductID] [int] IDENTITY (1, 1) NOT NULL ,
> [CategoryID] [int] NOT NULL ,
> [ModelNumber] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [ModelName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [ProductImage] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [UnitCost] [money] NOT NULL ,
> [Description] [nvarchar] (3800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
> ) ON [PRIMARY]
> GO
> ALTER TABLE [dbo].[CMRC_OrderDetails] ADD
> CONSTRAINT [PK_CMRC_OrderDetails] PRIMARY KEY NONCLUSTERED
> (
> [OrderID],
> [ProductID]
> ) ON [PRIMARY]
> GO
> ALTER TABLE [dbo].[CMRC_Products] ADD
> CONSTRAINT [PK_CMRC_Products] PRIMARY KEY NONCLUSTERED
> (
> [ProductID]
> ) ON [PRIMARY]
> GO
> ALTER TABLE [dbo].[CMRC_OrderDetails] ADD
> CONSTRAINT [FK_OrderDetails_Orders] FOREIGN KEY
> (
> [OrderID]
> ) REFERENCES [dbo].[CMRC_Orders] (
> [OrderID]
> ) NOT FOR REPLICATION
> GO
> ALTER TABLE [dbo].[CMRC_Products] ADD
> CONSTRAINT [FK_Products_Categories] FOREIGN KEY
> (
> [CategoryID]
> ) REFERENCES [dbo].[CMRC_Categories] (
> [CategoryID]
> )
> GO
>
>
>
>
>
>|||John,
A grouped query will not make sense if you don't follow this rule.
If you say
GROUP BY ProductID
you will retrieve one result row per ProductID. Suppose the table
you are querying contains 20 rows for ProductID number 123, and
additional columns Quantity and OrderID. If you ask for
select ProductID, OrderID, sum(Quantity)
group by ProductID
what do you want the query processor to do with the OrderID values
from the 20 ProductID=123 rows? The Quantity column will be summed,
but you can't return one result row (as the grouping requests) with 20
OrderID values - you either need to provide an aggregate to use on
the OrderID column or you need to change your mind and accept 20
rows in the result set by adding OrderID to the group by list.
Steve Kass
Drew University
John Grandy wrote:

>GROUP BY Clause
>Specifies the groups into which output rows are to be placed and, if
>aggregate functions are included in the SELECT clause <select list>,
>calculates a summary value for each group. When GROUP BY is specified,
>either each column in any non-aggregate expression in the select list shoul
d
>be included in the GROUP BY list, or the GROUP BY expression must match
>exactly the select list expression.
>
>I don't understand the implication of having to include all columns in any
>non-aggregate expression in the select list.
>For example (from the "Commerce" ASP.NET Starter Kit) :
>
>CREATE Procedure CMRC_CustomerAlsoBought
>(
> @.ProductID int
> )
>As
>/* We want to take the top 5 products contained in
> the orders where someone has purchased the given Product */
>SELECT TOP 5
> CMRC_OrderDetails.ProductID,
> CMRC_Products.ModelName,
> SUM(CMRC_OrderDetails.Quantity) as TotalNum
>FROM
> CMRC_OrderDetails
> INNER JOIN CMRC_Products ON CMRC_OrderDetails.ProductID =
>CMRC_Products.ProductID
>WHERE OrderID IN
>(
> /* This inner query should retrieve all orders that have contained the
>productID */
> SELECT DISTINCT OrderID
> FROM CMRC_OrderDetails
> WHERE ProductID = @.ProductID
> )
>AND CMRC_OrderDetails.ProductID != @.ProductID
>GROUP BY CMRC_OrderDetails.ProductID, CMRC_Products.ModelName
>ORDER BY TotalNum DESC
>
>CREATE TABLE [dbo].[CMRC_OrderDetails] (
> [OrderID] [int] NOT NULL ,
> [ProductID] [int] NOT NULL ,
> [Quantity] [int] NOT NULL ,
> [UnitCost] [money] NOT NULL
> ) ON [PRIMARY]
>GO
>CREATE TABLE [dbo].[CMRC_Products] (
> [ProductID] [int] IDENTITY (1, 1) NOT NULL ,
> [CategoryID] [int] NOT NULL ,
> [ModelNumber] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [ModelName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [ProductImage] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [UnitCost] [money] NOT NULL ,
> [Description] [nvarchar] (3800) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
> ) ON [PRIMARY]
>GO
>ALTER TABLE [dbo].[CMRC_OrderDetails] ADD
> CONSTRAINT [PK_CMRC_OrderDetails] PRIMARY KEY NONCLUSTERED
> (
> [OrderID],
> [ProductID]
> ) ON [PRIMARY]
>GO
>ALTER TABLE [dbo].[CMRC_Products] ADD
> CONSTRAINT [PK_CMRC_Products] PRIMARY KEY NONCLUSTERED
> (
> [ProductID]
> ) ON [PRIMARY]
>GO
>ALTER TABLE [dbo].[CMRC_OrderDetails] ADD
> CONSTRAINT [FK_OrderDetails_Orders] FOREIGN KEY
> (
> [OrderID]
> ) REFERENCES [dbo].[CMRC_Orders] (
> [OrderID]
> ) NOT FOR REPLICATION
>GO
>ALTER TABLE [dbo].[CMRC_Products] ADD
> CONSTRAINT [FK_Products_Categories] FOREIGN KEY
> (
> [CategoryID]
> ) REFERENCES [dbo].[CMRC_Categories] (
> [CategoryID]
> )
>GO
>
>
>
>
>
>
>|||I have posted a detailed description of how a SELECT statement works.
Look it up and you can see why your mental model is wrong.|||Hi Steve, and thanks for the response.
So, when specifying a GROUP BY clause, the only variable is the order in
which you list the column names. Every GROUP BY clause must provide the
query processor with instructions regarding how to order the rows in the
subgroups of any group. An therefore every non-aggregate column must be
included.
"Steve Kass" <skass@.drew.edu> wrote in message
news:%23v3PIq8KFHA.3064@.TK2MSFTNGP12.phx.gbl...
> John,
> A grouped query will not make sense if you don't follow this rule.
> If you say
> GROUP BY ProductID
> you will retrieve one result row per ProductID. Suppose the table
> you are querying contains 20 rows for ProductID number 123, and
> additional columns Quantity and OrderID. If you ask for
> select ProductID, OrderID, sum(Quantity)
> group by ProductID
> what do you want the query processor to do with the OrderID values
> from the 20 ProductID=123 rows? The Quantity column will be summed,
> but you can't return one result row (as the grouping requests) with 20
> OrderID values - you either need to provide an aggregate to use on
> the OrderID column or you need to change your mind and accept 20
> rows in the result set by adding OrderID to the group by list.
> Steve Kass
> Drew University
>
> John Grandy wrote:
>|||Hi Joe, and thanks for the response.
Where is your description ?
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1111164541.019670.34390@.o13g2000cwo.googlegroups.com...
>I have posted a detailed description of how a SELECT statement works.
> Look it up and you can see why your mental model is wrong.
>|||
John Grandy wrote:

>Hi Steve, and thanks for the response.
>So, when specifying a GROUP BY clause, the only variable is the order in
>which you list the column names. Every GROUP BY clause must provide the
>query processor with instructions regarding how to order the rows in the
>subgroups of any group. An therefore every non-aggregate column must be
>included.
>
I think you've got it, but just to make sure: there's nothing in a GROUP
BY query
that has anything to do with "how to order the rows", though that might
be your
interpretation of how a MIN or MAX aggregate is evaluated. There are other
aggregates, though, like AVG and COUNT, that don't correspond to the first
or last value in some order, as MIN and MAX do.
While changing the order of the column names in the GROUP BY clause
might change the order in which your results appear, think of any such
behavior as coincidental. If you want the result rows in a particular
order,
use an ORDER BY clause.
What the query processor needs to know in a grouping query is what
to do with the table source (those rows specified by what's in the FROM
and WHERE clauses. For each distinct combination of values of the
columns mentioned in the SELECT clause and in the group by clause,
you'll see one row in the result set. This row may correspond to one or
many rows in the table source, depending on how many times the
distinct combination appears. The additional columns of the result
set must all be aggregates, and each will represent a min, max,
avg, count, etc., for the one group of rows the result row corresponds
to.
So grouping queries give you one result row per group. The GROUP
BY clause identifies the columns used to specify each group, and the
remaining result columns are aggregate values for those groups.
SK

>"Steve Kass" <skass@.drew.edu> wrote in message
>news:%23v3PIq8KFHA.3064@.TK2MSFTNGP12.phx.gbl...
>
>
>|||You can find it here:
[url]http://groups.google.nl/groups?hl=nl&lr=&q=Here+is+how+a+SELECT+works+in+SQL+...+a
t+least+in+theory&btnG=Zoeken&meta=group%3Dmicrosoft.public.sqlserver.programming[
/url]
(url may wrap)
HTH,
Gert-Jan
John Grandy wrote:
> Hi Joe, and thanks for the response.
> Where is your description ?
> "--CELKO--" <jcelko212@.earthlink.net> wrote in message
> news:1111164541.019670.34390@.o13g2000cwo.googlegroups.com...|||Great explanation, Steve !
"Steve Kass" <skass@.drew.edu> wrote in message
news:eVVf4UCLFHA.244@.TK2MSFTNGP12.phx.gbl...
>
> John Grandy wrote:
>
> I think you've got it, but just to make sure: there's nothing in a GROUP
> BY query
> that has anything to do with "how to order the rows", though that might be
> your
> interpretation of how a MIN or MAX aggregate is evaluated. There are
> other
> aggregates, though, like AVG and COUNT, that don't correspond to the first
> or last value in some order, as MIN and MAX do.
> While changing the order of the column names in the GROUP BY clause
> might change the order in which your results appear, think of any such
> behavior as coincidental. If you want the result rows in a particular
> order,
> use an ORDER BY clause.
> What the query processor needs to know in a grouping query is what
> to do with the table source (those rows specified by what's in the FROM
> and WHERE clauses. For each distinct combination of values of the
> columns mentioned in the SELECT clause and in the group by clause,
> you'll see one row in the result set. This row may correspond to one or
> many rows in the table source, depending on how many times the
> distinct combination appears. The additional columns of the result
> set must all be aggregates, and each will represent a min, max,
> avg, count, etc., for the one group of rows the result row corresponds
> to.
> So grouping queries give you one result row per group. The GROUP
> BY clause identifies the columns used to specify each group, and the
> remaining result columns are aggregate values for those groups.
> SK
>

No comments:

Post a Comment