Friday, March 9, 2012

GROUP BY Question

Hi,
I am struggling with a GROUP BY query that works fine until I add an
additional column to the select statement when i get duplicates, and I
believe the reason is that the query can not decide what column to aggregate
on. Not sure if this makes sense, but I was wodering if it is possible to
exclude a column from being aggregated.
The query is posted below, In essence if I remove the Club_Galas.MeetingName
from the select statement and the Club_Galas from the FROM statement the
query works OK and I get each individual athletes best result listed
(produces a ranking list based on all results). Would like to add info on
the occasion the record was produced but this causes atheltes names to be
listed multiple times
Any help appreciated
Niclas
Code below ->
Select Users.LastName + ', ' + Users.FirstName, MIN(Club_Results.Result)
as BestTime,Club_ExtUser.dob,Club_Galas.MeetingName
from Club_Results,Users,Club_Events,Club_ExtU
ser,Club_Galas
Where Club_ExtUser.UserID=users.userid
and
Users.UserID=Club_Results.UserID
AND
Club_Events.StrokeID=1
AND
Club_Events.DistanceID=1
AND
Club_Events.CourseID=1
AND
Club_Results.EventID=Club_Events.EventID
AND
Club_Events.GenderID=1
AND
Club_Results.GalaID=Club_Galas.GalaID
AND
Club_Results.ResultTypeID=1
Group by Users.LastName + ', ' +
Users.FirstName,Club_ExtUser.dob,Club_Galas.MeetingName
Order by BestTimeNiclas (lindblom_niclas@.hotmail.com) writes:
> I am struggling with a GROUP BY query that works fine until I add an
> additional column to the select statement when i get duplicates, and I
> believe the reason is that the query can not decide what column to
> aggregate on. Not sure if this makes sense, but I was wodering if it is
> possible to exclude a column from being aggregated.
> The query is posted below, In essence if I remove the
> CG.MeetingName from the select statement and the Club_Galas from
> the FROM statement the query works OK and I get each individual athletes
> best result listed (produces a ranking list based on all results). Would
> like to add info on the occasion the record was produced but this causes
> atheltes names to be listed multiple times
There are certainly solutions to your problem, but since you did not include
information how the tables are related, it's difficult to suggestion. So
standard suggestions follows:
Include
o CREATE TABLE statements for your tables (possibly simplified).
o INSERT statements with sample data that demonstrates the duplicates
you get.
o The desired output given the sample.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Niclas,
You seem to want one result row for each athlete, and in
that row, you want a MeetingName value (which you must
find in the Club_Galas table) as well. You are trying to find
the correct row of the Club_Galas table to look at from the
condition Club_Results.GalaID=Club_Galas.GalaID.
Evidently, for some athletes, more than one row in Club_Galas
satisfies this condition, and so you get multiple rows in your
result for each of these athletes - one row for each matching
row in Club_Galas. If you only want one row per athlete in
the result, you could select min(MeetingName) and remove
MeetingName from the GROUP BY clause. This will give
you just one MeetingName for each athlete - the one that
happens to be first in alphabetical order among the multiple
MeetingName values matching the athlete. It's also possible
that there is some specific MeetingName for each athlete,
but it is not what you are asking for. You are asking for
meeting names that appear in Club_Galas in a row with the
GalaID of a matching Club_Results row.
If you are getting multiple rows per athlete and those rows
are identical in every column, your problem may be that
you are not correctly modeling your scenario - maybe you
have no table that stores the relationship between GalaID
and MeetingName (if this is a 1-1 relationship) correctly,
meaning just once for each pair (GalaID,MeetingName).
A quick but terrible fix in this case would be to add DISTINCT
to the query - terrible, because it fixes the symptom, not
the underlying problem, which is that you are violating
the principle of storing a single fact ("this GalaID goes
with that MeetingName) in more than one place.
Steve Kass
Drew University
"Niclas" <lindblom_niclas@.hotmail.com> wrote in message
news:uHx0c0xkGHA.1344@.TK2MSFTNGP03.phx.gbl...
> Hi,
> I am struggling with a GROUP BY query that works fine until I add an
> additional column to the select statement when i get duplicates, and I
> believe the reason is that the query can not decide what column to
> aggregate on. Not sure if this makes sense, but I was wodering if it is
> possible to exclude a column from being aggregated.
> The query is posted below, In essence if I remove the
> Club_Galas.MeetingName from the select statement and the Club_Galas from
> the FROM statement the query works OK and I get each individual athletes
> best result listed (produces a ranking list based on all results). Would
> like to add info on the occasion the record was produced but this causes
> atheltes names to be listed multiple times
> Any help appreciated
> Niclas
> Code below ->
> Select Users.LastName + ', ' + Users.FirstName, MIN(Club_Results.Result)
> as BestTime,Club_ExtUser.dob,Club_Galas.MeetingName
> from Club_Results,Users,Club_Events,Club_ExtU
ser,Club_Galas
> Where Club_ExtUser.UserID=users.userid
> and
> Users.UserID=Club_Results.UserID
> AND
> Club_Events.StrokeID=1
> AND
> Club_Events.DistanceID=1
> AND
> Club_Events.CourseID=1
> AND
> Club_Results.EventID=Club_Events.EventID
> AND
> Club_Events.GenderID=1
> AND
> Club_Results.GalaID=Club_Galas.GalaID
> AND
> Club_Results.ResultTypeID=1
> Group by Users.LastName + ', ' +
> Users.FirstName,Club_ExtUser.dob,Club_Galas.MeetingName
> Order by BestTime
>|||I tried to remove the MeetingName from the GROUP BY CLAUSE, but this
generates the error message Column 'Club_Galas.MeetingName' is invalid in
the select list because it is not contained in either an aggregate function
or the GROUP BY clause.
The Club_Galas Table, the GalaID is the primary key and has a column called
MeetingName, there is only a single GalaID associated with each MeetingName,
So not sure what you refer to with the Underlaying problem
I will try to provide some more info about the desired result and table
relations
Niclas
"Steve Kass" <skass@.drew.edu> wrote in message
news:%23QGwnx0kGHA.4816@.TK2MSFTNGP05.phx.gbl...
> Niclas,
> You seem to want one result row for each athlete, and in
> that row, you want a MeetingName value (which you must
> find in the Club_Galas table) as well. You are trying to find
> the correct row of the Club_Galas table to look at from the
> condition Club_Results.GalaID=Club_Galas.GalaID.
> Evidently, for some athletes, more than one row in Club_Galas
> satisfies this condition, and so you get multiple rows in your
> result for each of these athletes - one row for each matching
> row in Club_Galas. If you only want one row per athlete in
> the result, you could select min(MeetingName) and remove
> MeetingName from the GROUP BY clause. This will give
> you just one MeetingName for each athlete - the one that
> happens to be first in alphabetical order among the multiple
> MeetingName values matching the athlete. It's also possible
> that there is some specific MeetingName for each athlete,
> but it is not what you are asking for. You are asking for
> meeting names that appear in Club_Galas in a row with the
> GalaID of a matching Club_Results row.
> If you are getting multiple rows per athlete and those rows
> are identical in every column, your problem may be that
> you are not correctly modeling your scenario - maybe you
> have no table that stores the relationship between GalaID
> and MeetingName (if this is a 1-1 relationship) correctly,
> meaning just once for each pair (GalaID,MeetingName).
> A quick but terrible fix in this case would be to add DISTINCT
> to the query - terrible, because it fixes the symptom, not
> the underlying problem, which is that you are violating
> the principle of storing a single fact ("this GalaID goes
> with that MeetingName) in more than one place.
> Steve Kass
> Drew University
> "Niclas" <lindblom_niclas@.hotmail.com> wrote in message
> news:uHx0c0xkGHA.1344@.TK2MSFTNGP03.phx.gbl...
>|||Erland, See some further info below. I hope this clarifies my objectives.
Let me know if further info is required
The results table contains one entry per recorded time per Athlete (UserID)
where the GalaID column contains a reference meeting/competition where the
result was recorded. I would like the query to produce a "Club Records" list
by selecting the fastest time each athlete has recorded in the database for
a particular Event and also display the MeetingName of the occasion where
the result was recorded
Club_Results
=============
ResultID int (PK)
UserID int
GalaID int
EventID int
FinalPosition int
Result DateTime
ResultTypeID int
AddedDate DateTime
ConfirmedDate DateTime
Relations
EventID (Club_Events - Club_Results)
GalaId (Club_Galas - Club_Results)
UserID (Users - Club_Results)
ResultTypeID (Club_ResultTypes - Club_Results)
Club_Galas
==============
GalaID int (PK)
ClubID int
LocationID int
GalaTypeID int
StartDate DateTime
FinishDate DateTime
MeetingName VarChar(50)
LayoutID int
CutoverDate DateTime
Relations
GalaID (Club_Galas - Club_Results)
ClubID (Club_Clubs - Club_Galas)
GalaTypeID (Club_GalaTypes-Club_Galas)
LocationID (Club_Locations - Club_Galas)
"Erland Sommarskog" <esquel@.sommarskog.se> wrote in message
news:Xns97E74C12EBA7Yazorman@.127.0.0.1...
> Niclas (lindblom_niclas@.hotmail.com) writes:
> There are certainly solutions to your problem, but since you did not
> include
> information how the tables are related, it's difficult to suggestion. So
> standard suggestions follows:
> Include
> o CREATE TABLE statements for your tables (possibly simplified).
> o INSERT statements with sample data that demonstrates the duplicates
> you get.
> o The desired output given the sample.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/pr...oads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodin...ions/books.mspx|||Niclas (lindblom_niclas@.hotmail.com) writes:
> Erland, See some further info below. I hope this clarifies my objectives.
> Let me know if further info is required
> The results table contains one entry per recorded time per Athlete
> (UserID) where the GalaID column contains a reference
> meeting/competition where the result was recorded. I would like the
> query to produce a "Club Records" list by selecting the fastest time
> each athlete has recorded in the database for a particular Event and
> also display the MeetingName of the occasion where the result was
> recorded
Maybe something like:
SELECT G.MeetingName, U.UserName, E.EventName, R.Result
FROM Club_Results R
JOIN (SELECT UserID, minresult = MIN(Result)
FROM Club_Results
GROUP BY UserID) AS minR ON R.UserID = minR.UserID
AND R.Result = minR.Result
JOIN Club_Galas G ON R.GalaID = G.GalaID
JOIN Club_Events E ON R.EventID = E.EventID
But this is quite much of a wild guess from me. In my previous post
I suggested that you should post:
That suggestion still stands if you want any more useful reply.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Many thanks for your help Erland,
When running the query you suggested (understand that this is a best effort
guess), I get the error message:
The column prefix 'U' does not match with a table name or alias name used in
the query.
I have generated scripts in Enterprise Manager for the tables as requested
(Bottom of this post), not sure how I generate the INSERT statements you ask
for.
The desired result is a table listing each individual athletes lowest time
ranked from lowest to highest time value with a single entry per athlete who
has an occurence in the results table. With my initially posted query, I get
multiple matches per athlete and the query basically produces a list of all
results from lowest to highest time.
Table creation statement below
CREATE TABLE [dbo].[CLUB_Results] (
[ResultID] [int] IDENTITY (1, 1) NOT NULL ,
[UserID] [int] NOT NULL ,
[GalaID] [int] NOT NULL ,
[EventID] [int] NOT NULL ,
[FinalPosition] [int] NOT NULL ,
[Result] [datetime] NOT NULL ,
[ResultTypeID] [int] NOT NULL ,
[AddedDate] [datetime] NOT NULL ,
[ConfirmedDate] [datetime] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Users] (
[UserID] [int] IDENTITY (1, 1) NOT NULL ,
[Username] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[FirstName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[LastName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[IsSuperUser] [bit] NOT NULL ,
[AffiliateId] [int] NULL ,
[Email] [nvarchar] (256) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CLUB_Galas] (
[GalaID] [int] IDENTITY (1, 1) NOT NULL ,
[ClubID] [int] NOT NULL ,
[LocationID] [int] NOT NULL ,
[GalaTypeID] [int] NOT NULL ,
[StartDate] [datetime] NOT NULL ,
[FinishDate] [datetime] NOT NULL ,
[GalaName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[LayoutID] [int] NOT NULL ,
[CutoverDate] [datetime] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[CLUB_Results] ADD
CONSTRAINT [DF_CLUB_Results_FinalPosition] DEFAULT (0) FOR [FinalPosition],
CONSTRAINT [DF_CLUB_Results_ConfirmedDate] DEFAULT ('1900/01/01 00:00:00')
FOR [ConfirmedDate],
CONSTRAINT [PK_CLUB_Results] PRIMARY KEY NONCLUSTERED
(
[ResultID]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Users] ADD
CONSTRAINT [DF_Users_IsSuperUser] DEFAULT (0) FOR [IsSuperUser],
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
[UserID]
) ON [PRIMARY] ,
CONSTRAINT [IX_Users] UNIQUE NONCLUSTERED
(
[Username]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[CLUB_Galas] ADD
CONSTRAINT [PK_CLUB_Galas] PRIMARY KEY CLUSTERED
(
[GalaID]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[CLUB_Results] ADD
CONSTRAINT [FK_CLUB_Results_CLUB_Events] FOREIGN KEY
(
[EventID]
) REFERENCES [dbo].[CLUB_Events] (
[EventID]
),
CONSTRAINT [FK_CLUB_Results_CLUB_ResultTypes] FOREIGN KEY
(
[ResultTypeID]
) REFERENCES [dbo].[CLUB_ResultTypes] (
[ResultTypeID]
),
CONSTRAINT [FK_CLUB_Results_Users] FOREIGN KEY
(
[UserID]
) REFERENCES [dbo].[Users] (
[UserID]
)
GO
ALTER TABLE [dbo].[CLUB_Galas] ADD
CONSTRAINT [FK_CLUB_Galas_CLUB_Clubs] FOREIGN KEY
(
[ClubID]
) REFERENCES [dbo].[CLUB_Clubs] (
[ClubID]
),
CONSTRAINT [FK_CLUB_Galas_CLUB_GalaType] FOREIGN KEY
(
[GalaTypeID]
) REFERENCES [dbo].[CLUB_GalaType] (
[GalaTypeID]
),
CONSTRAINT [FK_CLUB_Galas_CLUB_Locations] FOREIGN KEY
(
[LocationID]
) REFERENCES [dbo].[CLUB_Locations] (
[LocationID]
)
GO
"Erland Sommarskog" <esquel@.sommarskog.se> wrote in message
news:Xns97E8353B1CA5Yazorman@.127.0.0.1...
> Niclas (lindblom_niclas@.hotmail.com) writes:
> Maybe something like:
> SELECT G.MeetingName, U.UserName, E.EventName, R.Result
> FROM Club_Results R
> JOIN (SELECT UserID, minresult = MIN(Result)
> FROM Club_Results
> GROUP BY UserID) AS minR ON R.UserID = minR.UserID
> AND R.Result = minR.Result
> JOIN Club_Galas G ON R.GalaID = G.GalaID
> JOIN Club_Events E ON R.EventID = E.EventID
> But this is quite much of a wild guess from me. In my previous post
> I suggested that you should post:
>
> That suggestion still stands if you want any more useful reply.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/pr...oads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodin...ions/books.mspx|||Niclas (lindblom_niclas@.hotmail.com) writes:
> Many thanks for your help Erland,
> When running the query you suggested (understand that this is a best
> effort guess), I get the error message:
> The column prefix 'U' does not match with a table name or alias name
> used in the query.
Well, I suggested that you should post create table statements as well as
insert statements with sample data (that gives the problem with duplicates
with your original query) and the desired result for the sample. That makes
it very easy to develop a tested query.
Now you did not include that information, so I typed up an untested,
uncompiled query. Thus, there can be compilation errors in it. I gladly
assume that people who ask for help can sort out trivial errors like
missing aliases themselves.
In this post, you have included the tables, but not any data. So that is
your choice. But the thing is, the less effort you are prepared to put into
to help us, the less inclined I and other people are to take a stab at your
problem.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

No comments:

Post a Comment