Monday, March 12, 2012

Group By versus Distinct

All
Can someone please advise if it is more efficent to use a
select distinct or a group by.
ie.
select distinct(colname)
from table
or
select colname
from table
group by colname
Thanks in Advance.Hi
Don't use the GROUP BY clause without an aggregate function. This is because
you can accomplish the same end result by using the DISTINCT option instead,
and it is faster.
On the other hand DISTINCT will require extra work to perfom the query in
comparison with GROUP BY clause and an aggregate function.
<anonymous@.discussions.microsoft.com> wrote in message
news:8bfa01c404fd$2e8101a0$a001280a@.phx.gbl...
> All
> Can someone please advise if it is more efficent to use a
> select distinct or a group by.
> ie.
> select distinct(colname)
> from table
> or
> select colname
> from table
> group by colname
>
> Thanks in Advance.|||Uri,
Your statement is incorrect. GROUP BY without an aggregate function and
DISTINCT produce the same query plans.
--
Jacco Schalkwijk
SQL Server MVP
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:OZm%23EHQBEHA.576@.TK2MSFTNGP11.phx.gbl...
> Hi
> Don't use the GROUP BY clause without an aggregate function. This is
because
> you can accomplish the same end result by using the DISTINCT option
instead,
> and it is faster.
> On the other hand DISTINCT will require extra work to perfom the query in
> comparison with GROUP BY clause and an aggregate function.
>
> <anonymous@.discussions.microsoft.com> wrote in message
> news:8bfa01c404fd$2e8101a0$a001280a@.phx.gbl...
>|||Well
I have just test it on the table with >10000 rows
--Distinct
CPU time = 31 ms, elapsed time = 191 ms.
--Group by
CPU time = 36 ms, elapsed time = 255ms.
That means which is faster.
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:#3L4v5QBEHA.2348@.TK2MSFTNGP09.phx.gbl...
> Uri,
> Your statement is incorrect. GROUP BY without an aggregate function and
> DISTINCT produce the same query plans.
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:OZm%23EHQBEHA.576@.TK2MSFTNGP11.phx.gbl...
> because
> instead,
in
>
>|||That doesn't mean anything. It could be the other way around. (Actually, I
ran DISTINCT vs GROUP BY 3 times and one of the results favoured DISTINCT,
one favoured GROUP BY and one returned the same time for both). CPU time and
elapsed time are influenced by a number of factors outside the query itself,
like other processes running on the server, whether all the data is in
memory or some of it is still on disk, netwrok traffic etc. The Query
execution plan and SET STATISTICS IO ON give you the proper information
about how a query will perform under any circumstances.
Jacco Schalkwijk
SQL Server MVP
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:ePc1bGRBEHA.212@.TK2MSFTNGP12.phx.gbl...
> Well
> I have just test it on the table with >10000 rows
> --Distinct
> CPU time = 31 ms, elapsed time = 191 ms.
> --Group by
> CPU time = 36 ms, elapsed time = 255ms.
> That means which is faster.
>
> "Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
> news:#3L4v5QBEHA.2348@.TK2MSFTNGP09.phx.gbl...
> in
>|||My two cents...
We have our "ten commandments" of query building posted on the wall here and
one of them is:
NEVER USE SELECT DISTINCT
I have found that this rule forces the coders to be careful about joins, thu
s reducing the requirement to clean dup's out of the result set.
and a test on 10000 rows is pretty small - the query optimizer probably took
more time trying to figure out a path then the query itself.
We have production tables of 2.5 million rows - failure to join properly is
a huge risk.
I believe I've also read that GROUP BY and DISTINCT are done a very differen
t times in the building of a result set, and can be further influenced by OR
DER BY, sorting, hashing and the like.
What is your real reason for wanting to use the GROUP BY or DISTINCT anyway?|||Jacco
Why that does not mean anything?
If I want to compare elapsed vs. actual CPU time and I am interested in how
long compilation and optimization took I think I should /must use SET
STATISTICS TIME .
"Jacco Schalkwijk" <NOSPAMjaccos@.eurostop.co.uk> wrote in message
news:ebNLwWRBEHA.2404@.TK2MSFTNGP11.phx.gbl...
> That doesn't mean anything. It could be the other way around. (Actually, I
> ran DISTINCT vs GROUP BY 3 times and one of the results favoured DISTINCT,
> one favoured GROUP BY and one returned the same time for both). CPU time
and
> elapsed time are influenced by a number of factors outside the query
itself,
> like other processes running on the server, whether all the data is in
> memory or some of it is still on disk, netwrok traffic etc. The Query
> execution plan and SET STATISTICS IO ON give you the proper information
> about how a query will perform under any circumstances.
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:ePc1bGRBEHA.212@.TK2MSFTNGP12.phx.gbl...
and
query
>|||AFAIK in SQL 2000 DISTINCT operator is physically impemented as GROUP BY, so
they both shoud perform the same. And I agree with Steve - prefer GROUP BY
as well, because of the same reasons.
Dejan Sarka, SQL Server MVP
Associate Mentor
Solid Quality Learning
More than just Training
www.SolidQualityLearning.com
"Steve Z" <szlamany@.antarescomputing.com> wrote in message
news:31C87BFA-5F98-4780-A36E-D4ABF84D3505@.microsoft.com...
> My two cents...
> We have our "ten commandments" of query building posted on the wall here
and one of them is:
> NEVER USE SELECT DISTINCT
> I have found that this rule forces the coders to be careful about joins,
thus reducing the requirement to clean dup's out of the result set.
> and a test on 10000 rows is pretty small - the query optimizer probably
took more time trying to figure out a path then the query itself.
> We have production tables of 2.5 million rows - failure to join properly
is a huge risk.
> I believe I've also read that GROUP BY and DISTINCT are done a very
different times in the building of a result set, and can be further
influenced by ORDER BY, sorting, hashing and the like.
> What is your real reason for wanting to use the GROUP BY or DISTINCT
anyway?

No comments:

Post a Comment