Firehose mode: queryselect

Showing posts with label queryselect. Show all posts

Friday, March 30, 2012

Grouping Result

Here's my query:

SELECT col_1, col_2, col_3, col_4 FROM my_table
WHERE (col_1 = @.col_1) AND (col_2 = @.col_2)
ORDER BY col_1

I want my result rows to have an uniqe value in col_3. How can I exclude rows (but one) that have the same value in col_3?

Thanx

/sf

SELECT TOP 1 col_1, col_2, col_3, col_4 FROM my_table
WHERE (col_1 = @.col_1) AND (col_2 = @.col_2)
ORDER BY col_1

|||Thank you for your answer.

But will not this query only output one row?

I'll try to explain my problem better:

Here's my query:

SELECT col_1, col_2, col_3, col_4 FROM my_table
WHERE (col_1 = @.col_1) AND (col_2 = @.col_2)
ORDER BY col_1

What I want is output all rows with unique value in col_3 and if several rows have the same value in col_3, I only want one of those rows. The following could have worked, but it's not valid sql:

SELECT col_1, col_2, col_3, col_4 FROM my_table
WHERE (col_1 = @.col_1) AND (col_2 = @.col_2)
ORDER BY col_1
GROUP BY col_3|||SELECT MAX(col_1), MAX(col_2), col_3, MAX(col_4) FROM my_table
WHERE (col_1 = @.col_1) AND (col_2 = @.col_2)
ORDER BY MAX(col_1)
GROUP BY col_3

I used MAX() for the non-grouped values, but you could use MIN(), SUM(), AVG()

Monday, March 26, 2012

Grouping for Sum

My query:

SELECT Ticket.Barrels, Lease.[RRC Lease Number], Lease.[Lease Name], Lease.[Field Name], Lease.OperatorID, Lease.OilGasOther, Lease.District,
Operator.[Operator Name], SUM(Ticket.Barrels) AS Expr1
FROM ((Ticket INNER JOIN
Lease ON Ticket.LeaseID = Lease.LeaseID) INNER JOIN
Operator ON Lease.OperatorID = Operator.OperatorID)
WHERE (Ticket.SWDNumber = ?) AND (Ticket.TicketDate BETWEEN ? AND ?)
GROUP BY Ticket.Barrels, Lease.[RRC Lease Number], Lease.[Lease Name], Lease.[Field Name], Lease.OperatorID, Lease.OilGasOther, Lease.District,
Operator.[Operator Name]
HAVING (Lease.District = ?) AND (Lease.OilGasOther = ?)

I would like to produce a table as follows:

RRC Number - Lease Name - Field Name -Sum of Barrels

0001 - Lease1 - Field1 - 120

0002 - Lease1 - Field3 - 340

0002 - Lease2 - Field3 - 120

Instead I have some of the data on several rows:

0001 - Lease1 - Field1 - 70

0001 - Lease1 - Field1 - 50

0002 - Lease1 - Field3 - 40

0002 - Lease1 - Field3 - 300

and so on ....

What am I doing wrong?

You need to clean up both the select columns and group by columns. I can see the Ticket.Barrels coumn included in both select and group by which is the reason for what you got. Try another on with this column and other unnecessary columns.
.
SELECT Lease.[RRC Lease Number], Lease.[Lease Name],

Lease.[Field Name], SUM(Ticket.Barrels) AS 'Sum of Barrels'
FROM ((Ticket INNER JOIN
Lease ON Ticket.LeaseID = Lease.LeaseID) INNER JOIN
Operator ON Lease.OperatorID = Operator.OperatorID)
WHERE (Ticket.SWDNumber = ?) AND (Ticket.TicketDate BETWEEN ? AND ?)
GROUP

BY Lease.[RRC Lease Number], Lease.[Lease Name],

Lease.[Field Name]
HAVING (Lease.District = ?) AND (Lease.OilGasOther = ?)

Monday, March 12, 2012

GROUP BY useage

My query:

SELECT TOP 10 stats_page
FROM site_stats
GROUP BY stats_page
ORDER BY idDESC

Produces the error:
Column "site_stats.id" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.

Can someone please explain why this does not work? And perhaps offer a solution?

Thanks

From the error message we can see your query conflict with a rule of using ORDER BY clause in T-SQL. So you can write a query like this:

SELECT TOP 10 stats_page
FROM site_stats
GROUP BY stats_page,id
ORDER BY idDESC

|||

Because, you are asking for the top 10 results of stats_page.

It's akin to saying, ok, make me a list of the top 10 most expensive cars in the world. Now sort that list by color.

Each car can come in one of many colors (As each of your stats_page can have multiple id's), but what color (id) are you referring to for each car? Did you want to sort by the least color (alphabetically) that each car is available in? So if the ferrari is only available in green and red, it would come after a mercedes benz that is available in black, purple, and yellow (Since the least color for ferrari is green, and the least color for mercedes is black)? Or did you want to sort by the highest color (alphabetically) so that the ferrari (red) comes before mercedes (yellow)? Or sort by the number of colors each car is available in (2 for ferrari, 3 for mercedes)?

You must choose which ID is to represent each stats_page since there are multiple ID's for each stats_page (If there is not multiple, then you shouldn't be using GROUP BY). You could use an ORDER BY MIN(id) if you want to order by the lowest id for each stats_page, or ORDER BY MAX(id) if you want to order by the highest id, or COUNT(id) for the number of id's, etc.

|||

Iori_Jay:
From the error message we can see your queryconflict with a rule of using ORDER BY clause in T-SQL. So you canwrite a query like this:
SELECT TOP 10 stats_page
FROM site_stats
GROUP BY stats_page,id
ORDER BY idDESC

Even if I use that, I get a syntax error on the "GROUP BY stats_page,id" line.

Motley:
You must choose which ID is to represent each stats_page since there are multiple ID's for each stats_page (If there is not multiple, then you shouldn't be using GROUP BY). You could use an ORDER BY MIN(id) if you want to order by the lowest id for each stats_page, or ORDER BY MAX(id) if you want to order by the highest id, or COUNT(id) for the number of id's, etc.

soo...my new query would be...? Sorry but I'm a little lost here xD|||

lori_Jay:

Motley:
You must choose which ID is to represent each stats_page since there are multiple ID's for each stats_page (If there is not multiple, then you shouldn't be using GROUP BY). You could use an ORDER BY MIN(id) if you want to order by the lowest id for each stats_page, or ORDER BY MAX(id) if you want to order by the highest id, or COUNT(id) for the number of id's, etc.
soo...my new query would be...? Sorry but I'm a little lost here xD

Something like this:

SELECT TOP 10 stats_page, MIN(id)
FROM site_stats
GROUP BY stats_page
ORDER BY MIN(id)DESC

|||I would need a small sample of what your site_stats table data looks like, and what result you want the query to give in order to give you an answer. I'm guessing the query you want isn't really what you have, and I'm unsure of what it is that you want.|||Yay, I'm starting to make progress.

The query now works using:
SELECT TOP 10 stats_page, MIN(id)
FROM site_stats
GROUP BY stats_page
ORDER BY MIN(id) DESC

And to answer motley's question. The site_stats table contains:
id, stats_ip, stats_page, stats_date

All I want to be able to do (In this case) is to see the most frequently visited pages.
ie. I'd be expecting something like Default.aspx to be first as it's visited most frequently, followed by something like login.aspx. The amount of times the page has been accessed isn't really nesscary.|||SELECT TOP 10 stats_page,COUNT(*)
FROM site_stats
GROUP BY stats_page
ORDER BY COUNT(*) DESC|||That works perfectly.

Thanks every much for all your help :)

Friday, March 9, 2012

Group by query returning too many rows

Can anyone help ?
Why is this query:

select bp, sum(msg) as 'msg'
from dbo.net_report
where gateway = 'sweden'
and convert(varchar, sqldate, 2) > '02.05.01'
and convert(varchar, sqldate, 2) <= '02.05.31'
group by bp

returning 666 rows, while this query

select bp
from dbo.net_report
where gateway = 'sweden'
and convert(varchar, sqldate, 2) > '02.05.01'
and convert(varchar, sqldate, 2) <= '02.05.31'
group by bp

is only returning 20.
The "correct" result is 20 rows, one for each bp.
The fisrt query returns alot of duplicate bp.

By the way: What is faster: Converting the sqldate field to a varchar and comparing with another varchar, or converting the varchar to a date and then comparing it to the sqldate field ?do you want the SUM of all the msg attributes or the number of msg (messages?) for each bp? try using COUNT(*) in place of the SUM(msg).

In your case you are probably doing table scans due to the convertion of the date attribute to a varchar. SQL Server can efficiently convert and test a (var)char variable to a date attribute + you can take advantage of indexes.|||Sorry, I guess I should have made things clearer.
What I want is the sum of all the numbers stored in the msg column, i.e. the number of msgs for each bp. So the result set should have one row for each bp. This works fine without the sum(msg) part, and the result looks something like this:

HENNES
HENTEXTRA
KANAL5

But when I add the sum(msg) to get the number of messages pr. bp then the result looks like this:

Wow. Something strange just happened. When I ran the query to produce the results I added "order by bp" at the end, and then there was suddenly just one row for each 20 in total. Without it the resultset returns 666 rows.

Is the group by clause dependent upon the order one retrieves the rows ?|||No. The order by is used to sort the result set and does not affect the group by.

I set up a simple test...

Code:
----------------------------
create table #tmp(f1 varchar(10),f2 int)
insert into #tmp values('A',2)
insert into #tmp values('A',4)
insert into #tmp values('C',3)
insert into #tmp values('C',1)
insert into #tmp values('C',1)
insert into #tmp values('B',2)
insert into #tmp values('B',3)
insert into #tmp values('B',4)

select f1,sum(f2) as 'Sum'
from #tmp
group by f1
order by f1
----------------------------

is this anything close to what you are working on?|||That is pretty much what I am working on, except that my view has alot more columns. At the moment I am really only interested in getting one row for each bp, with one sum of messages for each.
My query does produce the desired results, as long as I have the "order by bp" clause at the end.
So my problem is really solved, but I don't really understand why though. If you want to find out why, and need any more information from me just let me know.
The view I am querying is based on two other views, but I can't see that making much of a difference.

This is the result I was looking for, and I get with the order bp:

davinci 1333
E-CLIPS 1864
HENNES 1397
KANAL5 6470
MRJET 6
PASSAGEN 70
SIMONTV 12
SPORTAL 828
STARLIFE 1004
TISCALI 2484
YAHOO 3
...
...
20 rows in total

This is some of what I get without the order bp:
SPORTAL 8
davinci 11
E-CLIPS 11
davinci 1
E-CLIPS 1
davinci 7
E-CLIPS 7
davinci 9
E-CLIPS 9
davinci 2
E-CLIPS 2
davinci 8
...
...
...
666 rows in total

group by query

Hi

I have the following query:

select tbl.id, nvl(sum(x),0) as A, nvl(count(y),0) as B from ... where tbl.id in (1,2,3) group by tbl.id

And here are the results I am currently seeing:

tbl.id A B
1 232 343
3 3434 343

The table where tbl.id=2 has 0 for both columns so it does not show up.
How can I modify the query so that I will get a result set as the following:

tbl.id A B
1 232 343
2 0 0
3 3434 343> "The table where tbl.id=2 has 0 for both columns so it does not show up"

i'm having trouble believing this

your query must return a row for the 2 group, regardless of whether the 2 row(s) have 0 in the x and/or y columns, or nulls, or anything else

if at least one row for 2 exists, there will be a 2 group in the results, unless it's eliminated by a HAVING clause

rudy
http://r937.com/

Sunday, February 26, 2012

group by clause Query help

This is my query

select ano,max(date),a_subject from MY_TAB where table_name='xyz' and
ano=877
group by a_subject,ano order by a_subject

ANOmax(Date)A_Subject
8772005-01-20 00:00:00.000Subject_1
8771900-01-01 00:00:00.000Subject_2
8772004-12-20 00:00:00.000Subject_3
8772005-01-19 00:00:00.000Subject_4
-----------------------
When I put the status column in, it fetches all the rows.

select ano,max(date),a_subject,status from MY_TAB where
table_name='xyz' and ano=877 group by a_subject,ano,status order by
a_subject

ANOmax(Date)A_SubjectStatus
8772005-01-20 00:00:00.000Subject_1Not Started
8771900-01-01 00:00:00.000Subject_2Not Started
8772004-12-20 00:00:00.000Subject_3Completed
8771900-01-01 00:00:00.000Subject_3Not Started
8771900-01-01 00:00:00.000Subject_4Not Started
8772005-01-19 00:00:00.000Subject_4Not Started
----------------------
now what i want is

AJ[posted and mailed, please reply in news]

(aj70000@.hotmail.com) writes:
> select ano,max(date),a_subject from MY_TAB where table_name='xyz' and
> ano=877
> group by a_subject,ano order by a_subject
> ANO max(Date) A_Subject
> 877 2005-01-20 00:00:00.000 Subject_1
> 877 1900-01-01 00:00:00.000 Subject_2
> 877 2004-12-20 00:00:00.000 Subject_3
> 877 2005-01-19 00:00:00.000 Subject_4
> -----------------------
> When I put the status column in, it fetches all the rows.
> select ano,max(date),a_subject,status from MY_TAB where
> table_name='xyz' and ano=877 group by a_subject,ano,status order by
> a_subject
> ANO max(Date) A_Subject Status
> 877 2005-01-20 00:00:00.000 Subject_1 Not Started
> 877 1900-01-01 00:00:00.000 Subject_2 Not Started
> 877 2004-12-20 00:00:00.000 Subject_3 Completed
> 877 1900-01-01 00:00:00.000 Subject_3 Not Started
> 877 1900-01-01 00:00:00.000 Subject_4 Not Started
> 877 2005-01-19 00:00:00.000 Subject_4 Not Started
> ----------------------
> now what i want is
> ANO max(Date) A_Subject Status
> 877 2005-01-20 00:00:00.000 Subject_1 Not Started
> 877 1900-01-01 00:00:00.000 Subject_2 Not Started
> 877 2004-12-20 00:00:00.000 Subject_3 Completed
> 877 2005-01-19 00:00:00.000 Subject_4 Not Started
> Thanks a lot for your help.

With the example you have given, you would get away with:

select ano,max(date),a_subject, MIN(status)
from MY_TAB
where table_name='xyz' and ano=877
group by a_subject,ano
order by a_subject

But this does not work well, if you have Status values like "Anticipating"
which represents intermediate stages.

If you are looking for the status on the selected date, then this might be
better:

SELECT a.ano, a.maxdate, a.a_subject, b.status
FROM (select ano, maxdate, max(date), a_subject
from MY_TAB
where table_name='xyz' and ano=877
group by a_subject, ano) AS b
JOIN MY_TAB b ON a.ano = b.ano
AND a.maxdate = b.date
AND a.a_subject = b.a_subject

This assumes that ano/a_subject/date constitutes some form of key in
the table.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things when they can.

a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there. The table expression> AS <correlation name> option allows you
give a name to this working table which you then have to use for the
rest of the containing query.

b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (reject UNKNOWN and FALSE). The
WHERE clause is applied to the working set in the FROM clause.

c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
those three items.

d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.

e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The
"AS" operator can also give names to expressions in the SELECT
list. These new names come into existence all at once, but after the
WHERE clause, GROUP BY clause and HAVING clause has been executed; you
cannot use them in the SELECT list or the WHERE clause for that reason.

If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).

f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.

g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there. The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.

As you can see, things happen "all at once" in SQL, not from left to
right as they would in a sequential file/proceudral language model. In
those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;

while these two statements return the same data:

SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;

Think about what a confused mess this statement is in the SQL model.

SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;
That is why such nonsense is illegal syntax.