Friday, February 24, 2012

Group By

I'm trying to aggregate values and grouping the results by defined periods (6
month, 12 month, 18 month, and lifetime). I'd like to do this with one pass,
but using a CASE statement did not aggregrate correctly. The results are
more like 6 months, 7-12 months, and 13-18 months, 19-lifetime. I would like
the results of the 12 month grouping to include all of the 6 month, the 18
month to include all of the 6 and 12 month grouping, etc.
/**/
CASE WHEN pd.accountingdate > 0 THEN 'LIFE'
WHEN pd.accountingdate BETWEEN '10/19/2004' AND '04/19/2007' THEN '18MONTH'
WHEN pd.accountingdate BETWEEN '04/19/2006' AND '04/19/2007' THEN '12MONTH'
WHEN pd.accountingdate BETWEEN '10/19/2006' AND '04/19/2007' THEN '06MONTH'
ELSE NULL
END
/**/
Thanks.
"Jeremy" <Jeremy@.discussions.microsoft.com> wrote in message
news:79FBE5E0-E112-406A-B406-A5D19626FA96@.microsoft.com...
> I'm trying to aggregate values and grouping the results by defined periods
> (6
> month, 12 month, 18 month, and lifetime). I'd like to do this with one
> pass,
> but using a CASE statement did not aggregrate correctly. The results are
> more like 6 months, 7-12 months, and 13-18 months, 19-lifetime. I would
> like
> the results of the 12 month grouping to include all of the 6 month, the 18
> month to include all of the 6 and 12 month grouping, etc.
>
> /**/
> CASE WHEN pd.accountingdate > 0 THEN 'LIFE'
> WHEN pd.accountingdate BETWEEN '10/19/2004' AND '04/19/2007' THEN
> '18MONTH'
> WHEN pd.accountingdate BETWEEN '04/19/2006' AND '04/19/2007' THEN
> '12MONTH'
> WHEN pd.accountingdate BETWEEN '10/19/2006' AND '04/19/2007' THEN
> '06MONTH'
> ELSE NULL
> END
In situations like this, you should post DDL, sample data, and the actual
query. A non-working snippet of a query doesn't really help anyone
understand the complete situation.
Making some assumptions, I believe your problem is related to the use of a
single column to represent different periods. The above case expression
represents what? Is it the 6 month data, the 12 month data, ...? It can't
represent more than one "attribute" - in this case, period.
The solution is to generate separate period values. This can be done in one
of two ways. Either you aggregate the periods as separate columns or you
create a situation where you join the data to be aggregated to a table
containing the periods. In the first example, you get period data as
separate columns within the result set. In the second example, you get
period data as separate rows. You decide which way you want to proceed. It
will facilitate discussion to use either Pubs or Northwind for sample data
and queries since most people have those available (and thus do not require
the posting of DDL or sample data).
|||Understood Scott, I've already considered the approach you've suggested, but
let me expand to help everyone better understand (as you recommended).
Using a case statement gives me the following from the Orders table in
Northwind:
customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|2|94.7100
AROUT|18MONTH|3|101.5500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|5|419.5500
BERGS|18MONTH|4|426.2300
BERGS|LIFE|2|101.6700
/* sample statement */
SELECTcustomerid,
CASEWHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
GROUP BY
customerid,
CASE WHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END
/**/
However to get the results I seek I have to make four passes:
customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|8|275.6900
AROUT|18MONTH|13|471.9500
AROUT|LIFE|13|471.9500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|7|612.0700
BERGS|18MONTH|16|1457.8500
BERGS|LIFE|18|1559.5200
/* sample statement */
SELECTcustomerid,
'06MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1997' AND '05/06/1998'
GROUP BY customerid
UNION
SELECTcustomerid,
'12MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1997' AND '05/06/1998'
GROUP BY customerid
UNION
SELECTcustomerid,
'18MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1996' AND '05/06/1998'
GROUP BY customerid
UNION
SELECTcustomerid,
'LIFE' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate > 0
GROUP BY customerid
ORDER BY customerid, period
/**/
The results of the first query are affected by the sequence of the
conditions in the CASE statement. And there's the rub. I'd like to have my
results in rows because I'm aggregrating at least 20 columns and 4 periods
would push it to 80 columns. Thanks for you help.
"Scott Morris" wrote:

> "Jeremy" <Jeremy@.discussions.microsoft.com> wrote in message
> news:79FBE5E0-E112-406A-B406-A5D19626FA96@.microsoft.com...
> In situations like this, you should post DDL, sample data, and the actual
> query. A non-working snippet of a query doesn't really help anyone
> understand the complete situation.
> Making some assumptions, I believe your problem is related to the use of a
> single column to represent different periods. The above case expression
> represents what? Is it the 6 month data, the 12 month data, ...? It can't
> represent more than one "attribute" - in this case, period.
> The solution is to generate separate period values. This can be done in one
> of two ways. Either you aggregate the periods as separate columns or you
> create a situation where you join the data to be aggregated to a table
> containing the periods. In the first example, you get period data as
> separate columns within the result set. In the second example, you get
> period data as separate rows. You decide which way you want to proceed. It
> will facilitate discussion to use either Pubs or Northwind for sample data
> and queries since most people have those available (and thus do not require
> the posting of DDL or sample data).
>
>
|||Understood Scott. I've already considered the approaches you've suggested,
but let me expand to help us better understand my problem as you recommended.
The CASE statement approach gives me the following results from the Orders
table in Northwind:
customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|2|94.7100 /* freight should = 370.40*/
AROUT|18MONTH|3|101.5500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|5|419.5500
BERGS|18MONTH|4|426.2300
BERGS|LIFE|2|101.6700
/* sample code */
SELECTcustomerid,
CASE
WHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
GROUP BY
customerid,
CASE
WHEN orderdate BETWEEN '11/06/1997' AND '05/06/1998' THEN '06MONTH'
WHEN orderdate BETWEEN '05/06/1997' AND '05/06/1998' THEN '12MONTH'
WHEN orderdate BETWEEN '11/06/1996' AND '05/06/1998' THEN '18MONTH'
WHEN orderdate > 0 THEN 'LIFE'
ELSE NULL
END
/**/
However, I would like my results to read:
customerid|period|orders|freight
AROUT|06MONTH|8|275.6900
AROUT|12MONTH|10|370.4000
AROUT|18MONTH|13|471.9500
AROUT|LIFE|13|471.9500
BERGS|06MONTH|7|612.0700
BERGS|12MONTH|12|1031.6200
BERGS|18MONTH|16|1457.8500
BERGS|LIFE|18|1559.5200
/* sample code */
SELECTcustomerid,
'06MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1997' AND '05/06/1998'
GROUP BY customerid
UNION
SELECTcustomerid,
'12MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '05/06/1997' AND '05/06/1998'
GROUP BY customerid
UNION
SELECTcustomerid,
'18MONTH' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate BETWEEN '11/06/1996' AND '05/06/1998'
GROUP BY customerid
UNION
SELECTcustomerid,
'LIFE' period,
COUNT(orderid) orders,
SUM(freight) freight
FROMorders
WHEREcustomerid IN ('AROUT', 'BERGS')
AND orderdate > 0
GROUP BY customerid
ORDER BY customerid, period
/**/
The results of the first query are affected by the sequence of the
conditions in the CASE statement. And there's the rub. I need the 18 month
period to include 6 months and 12 months. The only way I know to do that is
through multiple passes. I prefer my results to rows as I'm aggregating at
least 20 columns and if I push the results to columns with 4 different
periods, then I would expect at least 80 columns! Any help is greatly
appreciated. Thanks.
"Scott Morris" wrote:

> "Jeremy" <Jeremy@.discussions.microsoft.com> wrote in message
> news:79FBE5E0-E112-406A-B406-A5D19626FA96@.microsoft.com...
> In situations like this, you should post DDL, sample data, and the actual
> query. A non-working snippet of a query doesn't really help anyone
> understand the complete situation.
> Making some assumptions, I believe your problem is related to the use of a
> single column to represent different periods. The above case expression
> represents what? Is it the 6 month data, the 12 month data, ...? It can't
> represent more than one "attribute" - in this case, period.
> The solution is to generate separate period values. This can be done in one
> of two ways. Either you aggregate the periods as separate columns or you
> create a situation where you join the data to be aggregated to a table
> containing the periods. In the first example, you get period data as
> separate columns within the result set. In the second example, you get
> period data as separate rows. You decide which way you want to proceed. It
> will facilitate discussion to use either Pubs or Northwind for sample data
> and queries since most people have those available (and thus do not require
> the posting of DDL or sample data).
>
>
|||> The results of the first query are affected by the sequence of the
> conditions in the CASE statement. And there's the rub. I'd like to have
> my
> results in rows because I'm aggregrating at least 20 columns and 4 periods
> would push it to 80 columns. Thanks for you help.
The secret is that you need a table of periods - which you then join to your
data while aggregating. This can be done in one pass since each "raw data
row" joins once to each appropriate period. So a row for today joins to the
6 month period while a row from 10 months ago joins to both the 6 and 12
month periods (and so forth). The trick is to generate the period table -
dynamically if needed. Below is a query (that can be used as a derived
table) to do this. Obviously, the variable isn't actually required. You
could create a table-valued function to do the same thing (especially if you
needed to reuse this logic).
declare @.today datetime
set @.today = DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)
select @.today, dateadd(month, -6, @.today)
union all
select @.today, dateadd(month, -12, @.today)
union all
select @.today, dateadd(month, -18, @.today)
You may need to adjust the logic depending on your definition of periods -
the last day of the month (esp Feb in and out of a leap year) is always fun
for these types of things. With the table of periods, you simply join to
the transaction table where date of transaction is between the period start
and end dates.
|||Your solution is perfect! This increases performance and keeps my DBA happy.
I'll add this technique to my bag of tricks. Thanks Scott!
"Scott Morris" wrote:

> The secret is that you need a table of periods - which you then join to your
> data while aggregating. This can be done in one pass since each "raw data
> row" joins once to each appropriate period. So a row for today joins to the
> 6 month period while a row from 10 months ago joins to both the 6 and 12
> month periods (and so forth). The trick is to generate the period table -
> dynamically if needed. Below is a query (that can be used as a derived
> table) to do this. Obviously, the variable isn't actually required. You
> could create a table-valued function to do the same thing (especially if you
> needed to reuse this logic).
> declare @.today datetime
> set @.today = DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)
> select @.today, dateadd(month, -6, @.today)
> union all
> select @.today, dateadd(month, -12, @.today)
> union all
> select @.today, dateadd(month, -18, @.today)
> You may need to adjust the logic depending on your definition of periods -
> the last day of the month (esp Feb in and out of a leap year) is always fun
> for these types of things. With the table of periods, you simply join to
> the transaction table where date of transaction is between the period start
> and end dates.
>
>

No comments:

Post a Comment