Monday, March 12, 2012

Group By, Max(date) query problem

Hello all:

I have an invoice header and detail tables and a customer table using sqlserver 2005. The Detail invoice table has price and product id. The header has the date and customerID.

I need to create a list of the most recent invoice date (and the product price) for each product for each customer. I can't use the group by because when I select both the max(date) and the price (as well as the productID and customerID which all have to be included in the group by) , I get more than one date per product per customer. I can only get the most recent date when I leave out the price and headerdetailID from the field selection.

Any help would be appreciated. Here is sample data & results.

invoiceHeadertable

invheaderID CustomerID InvoiceDate

1 40 1/1/2006

2 40 4/1/2006

3 80 3/1/2006

4 80 7/1/2006

5 80 8/12/2006

invoicedetailtable

invdetail ID invheaderID productcode price

11 1 AA 1.50

12 1 BB 1.30

13 1 CC 1.00

21 2 AA 1.40

23 2 CC 2.00

24 3 AA 2.00

25 3 CC 2.10

26 3 EE 1.10

27 4 AA 1.00

28 4 CC 2.00

29 4 EE 0.99

34 5 EE 1.55

CustomerTable

CustomerID Customername

40 johnCorp

80 maryCorp

Results

customer product most recent invoice(for this product) price

JohnCorp 40 AA 4/1/2006 1.40

JohnCorp 40 BB 1/1/2006 1.30

JohnCorp 40 CC 4/1/2006 2.00

maryCorp 80 AA 7/1/2006 1.40

maryCorp 80 CC 7 /1/2006 2.00

maryCorp 80 EE 8/12/2006 1.55

Something like this 'should' work for you. (Untested)

SELECT
dt.Customer,
dt.Product,
dt.InvoiceDate
d.Price
FROM InvoiceDetailTable d
JOIN ( SELECT
Customer,
Product,
InvoiceDate = max( InvoiceDate )
FROM InvoiceHeaderTable h
JOIN InvoiceDetailTable d
ON h.InvHeaderID = d.InvHeaderID
GROUP BY
Customer,
Product
) dt
ON ( d.Customer = dt.Customer
AND d.Product = dt.Product
AND d.InvoiceDate = dt.InvoiceDate
)

|||Crapola, didn't check your tables first. So ignore the prevoius exercise in 'egg on my face'.|||

IF there was no more than one invoice per day for a customer, you could, in the derived table (dt), include in the SELECT list:

InvHeaderID = max( InvHeaderID )

And then JOIN ON InvHeaderID instead of the three fields I indicated.

Of course, if there were two invoices for the same customer in a day, that would still not be the correct solution.

|||

Well, most of the time there would only be one invoice. But as is always the case, there can be an exception.

I had also wondered if the rank and partition function in sql server 2005 could apply, then one could just use a select query (if this is possible) to return the # 1 invoice per group but I have found no examples showing this used in a group by and where there are multiple tables involved.

Thanks

smhaig

|||

select

ct.CustomerName as Customer,

ct.CustomerID as CustomerID,

idt.ProductCode as Product,

iht.InvoiceDate as MostRecentInvoiceDate,

idt.Price as Price

from InvoiceDetailTable as idt

join InvoiceHeaderTable as iht

on idt.invheaderID = iht.invheaderID

join CustomerTable as ct

on ct.CustomerID = iht.CustomerID

where not exists

(select 1

from InvoiceDetailTable as idtx

join InvoiceHeaderTable as ihtx

on idtx.InvHeaderID = ihtx.InvHeaderID

where idtx.ProductCode = idt.ProductCode

and ihtx.CustomerID = iht.CustomerID

and ihtx.InvoiceDate > iht.InvoiceDate)

order by

ct.CustomerName,

idt.ProductCode

|||

-- Using SQL Server 2005

set nocount on
set dateformat mdy

create table invoiceHeadertable(invheaderID int,CustomerID int,InvoiceDate datetime)
insert into invoiceHeadertable(invheaderID ,CustomerID ,InvoiceDate )
select 1, 40, '1/1/2006' union all
select 2, 40, '4/1/2006' union all
select 3, 80, '3/1/2006' union all
select 4, 80, '7/1/2006' union all
select 5, 80, '8/12/2006'

create table invoicedetailtable(invdetailID int, invheaderID int, productcode char(2), price decimal(5,2))
insert into invoicedetailtable(invdetailID , invheaderID , productcode , price )
select 11, 1, 'AA', 1.50 union all
select 12, 1, 'BB', 1.30 union all
select 13, 1, 'CC', 1.00 union all
select 21, 2, 'AA', 1.40 union all
select 23, 2, 'CC', 2.00 union all
select 24, 3, 'AA', 2.00 union all
select 25, 3, 'CC', 2.10 union all
select 26, 3, 'EE', 1.10 union all
select 27, 4, 'AA', 1.00 union all
select 28, 4, 'CC', 2.00 union all
select 29, 4, 'EE', 0.99 union all
select 34, 5, 'EE', 1.55


create table CustomerTable(CustomerID int, Customername varchar(10))
insert into CustomerTable(CustomerID , Customername )
select 40, 'johnCorp' union all
select 80, 'maryCorp';

with cte(customer,product,[most recent invoice(for this product)],price,rn)
as (
select c.Customername,
d.productcode,
h.InvoiceDate,
d.price,
rank() over(partition by c.Customername,d.productcode order by h.InvoiceDate desc)
from CustomerTable c
inner join invoiceHeadertable h on h.CustomerID=c.CustomerID
inner join invoicedetailtable d on d.invheaderID=h.invheaderID
)
select customer,
product,
[most recent invoice(for this product)],
price
from CTE
where rn=1
order by customer,product

|||

I want to thank Mark and Ron for their solutions and Arnie for getting me to think about two invoices on the same day for the same product and customer (which do exist actually).

I found these solutions to be on the level of advanced lessons for me to study. I have always had trouble with group by when I needed a unique ID on a table where I was selecting a max or min or first one, etc. on another field in the same table.

I have not found any good examples other than very basic ones for rank and partition so if anyone has a good site let me know. Meanwhile I will study what I have as I now have two great ways to solve my problem.

I did not give a duplicate item with my sample data so I will see how this sql 2005 query deals with this. I seem to remember something about ties and ranking and perhaps I could also use select distinct when I select the rank = 1.

The second standard sql query (Ron's) shows me both invoices when there are 2 on same date. It may be that this is the way the data should be displayed if the prices are different, so I will need to check further on that and see if I can tweak these 2 queries to deal with that.

Thank you all again

smHaig

|||

And my thanks to Mark for demonstrating the more modern solution of the two!

Ron

No comments:

Post a Comment