Friday, March 30, 2012

Grouping Records & Assigning Sequential Number

I need to group records and assign a setid to the group. I have a
table with data that looks like this

ColA ColB
94015 01065
94016 01065
94015 01085
94015 01086
33383 00912
32601 00912

I need to create a resultset using just sql to look like this

ColA ColB GRP
94015 01065 1
94016 01065 1
94015 01085 1
94015 01086 1
33383 00912 2
32601 00912 2

The tricky part is resolving the many to many issue. A value in ColA
can belong to multiple values in ColB and a value in ColB can have
multiple values in ColA.Please explain the logic that determines GRP. What rule makes the first four
rows GRP=1 and the next two GRP=2 ?

--
David Portas
SQL Server MVP
--|||"cjm" <cjm136@.optonline.net> wrote in message news:62be3d63.0402120756.f08195b@.posting.google.co m...
> I need to group records and assign a setid to the group. I have a
> table with data that looks like this
> ColA ColB
> 94015 01065
> 94016 01065
> 94015 01085
> 94015 01086
> 33383 00912
> 32601 00912
> I need to create a resultset using just sql to look like this
> ColA ColB GRP
> 94015 01065 1
> 94016 01065 1
> 94015 01085 1
> 94015 01086 1
> 33383 00912 2
> 32601 00912 2
> The tricky part is resolving the many to many issue. A value in ColA
> can belong to multiple values in ColB and a value in ColB can have
> multiple values in ColA.

Not completely sure I understand your grouping criteria but hopefully
this is helpful.

CREATE TABLE T
(
colA VARCHAR(10) NOT NULL,
colB VARCHAR(10) NOT NULL,
PRIMARY KEY (colA, colB)
)

INSERT INTO T (colA, colB)
VALUES ('94015', '01065')
INSERT INTO T (colA, colB)
VALUES ('94016', '01065')
INSERT INTO T (colA, colB)
VALUES ('94015', '01085')
INSERT INTO T (colA, colB)
VALUES ('94015', '01086')
INSERT INTO T (colA, colB)
VALUES ('33383', '00912')
INSERT INTO T (colA, colB)
VALUES ('32601', '00912')

SELECT T.colA, T.colB, B.grp
FROM (SELECT B1.colB, COUNT(*) AS grp
FROM (SELECT colB
FROM (SELECT colA, MIN(colB) AS colB
FROM T
GROUP BY colA) AS A
GROUP BY colB) AS B1
INNER JOIN
(SELECT colB
FROM (SELECT colA, MIN(colB) AS colB
FROM T
GROUP BY colA) AS A
GROUP BY colB) AS B2
ON B2.colB <= B1.colB
GROUP BY B1.colB) AS B
INNER JOIN
(SELECT colA, MIN(colB) AS colB
FROM T
GROUP BY colA) AS A
ON A.colB = B.colB
INNER JOIN
T
ON T.colA = A.colA
ORDER BY B.grp, T.colB, T.colA

colA colB grp
32601 00912 1
33383 00912 1
94015 01065 2
94016 01065 2
94015 01085 2
94015 01086 2

Regards,
jag|||"John Gilson" <jag@.acm.org> wrote in message news:<W_7Xb.20407$Lp.1268@.twister.nyc.rr.com>...
> "cjm" <cjm136@.optonline.net> wrote in message news:62be3d63.0402120756.f08195b@.posting.google.co m...
Thanks JAG for the clever and clean solution!|||Is a given colB value allowed to belong to more than one group? If so then
John's solution looks good but I wasn't clear on this point from your sample
data.

Here's another solution that may or may not give the result you want (thanks
for the DDL and sample data John). I've added an extra row of sample data:

CREATE TABLE T
(
colA VARCHAR(10) NOT NULL,
colB VARCHAR(10) NOT NULL,
grp INTEGER NULL,
PRIMARY KEY (colA, colB)
)

INSERT INTO T (colA, colB)
VALUES ('94015', '01065')
INSERT INTO T (colA, colB)
VALUES ('94016', '01065')
INSERT INTO T (colA, colB)
VALUES ('94015', '01085')
INSERT INTO T (colA, colB)
VALUES ('94015', '01086')
INSERT INTO T (colA, colB)
VALUES ('33383', '00912')
INSERT INTO T (colA, colB)
VALUES ('32601', '00912')

INSERT INTO T (colA, colB)
VALUES ('32601', '01065')

John's query gives:

colA colB grp
---- ---- ----
32601 00912 1
33383 00912 1
32601 01065 1
94015 01065 2
94016 01065 2
94015 01085 2
94015 01086 2

Notice that 01065 appears in both groups. This iterative solution will put
all rows in the same group:

DECLARE @.grp INTEGER

UPDATE T
SET @.grp = grp = COALESCE(@.grp,0) + 1

WHILE @.@.ROWCOUNT>0
UPDATE T
SET grp =
(SELECT MIN(X.grp)
FROM T AS X
WHERE (T.colB = X.colB OR T.colA = X.colA) AND X.grp<T.grp)
WHERE EXISTS
(SELECT *
FROM T AS X
WHERE (T.colB = X.colB OR T.colA = X.colA) AND X.grp<T.grp)

Note that the group numbers using this method are not "sequential" and may
have gaps but it's not clear from your original post exactly what the
sequence should be (if any).

--
David Portas
SQL Server MVP
--|||"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message news:<jNadnasl3vRMQ7DdRVn-tw@.giganews.com>...
> Is a given colB value allowed to belong to more than one group? If so then
> John's solution looks good but I wasn't clear on this point from your sample
> data.
> Here's another solution that may or may not give the result you want (thanks
> for the DDL and sample data John). I've added an extra row of sample data:
> ...
There is no column called GRP in the table T and I don't want to alter
the table or create a temp table or otherwise UPDATE table T. How
would this be rewritten to return the result set as a query?

You made an important observation that a given colB value should
belong to only ONE group and I want to see the results of your code
with the record you added to the example. Sorry if this is a simple
conversion but I'm still learning.|||I'm not sure this is possible as a single query. It doesn't look like you
can avoid an iterative solution although you could turn it into a
table-valued function. I'm cross-posting to
microsoft.public.sqlserver.programming to see if anyone can come up with
better than this.

(http://groups.google.com/groups?sel...8195b%40posting.
google.com)

CREATE TABLE T(colA VARCHAR(10), colB VARCHAR(10) NOT NULL, PRIMARY KEY
(colA, colB))

INSERT INTO T (colA, colB) VALUES ('94015', '01065')
INSERT INTO T (colA, colB) VALUES ('94016', '01065')
INSERT INTO T (colA, colB) VALUES ('94015', '01085')
INSERT INTO T (colA, colB) VALUES ('94015', '01086')
INSERT INTO T (colA, colB) VALUES ('33383', '00912')
INSERT INTO T (colA, colB) VALUES ('32601', '00912')

/* Additional row makes it a single group: */
INSERT INTO T (colA, colB) VALUES ('32601', '01065')

GO

CREATE FUNCTION TGroupings ()
RETURNS @.t TABLE (colA VARCHAR(10) NOT NULL, colB VARCHAR(10) NOT NULL, grp
INTEGER NULL, PRIMARY KEY (colA,colB))

BEGIN
INSERT INTO @.t (colA, colB)
SELECT colA, colB
FROM T

DECLARE @.grp INTEGER

UPDATE @.t
SET @.grp = grp = COALESCE(@.grp,0) + 1

WHILE @.@.ROWCOUNT>0
UPDATE T
SET grp =
(SELECT MIN(X.grp)
FROM @.t AS X
WHERE (T.colB = X.colB OR T.colA = X.colA) AND X.grp<T.grp)
FROM @.t AS T
WHERE EXISTS
(SELECT *
FROM @.t AS X
WHERE (T.colB = X.colB OR T.colA = X.colA) AND X.grp<T.grp)

UPDATE T
SET grp =
(SELECT COUNT(DISTINCT grp)
FROM @.t AS X
WHERE grp <= T.grp)
FROM @.t AS T

RETURN
END

GO

SELECT * FROM TGroupings()

This is the result with your original test-data:

colA colB grp
---- ---- ----
32601 00912 1
33383 00912 1
94015 01065 2
94015 01085 2
94015 01086 2
94016 01065 2

(6 row(s) affected)

And this is it with my extra row added:

colA colB grp
---- ---- ----
32601 00912 1
32601 01065 1
33383 00912 1
94015 01065 1
94015 01085 1
94015 01086 1
94016 01065 1

(7 row(s) affected)

--
David Portas
SQL Server MVP
--

"cjm" <cjm136@.optonline.net> wrote in message
news:62be3d63.0402201048.537be689@.posting.google.c om...
> "David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:<jNadnasl3vRMQ7DdRVn-tw@.giganews.com>...
> > Is a given colB value allowed to belong to more than one group? If so
then
> > John's solution looks good but I wasn't clear on this point from your
sample
> > data.
> > Here's another solution that may or may not give the result you want
(thanks
> > for the DDL and sample data John). I've added an extra row of sample
data:
> > ...
> There is no column called GRP in the table T and I don't want to alter
> the table or create a temp table or otherwise UPDATE table T. How
> would this be rewritten to return the result set as a query?
> You made an important observation that a given colB value should
> belong to only ONE group and I want to see the results of your code
> with the record you added to the example. Sorry if this is a simple
> conversion but I'm still learning.sql

No comments:

Post a Comment