Non-overlapping ranges in sequential data

From Hashmysql
Jump to: navigation, search

Consider the following structure:

CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL,
  `v` int(11) DEFAULT NULL
) ENGINE=InnoDB;

And the following data:

INSERT INTO `t1` VALUES (1,1),(1,2),(2,3),(2,4),(2,5),(1,6),(1,7);
+----+---+
| id | v |
+----+---+
|  1 | 1 |
|  1 | 2 |
|  2 | 3 |
|  2 | 4 |
|  2 | 5 |
|  1 | 6 |
|  1 | 7 |
+----+---+

hash-mysql regulars inquired if it would be possible to extract sequential non-overlapping ranges, in relation for each `id`. Here is a sample output, based on the test data:

+----+----------+----------+
| id | min(a.v) | max(a.v) |
+----+----------+----------+
|  1 |        1 |        2 |
|  2 |        3 |        5 |
|  1 |        6 |        7 |
+----+----------+----------+

A proposed solution (thanks Jon!) would be to create two derived tables, each extracting the previous and next element in the chain:

-- Pair the min with the closest max.
SELECT v1.v      AS r1
     , MIN(v2.v) AS r2
     , v1.id     AS id
     , (MIN(v2.v) - v1.v) AS xrange
  FROM (
     SELECT t1.*
       FROM t1
       LEFT JOIN t1 AS t2
         ON t1.v-1 = t2.v
        AND t1.id  = t2.id
      WHERE t2.id IS NULL
       ) v1
  JOIN (
     SELECT t1.*
       FROM t1
       LEFT JOIN t1 AS t2
         ON t1.v+1 = t2.v
        AND t1.id  = t2.id
      WHERE t2.id IS NULL
       ) v2
    ON v1.v <= v2.v
   AND v1.id = v2.id
 GROUP BY id
        , r1
 ORDER BY r1
        , xrange DESC

The output from this solution would look like:

+----+----+----+--------+
| R1 | R2 | ID | XRANGE |
+----+----+----+--------+
|  1 |  2 |  1 |      1 |
|  3 |  5 |  2 |      2 |
|  6 |  7 |  1 |      1 |
+----+----+----+--------+

Running each derived table separately shows the individual logic blocks:

This solution would also cover overlapping ranges.

Window functions, such as lead and lag can be used to handle this a little more cleanly. Unfortunately, MySQL does not implement them, so the derived table approach is required.