If the user_resources
(t1) was a 'normalized table' with one row for each user => resource
combination then the query to get the answer would be as simple as just joining
the tables together.
Alas, it is denormalized
by having the resources
column as a: 'list of resource id' separated by a ';' character.
If we could convert the 'resources' column into rows then a lot of the difficulties go away as the table joins become simple.
The query to generate the output asked for:
SELECT user_resource.user,
resource.data
FROM user_resource
JOIN integerseries AS isequence
ON isequence.id <= COUNT_IN_SET(user_resource.resources, ';') /* normalize */
JOIN resource
ON resource.id = VALUE_IN_SET(user_resource.resources, ';', isequence.id)
ORDER BY
user_resource.user, resource.data
The Output:
user data
---------- --------
sampleuser abcde
sampleuser azerty
sampleuser qwerty
stacky qwerty
testuser abcde
testuser azerty
How:
The 'trick' is to have a table that contains the numbers from 1 to some limit. I call it integerseries
. It can be used to convert 'horizontal' things such as: ';' delimited strings
into rows
.
The way this works is that when you 'join' with integerseries
, you are doing a cross join
, which is what happens 'naturally' with 'inner joins'.
Each row gets duplicated with a different 'sequence number' from the integerseries
table which we use as an 'index' of the 'resource' in the list that we want to use for that row
.
The idea is to:
- count the number of items in the list.
- extract each item based on its the position in the list.
- Use
integerseries
to convert one row into a set of rows extracting the individual 'resource id' from user
.resources
as we go along.
I decided to use two functions:
function that given a 'delimited string list' and an 'index' will return the value at the position in the list. I call it: VALUE_IN_SET
. i.e. given 'A;B;C' and an 'index' of 2 then it returns 'B'.
function that given a 'delimited string list' will return the count of the number of items in the list. I call it: COUNT_IN_SET
. i.e. given 'A;B;C' will return 3
It turns aout that those two functions and integerseries
should provide a general solution to delimited items list in a column
.
Does it work?
The query to create a 'normalized' table from a ';' delimited string in column
. It shows all the columns, including the generated values due to the 'cross_join' (isequence.id
as resources_index
):
SELECT user_resource.user,
user_resource.resources,
COUNT_IN_SET(user_resource.resources, ';') AS resources_count,
isequence.id AS resources_index,
VALUE_IN_SET(user_resource.resources, ';', isequence.id) AS resources_value
FROM
user_resource
JOIN integerseries AS isequence
ON isequence.id <= COUNT_IN_SET(user_resource.resources, ';')
ORDER BY
user_resource.user, isequence.id
The 'normalized' table output:
user resources resources_count resources_index resources_value
---------- --------- --------------- --------------- -----------------
sampleuser 1;2;3 3 1 1
sampleuser 1;2;3 3 2 2
sampleuser 1;2;3 3 3 3
stacky 2 1 1 2
testuser 1;3 2 1 1
testuser 1;3 2 2 3
Using the above 'normalized' user_resources
table, it is a simple join to provide the output required:
The functions needed (these are general functions that can be used anywhere)
note: The names of these functions are related to the mysql FIND_IN_SET function. i.e. they do similar things as regards string lists?
The COUNT_IN_SET
function: returns the count of character delimited items
in the column.
DELIMITER $$
DROP FUNCTION IF EXISTS `COUNT_IN_SET`$$
CREATE FUNCTION `COUNT_IN_SET`(haystack VARCHAR(1024),
delim CHAR(1)
) RETURNS INTEGER
BEGIN
RETURN CHAR_LENGTH(haystack) - CHAR_LENGTH( REPLACE(haystack, delim, '')) + 1;
END$$
DELIMITER ;
The VALUE_IN_SET
function: treats the delimited list
as a one based array
and returns the value at the given 'index'.
DELIMITER $$
DROP FUNCTION IF EXISTS `VALUE_IN_SET`$$
CREATE FUNCTION `VALUE_IN_SET`(haystack VARCHAR(1024),
delim CHAR(1),
which INTEGER
) RETURNS VARCHAR(255) CHARSET utf8 COLLATE utf8_unicode_ci
BEGIN
RETURN SUBSTRING_INDEX(SUBSTRING_INDEX(haystack, delim, which),
delim,
-1);
END$$
DELIMITER ;
Related Information:
The tables (with data):
CREATE TABLE `integerseries` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=500 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
/*Data for the table `integerseries` */
insert into `integerseries`(`id`) values (1);
insert into `integerseries`(`id`) values (2);
insert into `integerseries`(`id`) values (3);
insert into `integerseries`(`id`) values (4);
insert into `integerseries`(`id`) values (5);
insert into `integerseries`(`id`) values (6);
insert into `integerseries`(`id`) values (7);
insert into `integerseries`(`id`) values (8);
insert into `integerseries`(`id`) values (9);
insert into `integerseries`(`id`) values (10);
Resource:
CREATE TABLE `resource` (
`id` int(11) NOT NULL,
`data` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
/*Data for the table `resource` */
insert into `resource`(`id`,`data`) values (1,'abcde');
insert into `resource`(`id`,`data`) values (2,'qwerty');
insert into `resource`(`id`,`data`) values (3,'azerty');
User_resource:
CREATE TABLE `user_resource` (
`user` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`resources` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`user`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
/*Data for the table `user_resource` */
insert into `user_resource`(`user`,`resources`) values ('sampleuser','1;2;3');
insert into `user_resource`(`user`,`resources`) values ('stacky','3');
insert into `user_resource`(`user`,`resources`) values ('testuser','1;3');