Your schema is looking fairly good. There’s no need for the ID column in your join table—just create a primary key from the ID columns of the other tables (although see Marjan Venema's comment and Should I use composite primary keys or not? for alternative views on this). The following examples show how you can create the tables, add some data, and perform the queries that you requested.
Create tables, complete with foreign key constraints. In short, foreign key constraints help to ensure database integrity. In this example, they prevent items being inserted in the join table (item_tag
), if there are no matching items in the item
and tag
tables:
CREATE TABLE IF NOT EXISTS `item` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`item` VARCHAR(255) NOT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `tag` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`tag` VARCHAR(255) NOT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `item_tag` (
`item_id` INT UNSIGNED NOT NULL ,
`tag_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`item_id`, `tag_id`) ,
INDEX `fk_item_tag_item` (`item_id` ASC) ,
INDEX `fk_item_tag_tag` (`tag_id` ASC) ,
CONSTRAINT `fk_item_tag_item`
FOREIGN KEY (`item_id` )
REFERENCES `item` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_item_tag_tag`
FOREIGN KEY (`tag_id` )
REFERENCES `tag` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
Insert some test data:
INSERT INTO item (item) VALUES
('spaniel'),
('tabby'),
('chicken'),
('goldfish');
INSERT INTO tag (tag) VALUES
('bird'),
('pet'),
('dog'),
('cat'),
('reptile'),
('fish'),
('delicious'),
('cheap'),
('expensive');
INSERT INTO item_tag (item_id, tag_id) VALUES
(1,2),
(1,3),
(1,8),
(2,2),
(2,4),
(3,1),
(3,7),
(4,2),
(4,6),
(4,8);
Select all items and all tags:
SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id;
+----+----------+-----------+
| id | item | tag |
+----+----------+-----------+
| 1 | spaniel | pet |
| 1 | spaniel | dog |
| 1 | spaniel | cheap |
| 2 | tabby | pet |
| 2 | tabby | cat |
| 3 | chicken | bird |
| 3 | chicken | delicious |
| 4 | goldfish | pet |
| 4 | goldfish | fish |
| 4 | goldfish | cheap |
+----+----------+-----------+
Select items with a specific tag:
SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag = 'pet';
+----+----------+-----+
| id | item | tag |
+----+----------+-----+
| 1 | spaniel | pet |
| 2 | tabby | pet |
| 4 | goldfish | pet |
+----+----------+-----+
Select items with one or more tags. Note that this will return items that have the tags cheap OR pet:
SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet');
+----+----------+-------+
| id | item | tag |
+----+----------+-------+
| 1 | spaniel | pet |
| 1 | spaniel | cheap |
| 2 | tabby | pet |
| 4 | goldfish | pet |
| 4 | goldfish | cheap |
+----+----------+-------+
The above query produces an answer that you might not want, as highlighted by the following query. In this case, there are no items with the house tag, but this query still returns some rows:
SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'house');
+----+----------+-------+
| id | item | tag |
+----+----------+-------+
| 1 | spaniel | cheap |
| 4 | goldfish | cheap |
+----+----------+-------+
You can fix that by adding GROUP BY
and HAVING
:
SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'house')
GROUP BY item.id HAVING COUNT(*) = 2;
Empty set (0.00 sec)
GROUP BY
causes all items with the same id (or whatever column you specify) to be grouped together into a single row, effectively removing duplicates. HAVING COUNT
limits the results to those where the count of the matching grouped rows is equal to two. That ensures that only items with two tags are returned - note that this value must match the number of tags specified in the IN
clause. Here’s an example that produces something:
SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet')
GROUP BY item.id HAVING COUNT(*) = 2;
+----+----------+-----+
| id | item | tag |
+----+----------+-----+
| 1 | spaniel | pet |
| 4 | goldfish | pet |
+----+----------+-----+
Note that in the previous example, the items have been grouped together so that you don’t get duplicates. In this case, there’s no need for the tag
column, as that just confuses the results—you already know what tags there are, as you have requested items with those tags. You can therefore simplify things a little by removing the tag
column from the query:
SELECT item.id, item.item
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet')
GROUP BY item.id HAVING COUNT(*) = 2;
+----+----------+
| id | item |
+----+----------+
| 1 | spaniel |
| 4 | goldfish |
+----+----------+
You could go a step further, and use GROUP_CONCAT
to provide a list of matching tags. This might be handy where you want a list of items that have one or more of the specified tags, but not necessarily all of them:
SELECT item.id, item.item, GROUP_CONCAT(tag.tag) AS tags
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet', 'bird', 'cat')
GROUP BY id;
+----+----------+-----------+
| id | item | tags |
+----+----------+-----------+
| 1 | spaniel | pet,cheap |
| 2 | tabby | pet,cat |
| 3 | chicken | bird |
| 4 | goldfish | pet,cheap |
+----+----------+-----------+
One problem with the above schema design is that it is possible to enter duplicate items and tags. That is, you could insert bird into the tag
table as many times as you like, and this is not good. One way to fix that is to add a UNIQUE INDEX
to the item
and tag
columns. This has the added benefit of helping to speed up queries which rely on these columns. The updated CREATE TABLE
commands now look like this:
CREATE TABLE IF NOT EXISTS `item` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`item` VARCHAR(255) NOT NULL ,
UNIQUE INDEX `item` (`item`) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
CREATE TABLE IF NOT EXISTS `tag` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`tag` VARCHAR(255) NOT NULL ,
UNIQUE INDEX `tag` (`tag`) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
Now if you try to insert a duplicate value, MySQL will prevent you from doing so:
INSERT INTO tag (tag) VALUES ('bird');
ERROR 1062 (23000): Duplicate entry 'bird' for key 'tag'