I have a table watchlist containing today almost 3Mil records.
mysql> select count(*) from watchlist;
+----------+
| count(*) |
+----------+
| 2957994 |
+----------+
It is used as a log to record product-page-views on a large e-commerce site (50,000+ products). It records the productID of the viewed product, the IP address and USER_AGENT of the viewer. And a timestamp of when it happens:
mysql> show columns from watchlist;
+-----------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+-------------------+-------+
| productID | int(11) | NO | MUL | 0 | |
| ip | varchar(16) | YES | | NULL | |
| added_on | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| agent | varchar(220) | YES | MUL | NULL | |
+-----------+--------------+------+-----+-------------------+-------+
The data is then reported on several pages throughout the site on both the back-end (e.g. checking what GoogleBot is indexing), and front-end (e.g. a side-bar box for "Recently Viewed Products" and a page showing users what "People from your region also liked" etc.).
So that these "report" pages and side-bars load quickly I put indexes on relevant fields:
mysql> show indexes from watchlist;
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| watchlist | 1 | added_on | 1 | added_on | A | NULL | NULL | NULL | | BTREE | |
| watchlist | 1 | productID | 1 | productID | A | NULL | NULL | NULL | | BTREE | |
| watchlist | 1 | agent | 1 | agent | A | NULL | NULL | NULL | YES | BTREE | |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Without the INDEXES, pages with the side-bar for example would spend about 30-45sec executing a query to get the 7 most-recent ProductIDs. With the indexes it takes <0.2sec.
The problem is that with the INDEXES the product pages themselves are taking longer and longer to load because as the table grows the write operations are taking upwards of 5sec. In addition there is a spike on the mysqld process amounting to 10-15% of available CPU each time a product page is viewed (roughly once every 2sec). We already had to upgrade the server hardware because on a previous server it was reaching 100% and caused mysqld to crash.
My plan is to attempt a 2-table solution. One table for INSERT operations, and another for SELECT operations. I plan to purge the INSERT table whenever it reaches 1000 records using a TRIGGER, and copy the oldest 900 records into the SELECT table. The report pages are a mixture of real-time (recently viewed) and analytics (which region), but the real-time pages tend to only need a handful of fresh records while the analytical pages don't need to know about the most recent trend (i.e. last 1000 views). So I can use the small table for the former and the large table for the latter reports.
My question: Is this an ideal solution to this problem?
Also: With TRIGGERS in MySQL is it possible to nice the trigger_statement so that it takes longer, but doesn't consume much CPU? Would running a cron job every 30min that is niced, and which performs the purging if required be a better solution?
See Question&Answers more detail:
os