1) Separate the SQL dump file to two parts: creation and data
grep -n "Dumping data" enwiki-latest-redirect.sql
This will return the line number separating the creation and data of the table parts of the file. Let it be 46.
tail -n +46 enwiki-latest-redirect.sql > redirect -data.sql
The number we get in the last command is used as the input to tail
to get the data part of the file.
head -46 enwiki-latest-redirect.sql > redirect-creation.sql
The number we get in the first command is used as the input to head
to get the table creation part of the file.
2) Remove indexes from the redirect-creation.sql
The following is the original CREATE TABLE
statement in the dump:
CREATE TABLE `redirect` (
`rd_from` int(8) unsigned NOT NULL DEFAULT '0',
`rd_namespace` int(11) NOT NULL DEFAULT '0',
`rd_title` varbinary(255) NOT NULL DEFAULT '',
`rd_interwiki` varbinary(32) DEFAULT NULL,
`rd_fragment` varbinary(255) DEFAULT NULL,
PRIMARY KEY (`rd_from`),
KEY `rd_ns_title` (`rd_namespace`,`rd_title`,`rd_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary;
The following is the desired version after removing the indexes:
CREATE TABLE `redirect` (
`rd_from` int(8) unsigned NOT NULL DEFAULT '0',
`rd_namespace` int(11) NOT NULL DEFAULT '0',
`rd_title` varbinary(255) NOT NULL DEFAULT '',
`rd_interwiki` varbinary(32) DEFAULT NULL,
`rd_fragment` varbinary(255) DEFAULT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=binary;
Edit the page-creation.sql to remove indexes as shown above.
3) Load the created files
mysql -u root -p wikipedia < redirect-creation.sql
mysql -u root -p wikipedia < redirect-data.sql
4) Create indexes on fields you want after loading the data
create index idx_redirect_rd_from on redirect(rd_from)
I am able to load the categorylinks table in under an hour on my Macbook Pro by following the steps above.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…