Hopefully someone can see what I'm doing wrong, but here's the story...
My current site URL's are auto-generated by the ecommerce software from the product and category names, therefore if the product/category name includes a non-alphanumeric characer, this is encoded in the URL which is a pain. EG:
mysite.com/Shop/Furniture-Set-Large-Table%2C-4-Chairs.html
I am moving to a new ecommerce solution, which also autogenerates the URL's from the product name, but is clever enough to remove all non-alphanumeric characters. It also converts to lowercase, which I have managed to find a htaccess solution for redirecting uppercase to lowercase. It also does not have the 'Shop' part of the URL, which I have also managed to solve via htaccess. EG:
mysite.com/furniture-set-large-table-4-chairs.html
To remove the 'Shop' part:
RedirectMatch 301 ^/Shop/(.*)$ http://www.mysite.com/$1
To replace uppercase with lowercase to prevent a 404 error:
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{REQUEST_FILENAME} !.(?:png|gif|ico|swf|jpg|jpeg|js|css|php|pdf)$
RewriteRule (.*) ${lc:http://www.mysite.com/$1} [R=301,L]
These both work perfectly.
So I need an htaccess rule, or possibly several, to remove these encoded characters from the URL. I don't need to replace them, just remove them, because the software creates the URL as "Table%2C-4-Chairs" - so only the %2C needs removed.
I need to remove certain character encodings from the URL, such as:
comma (%2C), apostrophe (%27), colon (%3A), etc.
Can anyone advise a suitable htaccess rule or rules for this?
Thanks in advance.
See Question&Answers more detail:
os