Bucketed table is an outdated concept.
You do not need to write CLUSTERED BY in table DDL.
When loading table use distribute by partition key
to reduce pressure on reducers especially when writing ORC, which requires intermediate buffers for building ORC and if each reducer loads many partitions it may cause OOM exception.
When the table is big, you can limit the max file size using bytes.per.reducer like this:
set hive.exec.reducers.bytes.per.reducer=67108864;--or even less
If you have more data, more reducers will be started, more files created. This is more flexible than loading fixed number of buckets.
This will also work better because for small tables you do not need to create smaller buckets.
ORC has internal indexes and bloom filters. Applying SORT you can improve index and bloom filters efficiency because all similar data will be stored together. Also this can improve compression depending on your data enthropy.
If distribution by partition key is not enough because you have some data skew and the data is big, you can additionally distribute by random. It is better to distribute by column if you have evenly distributed data. If not, distribute by random to avoid single long running reducer problem.
Finally your insert statement may look loke this:
set hive.exec.reducers.bytes.per.reducer=33554432; --32Mb per reducer
INSERT OVERWRITE TABLE trades PARTITION (dt)
SELECT trade_id, name, contract_type, ts, dt
FROM raw_trades
DISTRIBUTE BY dt, --partition key is a must for big data
trade_id, --some other key if the data is too big and key is
--evenly distributed (no skew)
FLOOR(RAND()*100.0)%20 --random to distribute additionally on 20 equal parts
SORT BY contract_type; --sort data if you want filtering by this key
--to work better using internal index
Do not use CLUSTERED BY in table DDL because using DISTRIBUTE BY, ORC w indexes and bloom filters + SORT during insert you can achieve the same in more flexible way.
Distribute + sort can reduce the size of ORC files extremely by x3 or x4 times. Similar data can be better compressed and makes internal indexes more efficient.
Read also this: https://stackoverflow.com/a/55375261/2700344
This is related answer about about sorting: https://stackoverflow.com/a/47416027/2700344