Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

apache spark - Does Hive preserve file order when selecting data

If I do select * from table1; in which order data will retrieve

File order Or random order

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Without ORDER BY the order is not guaranteed.

Data is being read in parallel by many processes (mappers), after splits were calculated, each process starts reading some piece of file or few files, depending on splits calculated.

All parallel processes can process different volume of data and running on different nodes, the load is not the same each time, so they start returning rows and finishing at different times, depending on too many factors, such as node load, network load, volume of data per process, etc, etc.

Removing all this factors you can increase the order prediction accuracy. Say, single thread sequential file read may return rows in the same order as they are in the file. But this is not how the database works.

Also according to Codd's relational theory, the order of columns and rows is immaterial.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...