Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
75 views
in Technique[技术] by (71.8m points)

Database choice for large data volume?

I'm about to start a new project which should have a rather large database.

The number of tables will not be large (<15), majority of data (99%) will be contained in one big table, which is almost insert/read only (no updates).

The estimated amount of data in that one table is going to grow at 500.000 records a day, and we should keep at least 1 year of them to be able to do various reports.

There needs to be (read-only) replicated database as a backup/failover, and maybe for offloading reports in peak time.

I don't have first hand experience with that large databases, so I'm asking the ones that have which DB is the best choice in this situation. I know that Oracle is the safe bet, but am more interested if anyone have experience with Postgresql or Mysql with similar setup.

question from:https://stackoverflow.com/questions/629445/database-choice-for-large-data-volume

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I've used PostgreSQL in an environment where we're seeing 100K-2M new rows per day, most added to a single table. However, those rows tend to be reduced to samples and then deleted within a few days, so I can't speak about long-term performance with more than ~100M rows.

I've found that insert performance is quite reasonable, especially if you use the bulk COPY. Query performance is fine, although the choices the planner makes sometimes puzzle me; particularly when doing JOINs / EXISTS. Our database requires pretty regular maintenance (VACUUM/ANALYZE) to keep it running smoothly. I could avoid some of this by more carefully optimizing autovacuum and other settings, and it's not so much of an issue if you're not doing many DELETEs. Overall, there are some areas where I feel it's more difficult to configure and maintain than it should be.

I have not used Oracle, and MySQL only for small datasets, so I can't compare performance. But PostgreSQL does work fine for large datasets.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...