To the contrary, solution 1 and 2 are your best bet. Solution 3 can be considered when the update/creation frequency is very less compared to read frequency of projects and users as even though to update/create, it requires two queries, the ease of reading will be make up for that.
To choose among solution 1 and 2, you need to consider the read frequencies. Will you need projects of a user or uses of a project more frequently and choose according to that. If you feel both are of relatively the same frequency, it is better to keep the user object as less clustered as possible. Whatever option you choose, consider keeping an index
on the array storing the _id
s (of projects or users).
For ex.
userSchema = new Schema(
{//otherstuff
project_ids: [{type: Schema.Types.ObjectId, ref: 'Project'}})
...
})
userSchema.index({'project_ids':1})
or
projectSchema = new Schema(
{//otherstuff
user_ids: [{type: Schema.Types.ObjectId, ref: 'User'}})
...
})
projectSchema.index({'user_ids':1})
Keeping an index on the array of _id
will vastly improve your queries' speed on the side where you fear there will be significant overhead.
But keep the index
only if this relation is an important relation with a lot of queries going on. If this is just a side feature of your project, you can do without
an index too.
If the user can do lots of stuff and has lots of relations, you will be needing that user object constantly throughout your app, so if your app isn't project specific, it would be better to not put the project ids in the user schema. But as we are just putting the ids, it isn't much of an overhead anyway. No need to worry about that.
Reg index on both the arrays: Yes you can ofcourse. But when you go for solution 3, you don't need an index at all as you won't be doing a query to get the list of projects of a user or the list of users in a project. Solution 3 makes reading very easy but writing a bit cumbersome. But as you mentioned that your use case involves reading>>writing
, go with solution 3 but there's always a danger of data inconsistency which you need to take care of.
Indexing just makes things faster. Go through the docs and do a bit of googling. Nothing fancy. Querying over indexed arrays is efficient than normal arrays. For ex. Let us assume you use solution 2.
Store the project ids in the project_ids field.
You can get the projects of a user easily. This is straight forward.
But to get users of project1. You need to a query like this.
User.find({project_ids:project._id},function(err,docs){
//here docs will be the list of the users of project1
})
//The above query might be slow if the user base is large.
//But it can be improved vastly by indexing the project_ids field in the User schema.
Similary for solution 1. Each project has user_ids field.Let us assume we have a user1.
To get the projects of user we do the folowing query
Project.find({user_ids:user1._id},function(err,docs){
//here docs will be the projects of user1
//But it can be improved vastly by indexing the user_ids field in the Project schema.
If you are pondering over solution 1 vs solution 2, solution 1 is better I guess. There might be cases where you need user without his projects but the chances of requiring the project without users is pretty low. But it depends on your exact use case.