As you noted, this isn't scalable and your costs are going to skyrocket since you pay per read/write on Firestore, so I would recommend rethinking your architecture.
I solved a similar problem several years ago for an App Engine website that needed to generate sitemaps for millions of dynamically created pages and it was so efficient that it never exceeded the free tier's limits.
Step 1: Google Storage instead of Firestore
When a page is created, append that URL to a text file in a Google Storage bucket on its own line. If your URLs have a unique ID you can use that to search and replace existing URLs.
https://www.example.com/foo/some-long-title
https://www.example.com/bar/some-longer-title
If may be helpful to break the URLs into smaller files. If some URLs start with /foo
and others start with /bar
I'd create at least two files called sitemap_foo.txt
and sitemap_bar.txt
and store the URLs into their respective files.
Step 2: Dynamically Generate Sitemap Index
Instead of a normal enormous XML sitemap, create a sitemap index that points to your multiple sitemap files.
When /sitemap.xml is visited have the following index generated by looping through the sitemap files in your bucket and listing them like this:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://storage.google...../sitemap_foo.txt</loc>
</sitemap>
<sitemap>
<loc>https://storage.google...../sitemap_bar.txt</loc>
</sitemap>
</sitemapindex>
Step 3: Remove Broken URLs
Update your 404 controller to search and remove the URL from your sitemap if found.
Summary
With the above system you'll have a scalable, reliable and efficient sitemap generation system that will probably cost you little to nothing to operate.
Answers to your questions
Q: How many URLs can you have in a sitemap?
A: According to Google, 50,000 or 50MB uncompressed.
Q: Do I need to update my sitemap everytime I add a new user/post/page?
A: Yes.
Q: How do you write to a single text file without collisions?
A: Collisions are possible, but how many new pages/posts/users are being created per second? If more than one per second I would create a Pub/Sub topic with a function that drains it to update the sitemaps in batches. Otherwise I'd just have it update directly.
Q: Let's say I created a sitemap_users.txt for all of my users...
A: Depending on how many users you have, it may be wise to break it up even further to group them into users per month/week/day. So you'd have sitemap_users_20200214.txt that would contain all users created that day. That would most likely prevent the 50,000 URL limit.