Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
466 views
in Technique[技术] by (71.8m points)

save IP address in mongoDB

Currently in order to save an IP address I am converting it to number and store it in the collection. Basically I am doing this for logging purposes. This means that I care to store information as fast as possible and with smallest amount of space.

I will be rarely using it for querying.

My ideas that

  • Storing as strings is for sure inefficient.
  • Storing as 4 digits will be slower and will take more space.

Nonetheless I think that this is an adequate method, but is there a better one for my purpose?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Definitely save IP addresses as numbers, if you don't mind the extra bit of work that it takes, especially if you need to do queries on the addresses and you have large tables/collections.

Here's why:

Storage

  • An IPv4 address is 4 bytes if stored as unsigned integer.
  • An IPv4 address varies between 10 bytes and 18 bytes when written out as a string in dotted octed form. (Let's assume the average is 14 bytes.)

That is 7-15 bytes for the characters, plus 2-3 bytes if you're using a variable length string type, which varies based on the database you're using. If you have a fixed length string representation available, then you must use a 15-character fixed width field.

Disk storage is cheap, so that's not a factor in most use cases. Memory, however, is not as cheap, and if you have a large table/collection and you want to do fast queries, then you need an index. The 2-3x storage penalty of string encoding drastically reduces the amount of records you can index while still keeping the index resident in memory.

  • An IPv6 address is 16 bytes if stored as an unsigned integer. (Likely as multiple 4 or 8 byte integers, depending on your platform.)
  • An IPv6 address ranges from 6 bytes to 42 bytes when encoded as a string in abbreviated hex notation.

On the low end, a loop back address (::1) is 3 bytes plus the variable length string overhead. On the high end, an address like 2002:4559:1FE2:1FE2:4559:1FE2:4559:1FE2 uses 39 bytes plus the variable length string overhead.

Unlike with IPv4, it's not safe to assume the average IPv6 string length will be mean of 6 and 42, because the number of addresses with a significant number of consecutive zeroes is a very small fraction of the overall IPv6 address space. Only some special addresses, like loopback and autoconf addresses, are likely to be compressible in this way.

Again, this is a storage penalty of >2x for string encoding versus integer encoding.

Network Math

Do you think routers store IP addresses as strings? Of course they don't.

If you need to do network math on IP addresses, the string representation is a hassle. E.g. if you want to write a query that searches for all addresses on a specific subnet ("return all records with an IP address in 10.7.200.104/27", you can easily do this by masking an integer address with an integer subnet mask. (Mongo doesn't support this particular query, but most RDBMS do.) If you store addresses as strings, then your query will need to convert each row to an integer, then mask it, which is several orders of magnitude slower. (Bitwise masking for an IPv4 address can be done in a few CPU cycles using 2 registers. Converting a string to an integer requires looping over the string.)

Similarly, range queries ("return all records all records between 192.168.1.50 and 192.168.50.100") with integer addresses will be able to use indexes, whereas range queries on string addresses will not.

The Bottom Line

It takes a little bit more work, but not much (there are a million aton() and ntoa() functions out there), but if you're building something serious and solid and you want to future-proof it against future requirements and the possibility of a large dataset, you should store IP addresses as integers, not strings.

If you're doing something quick and dirty and don't mind the possibility of remodeling in the future, then use strings.

For the OP's purpose, if you are optimizing for speed and space and you don't think you want to query it often, then why use a database at all? Just print IP addresses to a file. That would be faster and more storage efficient than storing it in a database (with associated API and storage overhead).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...