Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
158 views
in Technique[技术] by (71.8m points)

c++ - Hash a line from a file MD5 in Rcpp

I need to open a file, read a long line, produce a hash (MD5 is fine) and write that hash to a file. The file has millions of lines and a hashing function is something ideally suited to C++.

Opening the files, reading line by line and writing to file has been completed but the hash function..

I have found some hashing code but shoehorning this code into Rcpp just doesn't seem to work. I have also used the R function within Rcpp but for some reason the result is truncated creating a lot of collisions. The result also changes each time it is ran but stays the same when the same string is ran in R, the result is also full length in R.

Has anyone successfully implemented a line by line MD5 (or SHA doesn't matter) hash with Rcpp?

question from:https://stackoverflow.com/questions/65838609/hash-a-line-from-a-file-md5-in-rcpp

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here is complete answer, relying on two other StackOverflow answers:

  • one for one of a bazillion possible ways to read a file line by line and write the result out
  • and one for the somewhat for interesting part of getting md5 from Boost so that we don't have to link.

This now works with one simple sourceCpp() call provided you have CRAN package BH installed.

Demo

> sourceCpp("~/git/stackoverflow/65838609/answer.cpp")
> hasher("/home/edd/git/stackoverflow/65838609/answer.cpp", "/tmp/md5.txt")
[1] TRUE
> 

Code

#define BOOST_NO_AUTO_PTR

#include <Rcpp.h>
#include <fstream>
#include <boost/uuid/detail/md5.hpp>
#include <boost/algorithm/hex.hpp>

// [[Rcpp::depends(BH)]]

using boost::uuids::detail::md5;

// [[Rcpp::export]]
bool hasher(const std::string& infilename, const std::string& outfilename) {

  // cf https://stackoverflow.com/questions/48545330/c-fastest-way-to-read-file-line-by-line
  //    https://stackoverflow.com/questions/55070320/how-to-calculate-md5-of-a-file-using-boost
  std::ifstream infile(infilename);
  std::ofstream outfile(outfilename);
  std::string line;
  while (std::getline(infile, line)) {
    // line contains the current line
    md5 hash;
    md5::digest_type digest;
    hash.process_bytes(line.data(), line.size());
    hash.get_digest(digest);
    const auto charDigest = reinterpret_cast<const char *>(&digest);
    std::string res;
    boost::algorithm::hex(charDigest, charDigest + sizeof(md5::digest_type), std::back_inserter(res));
    outfile << res << '
';
  }
  outfile.close();
  infile.close();
  return true;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...