Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
536 views
in Technique[技术] by (71.8m points)

posix - Split JSON array into separate files/objects

I have JSON exported from Cassandra in this format.

[
  {
    "correlationId": "2232845a8556cd3219e46ab8",
    "leg": 0,
    "tag": "received",
    "offset": 263128,
    "len": 30,
    "prev": {
      "page": {
        "file": 0,
        "page": 0
      },
      "record": 0
    },
    "data": "HEAD /healthcheck HTTP/1.1

"
  },
  {
    "correlationId": "2232845a8556cd3219e46ab8",
    "leg": 0,
    "tag": "sent",
    "offset": 262971,
    "len": 157,
    "prev": {
      "page": {
        "file": 10330,
        "page": 6
      },
      "record": 1271
    },
    "data": "HTTP/1.1 200 OK
Date: Wed, 14 Feb 2018 12:57:06 GMT
Server: 
Connection: close
X-CorrelationID: Id-2232845a8556cd3219e46ab8 0
Content-Type: text/xml

"
  }]

I would like to split it to separate documents:

{ "correlationId": "2232845a8556cd3219e46ab8", "leg": 0, "tag": "received", "offset": 263128, "len": 30, "prev": { "page": { "file": 0, "page": 0 }, "record": 0 }, "data": "HEAD /healthcheck HTTP/1.1 " }

and

{ "correlationId": "2232845a8556cd3219e46ab8", "leg": 0, "tag": "sent", "offset": 262971, "len": 157, "prev": { "page": { "file": 10330, "page": 6 }, "record": 1271 }, "data": "HTTP/1.1 200 OK Date: Wed, 14 Feb 2018 12:57:06 GMT Server: Connection: close X-CorrelationID: Id-2232845a8556cd3219e46ab8 0 Content-Type: text/xml " }

I wanted to use jq but didn't find way how.

Can you please advise way, how to split it by the document separator?

Thanks, Reddy

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using jq, one can split an array into its components using the filter:

.[]

The question then becomes what is to be done with each component. If you want to direct each component to a separate file, you could (for example) use jq with the -c option, and filter the result into awk, which can then allocate the components to different files. See e.g. Split JSON File Objects Into Multiple Files

Performance considerations

One might think that the overhead of calling jq+awk would be high compared to calling python, but both jq and awk are lightweight compared to python+json, as suggested by these timings (using Python 2.7.10):

time (jq -c  .[] input.json | awk '{print > "doc00" NR ".json";}')
user    0m0.005s
sys     0m0.008s

time python split.py
user    0m0.016s
sys     0m0.046s

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...