Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
482 views
in Technique[技术] by (71.8m points)

silence out regions of audio based on a list of time stamps , using sox and python

I have an audio file.
I have a bunch of [start, end] time stamp segments.

WHAT I WANT TO ACHIEVE: Say audio is 6:00 minutes long.
Segments I have are : [[0.0,4.0], [8.0,12.0], [16.0,20.0], [24.0,28.0]]

After I pass these two to sox + python , out put should be audio that is 6 minutes long, but has audio only in the times passed by the segments.

i.e I want to pass the time stamps and original audio to SOX + python so that an audio with everything silenced out except for those portions corresponding to the passed segments is generated

I couldn't achieve above but came somewhat close to the opposite, after days of googling I have this:

UPDATED, MORE CONCISE CODE + EXAMPLE:
sox command that takes padding and trimming like this

SOX__SILENCE = 'sox "{inputaudio}" -c 1 "{outputaudio}" {padding}{trimming}'

Random Segments for testing:

# random segments:
A= [[0.0,16.0]]
b=[[1.0,2.0]]
z= [[1.6, 8.3], [13.2, 33.7], [35.0,38.0], [42.0,51.0], [70.2,73.7], [90.0,99.2], [123.0,131.1]]
q= [[0.0,4.0], [8.0,12.0], [16.0,20.0], [24.0,28.0]]

A small python script to generate padding and trimming.

PADDING:

def get_pad_pattern_from_timestamps(my_segments):
        padding = 'pad'
        for segment in my_segments:
            duration = str(segment[1] - segment[0])
            padding = padding + ' ' + duration + '@' + str(segment[0])
        return padding
?
print get_pad_pattern_from_timestamps(A)
print get_pad_pattern_from_timestamps(b)
print get_pad_pattern_from_timestamps(z)
print get_pad_pattern_from_timestamps(q)

OUTPUT from ^:

pad [email protected]
pad [email protected]
pad [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
pad [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

TRIMMING:

def get_trimm_pattern_from_timestamps(my_segments):
        trimming = ''
        for segment in my_segments:
            duration = str(segment[1] - segment[0])
            trimming = trimming + ' trim 0 ' + str(segment[0]) + ' 0 ' + duration + ' ' + duration
        return trimming

print get_trimm_pattern_from_timestamps(A)
print get_trimm_pattern_from_timestamps(b)
print("
")
print get_trimm_pattern_from_timestamps(z)
print("
")
print get_trimm_pattern_from_timestamps(q)
print("
")

OUTPUT FROM TRIMMING:

trim 0 0.0 0 16.0 16.0
 trim 0 1.0 0 1.0 1.0


 trim 0 1.6 0 6.7 6.7 trim 0 13.2 0 20.5 20.5 trim 0 35.0 0 3.0 3.0 trim 0 42.0 0 9.0 9.0 trim 0 70.2 0 3.5 3.5 trim 0 90.0 0 9.2 9.2 trim 0 123.0 0 8.1 8.1


 trim 0 0.0 0 4.0 4.0 trim 0 8.0 0 4.0 4.0 trim 0 16.0 0 4.0 4.0 trim 0 24.0 0 4.0 4.0 trim 0 32.0 0 4.0 4.0 trim 0 40.0 0 4.0 4.0

RUNNING SOX using about outputs from a terminal:

Padding:  

    sox dinners.mp3 -c 1 testlongpad.mp3 pad [email protected] [email protected] [email protected] [email protected]

Trimming:  

    sox dinners.mp3 -c 1 testrim.mp3 trim 0 0.0 0 16.0 16.0

Padd and trimm: 

    sox dinners.mp3 -c 1 testlongpadtrim.mp3 pad [email protected] [email protected] [email protected] [email protected] trim 0 0.0 0 4.0 4.0 trim 0 8.0 0 4.0 4.0 trim 0 16.0 0 4.0 4.0 trim 0 24.0 0 4.0 4.0

If S are my segments, then NS is everything else. In ^ approach I'm passing NS , and NS is getting removed from Audio.

What I want to achieve is still the same but in a different way i.e I want to pass S so that only portions of audio corresponding toS are retained.

PS: My question is very specific, i am new to audio processing and unsure how to proceed. Kindly don't close question as being too broad or something. I'd be happy to provide more details to provide clarification. Lastly this is not a hw question. This is for a personal project.

Sample Audio : https://www.dropbox.com/s/1p27nfwney42ka2/LAZY_SALON_-03-_Hot_Dinners.mp3?dl=0

Sample Segments[[start,end],[,] ] : [[1.6, 8.3], [13.2, 33.7], [35.0,38.0], [42.0,51.0], [70.2,73.7], [90.0,99.2], [123.0,131.1]]

So when these time stamps are passed to sox/python with audio, everything in the audio except those portions in the supplied segments should be silenced out.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I was able to implement with a workaround.

See : create new list from list of lists in python by grouping

What I did was create a new list containing the regions between segments and then pass it on to sox. At the moment whatever I pass to sox gets removed. So I calculated regions to be removed and then passed it on to sox. It worked pretty well.

Solution is still inverted , but I don't have to change anything in the sox.

I won't accept my answer as an answer. Hoping someone is able to come up with a solution which involves modifying sox commands and not have to recalculate segments like I did.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...