I'm testing subprocesses pipelines with python. I'm aware that I can do what the programs below do in python directly, but that's not the point. I just want to test the pipeline so I know how to use it.
My system is Linux Ubuntu 9.04 with default python 2.6.
I started with this documentation example.
from subprocess import Popen, PIPE
p1 = Popen(["grep", "-v", "not"], stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
print output
That works, but since p1
's stdin
is not being redirected, I have to type stuff in the terminal to feed the pipe. When I type ^D
closing stdin, I get the output I want.
However, I want to send data to the pipe using a python string variable. First I tried writing on stdin:
p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
p1.stdin.write('test
')
output = p2.communicate()[0] # blocks forever here
Didn't work. I tried using p2.stdout.read()
instead on last line, but it also blocks. I added p1.stdin.flush()
and p1.stdin.close()
but it didn't work either. I Then I moved to communicate:
p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
p1.communicate('test
') # blocks forever here
output = p2.communicate()[0]
So that's still not it.
I noticed that running a single process (like p1
above, removing p2
) works perfectly. And passing a file handle to p1
(stdin=open(...)
) also works. So the problem is:
Is it possible to pass data to a pipeline of 2 or more subprocesses in python, without blocking? Why not?
I'm aware I could run a shell and run the pipeline in the shell, but that's not what I want.
UPDATE 1: Following Aaron Digulla's hint below I'm now trying to use threads to make it work.
First I've tried running p1.communicate on a thread.
p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
t = threading.Thread(target=p1.communicate, args=('some data
',))
t.start()
output = p2.communicate()[0] # blocks forever here
Okay, didn't work. Tried other combinations like changing it to .write()
and also p2.read()
. Nothing. Now let's try the opposite approach:
def get_output(subp):
output = subp.communicate()[0] # blocks on thread
print 'GOT:', output
p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
t = threading.Thread(target=get_output, args=(p2,))
t.start()
p1.communicate('data
') # blocks here.
t.join()
code ends up blocking somewhere. Either in the spawned thread, or in the main thread, or both. So it didn't work. If you know how to make it work it would make easier if you can provide working code. I'm trying here.
UPDATE 2
Paul Du Bois answered below with some information, so I did more tests.
I've read entire subprocess.py
module and got how it works. So I tried applying exactly that to code.
I'm on linux, but since I was testing with threads, my first approach was to replicate the exact windows threading code seen on subprocess.py
's communicate()
method, but for two processes instead of one. Here's the entire listing of what I tried:
import os
from subprocess import Popen, PIPE
import threading
def get_output(fobj, buffer):
while True:
chunk = fobj.read() # BLOCKS HERE
if not chunk:
break
buffer.append(chunk)
p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
b = [] # create a buffer
t = threading.Thread(target=get_output, args=(p2.stdout, b))
t.start() # start reading thread
for x in xrange(100000):
p1.stdin.write('hello world
') # write data
p1.stdin.flush()
p1.stdin.close() # close input...
t.join()
Well. It didn't work. Even after p1.stdin.close()
was called, p2.stdout.read()
still blocks.
Then I tried the posix code on subprocess.py
:
import os
from subprocess import Popen, PIPE
import select
p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
numwrites = 100000
to_read = [p2.stdout]
to_write = [p1.stdin]
b = [] # create buffer
while to_read or to_write:
read_now, write_now, xlist = select.select(to_read, to_write, [])
if read_now:
data = os.read(p2.stdout.fileno(), 1024)
if not data:
p2.stdout.close()
to_read = []
else:
b.append(data)
if write_now:
if numwrites > 0:
numwrites -= 1
p1.stdin.write('hello world!
'); p1.stdin.flush()
else:
p1.stdin.close()
to_write = []
print b
Also blocks on select.select()
. By spreading print
s around, I found out this:
- Reading is working. Code reads many times during execution.
- Writing is also working. Data is written to
p1.stdin
.
- At the end of
numwrites
, p1.stdin.close()
is called.
- When
select()
starts blocking, only to_read
has something, p2.stdout
. to_write
is already empty.
os.read()
call always returns something, so p2.stdout.close()
is never called.
Conclusion from both tests: Closing the stdin
of the first process on the pipeline (grep
in the example) is not making it dump its buffered output to the next and die.
No way to make it work?
PS: I don't want to use a temporary file, I've already tested with files and I know it works. And I don't want to use windows.
See Question&Answers more detail:
os