I've been using python to implement a custom parser and use that parsed data to format a word document to be distributed internally. All of the formatting has been straightforward and easy so far but I'm completely stumped on how to insert a checkbox into individual table cells.
I've tried using the python object functions within python-docx (using get_or_add_tcPr()
, etc.) which causes MS Word to throw the following error when I try to open the file, "The file xxxx cannot be opened because there are problems with the contents Details: The file is corrupt and cannot be opened".
After struggling with this for a while I moved to a second approach involving manipulating the word/document.xml file for the output doc. I've retrieved what I believe to be the correct xml for a checkbox saved as replacementXML
and have inserted filler text into the cells to act as a tag that can be searched and replaced, searchXML
. The following seems to run using python in a linux (Fedora 25) environment but the word document displays the same errors when I try to open the document, however this time the document is recoverable and reverts back to the filler text. I've been able to get this to work with a manually made document and using an empty table cell, so I believe that this should be possible. NOTE: I've included the whole xml element for the table cell in the searchXML
variable, but I've tried using regular expressions and shortening the string. Not just using an exact match as I know this could differ cell to cell.
searchXML = r'<w:tc><w:tcPr><w:tcW w:type="dxa" w:w="4320"/><w:gridSpan w:val="2"/></w:tcPr><w:p><w:pPr><w:jc w:val="right"/></w:pPr><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:t>IN_CHECKB</w:t></w:r></w:p></w:tc>'
def addCheckboxes():
os.system("mkdir unzipped")
os.system("unzip tempdoc.docx -d unzipped/")
with open('unzipped/word/document.xml', encoding="ISO-8859-1") as file:
filedata = file.read()
rep_count = 0
while re.search(searchXML, filedata):
filedata = replaceXML(filedata, rep_count)
rep_count += 1
with open('unzipped/word/document.xml', 'w') as file:
file.write(filedata)
os.system("zip -r ../buildcfg/tempdoc.docx unzipped/*")
os.system("rm -rf unzipped")
def replaceXML(filedata, rep_count):
replacementXML = r'<w:tc><w:tcPr><w:tcW w:w="4320" w:type="dxa"/><w:gridSpan w:val="2"/></w:tcPr><w:p w:rsidR="00D2569D" w:rsidRDefault="00FD6FDF"><w:pPr><w:jc w:val="right"/></w:pPr><w:r><w:rPr><w:sz w:val="16"/>
</w:rPr><w:fldChar w:fldCharType="begin"><w:ffData><w:name w:val="Check1"/><w:enabled/><w:calcOnExit w:val="0"/><w:checkBox><w:sizeAuto/><w:default w:val="0"/></w:checkBox></w:ffData></w:fldChar>
</w:r><w:bookmarkStart w:id="' + rep_count + '" w:name="Check' + rep_count + '"/><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:instrText xml:space="preserve"> FORMCHECKBOX </w:instrText></w:r><w:r>
<w:rPr><w:sz w:val="16"/></w:rPr></w:r><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:fldChar w:fldCharType="end"/></w:r><w:bookmarkEnd w:id="' + rep_count + '"/></w:p></w:tc>'
filedata = re.sub(searchXML, replacementXML, filedata, 1)
rerturn filedata
I have a strong feeling that there is a much simpler (and correct!) way of doing this through the python-docx library but for some reason I can't seem to get it right.
Is there a way to easily insert checkbox fields into a table cell in an MS Word doc? And if yes, how would I do that? If no, is there a better approach than manipulating the .xml file?
UPDATE: I've been able to inject XML into the document succesffuly using python-docx but the checkbox and added XML are not appearing.
I've added the following XML into a table cell:
<w:tc>
<w:tcPr>
<w:tcW w:type="dxa" w:w="4320"/>
<w:gridSpan w:val="2"/>
</w:tcPr>
<w:p>
<w:r>
<w:bookmarkStart w:id="0" w:name="testName">
<w:complexType w:name="CT_FFCheckBox">
<w:sequence>
<w:choice>
<w:element w:name="size" w:type="CT_HpsMeasure"/>
<w:element w:name="sizeAuto" w:type="CT_OnOff"/>
</w:choice>
<w:element w:name="default" w:type="CT_OnOff" w:minOccurs="0"/>
<w:element w:name="checked" w:type="CT_OnOff" w:minOccurs="0"/>
</w:sequence>
</w:complexType>
</w:bookmarkStart>
<w:bookmarkEnd w:id="0" w:name="testName"/>
</w:r>
</w:p>
</w:tc>
by using the following python-docx code:
run = p.add_run()
tag = run._r
start = docx.oxml.shared.OxmlElement('w:bookmarkStart')
start.set(docx.oxml.ns.qn('w:id'), '0')
start.set(docx.oxml.ns.qn('w:name'), n)
tag.append(start)
ctype = docx.oxml.OxmlElement('w:complexType')
ctype.set(docx.oxml.ns.qn('w:name'), 'CT_FFCheckBox')
seq = docx.oxml.OxmlElement('w:sequence')
choice = docx.oxml.OxmlElement('w:choice')
el = docx.oxml.OxmlElement('w:element')
el.set(docx.oxml.ns.qn('w:name'), 'size')
el.set(docx.oxml.ns.qn('w:type'), 'CT_HpsMeasure')
el2 = docx.oxml.OxmlElement('w:element')
el2.set(docx.oxml.ns.qn('w:name'), 'sizeAuto')
el2.set(docx.oxml.ns.qn('w:type'), 'CT_OnOff')
choice.append(el)
choice.append(el2)
el3 = docx.oxml.OxmlElement('w:element')
el3.set(docx.oxml.ns.qn('w:name'), 'default')
el3.set(docx.oxml.ns.qn('w:type'), 'CT_OnOff')
el3.set(docx.oxml.ns.qn('w:minOccurs'), '0')
el4 = docx.oxml.OxmlElement('w:element')
el4.set(docx.oxml.ns.qn('w:name'), 'checked')
el4.set(docx.oxml.ns.qn('w:type'), 'CT_OnOff')
el4.set(docx.oxml.ns.qn('w:minOccurs'), '0')
seq.append(choice)
seq.append(el3)
seq.append(el4)
ctype.append(seq)
start.append(ctype)
end = docx.oxml.shared.OxmlElement('w:bookmarkEnd')
end.set(docx.oxml.ns.qn('w:id'), '0')
end.set(docx.oxml.ns.qn('w:name'), n)
tag.append(end)
Can't seem to find reasoning for the XML not being reflected in the output document but will update with whatever I find.
See Question&Answers more detail:
os