Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
440 views
in Technique[技术] by (71.8m points)

python-docx add_style with CTL (Complex text layout) language

What I’m trying to accomplish:

  • Create a paragraph style in python-docx with user defined Persian font and size (a CTL language)

Problem:

  • I can do this with non-CTL languages like English:

    from docx import Document
    from docx.enum.style import WD_STYLE_TYPE
    from docx.shared import Pt
    
    user_font_name = 'FreeMono'
    user_font_size = 14
    
    doc = Document()
    my_style = doc.styles.add_style('style_name',WD_STYLE_TYPE.PARAGRAPH)
    my_font = my_style.font
    my_font.name = user_font_name
    my_font.size = Pt(user_font_size)
    p = doc.add_paragraph('some text',my_style)
    
    # persian_p = doc.add_paragraph('?????',my_style)
    # FreeMono supports Persian language so the problem is not the font
    
    doc.save('file.docx')
    
  • However if I change the text to a Persian text, its font won’t change to the specified font.

Why this happens:

  • My specified font only changes western font family of style and doesn’t do anything to CTL font family

How I know this:

  • If I open the docx file with LibreOffice and open the style and go into the font section, I can see that my specified font and size are there in “Western Text Font Family” but not in “CTL Font Family”. And as a result my CTL text font becomes the default font.

Additional info:

  1. I’m using LibreOffice on Linux
  2. Changing the default style doesn’t do me any good in this situation because I want the user to specify font name and size.
  3. I have no experience in changing xml files (let alone docx xml files)
  4. python-docx version is 0.8.6
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

After many hours poking around the docx file I realized much to my horror, that the answer lied in style.xml file of the document. Here’s a kind of way to fix it for people with similar problems:

Problems with Text Direction:

  • If you’ve ever typed in Arabic or Persian you might have seen that aligning the text right to left doesn’t fix all your problems. Because if you don’t change text direction, then the cursor and punctuation marks remain at the far right of the screen (instead of following the last letter) and there is no right-justify if you need it. Now because I couldn’t change text direction in python-docx even by changing “textDirection” value of document.xml from ‘lrTb’ (Left-Right/Top-Bottom) to ‘rlTb’, I had to make a document with LibreOffice and change its default paragraph style (‘Normal’) to what I had in mind (rtl text direction, etc). This actually saves a lot of time later too because you don’t need to do it in python.

Xml explanation of the font changing problem:

  • The document with altered default style shows a couple of different things in its style.xml file. In Normal paragraph style under "w:rPr" you can see that there is an additional "w:szCs" that determines the size of complex script font (which you can’t change by changing style.font.size) and in "w:rFonts" the value for "cs" is now my specified Persian font. Also the "w:lang" value, “bidi”, is now “fa-IR” (for Persian). Here’s the xml part I’m talking about:

    <w:rPr>
    <w:rFonts w:ascii="FreeMono" w:hAnsi="FreeMono" w:cs="FreeFarsi"/>
    <w:sz w:val="40"/>
    <w:rtl/>
    <w:cs/>
    <w:szCs w:val="40"/>
    <w:lang w:val="en-Us" w:bidi="fa-IR"/>
    </w:rPr>
    
  • Now changing the style.font.size only changes "sz" value (western font size) and doesn’t do anything to "szCs" value (cs font size). And similarly style.font.name only changes "ascii" and "hAnsi" values of "w:rFonts" and doesn't do anything to "cs" value. So to change these values I had to change my style elements in python.

Solution:

from docx import Document
from docx.shared import Pt

#path to doc with altered style:
base_doc_location = 'base.docx'
doc = Document(base_doc_location)
my_style = doc.styles['Normal']

# define your desired fonts
user_cs_font_size = 16
user_cs_font_name = 'FreeFarsi'
user_en_font_size = 12
user_en_font_name = 'FreeMono'

# get <w:rPr> element of this style
rpr = my_style.element.rPr

#==================================================
'''This probably isn't necessary if you already
have a document with altered style, but just to be
safe I'm going to add this here'''

if rpr.rFonts is None:
    rpr._add_rFonts()
if rpr.sz is None:
    rpr._add_sz()
#==================================================

'''Get the nsmap string for rpr. This is that "w:"
at the start of elements and element values in xml.
Like these:
    <w:rPr>
    <w:rFonts>
    w:val

The nsmap is like a url:
http://schemas.openxmlformats.org/...

Now w:rPr translates to:
{nsmap url string}rPr

So I made the w_nsmap string like this:'''

w_nsmap = '{'+rpr.nsmap['w']+'}'
#==================================================

'''Because I didn't find any better ways to get an
element based on its tag here's a not so great way
of getting it:
'''
szCs = None
lang = None

for element in rpr:
    if element.tag == w_nsmap + 'szCs':
        szCs = element
    elif element.tag == w_nsmap + 'lang':
        lang = element

'''if there is a szCs and lang element in your style
those variables will be assigned to it, and if not
we make those elements and add them to rpr'''

if szCs is None:
    szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
if lang is None:
    lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)

rpr.append(szCs)
rpr.append(lang)
#==================================================

'''Now to set our desired values to these elements
we have to get attrib dictionary of these elements
and set the name of value as key and our value as
value for that dict'''

szCs_attrib = szCs.attrib
lang_attrib = lang.attrib
rFonts_atr = rpr.rFonts.attrib

'''sz and szCs values are string values and 2 times
the font size so if you want font size to be 11 you
have to set sz (for western fonts) or szCs (for CTL
fonts) to "22" '''
szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))

'''Now to change cs font and bidi lang values'''
rFonts_atr[w_nsmap+'cs'] = user_cs_font_name
lang_attrib[w_nsmap+'bidi'] = 'fa-IR' # For Persian
#==================================================

'''Because we changed default style we don't even
need to set style every time we add a new paragraph
And if you change font name or size the normal way
it won't change these cs values so you can have a
font for CTL language and a different font for
western language
'''
persian_p = doc.add_paragraph('?????')
en_font = my_style.font
en_font.name = user_en_font_name
en_font.size = Pt(user_en_font_size)
english_p = doc.add_paragraph('some text')

doc.save('ex.docx')

Edit (code improvement):
I commented the lines that could use some improvement and put the better lines underneath them.

#rpr = my_style.element.rPr # If None it'll throw errors later
rpr = my_style.element.get_or_add_rPr() # this avoids potential errors
#if rpr.rFonts is None:
#    rpr._add_rFonts()
rFonts = rpr.get_or_add_rFonts()
#if rpr.sz is None:
#    rpr._add_sz()
rpr.get_or_add_sz()

#by importing these you can make elements and set values quicker
from docx.oxml.shared import OxmlElement, qn
#szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
szCs = OxmlElement('w:szCs')
#lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)
lang = OxmlElement('w:lang')

#szCs_attrib = szCs.attrib
#lang_attrib = lang.attrib
#rFonts_atr = rpr.rFonts.attrib
#szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))
#rFonts_atr[w_nsmap+'cs'] = user_cs_font_name
#lang_attrib[w_nsmap+'bidi'] = 'fa-IR'

szCs.set(qn('w:val'),str(int(user_cs_font_size*2)))
lang.set(qn('w:bidi'),'fa-IR')
rFonts.set(qn('w:cs'),user_cs_font_name)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...