When using openpyxl, there are times that I can only refer to columns in Excel worksheets via their column name, not their numerical index. It took me quite a bit of tinkering with failed solutions until I came up with a general approach that works.
I'd like to archive this information for anyone else who needs it.
Solution
Input: Column number (with 1 = A, 2 = B, ...)
Output: Column name in excel
__alphabet = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
def convertColumnNumToLetter(column):
converted = ""
#figure out the width of the converted text
columnCount = 1
base = len(__alphabet)
base_exponent = base
exponent = 1
while column > base_exponent:
column = column - base_exponent
exponent = exponent + 1
columnCount = columnCount + 1
base_exponent = base_exponent * base
#calculate the actual column name
column = column - 1
while len(converted) < columnCount:
digit = column % base
column = (column - digit) // base
converted = __alphabet[digit] + converted
return converted
The above calculates the column name by first figuring out the width of the column name, then using a standard base-n calculation with A=0, B=1, ..., Z=25 (forcing padding with 0s up to the length of the column name).
Figuring out the width of the column name:
1 - Width columns (Textual name, index)
A - 0
B - 1
C - 2
...
Z - 25 Note: these can be calculated with a standard base-26 calculation
2 - Width columns (Textual name, index)
AA - 26 = 26 + 0 Note: The 26 here represents the 26 A -> Z columns we just passed
AB - 27 = 26 + 1
AC - 28 = 26 + 2
...
AZ - 51 = 26 + 25
BA - 52 = 26 + 26 + 0 Note: The second 26 represents the AA -> AZ columns we just passed
BB - 53 = 26 + 26 + 1
...
BZ - 77 = 26 + 26 + 25
...
ZZ - ? = 26 + 26 * 25 + 25 Note: The 26 times 25 are all the AA -> BZ columns we just passed
Note: Ignoring the first 26 in each sum for the 2 - Width columns, we can see that the 2 - Width columns are just base-26 values (padded to width-2).
Conclusion
As it turns out, this pattern continues. The above solution, therefore, works based off the observation that the columns of width 1 are < 26, those of width 2 are < 26 + 26^2, those of width 3 are < 26 + 26^2 + 26^3.
If we calculate the column-width, we've noted that we can just calculate the base-26 column name (with A=0, B=1, ..., Z=25), pad the values to the appropriate width, and we receive the correct column name.
For the interested party, here is the inverse function (calculating the column number from the excel column name):
Solution
Input: Column name in excel
Output: Column number (with 1 = A, 2 = B, ...)
def convertLetterToColumnNum(columnName):
converted = 0
columnCount = len(columnName)
base = len(__alphabet)
base_exponent = 1
while len(columnName) != 0:
#strip the right-most digit, convert to index
digit = columnName[-1:]
columnName = columnName[:-1]
digit = __alphabet.index(digit)
#add the value it represents to the total, increment base_exponent
converted = converted + digit * base_exponent
base_exponent = base_exponent * base
#add the offset for having passed all the n-width columns
if len(columnName) != 0:
converted = converted + base_exponent
return converted + 1
Seeing as these are rather quick algorithms, I'd be interested to see if anyone knew of a solution that was demonstrably faster (when calculating a ton of these in the 3-width range, you might need such a thing).
P.S.
It should be noted that these algorithms work for any alphabet that uses the excel algorithm of naming, notably that:
- The alphabet is strictly ordered (A < B < C < D ... < Z for example)
- Columns are named in a gear fashion where the first length-of-alphabet values use 1 gear, the next length-of-alphabet^2 use 2 gears and so on
- Columns are named in order where the n gears all start on the least element and continue upwards
See Question&Answers more detail:
os