Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
159 views
in Technique[技术] by (71.8m points)

excel - How to extract dates from string?

I am trying to extract dates from string by excel formula only. Below is my data sample

Only contains one date 01/05/2021 to this example
Project start date is 01/01/2021 and end date is 31/01/2021
There may multiple date like 1st 01/01/2021 2nd 01/06/2021 and 3rd 31/12/2021

enter image description here

My computer local date format is dd/mm/yyyy. I was trying to accomplish it by FILTERXML() formula. I have tried below formula, also tried few other methods but fails.

=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[translate(.,'dd/mm/yyyy','')!=.]"))

My expected output looks like-

enter image description here

question from:https://stackoverflow.com/questions/65952087/how-to-extract-dates-from-string

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you want to go purely xpath then you could try to fully validate your pattern dd/mm/yyyy in a few steps:1

=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[substring(., 3, 1)= '/'][substring(., 6, 1)= '/'][string-length(translate(., '/' , '')) = 8][translate(., '/' , '')*0=0]"),"dd/mm/e"))
  • "<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>" - Create a valid XML-construct.
  • //s - Select s-nodes where:
    • [substring(., 3, 1)= '/'] - There is a forward slash at the 3rd index;
    • [substring(., 6, 1)= '/'] - There is a forward slash at the 6th index;
    • [string-length(translate(., '/' , '')) = 8] - The remainder of the node when we replace the forward slashes is of length eight.
    • [translate(., '/' , '')*0=0] - The remainder of the node when we replace the forward slashes is numeric.

Needless to say that if your string does not hold any other forward slashes but those in the dates you can simplify the above significantly1:

=TRANSPOSE(TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[contains(., '/')]"),"dd/mm/e"))

enter image description here

Notice that if "dd/mm/yyyy" is recognized by Excel as dates, the returned array of simply using //s would return the numeric equivalent of these dates. If no other numeric values exist in your string you could benefit from that using Microsoft365 functionality1:

=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s"),TRANSPOSE(TEXT(FILTER(X,ISNUMBER(X)),"dd/mm/e")))

1: Note that you can remove the nested TEXT() function and numberformat your cells to dd/mm/e too.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...