Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.6k views
in Technique[技术] by (71.8m points)

vba - Removing a string that includes CRLF characters from body of e-mail

I am trying to remove a string from selected incoming MS Outlook (2016) e-mail.

The string is two sentences in German language. I use the Replace() function. This principally works. (See my full procedure below.)

The two sentences are sometimes separated by CRLF (Carriage Return, Line Feed) characters, and these are not always at the same place. This seems to be the result of these e-mails passing through various devices before they land in my Outlook inbox.

First address the simpler part of the problem

Before addressing the issue of the CRLF appearing on varying places, I want to create a procedure that deals with strings with CRLF at fixed positions.

How the source code of such a string would look:
enter image description here

(Screen shot history: I saved the e-mail as .html on my harddisk, then opened the .html file in Notepad++, to see the CRLF characters.)

The html tags are not that relevant for me. They can remain in the e-mail. (In fact, the formatting tags vary, too, so it is better to not start tackling them at all.) My only concern is to remove the visible part, i.e. the text "Diese E-Mail kommt... vertrauenswürdig halten".

I tried to catch text with line breaks by including the CR LF part as Chr():

strDelete01 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "au?erhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten." 

My procedure does not recognize the string, and accordingly does nothing.

My script so far

Public Sub EditBodyCgReplace()

'Declarations
   Dim obj As Object
   Dim Sel As Outlook.Selection
   Dim DoSave As Boolean
   Dim NewBody As String
   Dim strDelete01 As String
   Dim strDelete02 As String
   Dim strDelete03 As String
   Dim strDelete04 As String

'Fill the variables 
   strDelete01 = "Diese E-Mail kommt von Personen au?erhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianh?nge, wenn Sie die Personen für vertrauenswürdig halten."
   strDelete02 = "################################################################################"
   strDelete03 = <hr>
   strDelete04 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "au?erhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten."

'Note: I am playing here with various types of strings at once. For example, 
'the procedure will also remove <hr> lines and "#####" strings  

'Work with it 
    If TypeOf Application.ActiveWindow Is Outlook.Inspector Then
        Set obj = Application.ActiveInspector.CurrentItem
    Else
        Set Sel = Application.ActiveExplorer.Selection
        If Sel.Count Then
            Set obj = Sel(1)
            DoSave = True
        End If
    End If

    If Not obj Is Nothing Then
        NewBody = Replace(obj.HTMLBody, strDelete01, "")
        NewBody = Replace(obj.HTMLBody, strDelete02, "")
        NewBody = Replace(obj.HTMLBody, strDelete03, "")
        NewBody = Replace(obj.HTMLBody, strDelete04, "")

        If NewBody <> "" Then
            obj.HTMLBody = NewBody
            If DoSave Then
                obj.Save
            End If
        End If
    End If
    End Sub

Question: What can I do to include the CRLF in the search string?

Follow-up question: What can I do to remove such strings with CRLF included in varying places? Is there a way to use regular expressions? Can VBA in Outlook deal with it? - Idea: if regular expressions work, perhaps the entire CRLF issue is not an issue anymore, as the expression would look something like

"Diese E-Mail kommt von * vertrauenswürdig halten."

and thus include anything - including CRLF - in the middle?

Perhaps important

After doing various experiments I am starting to feel that MS Outlook does not use HTML at all in its e-mails?

I observe I can practically not address any html code in the obj.HTMLBody. I can address plain text. I cannot address parts of html such as "<hr ", or at least that is what I believe to be observing. (There was a moment when I could address "<hr>" and thus delete it, but I cannot recreate the conditions where this worked yesterday.)

I can save the e-mails as html files (outside Outlook, somewhere on my harddisk in a separate folder), and in these files I do see the CRLF and other stuff. But perhaps the e-mails, as long as kept in Outlook itself, are stored using some other code?

So what is this code, and how can I address parts of it for deleting?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I will answer your question in parts as I have the spare time. Someone else may get to the important bit before I do.

I have edited your question. I did not understand a couple of sentences so I looked at the source and found my suspicion was correct, you had included less than characters. Stack Overflow permits a limited number of Html tags. Anything else that looks like an Html tag is ignored. I replaced each "<" with "&lt;" so readers could see your Html. I can add an explanation if you do not understand why this works.

You have:

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(obj.HTMLBody, strDelete02, "")
NewBody = Replace(obj.HTMLBody, strDelete03, "")
NewBody = Replace(obj.HTMLBody, strDelete04, "")
If NewBody <> "" Then

Each Replace (except the first) overwrites the value of NewBody created by the previous Replace. You seem to think that if strDelete04 is not found, NewBody will be empty. No, if strDelete04 is not found, NewBody will be a copy of obj.HTMLBody.

You need something like:

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(NewBody, strDelete02, "")
NewBody = Replace(NewBody, strDelete03, "")
NewBody = Replace(NewBody, strDelete04, "")
If NewBody <> obj.HTMLBody Then
  ' One or more delete strings found and removed

You say that the CRLFs are not in fixed positions. If so, no simple modification of your code will have the effect you seek. I will show you how to achieve the effect you seek but first I will have to create some emails containing your text so I can test my code.

Part 2

Having looked at your image of the Html more closely, I believe there is a simple solution. The two CRLFs in the text replace spaces. Providing this is always what happens, you can use:

NewBody = Replace(obj.HTMLBody, vbCr & vbLf, " ")

This would remove any CRLF present wherever it appeared within the Html. It would not matter if there were extra CRLFs because any string of whitespace characters (which includes CR and LF) in an Html document is replaced by a single space when the document is displayed.

You finish the removal of the unwanted text with:

Dim strDelete = "Diese E-Mail kommt von Personen au?erhalb " & _
                "der Stadtverwaltung. Klicken Sie nur auf " & _
                "Links oder Dateianh?nge, wenn Sie die Personen " & _
                "für vertrauenswürdig halten."

NewBody = Replace(NewBody, strDelete, "")

If the above does not work, you need a more convenient diagnostic technique. Saving the entire email as Html may be easy but you cannot be quite sure how the result differs from what a VBA macro would see. You wonder if Outlook stores emails in a format other than Html. I cannot imagine why Outlook would convert the incoming SMTP message to some secret format and then convert it back when the user wishes to view it. If Outlook does have a secret format, it is totally hidden from the VBA programmer.

The following is a simple version of the diagnostic tool I use. If you need something more advanced, I can provide it but let us try this first.

Copy the code below to an Outlook module. Select one of these emails and then run macro DsplHtmlBodyFromSelectedEmails. The entire Html body of the email will be output to the Immediate Window in a readable format. I believe I have included all the subroutines called by the macro. I apologise in advance if I have not. If you get a message about an undefined routine, let me know and I will add it to the answer.

Sub DsplHtmlBodyFromSelectedEmails()

  ' Select one or emails then run this macro.  For each selected email, the Received Time, the Subject and the Html body are output to the Immediate Window.  Note: the Immediate Window can only display about 200 lines before
The older lines are lost.

  Dim Exp As Explorer
  Dim Html As String
  Dim ItemCrnt As MailItem

  Set Exp = Outlook.Application.ActiveExplorer

  If Exp.Selection.Count = 0 Then
    Call MsgBox("Please select one or more emails then try again", vbOKOnly)
    Exit Sub
  Else
    For Each ItemCrnt In Exp.Selection
      With ItemCrnt
        If .Class = olMail Then
          Debug.Print .ReceivedTime & " " & .Subject
          Call OutLongTextRtn(Html, "Html", .HtmlBody)
          Debug.Print Html
        End If
      End With
    Next
  End If

End Sub
Sub OutLongTextRtn(ByRef TextOut As String, ByVal Head As String, _
                          ByVal TextIn As String)

  ' * Break TextIn into lines of not more than 100 characters
  '   and append to TextOut.
  ' * The output is arranged so:
  '     xxxx|sssssssssssssss|
  '         |sssssssssssssss|
  '         |ssssssssss|
  '   where "xxxx" is the value of Head and "ssss..." are characters from
  '         TextIn.  The third line in the example could be shorter because:
  '           * it contains the last few characters of TextIn
  '           * there a linefeed in TextIn
  '           * a <xxx> string recording whitespace would have been split
  '             across two lines.

  If TextIn = "" Then
    ' Nothing to do
    Exit Sub
  End If

  Const LenLineMax As Long = 100

  Dim PosBrktEnd As Long     ' Last > before PosEnd
  Dim PosBrktStart As Long   ' Last < before PosEnd
  Dim PosNext As Long        ' Start of block to be output after current block
  Dim PosStart As Long       ' First character of TextIn not yet output

  TextIn = TidyTextForDspl(TextIn)
  TextIn = Replace(TextIn, "lf?", "lf?" & vbLf)

  PosStart = 1
  Do While True
    PosNext = InStr(PosStart, TextIn, vbLf)
    If PosNext = 0 Then
      ' No LF in [Remaining] TextIn
      'Debug.Assert False
      PosNext = Len(TextIn) + 1
    End If
    If PosNext - PosStart > LenLineMax Then
      PosNext = PosStart + LenLineMax
    End If
    ' Check for <xxx> being split across lines
    PosBrktStart = InStrRev(TextIn, "?", PosNext - 1)
    PosBrktEnd = InStrRev(TextIn, "?", PosNext - 1)
    If PosBrktStart < PosStart And PosBrktEnd < PosStart Then
      ' No <xxx> within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And PosBrktEnd > 0 And PosBrktEnd > PosBrktStart Then
      ' Last or only <xxx> totally within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And _
           (PosBrktEnd = 0 Or (PosBrktEnd > 0 And PosBrktEnd < PosBrktStart)) Then
      ' Last or only <xxx> will be split across rows
      'Debug.Assert False
      PosNext = PosBrktStart
    Else
      ' Are there other combinations?
      Debug.Assert False
    End If

    'Debug.Assert Right$(Mid$(TextIn, PosStart, PosNext - PosStart), 1) <> "?"

    If TextOut <> "" Then
      TextOut = TextOut & vbLf
    End If
    If PosStart = 1 Then
      TextOut = TextOut & Head & "|"
    Else
      TextOut = TextOut & Space(Len(Head)) & "|"
    End If
    TextOut = TextOut & Mid$(TextIn, PosStart, PosNext - PosStart) & "|"
    PosStart = PosNext
    If Mid$(TextIn, PosStart, 1) = vbLf Then
      PosStart = PosStart + 1
    End If
    If PosStart > Len(TextIn) Then
      Exit Do
    End If
  Loop

End Sub
Function TidyTextForDspl(ByVal Text As String) As String

  ' Tidy Text for display by replacing white space with visible strings:
  '   Leave single space unchanged
  '   Replace single LF by                 ?lf?
  '   Replace single CR by                 ?cr?
  '   Replace single TB by                 ?tb?
  '   Replace single non-break space by    ?nbs?
  '   Replace single CRLF by               ?crlf?
  '   Replace multiple spaces by           ?n s?       where n is number of repeats
  '   Replace multiple LFs by              ?n lf?      of white space character
  '   Replace multiple CRs by ?cr? or      ?n cr?
  '   Replace multiple TBs by              ?n tb?
  '   Replace multiple non-break spaces by ?n nbs?
  '   Replace multiple CRLFs by            ?n crlf?

  Dim InsStr As String
  Dim InxWsChar As Long
  Dim NumWsChar As Long
  Dim PosWsChar As Long
  Dim RetnVal As String
  Dim WsCharCrnt As Variant
  Dim WsCharValue As Variant
  Dim WsCharDspl As Variant

  WsCharValue = VBA.Array(" ", vbCr & vbLf, vbLf, vbCr, vbTab, Chr(160))
  WsCharDspl = VBA.Array("s", "crlf", "lf", "cr", "tb", "nbs")

  RetnVal = Text

  ' Replace each whitespace individually
  For InxWsChar = 0 To UBound(WsCharValue)
    RetnVal = Replace(RetnVal, WsCharValue(InxWsChar), "?" & WsCharDspl(InxWsChar) & "?")
  Next

  ' Look for repeats. If found replace <x> by <n x>
  For InxWsChar = 0 To UBound(WsCharValue)
    'Debug.Assert InxWsChar <> 1
    PosWsChar = 1
    Do While True
      InsStr = "?" & WsCharDspl(InxWsChar) & "?"
      PosWsChar = InStr(PosWsChar, RetnVal, InsStr & InsStr)
      If PosWsChar = 0 Then
        ' No [more] repeats of this <x>
        Exit Do
      End If
      ' Have <x><x>.  Count number of extra <x>s
      NumWsChar = 2
      Do While Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr), Len(InsStr)) = InsStr
        NumWsChar = NumWsChar + 1
      Loop
      RetnVal = Mid(RetnVal, 1, PosWsChar - 1) & _
                "?" & NumWsChar & " " & WsCharDspl(InxWsChar) & "?" & _
                Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr))
      PosWsChar = PosWsChar + Len(InsStr) + Len(NumWsChar)

    Loop
  Next

  ' Restore any single spaces
  RetnVal = Replace(RetnVal, "?" & WsCharDspl(0) & "?", " ")

  TidyTextForDspl = RetnVal

End Function

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...