Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
489 views
in Technique[技术] by (71.8m points)

fortran operator overloading: function or subroutine

I recently updated my .f90 code to .f03, and I was expecting to see speedup because my older version involved many allocating and deallocating (7 3D arrays--45x45x45) at each iteration inside a do loop (4000 in total). Using derived types, I allocate these arrays at the beginning of the simulation and deallocate them at the end. I thought that I would see speedup, but it's actually running significantly slower (30 min as opposed to 23 min).

I ran a profiler, and it looks like the add/subtract/multiply/divide operators are taking a relatively long time. Aside from the change in the standard change, the operators are the only difference as far as I can tell. I'm wondering if this is due to the fact that functions are returning new copies of the field quantities during every operation.

So here is my question: might it run faster if I change the functions to subroutines so that these fields are passed by reference (I think?)? Also, if this is faster, and more preferred, then why are all of these examples showing functions for operator overloading instead of using subroutines? I feel like I'm missing something.

References for functions with operator overloading:

http://www.mathcs.emory.edu/~cheung/Courses/561/Syllabus/6-Fortran/operators.html

http://research.physics.illinois.edu/ElectronicStructure/498-s97/comp_info/overload.html

https://web.stanford.edu/class/me200c/tutorial_90/13_extra.html

https://www.ibm.com/developerworks/community/blogs/b10932b4-0edd-4e61-89f2-6e478ccba9aa/entry/object_oriented_fortran_does_fortran_support_operator_overloading55?lang=en

Here's an example of one of my operators:

    function vectorVectorDivide(f,g) result(q)
      implicit none
      type(vectorField),intent(in) :: f,g
      type(vectorField) :: q
      q%x = f%x / g%x; q%y = f%y / g%y; q%z = f%z / g%z
      q%sx = f%sx; q%sy = f%sy; q%sz = f%sz
    end function

Any help or info on this is greatly appreciated!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There are two questions here:

  1. In some situations, can I get better performance using a subroutine approach over a function approach?
  2. Why, if performance is worse, would I want to use a function?

An important thing to say about the first question is, you may be best testing things for yourself: there are a lot of specific aspects to this.

However, I have quickly knocked something up which may guide you.

module test

  implicit none

  type t1
     real, allocatable :: x(:)
  end type t1

contains

  function div_fun(f,g) result(q)
    type(t1), intent(in) :: f, g
    type(t1) q
    q%x = f%x/g%x
  end function div_fun

  subroutine div_sub1(f, g, q)
    type(t1), intent(in) :: f, g
    type(t1), intent(out) :: q
    q%x = f%x/g%x
  end subroutine div_sub1

  subroutine div_sub2(f, g, q)
    type(t1), intent(in) :: f, g
    type(t1), intent(inout) :: q
    q%x(:) = f%x/g%x
  end subroutine div_sub2

end module test

With this, I observed that sometimes there was no significant difference between using a function and a subroutine, and sometimes there was. That is, it depends on compilers, flags, and so on.

However, it is important to note what is happening.

For the function the result requires an allocation, and for the subroutine div_sub1 the intent(out) argument requires an allocation. [Assigning the function result adds to things - see later.]

In div_sub2 the allocation is re-used (the "result" argument is intent(inout)) and we suppress an automatic re-allocation by using q%x(:). It's this latter part that is important: compilers often suffer overheads for doing the checking about whether resizing is required. This latter part can be tested by changing the intent of q in div_sub1 to inout.

[Note that, there is an assumption for this div_sub2 approach that sizes aren't changing; this seems supported by your text.]

To conclude for the first question: check for yourself, but wonder whether you are merely "hiding" allocations by using derived types rather than removing them. And you may get very different answers using parameterized derived types.

Coming to the second question, why are functions commonly used? You'll note that I've looked at very specific cases:

q = div_fun(f,g)
call div_sub2(f,g,q)  ! Could be much faster

From the question text and the links (and previous questions you've asked) I'll assume that you have something which overloads the / operator

interface operator (/)
  module procedure div_fun
end interface

allowing

q = f/g               ! Could be slower, but looks good.
call div_sub2(f,g,q)

as we note that to be used as a binary operator (see Fortran 2008 7.1.5, 7.1.6) the procedure must be a function. In response to your comment to a previous revision of this answer

aren't the div_sub1 and div_sub2 binary operators just like the div_fun?

the answer is "no", at least in terms of what Fortran defines as being binary operators (link as above). [Also, div_fun isn't itself a binary operator, it's the combination of the function and the generic interface that form the operation.]

Making the function approach attractive is that a binary operation can be part of an expression:

q = q + alpha*(f/g)                ! Very neat
call div_sub2(f,g,temp1)
call mult_sub(alpha, temp1, temp2)
call add_sub(q, temp2, temp3)
call assign_sub(q, temp3)

Using subroutines can get a little messy. The example above could be tidied slightly by handling "in-place" aspects (or specialist subroutines), but this brings me to a final point. Because a function result is evaluated entirely before its later use (including assignment) we have situations like

f = f/g  ! or f=div_fun(f,g)
call div_sub2(f,g,f) ! Beware aliasing

To conclude for the second question: performance isn't everything.

[Finally, if you mean you are using .f90 and .f03 file suffixes to denote/govern standard compliance, then you may want to see what people think about that.]


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

Just Browsing Browsing

2.1m questions

2.1m answers

60 comments

56.9k users

...