Skip to content

Conversation

@davidedelvento
Copy link

@davidedelvento davidedelvento commented Oct 20, 2025

This is follow-up on #83 and of course for now it's just a talking point not something anywhere close to be merged. I'm putting it out there to start the conversation with something explicit in front of people, to get constructive criticism, before embarking myself in the full work to implement it (in a possibly broken way)

With that attitude, let me start with some questions @jeffhammond and @albandil

  1. In its current form I'm giving priority to BigMPI vs MPI v4 (because if one bothered to configure with the former, they more likely want to use it than not). If there is no BigMPI and the MPI implementation in use is v4, the PR in its current form blindly uses that. Perhaps we want a flag to revert the "non-big" file transfer regardless?
  2. This PR also assumes that I am understanding correctly both BigMPI's MPIX_* and the MPI4's MPI_*_c invocations. I have only experimented with toy examples for both so please don't blindly assume I'm doing the right thing (TM)
  3. This PR also assumes that the macro MpiInt is correctly implemented among the code and I believe I've seen a couple of places where it isn't (but even if it were, it obviously it has never been tested, so "by default" I assume there is at least a bug somewhere until someone or I proves otherwise). As such, part of the work of this PR will definitely include double checking all existing instances of MpiInt as well as the provenance of all the count fields for all MPI invocations. I haven't checked yet if this needs to involve Fortran too.

@albandil
Copy link

MpiInt is correctly implemented among the code

MpiInt is almost certainly not implemented completely. Even if the implementation was correct and complete, its aim was different: To conform to a possible (but in practice non-existing) ILP64 MPI interface, i.e., assuming all integers to be long integers, not only the counts. However, the large-counts MPI API is different. On top of that, I also see some places where MpiInt is not used even though it should (e.g., MPI_Irecv in BI_Arecv uses the attribute N of struct BLACBUFF as a count, but this is Int, not MpiInt).

I believe that the best approach is not to pay attention to the original use of MpiInt and replace that macro by the otherwise omnipresent Int in the few places where it appears. Then, starting with a clean MpiInt-less code, to carefully check all MPI calls that their arguments are correctly typed for the chosen MPI library and perform conversion as needed. This was also suggested in #83.

@davidedelvento
Copy link
Author

MpiInt is almost certainly not implemented completely. Even if the implementation was correct and complete, its aim was different: To conform to a possible (but in practice non-existing) ILP64 MPI interface, i.e., assuming all integers to be long integers, not only the counts. However, the large-counts MPI API is different. On top of that, I also see some places where MpiInt is not used even though it should (e.g., MPI_Irecv in BI_Arecv uses the attribute N of struct BLACBUFF as a count, but this is Int, not MpiInt).

I noticed these things and I hinted to them with my "I believe I've seen a couple of places where it isn't", but I have not payed too much attention to them yet. I will. As such, it may be easier to just scrap the existing MpiInt occurrences and create new ones for count only, I see that. I think we are in the same page here.

I believe that the best approach is not to pay attention to the original use of MpiInt and replace that macro by the otherwise omnipresent Int in the few places where it appears. Then, starting with a clean MpiInt-less code, to carefully check all MPI calls that their arguments are correctly typed for the chosen MPI library and perform conversion as needed.

I am not sure I understand correctly what you are saying here. Int swapping for int or long cannot be used for MPI_Count since that is a MPI-specific thing which could potentially be implemented with something different than long. But you know that and we may be saying the same thing I wrote in the previous paragraph of this comment. To be super-clear my plan is to do the following... disregard the fact that MpiInt already exists, assume I have removed it everywhere and I am reintroducing it according to the following:

  1. if one is compiling with BigMPI or MPIv4 #define MpiInt MPI_Count (as already done in the .h)
  2. otherwise #define MpiInt int (as already done in the .h)
  3. replace all MPI_whatever invocations in the code with _MPI_whatever wrappers (per the example .c file) and make sure all of the latter use MpiInt as a type for count (which isn't done at the moment in the example)
  4. wrap all _MPI_whatever invocations to one of MPIX_whatever, MPI_whatever_c or MPI_whatever depending on how one is compiling the code (as already done in the .h)

Do you have any objections?

I believe this is the same thing suggested in #83 (comment) (but there it wasn't so explicit, hence I'm not totally sure of the original poster intentions).

@albandil
Copy link

Do you have any objections?

No objections, this sounds good to me.

My unclear comment about removing MpiInt first was meant in this way: Currently, MpiInt is also used in places where it is unnecessary and potentially incorrect whenever MpiInt stands for MPI_Count. An example is Cblacs_get in "BLACS/SRC/blacs_get_.c", where an MpiInt is used for the argument flag of MPI_Comm_get_attr. I believe that this argument is int in every implementation of MPI. So, in this particular occurence, MpiInt should be unconditionally reverted to int. There might be a few other places of this kind (related to the original ILP64 MPI interface ambition) that will require similar action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants