The simplest way to compare two trace files or two time intervals from the same trace file is to open two Views and to look at them next to each other. While this provides a rough overview, a Comparison View allows to calculate the exact differences and speedups between two runs or between two ranges of the same run. To open a Comparison View for two files, open the files and choose-> -> in one of the Views.
Choose the other View from the dialog that appears. Notice that the dialog provides the opportunity to open another file.
Figure 6-1 shows a comparison View. The two labels right below the View's menu bar indicate which trace files are shown in the Comparison View. These files are denoted as file A and B in the entire Comparison View.
If you move the mouse pointer into the View's status bar you will notice that it now consists of two lines: one for file A and another for file B. The labels and controls for file B are shaded as are the charts that refer to file B only. Note that the comparison View inherits the time interval, aggregation and filter settings from the normal Views that are chosen to create the comparison View from.
It is perfectly valid to create a comparison View from two regular Views showing the same file. This allows to compare either different time intervals or different subsets of processes.
When creating a comparison View, three Charts open by default: an Event Timeline each for A and B and a Comparison Function Profile.
The rules that a View imposes on its Charts, namely that it enforces the same time interval, aggregation and filters on the Charts are extended for Comparison Views. The Comparison View holds two sets of time interval, aggregation and filters, one for each file.
All Charts that were described in Chapter 4 so far can only show data from a single trace file. This will stay the same and they will be tied exactly to one set of the constraints. If you choose -> -> in a comparison View then actually two timelines will appear, one for each file.
Until now a Comparison View does not provide a really striking advantage over just having two regular Views side by side. Additional benefit comes in when you open one of the now available Comparison Charts. The Charts menu of a Comparison Chart contains new comparing variants of the profiles that will calculate differences and speedups between the two runs. These new Chart variants are explained in Section 6.2. In a nutshell they provide the same displays as the usual profiles but they can calculate values for A-B, B-A, A/B and B/A.
An additional menu entry-> provides some control over the Comparison View's behavior.
-> -> is switched off per default. Selecting this option causes all timelines to use the same scaling. For example, when you look at 2 seconds of file A and 4 seconds of file B the timelines for A are shortened so that they occupy half the width of the timelines for B to allow for easier visual comparison.
If however the time intervals for A and B differ by more than a factor of hundred, this setting is ignored and the timelines are aligned as usually to avoid numeric exceptions and distorted diagrams.
-> -> is switched off per default. If switched on zooming with the mouse in a timeline that belongs to file A will zoom to a corresponding time interval in file B and vice versa.
The proverbial error in doing any comparison is to compare apples and oranges. When comparing two program runs this could be to compare the time that a process A.P0 spent in a function in run A to the time of another process B.P0 in run B with or without caring for the fact that B.P0 did only half of the work because run B used twice as much processes.
It is quite easy to see that depending on the domain decomposition or load balancing that is done in the application the meaningful mapping between the processes of two runs can not be determined automatically. There might be even no such mapping: imagine to compare a run that did a domain decomposition of a cube into 8 processes 2x2x2 with a run that used a 3x3x3 decomposition.
But functions and function groups can be mapped between the runs just by their fully qualified name. This works as long as the structure of modules, namespaces and classes is not changed dramatically.
It is next to impossible to even enumerate all combinations of parameters that might have changed between two runs. To foresee all these cases in terms of a automatically adapting GUI does not look promising.
Based on these considerations the mappings of processes, functions, communicators and message tags between the runs are handled differently:
Communicators are mapped by their Ids. Message tags are mapped literally by their value.
The mapping of processes and process groups is controlled by choosing the process aggregations for both files as outlined in Section 6.1.1.
The mapping between functions and function groups is handled automatically as outlined in Section 6.1.2.
Assume that run A had A.P0, A.P1 and run B had B.P0, B.P1, B.P2, B.P3 and assume that A.P0 did the same work as B.P0 and B.P1 and A.P1 did the same work as B.P2 and B.P3. To get a Comparison Message Profile that is meaningful under these assumptions choose the aggregation as shown in Figure 6-3 and Figure 6-4. Here for the run B a process aggregations into two halves was chosen.
Figure 6-3. Creating a suitable process group for the comparison between a 2 and a 4 processor run in the Process Group Editor
The message profile shows the quotient B/A of the average transfer rate. The rule that the Comparison Message Profile (and in fact the whole Comparison View) uses to map the senders and receivers of the two runs onto each other is quite simple: child number i of run A's process aggregation is always compared with (mapped to) child number i of run B's process aggregation.
Functions of the two files A and B are mapped onto each other by their fully qualified name. This name contains not only the mere function name, but a hierarchical name that is constructed by the Intel® Trace Collector using any information about modules, name spaces and classes that was available at trace time.
For example the fully qualified name of MPI_Allreduce will be MPI:MPI_Allreduce because Intel Trace Collector puts all MPI functions into the group MPI. Function groups that were defined by the user in ITA have no influence on these full function names. The function group editor described in Section 5.4 shows the fully qualified name of a function in a small tooltip window when the mouse hovers long enough over an entry.
The mapping of function groups is a little more subtle. For function groups that are within the hierarchy of the automatically created function group "Major Function Groups" in file A it is tried to find a matching group in B with the same name and nesting level in the corresponding hierarchy in B.
For automatically generated groups this works quite good. For example MPI is always mapped to MPI even if the groups differ because the two program runs did use a different subset of MPI calls. The same is true for groups that were created by instrumentation using the API provided by Intel Trace Collector.
When you create new function groups either by using the Function Group Editor or the ubiquitous context menu entries to ungroup existing function groups for one file then there will be created matching groups for the other file. You can find these read only groups under the header "Generated Groups" in the Function Group Editor.