4.5. The Function Profile Chart

The Function Profile provides detailed profiling information on the performance data. It consists of four different tabs, namely the Flat Profile tab, the Load Balance tab, the Call Tree tab and the Call Graph tab.

All four of these tabs use the same column headers with the same semantics, and use the same raw data. The default column headers on display are Name, TSelf, TTotal, #Calls and TSelf/Call. For a detailed explanation of all available column headers refer to Section 4.5.5.1. The order of these columns are adjusted by dragging headers of columns.

Figure 4-24. Function Profile

To sort a list in ascending or descending order, click on a column header. To see which process spends the most time (or the least time) in a function, click TSelf and the entries are sorted by this column. The arrow symbol in the column header indicates whether it is arranged in ascending or descending order.

Note: Sorting by the name column does not sort alphabetically. Instead, it sorts in the order given by the layout of the current process or function group.

The number formatting options are preset globally via the Number Formatting Settings dialog (refer to Section 5.13). To increase the number of digits locally by three (or one) digits press the key "+" (or CTRL"+"). Use the keys "-" (or CTRL"-") to revert this action. Notice that the exact effect of asking for additional digits depends on the format chosen in the Number Formatting Settings dialog for the respective unit.

4.5.1. Flat Profile

By default, the Flat Profile summarizes all major groups of functions and presents statistics over the processes. The exact contents of these groups depend on the group definitions stored in the trace file or as defined by the user; in the file poisson_icomm.single.stf, this is only MPI and Application.

The Chart in Figure 4-24 shows that most of the time was spent in MPI, which is considered as pure overhead. To see the distribution of execution time over the individual MPI routines, right-click on the MPI entry and select Ungroup Group MPI from the context menu as shown in Figure 4-25.

Figure 4-25. Ungrouping the function group MPI via the context menu

This causes the single MPI entry to be replaced by several entries - one for each MPI function (see Figure 4-26). To regroup the children of MPI, right-click on a child and choose Regroup MPI from the context menu or select Major Function Groups from the Function Group Editor (Views Menu->Advanced->Function Aggregation).

Figure 4-26. Flat Profile after ungrouping MPI

The default settings ensure that all statistics are summarized over all threads into a single profile. All tabs provide the option of viewing the data for each process separately. To do this, use the combo box at the top of the tab as shown in Figure 4-27. For example, selecting Children of Group All_Processes results in Figure 4-28. The processes are now listed as the top-level entries in the tree (first column). To expand and collapse the processes of interest, use the plus and minus handles (see Figure 4-28).

Figure 4-27. Selecting Profiles per process

Figure 4-28. Showing children of process group All Processes

4.5.2. Load Balance

The Load Balance tab displays the same data as the Flat Profile except that it is grouped by function instead of by process. The Load Balance tab compares the profiles of the same function across several processes. Here, the top level entries of the tree, given in the first column, are functions. Figure 4-29 shows that TSelf for MPI_Allreduce is pretty unbalanced across processes.

Figure 4-29. Load Balance for MPI_Allreduce

As in the Flat Profile, the Load Balance summarizes the statistics into a single profile using Group All_Processes. To view the data for each individual process in a given function, use Children of Group All_Processes. Similarly, the functions in the Load Balance tab are ungrouped using Ungroup/Regroup on the context menu. Ungrouping displays all major function groups. To group all processes together and to view it as a single profile, select Group All_Processes.

Figure 4-30. Pie diagrams in the Load Balance tab

The Load Balance tab offers to display the data in form of pie diagrams (refer to Figure 4-30). The button in the top right corner of the tab allows to switch back and forth between the usual list and the pie diagrams. This allows to judge the overall load balance pattern (for TSelf) even among a huge number of processes in a relatively confined space. Above the pie diagrams are two sliders. The left one controls the minimum radius of the pies and the right one controls how many pie diagrams appear in a row.

4.5.3. Call Tree

When the Load Balance and Flat Profile tabs do not show enough detail, use the Call Tree tab to include calling dependencies in your analysis. The Call Tree tab shows the same information as the Flat Profile and Load Balance, but also includes the calling hierarchy.

Select a certain entry in the Call Tree to focus on it. The focus remains on this entry even when the time interval is changed due to scrolling or zooming. It stays selected and visible when possible. If a corresponding entry is absent for the new time interval, then its parent is selected. This feature is very useful in large and deeply nested call trees.

Figure 4-31. The Call Tree tab

4.5.4. Call Graph

The Call Graph tab shows a small part of the call graph for each process or process group: a single node (called central function) with its inbound and outbound edges. Each process entry has three children: the Callers, the central function and the Callees.

Figure 4-32. The Call Graph tab

To navigate through the Call Graph, double-click on a caller or callee and press the space bar or Enter key to make the respective function the central node.

The time shown for the central function is the same as shown in the Flat Profile tab and the Load Balance tab. The times shown for the callers represent the time spent in the central function when called from the respective function.

If a function is used within different contexts (by different algorithms for example), then it can be observed which algorithm causes a function to consume more or less time. In Figure 4-32 it is seen which caller is responsible for most of the time spent in MPI: it is the function group Application (and not Forward, Adjoin, cg or Smoother). Using the Call Graph this way helps finding places in the code that cause expensive calls, even when the call tree gets too big to navigate through it.

4.5.5. Using the Function Profile

The following sections describe the columns headers of the Function Profile and how to define these headers using the Function Profile Settings dialog box.

4.5.5.1. Function Profile Settings

The Function Profile Settings dialog box enables customizing display options for all four tabs of the Function Profile Chart. To access the Settings dialog box, right click and select Function Profile settings from the context menu.

Figure 4-33. The Function Profile settings dialog

  • Preferences Tab

    In the Preferences tab, there are four groups of options. The first one is the Display Group, which consists of check boxes. Use these check boxes to select the attributes to be displayed. There are a total of eight attributes available, out of which four are selected by default (Time Self, Time Total, #Calls and Time Self per Call). All eight attributes are described below:

    • Time Self (TSelf): Time spent in the given function, excluding time spent in functions called from it

    • Time Total (TTotal): Time spent in the given function, including time spent in functions called from it

    • #Calls: Number of calls to this function. This can be zero even if other attributes are non-zero, because the actual calls to the respective function can occur outside the current time interval.

    • Time Self per Call (TSelf/Call): Time Self averaged over #Calls

    • Time Total per Call: Time Total averaged over #Calls

    • #Processes: Number of processes in this function

    • Time Self per Process: Time Self: averaged over #Processes

    • Time Total per Process: Time Total averaged over #Processes

    Using the given check boxes, the displaying of the above attributes either as text or as a bar graph can be switched on or off independently.

    Use the radio buttons to specify the format for time (seconds or ticks) or to specify time as a percentage of the time interval.

    There are three scaling modes in the Preferences tab and these are given as radio buttons. The default (Visible Items) scales the bars to the respective maximum of all expanded items. All Items uses the global maximum of all values, regardless if they are expanded or not. Siblings uses only the maximum of the direct siblings. In all three scaling modes only values from the same column are taken into account.

    At the bottom of the Preferences tab, there is a Function Colors command button. Clicking on this opens the Function Group Color Editor (see Section 5.5).

  • The Processes Tab

    In the Processes tab, select the processes to be displayed in the Chart by enabling the check box of the process. After selecting these in the Processes tab, the selected processes are shown in any of the Function Profile tabs by choosing the As selected in Settings option from the combo box (See Figure 4-27). An easy way to select all but one process is to choose the process not required and then using the Invert All to reverse the selection. Doing this has no influence on the current process group of the View, it only allows to focus the Function Profile on a subset of all processes.

  • Pie Tab

    The Pie tab contains check boxes to switch the individual diagram titles and the global legend on and off.

4.5.5.2. The Context Menu

The context menu, obtained by right-clicking on an item, contains a set of operations that are performed on the clicked item and on the Chart as a whole. The context menu adjusts itself to suit the selected entry in the Chart.

The Show All_Processes/xxx in entry in the context menu shows the given profile in a different tab. Here, xxx stands for the Function group name. For the given example, this would be either the function groups MPI, Application or the function group Other.

Another context menu entry is the Ungroup option. This ungroups the selected group and shows the distribution of execution time over the individual routines, as is illustrated in Figure 4-25 and Figure 4-26. To revert the ungrouping, right-click a child of a recently ungrouped function group and select Regroup from the context menu. To restore the summarized display after ungrouping a number of times, it is easier to open the Function Group Editor using Views Menu->Advanced->Function Aggregation and select Major Function Groups.

The Find entry searches for a process/function (See Section 5.11).

To save the flat profile data in text form, choose the context menu entry Export (Flat) Data. This opens a File Save dialog box. Specify the filename or choose the file in which to store the data here. This includes all data of the flat profile also taking into account the child processes. The default option is to save it as a .txt file.

Context Menu->Charts opens another sub-menu, which contains entries to print, save, clone and move the Chart (see Section 4.8).

Figure 4-34. A context menu with a submenu

4.5.5.3. Filtering and Tagging

Tagged entries are shown using a bold font for the name column. Entries with tagged descendants are shown with underlined names. This helps to see or find the required entry, especially when the tree is large. For more details on tagging and filtering refer to Section 9.3.

Figure 4-35. Tagged entries in the function profile