Age-specific mortality curves (see previous exercise) provide much useful information. But they also mask a great deal of additional information, because all individuals that died within a year are lumped together. Obviously, not all of these individuals died at exactly the same time (barring a major catastrophe), but were spread throughout the year. But was mortality constant, or was it concentrated during specific seasons? Answering this question is easily accomplished in species that have annual growth breaks on the skeleton. These growth breaks are almost always due to cold winter temperatures when metabolic rates become so slow that growth essentially ceases for several weeks to a few months. The amount of growth since the last annual growth break can be used as a quantitative measure of the season of mortality.
The major complication in estimating season of mortality arises from growth rate differences among individuals. For example, in many invertebrates growth rates are highest in juveniles and lowest in adults (who are diverting energy into reproduction). Consequently, small individuals would be expected to have larger absolute growth rates, than large individuals. To make size-independent comparison of season of mortality it is necessary to calculate the relative growth since the last annual growth break. Relative growth is simply the growth in the current year (measured in mm) divided by the growth for the immediately preceding complete year. Individuals of all sizes that died in early spring would have small relative growth values, while those that died later in the year would have progressively larger values.
Multimodel distributions are more common, but much more difficult to analyze. Figure 1 shows the relative growth-frequency distribution for a sample of oysters (Crassostrea virginica) from St. Mary's County; data from B. W. Kent. 1992. Making dead oysters talk. Techniques for analyzing oysters from archaeological sites. Maryland Historical & Cultural Publ., Crownsville, MD, 76 pp. Clearly there are at least three modes, but the properties of individual modes are not easily extracted from the graph. Analyzing this data requires the use of cumulative frequency plots, which are capable of isolating and statistically describing each mode.
Cumulative frequency plots are based on the observation that when a normal distribution is plotted on probability paper as cumulative frequency (i.e., the frequency of a given class plus that of all smaller classes) a straight line is produced. Because a multimodal plot consists of a series of overlapping normal distributions (i.e., one for each mode) its cumulative frequency plot is a composite of the cumulative frequency plots of the individual modes. The secret to analyzing multimodal distribution is to use the cumulative frequency plot to recalibrate the data so individual modes are then separated on a second cumulative frequency plot. While the procedure seems terrifyingly complex, it is easily mastered and is applicable to an enormous range of biologically interesting data sets.
The most bothersome aspect of analyzing data with cumulative frequency plots is the use of probability paper. The y-axis on such paper is not linearly scaled and records a complex integration of the cumulative area under a normal distribution. Unfortunately, commercially available graphics software rarely have this capability, forcing the researcher to produce these plots manually. However, there is a simple alternative. By transforming cumulative frequencies into z-values (= normalized, standard values), plots can be easily graphed with any graphing package, since z-values have linear scaling.
To perform a cumulative frequency plot of a multimodal data set requires the following seven steps:
1. Change class frequencies into relative frequencies (i.e., divide the number in each class by the total number of individuals). Use these values to generate cumulative frequencies. REMEMBER: the cumulative frequency of a given class is the proportion of the total sample in that class PLUS the proportions of the total sample in all smaller classes.
2. Transform the cumulative frequency values to z-values. Z-values can be determined from Table 1, or by using the NORMSINV function in Excel. The format for this function is =NORMSINV(freq), where freq has a value between 0 and 1. Plot these z-values (dependent variable) against relative growth classes (independent variable) as a line plot. This graph is the primary cumulative frequency plot.
3. Determine inflection points. Inflection points on a cumulative frequency plot occur where the slope abruptly increases. This can be determined either by inspection, or mathematically (see below).
4. Recalibrate non-inflection points with the equation:
C' Equation (in zplot.xls)
PN = recalibrated cumulative frequency for class N,
PN = cumulative frequency for class N
IL = cumulative frequency at inflection point below class N, and
IU = cumulative frequency at inflection point above class N.Classes located at either end of the cumulative frequency plot present a slight problem because they are not located between inflection points. For classes above the last inflection point use 1 as the value of IU. Due to idiosyncrasies of the recalibration process, the last data point can not be recalibrated and is ignored. For classes below the first inflection point, use 0 as the value of IL.
5. Transform the recalibrated cumulative frequency data to obtain the corresponding z-values.
6. Plot the z-values for the recalibrated data. This graph is the secondary cumulative frequency plot and represents each mode (i.e., the data between successive inflection points) individually. Use linear regression to find the straight line that best fits the points for each mode.
7. The mean for each mode occurs where the regression line equals a z-value of zero. The equation for the regression line can be rearranged so the mean can be easily calculated where the value of y = 0.
Regression Line Equation (in zplot.xls)
8. The standard deviation for each mode (sM) is calculated from the regression values where z equals 1 and -1 (see previous equation) that are then used in the equation:
9. Calculate the percentage of the sample in each mode: where: PM = proportion of individuals in mode M.For the mode below the first inflection point, the percentage of individuals is simply equal to the value of the first inflection point.When transformed in this way, the frequency data in Figure 1 forms the primary and secondary cumulative frequency plots shown in Figure 2.
Since this procedure is rather complicated, an example is available on the lab computers in the spreadsheet zplot to help clarify the process. The data in this file was the data used to generate both Figures 1 and 2.
The spreadsheet zplot contains four worksheets, labeled raw data, freq data, freq plot and z plot. For the example used here, only the freq data, freq plot and z plot worksheets contain data. Review the data in the zplot spreadsheet. Make certain you understand how each of the calculations built into this spreadsheet operates. One slightly unusual aspect of the freq plot worksheet is found in columns E and F. Column E calculates the change in cumulative frequency between consecutive classes. Inflection points occur where the changes are the smallest. Column F ranks the changes in cumulative frequency from smallest to largest. This helps identify inflection points, since they will have the lowest rank values.
Also, note that the linear regression lines are not inserted using the trendline algorithm in Excel. One disadvantage of the trendline function is that it extrapolates well beyond the data set, causing the axes of the graph to be much larger than needed. One way to avoid this problem is to separately calculate the y-intercept (= a) and slope (= b). This can be done using the functions:
The freq data worksheet calculates these values at the bottom of the data fields in columns I through L. These values are then used to plot regression line values in columns M through P.
NOTE: The zplot spreadsheet has been loaded into the four computers in the lab room.
THE ASSIGNMENT: Analysis of Seasonal Mortality in Anadara staminea
- Once you understand the internal structure and operation of the zplot spreadsheet, clear the data from the cells in the blue fields, but leave the calculations in the other fields intact. You can simply modify this spreadsheet to perform the analyses in this exercise.
- Reexamine the A. staminea specimens studied in Part 1 of this exercise. Using the optical scale on the calipers available in the laboratory, measure: (1) recent growth beyond the last growth break (= new growth), and (2) growth during the previous complete year (= old growth). Some specimens may be too badly worn to retain obvious growth breaks, and should be ignored.
- Place values in the appropriate cells in the zplot spreadsheet.
- Although zplot performs many functions for you, there is no way to anticipate how many modes will be present in your data, and where their inflection points will be. For that reason, it will be necessary for you to enter some equations and functions manually. These include:
- C' equations in column G
- The "=NORMSINV($G_)" function in the Modes columns
- The "=INTERCEPT(cellY1:cellYn,cellX1:cellXn)" and "=SLOPE(cellY1:cellYn,cellX1:cellXn)" functions in a= and b= cells.
- Values of y for the regression equation in the regression fields
- Equations for calculation of percentage of sample for each node.
- Using this dedicated spreadsheet, you should calculate all of the values you need to obtain the primary and secondary cumulative frequency plots and determine the mean, standard deviation and percentage of sample for each mode.
- Print out copies of your final graphs
- Relative growth frequency histogram
- Primary cumulative relative growth frequency graph
- Secondary (Z - transformed) cumulative relative growth frequency graph
- Indicate mean, standard deviation and percentage of sample for each mode.
- Based on your evaluation of this population of A. staminea reconstruct a plausible life history scenario that explains the observed distribution of seasonal mortality in your data. Feel free to consult library and/or on-line sources of information on the life histories of bivalves to assist you.
- Come to lab next week prepared to present (as a group) your results and interpretation.
Cumulative frequency plot analysis is a very robust technique with wide applicability in paleontology, archeology and ecology. Since cumulative frequency plots can be applied to virtually any multimodal data set, they can be used to investigate a number of interesting questions. Probably the most important use in paleobiology, other than looking at seasonal mortality patterns, is for elucidating age classes of fossil species. While many organisms produce annual growth breaks (e.g., bivalves, such as A. staminea, examined here), many other animals do not. Major fossil groups, such as trilobites, molted the skeleton and completely lacked annual growth breaks. A cumulative frequency plot analysis of size-frequency data can be used to identify and evaluate individual age classes.