3. Literature on comparisons of CA-based and AHP-based preference measurement Helm et al. (2004b) give an overview of existing empirical comparisons between AHP and variants of TCA. The authors conclude that both the complexity of the evaluation task as well as the knowledge of the respondent/decision maker regarding preference measurement influences the quality of the results. According to other papers in this field we categorize a product evaluation problem as being complex if more than six attributes are included in a study (Rao, 2007; Haaijer and Wedel, 2007). If so, more sophisticated CA approaches, e.g., Adaptive Conjoint Analysis (ACA), are used in marketing practice. Table 2 provides an update of the overview given by Helm et al. (2004b) and shows that AHP obviously produces better results than CA if the complexity of the evaluation task is high. The study by Meißner, Scholz, and Decker (2008) in particular indicates that the AHP approach can also be used in online research settings with real consumers who are not familiar with the respective preference measurement approach.

Proceedings of the International Symposium on the Analytic Hierarchy Process 2009 Table 2. Empirical studies on the comparison of CA-based and AHP-based preference measurement

–  –  –

In the present study the description of the product is restricted to six attributes only. Thus the complexity of the evaluation problem can be considered as low. Both CBC and AHP are used in a real consumer research setting, where the respondents are not familiar with the respective approaches.

From the existing empirical comparison of Helm et al. (2004b) one might assume that CBC as a decompositional approach might outperform AHP with respect to predictive validity. However, the above-mentioned modifications of AHP might prove beneficial regarding the performance of AHP.

4. Design of the empirical study To meet the requirements of CBC the most important attributes of a single-cup coffee brewer were determined in a pretest. The number of attributes was restricted to six because otherwise information overload could impair the predictive validity of the decompositional approach (Green and Srinivasan, 1990). The dual questioning technique (Myers and Alpert, 1968) was used to determine the most relevant attributes and levels. Figure 3 shows the respective hierarchy used to describe the single-cup coffee brewers. In order to avoid learning effects and keep the survey length manageable, we used a between subject design, i.e., each respondent had to pass either the AHP or CBC questionnaire. The product description design was the same for both surveys.

Figure 3. Hierarchy of attributes and attribute levels for single-cup coffee brewers The online questionnaire form was divided into three parts: In the first part, the respondents were interviewed about their consumption of coffee.

Respondents who did not drink coffee within the last 12 months were excluded from the survey because it could be assumed that in this case the purchase intention regarding a single-cup coffee brewer and product experience were quite low. Further M. Meißner, R. Decker/ Measuring Consumer Preferences with CBC and AHP questions concerning the intensity of consumption (cups per week) and prior product experience were added. In the second part of the survey, the respondents were first informed about the attributes and attribute levels describing the different single-cup coffee brewers by means of brief textual and pictorial descriptions. Then, the respondents were randomly assigned to either the AHP or the CBC questionnaire.

The CBC approach was conducted via the popular CBC software of Sawtooth Inc., the leading provider of CA tools. The survey was set up according to the recommendations given in the Sawtooth Software manual. The respondents had to answer 13 choice tasks including three alternatives and a nochoice option. Hierarchical Bayes estimation (a common standard when applying CBC in preference measurement) was used to estimate part-worth utilities on the individual level. The AHP survey was set up as outlined in Section 2. Since the product evaluation problem included six attributes, we only conducted 12 out of the 15 (=6∙(6-1)/2) pairwise comparisons and used Harker’s (1987) approach to calculate the importance weights. All in all, the AHP questionnaire included 37 (=12+6+6+3+6+1+3) pairwise comparisons.

As is common practice in marketing research, two holdout choice tasks were included in the last part of the survey to measure the predictive accuracy of the two approaches. Each choice task consisted of three alternatives which were described on all six attributes included in the study. Standard instructions for selecting the best alternative were used (Sawtooth Software, 2008). The second choice task was presented twice in order to gain a measure of test-retest reliability at the beginning of the preference measurement part of the interview.

Furthermore, three rating scales were added to measure the perceived realism, the difficulty and the

enjoyment of the survey. The respective questions read as follows:

–  –  –

At the end of the survey some additional questions on socio-demographics (age, gender, profession, size of the household) were asked. The respondents were invited to participate in the survey via a large public email directory. In total 61 and 58 respondents completed the AHP and CBC questionnaires respectively. Chi-square homogeneity tests indicate that the two samples stem from the same population. Therefore, the comparison of AHP and CBC should lead to meaningful results with respect to convergent and predictive validity.

5. ResultsI. Face Validity

Product attributes can be nominal, ordinal or even quantitative (Orme, 2002). In the first case, it is unknown a-priori whether a respondent prefers one level over another. In the other two cases, most respondents would usually prefer the given levels in a certain order. For example, it can be assumed that a low price will be preferred to a higher one, but it can hardly be anticipated which of several designs will be favored. In this study the attributes “price”, “material” and “price of a cup of coffee” can be assumed to be ordinal. The respective part-worth utilities should therefore follow the a-priori ranking of the attribute levels. The frequency of fulfilling this assumption is used as a measure of face validity.

Table 3 shows that for the majority of respondents the a-priori assumption is correct in both studies.

However, the face validity is significantly higher for AHP compared to CBC in case of the attribute Proceedings of the International Symposium on the Analytic Hierarchy Process 2009 “price”. A possible explanation for this difference is that the holistic product presentation in CBC significantly affects price expectations. In some cases respondents might prefer a higher price to a lower price, if they inferred the quality of the product from the price.

Table 3. Frequency of fulfilling the a-priori assumptions (in %)

–  –  –

II. Convergent Validity High convergent validity can be presumed if two different measurement methods come to similar results (Helm et al., 2004a). In the following, Spearman’s coefficient is used to quantify the extent of rank correlation between part-worth utilities on the aggregate level. The part-worth utilities of AHP are calculated by multiplying the local priority weights of the attribute levels with the corresponding importance measures. In order to make the AHP part-worth utilities comparable to the ones of CBC, they were transformed by normalizing the local priority weights within each sub-problem such that they sum up to zero (Mulye, 1998).

Figure 4 depicts the part-worth utilities as well as the attribute importances (in %). As can be seen, the part-worth utilities of both approaches show high structural similarity. Both approaches identify “price”, “price of a cup of coffee” and “material” as the most important attributes on the aggregate level. The rank correlation between the part-worth utilities of both approaches equals.93 and the corresponding correlation of the attribute importances equals.86. The ranking of the part-worth utilities is the same for four of the six attributes. Differences occur for the attribute “brand” and “design”.

The results are quite similar on the aggregate level. However, with respect to the differences regarding attributes brand and design, it cannot be said which method provides the better results. We therefore investigate the predictive validity of the two approaches on the individual level.

Figure 4. Aggregate part-worth utilities and attribute importances for single-cup coffee brewers III.

Predictive Validity M. Meißner, R. Decker/ Measuring Consumer Preferences with CBC and AHP In empirical studies the predictive validity is investigated by means of holdout choice tasks for two main reasons (Johnson, 1997): First, they provide an indication of validity, measured by the part-worth utilities’ ability to predict choice decisions. Second, they permit the identification and removal of inconsistent responses. In preference measurement it is advantageous to repeat at least one of the holdout concepts to assess the test-retest reliability. In a between-subject design it is critical to adjust hit rates by the test-retest reliability for each group in order to determine whether one method performs better than the other (Orme, Alpert, and Christensen, 1997). The repeated holdout task can be used to calculate a theoretical upper limit for holdout predictability. In our study one of the holdout tasks was shown twice. The corresponding test-retest reliabilities are p(AHP) = 77.04 % and p(CBC)=77.59 % respectively. According to Wittink and Johnson (1992) the maximum expected hit rate can be

calculated as follows:

–  –  –

Two holdout choice tasks with three alternatives each were used to measure the predictive validity.

We compared the overall utilities of the alternatives in the holdout tasks with the actual choices of the respondents in the questionnaire. The first choice hit rate equals the frequency with which a method correctly predicts the single-cup coffee brewer chosen by the respondent. On the aggregate level we estimated the choice shares in the holdout tasks. The mean difference between the predicted choice shares of an alternative and the actual choice shares is captured by the average mean absolute error. As in many other comparative studies we used the first choice rule for market share predictions (Green and Srinivasan, 1990).

Table 4. Individual holdout validations

–  –  –

Table 4 shows the first choice hit rates for the two holdout tasks and both preference measurement approaches. Compared to random prediction, which would lead to a first choice hit rate of 33% both methods perform reasonably well. CBC achieves slightly higher hit rates than AHP, but the difference is not significant at the.05 level. Bearing in mind that the reliability of the given answers was not that

high in both samples, the predictive accuracy of both approaches seems to be quite good:

68.03%/86.77% = 78.40% (AHP) and 69.83%/87.14% = 80.14% (CBC) of the reliable responses have been correctly predicted.

Table 5. Aggregate choice share validations

–  –  –

Table 5 shows that AHP significantly (.05 level) outperforms CBC with respect to the mean absolute error of choice shares in the two holdout tasks. On average the choice shares predicted by AHP deviate from the real choice shares by 1.64%. This error is about more than three times higher for the CBC predictions.

Proceedings of the International Symposium on the Analytic Hierarchy Process 2009 To sum up, AHP seems to be at least on par with CBC with respect to predictive accuracy. This is a very surprising result if one considers that the majority of today’s market research institutions use CBC to perform market share predictions.

IV. Practical applicability To compare the practical applicability of AHP and CBC, we used the respondents’ subjective evaluations (Helm et al., 2004b; Meißner et al., 2008) as described in Section 4. In marketing practice, respondents’ ability and willingness to participate in a study is a major concern as the costs of a survey rise with interview length and undesired cancellations of the survey. Moreover, the reliability of the results might decrease, if respondents become fatigue during the interview.

In the CBC sample the respondents had to choose between three holistic alternatives. Thus the cognitive effort should be higher for CBC than for AHP. In the latter only two attributes or attribute levels have to be evaluated at a time. Therefore questions might be much easier to answer and it can be assumed that respondents evaluate CBC to be more difficult. Due to the holistic presentation of product profiles in CBC, this approach should be evaluated as being more realistic compared to AHP.

According to Table 6 all three measures considered are significantly higher/better for AHP.

Surprisingly, CBC is not rated as being more realistic. One explanation for this might be that the choice tasks are too complex which results in some kind of information overload. This, in turn, might also impair the enjoyment of the survey. Interview length is another indicator for the practical applicability of an approach. In the present case 7.3 minutes were needed on average to complete the CBC survey and 8.0 minutes for the AHP questionnaire.

Table 6. Average values of measures for practical applicability

–  –  –

In sum, the results indicate that respondents answering the AHP questionnaire would be more motivated than those answering the CBC questionnaire, albeit the CBC interviews were on average shorter than the AHP interviews. This difference was not significant at the.05 level.

6. Conclusion Today, conjoint analytic approaches are marketer’s favorite instrument in research and practice.

