Standardized assessments are widely used to determine access to educational resources with important consequences for later economic outcomes in life and to evaluate and compare educational systems. However, many design features of the tests themselves may lead to psychological reactions influencing performance. In particular, the level of difficulty of the earlier questions in a test may affect performance in later questions. How should we order test questions according to their level of difficulty such that test performance offers an accurate assessment of the test taker's aptitudes and knowledge? We conduct a field experiment with about 19,000 participants in collaboration with an online teaching platform where we randomly assign participants to different orders of difficulty and we find that ordering the questions from easiest to most difficult yields the lowest probability to abandon the test, as well as the highest number of correct answers. We obtain similar results when exploiting the random variation of difficulty across test booklets in the Programme for International Student Assessment (PISA) for the years 2009, 2012, and 2015, which provides additional external validity to the experiment. We conclude that question difficulty order in tests has important policy implications for optimal test design and performance. It additionally may have important implications for ranking candidates, as well as for the evaluation and comparison of educational institutions and systems.
Multiple-choice tests are extensively used to measure individuals’ knowledge and aptitudes. We study gender differences in willingness to guess using approximately 10,000 multiple-choice math tests, where, for all participants, in half of the questions, omitted answers were rewarded while for the other half they scored the same as wrong answers. Using a within-participant regression analysis, we show that female participants leave significantly more omitted questions than males when there is a reward for omitted questions. This gender difference, which is stronger among high ability and older participants, hurts female performance as measured by the final score and position in the ranking. We conclude that it is important to use gender neutral scoring rules that do not differentiate between wrong answers and omitted questions in order to accurately measure individuals’ knowledge and aptitudes.
In two-stage elimination math contests participants from four different age groups compete to pass from stage 1 to stage 2 and later to be mong the winners. Although female participants have higher Math grades at school the gender gap reverses in the two stages of the contests. More importantly, following the same individual participant across different stages, we find that the gender gap in performance increases from stage 1 to stage 2 of the competition. The increase in female underperformance is attributed to higher competitive pressure and alternative explanations based on selection, discrimination and differences in reaction to increasing difficulty are ruled out.
We show that the existence of gender differences in performance is highly sensitive to the task used to measure performance, to existing stereotypes and to informational conditions. Out of sixteen purposely designed treatments we find that women underperform when competing only when two conditions are met: 1) the task used is perceived as favoring men and 2) the presence of a rival is strongly primed through information provided before competing. Such sensitivity sheds light on the contradictory evidence found on stereotype-threat causing gender differences in performance under competition.
We report evidence from a large field experiment that compares the effectiveness of contingent and noncontingent incentives in eliciting costly effort for a large range of payment levels. The company with which we worked sent 7,250 letters asking customers to complete a survey. Some letters promised to pay amounts ranging from $1 to $30 upon compliance (contingent incentives), whereas others already contained the money in the request envelopes (noncontingent incentives). Compared to no payment, very small contingent payments lower the response rate while small noncontingent payments raise the response rate. As expected, response rates rise with the size of the incentive offered. The response rate in the noncontingent incentives rises more rapidly for low amounts of incentive, but then flattens out and reaches lower levels than under contingent payments. We discuss how the optimal policy regarding the use of each size and type of incentives crucially depends on firms’ objectives.
Using data from modified dictator games and a mixture-of-types estimation technique, we find a clear relationship between a classification of subjects into four different types of interdependent preferences (selfish, social welfare maximizers, inequity averse, and competitive) and the beliefs subjects hold about others’ distributive choices in a nonstrategic environment. In particular, selfish individuals fall into false-consensus bias more than other types, as they can hardly conceive that other individuals incur costs so as to change the distribution of payoffs. We also find that selfish individuals are the most robust preference type when repeating play, both when they learn about others’ previous choices (social information) and when they do not, while other preference types are more unstable.
Affirmative action policies bias tournament rules in order to provide equal opportunities to a group of competitors who have a disadvantage they cannot be held responsible for. Its implementation affects the underlying incentive structure which might induce lower performance by participants, and additionally result in a selected pool of tournament winners that is less efficient. In this paper, we study the empirical validity of such concerns in a case where the disadvantage affects capacities to compete. We conducted real-effort tournaments between pairs of children from two similar schools who systematically differed in how much training they received ex-ante on the task at hand. Contrary to the expressed concerns, our results show that the implementation of affirmative action did not result in a significant performance loss for either advantaged or disadvantaged subjects; instead it rather enhanced the performance for a large group of participants. Moreover, affirmative action resulted in a more equitable tournament winner pool where half of the selected tournament winners came from the originally disadvantaged group. Hence, the negative selection effects due to the biased tournament rules were (at least partially) offset by performance enhancing incentive effects.
We present evidence from an experimentin which groups select a leader to compete against the leaders of other groups in a real-effort task that they have all performed in the past. We find that women are selected much less often as leaders than is suggested by their individual past performance. We study three potential explanations for the underrepresentation of women, namely, gender differences in overconfidence concerning past performance, in the willingness to exaggerate past performance to the group, and in the reaction to monetary incentives. We find that men’s overconfidence is the driving force behind the observed prevalence of male representation.
We compare behavior in modified dictator games with and without role uncertainty. Subjects choose between a selfish action, a costly surplus creating action (altruistic behavior) and a costly surplus destroying action (spiteful behavior). While costly surplus creating actions are the most frequent under role uncertainty (64%), selfish actions become the most frequent without role uncertainty (69%). Also, the frequency of surplus destroying choices is negligible with role uncertainty (1%) but not so without it (11%). A classification of subjects into four different types of interdependent preferences (Selfish, Social Welfare maximizing, Inequity Averse and Competitive) shows that the use of role uncertainty overestimates the prevalence of Social Welfare maximizing preferences in the subject population (from 74% with role uncertainty to 21% without it) and underestimates Selfish and Inequity Averse preferences. An additional treatment, in which subjects undertake an understanding test before participating in the experiment with role uncertainty, shows that the vast majority of subjects (93%) correctly understand the payoff mechanism with role uncertainty, but yet surplus creating actions were most frequent. Our results warn against the use of role uncertainty in experiments that aim to measure the prevalence of interdependent preferences.
This paper uses subjects’ diverse self-reported justifications to explain discrepancies between observed heterogeneous behavior and the unique equilibrium prediction in a oneshot traveler’s dilemma experiment. Principal components analysis suggests that iterative reasoning, aspiration levels, competitive behavior, attitudes towards risk and penalties and focal points may be behind different choices. Such reasons are coherent with same subjects’ behavior in other tests and experiments in which these particular issues are prominent, and thus, we identify “types” of subjects. Overall, we conclude that subjects’ self-justifications in complex strategic situations contain informational value which may be used to predict behavior in other situations of economic importance.
We ask whether the absence of information about other voters’ preferences allows optimal voting to be interpreted as sincere.We start by classifying voting mechanisms as simple and complex according to the number of message types voters can use to elect alternatives. We show that while in simple voting mechanisms the elimination of information about other voters’ preferences allows optimal voting to be interpreted as sincere, this is no longer always true for complex ones. In complex voting mechanisms, voters’ optimal strategy may vary with the size of the electorate. Therefore, in order to interpret optimal voting as sincere for complex voting mechanisms, we describe the optimal voting strategy when voters not only have no information but also have no pivotal power, i.e., as the size of the electorate tends to infinity.
We report experimental results on a series of ten one-shot two-person 3 × 3 normal form games with unique equilibrium in pure strategies played by non-economists. In contrast to previous experiments in which game theory predictions fail dramatically, a majority of actions taken coincided with the equilibrium prediction (70.2%) and were best-responses to subjects’ stated beliefs (67.2%). In constant-sum games, 78% of actions taken were predicted by the equilibrium model, outperforming simple K-level reasoning
models. We discuss how non-trivial game characteristics related to risk aversion, efficiency concerns and social preferences may affect the predictive value of different models in simple normal form games.
We study optimal contracts in a simple model where employees are averse to inequity, as modeled by Fehr and Schmidt (1999). A “selfish” employer can profitably exploit envy or
guilt by offering contracts which create inequity off-equilibrium, i.e., when employees do not meet his demands. Such contracts resemble team and relative performance contracts. We derive conditions for inequity aversion to be in itself a reason to form work teams of distributionally concerned employees, even in situations in which effort is contractible.
In this paper we study the mechanics of “leading by example” in teams. Leadership is beneficial for the entire team when agents are conformists, i.e., dislike effort differentials. We also show how leadership can arise endogenously and discuss what type of leader benefits a team most.
At the beginning of the 1990s it appeared that there was considerable agreement about the kind of economic policies that countries turning to the IMF and the World Bank should pursue. These included macroeconomic stabilisation, microeconomic liberalisation and openness, and were summarised by the concept of a â€˜Washington Consensusâ€™. How has the Consensus stood up to the passage of time? This article briefly assesses the track record of Consensus-type policies and shows how the Consensus has evolved. With regards to some of its components, a greater sense of agnosticism may now prevail. Moreover, issues that were little or no part of the Consensus have come to the fore. The implications of these changes for institutional design are also investigated
Identifying individual levels of rationality is crucial to modeling strategic interaction and understanding behavior in games. Nevertheless, there is no consensus on how to best identify levels of higher order rationality, and the identification of an empirical distribution remains highly elusive. In particular, the games used for the task can have a huge impact on the identified distribution. To tackle this fundamental problem, this paper introduces an axiomatic approach that singles out a simple class of games that minimizes the probability of misidentification errors. It then shows that the axioms are empirically meaningful in a within subject experiment that compares the distribution of orders of rationality across different games, including standard games from the literature. The games singled out by the axioms exhibit the highest correlation both with the distribution of the most frequent rationality level a subject has been classified with and with an independent measure of cognitive ability. Finally, there is no evidence in our sample of within subject consistency of identified rationality levels across games.
Competitive selection processes may create inefficiencies in the labor market when differences in performance during the selection process are unrelated to differences in performance on the job for which candidates are selected. Using data on the universe of candidates in the highly competitive and high stakes national entry exam into the medical profession in Spain over the past four decades, we first report the evolution of gender differences in exam performance, which translate into important gender gaps as regards the probability of gaining a position (ranging from negative 7% up to positive 9% depending on the period), controlling for individual heterogeneity in ability. We then exploit the large variance in the proportion of available positions with respect to the number of candidates, to show that the observed evolution of gender gaps is compatible with the evolution of the selection process’ competitiveness: the more competitive the process, the higher the underperformance of women compared to men, while when the process shows low competitiveness, women outperform men. Since competitiveness is not a requirement in several professions, planning the number of candidates in coordination with the number of available positions according to the system needs and not other criteria would result in gains of efficiency.
Understanding what affects consumption satisfaction is fundamental to understanding consumer behavior. Measuring satisfaction, however, is not trivial, especially in the context of experience goods where perceived quality is often subjective and unobservable prior to consumption. We report the results of a field study (N = 433) conducted in collaboration with a theater that uses pay what-you-want (PWYW) pricing, inviting audience members to pay at the end of the show. Our analyses indicate that neither expected nor realized enjoyment predict payments once we control for the expectation-realization gap. The paper highlights the managerial implications of these findings.
We study the unique role of social distance in determining deceptive behavior, independent of other confounding factors such as information asymmetries. In this experiment, confederates of three different ethnicities, under the same informational conditions, take bicycle taxi rides in Malawi, where drivers have two opportunities not to adhere to social norms of honesty - by overcharging or outright stealing by not returning back with change. We find that social distance affects fraudulent behavior per se, increasing the charged price by 11%. Using the outright stealing measure, we find that deceptive behaviour is independent of stake size. Finally, we find evidence that moral priming affects cheating, as routes that have destinations with moral connotations, such as churches, hospitals, and schools present fraud 18% less frequently than routes without moral connotations and, in these routes, discrimination in overcharges among different ethnicities disappears. Our results suggest that in aid programs in developing countries, fraud may be reduced if the identity of the agents interacting with the locals is carefully chosen and the purposes of the aid are given proper moral connotations.
Paper available soon
We perform a further experiment to check the robustness of the main result in Rey Biel (2005) to sequential play. We find that Equilibrium predictions work even better when the same games are played sequentially: 85% of first movers choose the Equilibrium strategy and 85% of second movers best respond to the action taken by first movers. We conclude by identifying constant sum games as a class of games where experimental subjects' choices coincide with theory predictions and we argue that in such games distributional and reciprocal preferences do not influence subjects' decisions.
This paper tries to find an economic explanation to the fact that vaccines for death-causing diseases such as AIDS, malaria and tuberculosis, have not been discovered yet. We argue that private laboratories do not have proper incentives to invest in research due to three reasons: fi) the vast majority of demand comes from poor countries who cannot afford a price high enough to compensate research costs, £) international pressure to reduce vaccine prices once discovered will be high and successful, and 3) laboratories act as monopolists in the competing market of treatments for already infected patients and the appearance of a vaccine will, in the short run, lower the price of treatments while, in the longer run, make the treatment market disappear. We first present a static model to show why laboratories may under-invest in vaccines. We extend the model to a dynamic context to endogenize how the infection rate. Finally, we discuss some mechanisms to provide incentives for private research in vaccines.
We study how giving depends on income and luck, and how culture and information about the determinants of others’ income affect this relationship. Our data come from an experiment conducted in two countries, the US and Spain – each of which have different beliefs about how income inequality arises. We find that when individuals are informed about the determinants of income, there are no crosscultural differences in giving. When uninformed, however, Americans give less than the Spanish. This difference persists even after controlling for beliefs, personal characteristics, and values.
Economic theories and experiments could and should inform each other. An economic theory is more useful if it is not only an intellectual exercise but also relates to empirical relevant behavior. Experiments that are based on a set of alternative, well defined, hypotheses are more useful. We argue that these theories do not have to be a mathematical model. For example, experiments can help in the understanding and testing of mechanisms that are used in the world, but for which deriving equilibrium behavior analytically is too complicated.